Commit Graph

3 Commits

Author SHA1 Message Date
Onur Solmaz
530d576cc6 fix(agents): time out local streams without first event (#98525)
* fix(agents): time out streams without first event

* fix(agents): propagate first-event stream timeout

* fix(agents): abort streams on first-event timeout

* fix(agents): time out chatgpt websocket streams

* fix(agents): clamp first event stream timeouts

* fix(agents): keep first event timeout internal

* fix(agents): classify first event stream stalls as timeouts

* fix(agents): preserve provider first-event timeouts

* fix(agents): satisfy first-event timeout lint

* fix(agents): classify first-event stalls as idle timeouts

* test(tooling): include transcript store helper route
2026-07-01 23:01:46 +08:00
weiqinl
552ec2b49d fix(opencode-go): re-arm idle timer on block-boundary events to prevent false stalled-stream abort (#97128)
* fix(opencode-go): re-arm idle timer on block-boundary events to prevent false stalled-stream abort

When the opencode-go model finalizes a tool call and deliberates before
the next one, the provider emits real block-boundary SSE events
(text_end, thinking_end, toolcall_start, toolcall_end) that prove the
socket is alive, but the watchdog's isProviderProgressEvent only
returned true for token deltas (text_delta, thinking_delta,
toolcall_delta). This caused the idle timer to fire and falsely abort a
live stream, replacing a completed answer with a stalled error and
dropping the provider's real done event.

Fix: include block-boundary events in isProviderProgressEvent so the
idle timer is re-armed on any forward-progress provider event.
text_start and thinking_start are intentionally excluded because they
are synthetic preamble events that should not shorten the first-event
window.

Closes #96518

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(opencode-go): satisfy lint in stream regression

* test(opencode-go): satisfy lint in stream regression

* test(opencode-go): satisfy lint in stream regression

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-06-27 09:48:42 +08:00
zhang-guiping
769579bcf0 fix(opencode-go): streaming completes when provider ends responses (#93965)
* fix(opencode-go): abort stalled SSE streams at provider-owned raw boundary

opencode-go routes through the shared OpenAI-compatible completions provider,
where a stalled SSE socket (provider emits tokens then never closes the stream)
hangs the gateway until stuckSessionAbortMs (~622s) and surfaces as
'LLM request failed' / 'Request was aborted'. Issue #93610 reports ~90% of
opencode-go cron jobs failing intermittently this way.

Add a provider-owned stream wrapper at the opencode-go raw SSE boundary that
injects an AbortController into the underlying OpenAI SDK request and aborts
it after a configurable idle window (default 30s, far below 622s) elapses
without any forward-progress event. The wrapper is:

- Provider-scoped: only applies when model.provider === 'opencode-go'; the
  shared openai-completions.ts path is untouched.
- Abortable: calls controller.abort() on the injected AbortSignal, which
  propagates through OpenAI SDK requestOptions.signal and genuinely
  interrupts the underlying fetch/stream (not just iterator return()).
- Idle-based: every event (text/tool/thinking delta, including delayed
  usage-only chunks) refreshes the timer; natural completion (done/error)
  cancels it. Normal delayed usage-only completion is preserved.
- Boundary-terminal: pushes a terminal { type: 'error', reason: 'aborted' }
  event downstream so consumers do not hang.

TDD: stream-termination.test.ts covers (a) stalled stream after first
progress is aborted within the idle window with a downstream 'aborted'
terminal event, and (b) normal delayed completion within the idle window
is not aborted and the done event is forwarded unchanged.

* fix(opencode-go): align stalled-stream idle default with runtime (120s)

Match the runtime's shared `DEFAULT_LLM_IDLE_TIMEOUT_MS` (120s) so
non-cron interactive opencode-go runs see no behavior change versus the
existing watchdog. Cron runs — for which the runtime disables its idle
watchdog entirely (`resolveLlmIdleTimeoutMs` returns 0 when trigger is
cron and no explicit timeout is set) — still get provider-owned
termination well before the ~622s stuck-session recovery.

Refs #93610

* fix(opencode-go): satisfy CI lint and test type checks

- Remove unnecessary `?? {}` fallback in spread (oxlint
  no-useless-fallback-in-spread).
- Drop non-narrowing `!` on the wrapper return type; use
  `await Promise.resolve(...)` to collapse the
  `StreamLike | Promise<StreamLike>` union before `for await`.

Refs #93610

* fix(opencode-go): arm stalled-stream idle timer only after first event

The wrapper armed the idle timer before the first upstream event, which
would mis-abort slow time-to-first-byte requests — including the
opencode-go cron runs that the runtime deliberately leaves uncapped via
resolveLlmIdleTimeoutMs. Arm only after the first forwarded event, and
add regression coverage for the slow-first-event path.

* fix(opencode-go): cover stalled stream first event

* fix(opencode-go): respect explicit stream timeout

* fix(opencode-go): preserve first-event timer after synthetic start

* fix(opencode-go): satisfy stream termination test lint

* fix(opencode-go): distinguish synthetic stream preambles

* fix(opencode-go): route stalled streams through failover
2026-06-22 19:57:21 +00:00