Files
zhang-guiping 769579bcf0 fix(opencode-go): streaming completes when provider ends responses (#93965)
* fix(opencode-go): abort stalled SSE streams at provider-owned raw boundary

opencode-go routes through the shared OpenAI-compatible completions provider,
where a stalled SSE socket (provider emits tokens then never closes the stream)
hangs the gateway until stuckSessionAbortMs (~622s) and surfaces as
'LLM request failed' / 'Request was aborted'. Issue #93610 reports ~90% of
opencode-go cron jobs failing intermittently this way.

Add a provider-owned stream wrapper at the opencode-go raw SSE boundary that
injects an AbortController into the underlying OpenAI SDK request and aborts
it after a configurable idle window (default 30s, far below 622s) elapses
without any forward-progress event. The wrapper is:

- Provider-scoped: only applies when model.provider === 'opencode-go'; the
  shared openai-completions.ts path is untouched.
- Abortable: calls controller.abort() on the injected AbortSignal, which
  propagates through OpenAI SDK requestOptions.signal and genuinely
  interrupts the underlying fetch/stream (not just iterator return()).
- Idle-based: every event (text/tool/thinking delta, including delayed
  usage-only chunks) refreshes the timer; natural completion (done/error)
  cancels it. Normal delayed usage-only completion is preserved.
- Boundary-terminal: pushes a terminal { type: 'error', reason: 'aborted' }
  event downstream so consumers do not hang.

TDD: stream-termination.test.ts covers (a) stalled stream after first
progress is aborted within the idle window with a downstream 'aborted'
terminal event, and (b) normal delayed completion within the idle window
is not aborted and the done event is forwarded unchanged.

* fix(opencode-go): align stalled-stream idle default with runtime (120s)

Match the runtime's shared `DEFAULT_LLM_IDLE_TIMEOUT_MS` (120s) so
non-cron interactive opencode-go runs see no behavior change versus the
existing watchdog. Cron runs — for which the runtime disables its idle
watchdog entirely (`resolveLlmIdleTimeoutMs` returns 0 when trigger is
cron and no explicit timeout is set) — still get provider-owned
termination well before the ~622s stuck-session recovery.

Refs #93610

* fix(opencode-go): satisfy CI lint and test type checks

- Remove unnecessary `?? {}` fallback in spread (oxlint
  no-useless-fallback-in-spread).
- Drop non-narrowing `!` on the wrapper return type; use
  `await Promise.resolve(...)` to collapse the
  `StreamLike | Promise<StreamLike>` union before `for await`.

Refs #93610

* fix(opencode-go): arm stalled-stream idle timer only after first event

The wrapper armed the idle timer before the first upstream event, which
would mis-abort slow time-to-first-byte requests — including the
opencode-go cron runs that the runtime deliberately leaves uncapped via
resolveLlmIdleTimeoutMs. Arm only after the first forwarded event, and
add regression coverage for the slow-first-event path.

* fix(opencode-go): cover stalled stream first event

* fix(opencode-go): respect explicit stream timeout

* fix(opencode-go): preserve first-event timer after synthetic start

* fix(opencode-go): satisfy stream termination test lint

* fix(opencode-go): distinguish synthetic stream preambles

* fix(opencode-go): route stalled streams through failover
2026-06-22 19:57:21 +00:00
..
2026-06-04 21:02:07 -04:00