* fix(agents): run heartbeat_prompt_contribution on harness prompt builds
Harness runtimes (e.g. the Codex app-server) assemble the prompt through
resolveAgentHarnessBeforePromptBuildResult rather than the embedded runner's
resolvePromptBuildHookResult. The harness helper ran before_prompt_build and
before_agent_start but never invoked heartbeat_prompt_contribution, so that hook
silently no-ops on those runtimes: plugins that contribute heartbeat context via
the documented hook get nothing on heartbeat turns.
Invoke heartbeat_prompt_contribution from the harness helper too, gated on
ctx.trigger === "heartbeat", merging its prepend/append context ahead of the
before_prompt_build / before_agent_start contributions (matching the embedded
path's ordering). before_prompt_build appendContext is already honored here, so
no change is needed for boot-style append contributions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): preserve heartbeat hook ordering
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
* fix(exec): preserve turn-source routing target in approval followups for plugin channels
When an async exec approval is resolved and the originating session is
resumed, buildAgentFollowupArgs forwarded the turn-source to/accountId/threadId
only for built-in deliverable channels or gateway-internal channels. For an
external channel plugin whose channel is not in the in-process deliverable set,
the followup dispatched channel alone and dropped the recipient, so the resumed
agent reply routed to webchat instead of the originating channel.
Forward the turn-source routing fields whenever the resolved delivery target is
not used, matching how the channel itself is already preserved, so the gateway
can route the post-approval reply back to the originating channel.
Fixes#96103
* fix(exec): normalize followup thread routing
* fix(exec): normalize followup thread routing
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Subagent runs do not have a live stream consumer; their result is delivered from
the terminal message path after the child run finishes. The intermediate
message_update stream work only feeds live preview output.
Thread suppressLiveStreamOutput from the subagent lane into the embedded runner
subscription and return from handleMessageUpdate after accumulating the raw
chunk. This keeps final delivery unchanged while skipping per-chunk visible text
and reasoning stream parsing for subagents, which reduces event-loop pressure
when multiple child agents stream long answers in parallel.
Interactive and Control UI runs keep the existing live preview path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit e0382c2c58c3eabdf64638777ec82cb1e68514e9)
* fix(model-fallback): classify Codex usage-limit payloads
* test: add real behavior proof for Codex usage-limit fallback
Adds a permanent real behavior proof test that exercises the production
classifyEmbeddedAgentRunResultForModelFallback() classifier with the exact
Codex subscription usage-limit error text.
Covers:
- Primary path: isError payload with usage-limit text -> rate_limit fallback
- Non-error payload: same text as normal assistant output -> no fallback
- Visible output already delivered -> no fallback
- Cross-provider: same text via openrouter -> rate_limit fallback
* fix(fallback-classifier): guard on finalAssistantVisibleText delivery evidence
When finalAssistantVisibleText contains real visible output (non-empty,
non-silent-reply), the agent already delivered a response to the user.
The classifier must not trigger model fallback in that case, because the
user already has their answer and rotating models would only burn quota
without improving the outcome.
Adds a guard in classifyEmbeddedAgentRunResultForModelFallback() that
checks finalAssistantVisibleText after committed outbound delivery
evidence and before the hook_block check. Uses the existing
isSilentReplyPayloadText() helper to avoid suppressing NO_REPLY and
similar intentional silent tokens.
This fixes the already-delivered-output test case in the Codex
usage-limit real behavior proof test.
* fix(test): use toEqual for cross-provider proof test type safety
The ModelFallbackResultClassification union includes { error: unknown },
so accessing .reason/.code after not.toBeNull() fails type checking.
Use toEqual with the full expected object instead, matching the pattern
used in result-fallback-classifier.test.ts.
* fix(model-fallback): refresh usage-limit fallback
Signed-off-by: sallyom <somalley@redhat.com>
---------
Signed-off-by: sallyom <somalley@redhat.com>
Co-authored-by: sallyom <somalley@redhat.com>
The Google embedded-agent prompt-cache helpers parsed cachedContents
metadata with an unbounded `await response.json()` in both
createGooglePromptCache and updateGooglePromptCacheTtl. A buggy or
hostile Generative Language endpoint returning a 200 with a large or
never-ending body (especially with no Content-Length) would be fully
buffered into memory before parsing, with the existing
cancelUnreadResponseBody guard firing too late (json() already drained
the body).
Route both reads through the shared streaming byte-cap reader
(readResponseWithLimit) under a 1 MiB cap, cancelling the stream on
overflow instead of buffering it, then JSON.parse the bounded buffer.
This is the symmetric Google-endpoint counterpart to the Anthropic
error-stream and gateway pricing-catalog bounds.
Adds regressions that stream an oversized no-Content-Length body through
the real create and TTL-refresh paths and assert the body is cancelled.
The OpenRouter /models catalog read in fetchOpenRouterModels hardened only
the error/early-return path (dbd5689 cancels the body when res.bodyUsed is
false), but the success branch still buffered the whole body with an
unbounded `await res.json()`. The response is a provider-controlled,
runtime-fetched body, so a faulty or hostile provider can stream an
effectively unbounded JSON document and exhaust process memory before the
parse completes; the finally-cancel is a no-op once .json() has drained.
Read the success body through the canonical byte-cap reader
(readResponseWithLimit) under a 4 MiB ceiling before JSON.parse, cancelling
the stream on overflow and bounding idle stalls with the call's existing
timeout. This is the symmetric success-path counterpart to the bounded-stream
hardening landed in #95103 (pricing catalog) and #95108 (Anthropic error
streams), reusing the same helper rather than a new abstraction.
* fix(agents): bound OpenRouter model catalog response reads
The runtime OpenRouter model-capability detector fetched the full
/models catalog with an unbounded `await response.json()`, so a
compromised or misbehaving endpoint could stream an arbitrarily large
body and force the process to buffer the whole payload before parsing.
Read the body through the shared bounded reader instead, capping it at
16 MiB (matching the sibling pricing-cache endpoint hardened in #95103)
and cancelling the stream on overflow. This mirrors the symmetric
bound-stream fixes in #95103 and #95108.
Adds coverage that an oversized streamed catalog is cancelled instead of
buffered and that an under-cap chunked body still reassembles, parses,
and round-trips through the SQLite cache on a fresh import.
* fix(agents): avoid OpenRouter refetch after capped catalog miss
---------
Co-authored-by: sallyom <somalley@redhat.com>
* fix(cron): trim trailing whitespace from recognized job object keys (#95407)
Some tool-call extraction/serialization pipelines can produce cron object
keys with trailing spaces (e.g. 'schedule ' instead of 'schedule'), causing
gateway validation to reject the job.
Add repairPaddedCronKeys() to canonicalizeCronToolObject() that trims only
recognized CRON_RECOVERABLE_OBJECT_KEYS. Non-recognized keys (including
special ones like '__proto__') are never trimmed, preventing prototype
pollution. When both padded and canonical forms exist, the canonical key
wins.
Tests:
- add job with trailing-space keys -> trimmed
- update patch with trailing-space keys -> trimmed
- non-recognized padded keys left intact (safety)
- canonical key preserved over padded duplicate
- clean keys unchanged
133 tests pass (128 existing + 5 new).
* fix(cron): preserve padded duplicate keys when canonical form already exists (#95407)
When both a padded key (e.g. 'schedule ') and its canonical form
('schedule') exist, the padded key is now preserved so strict gateway
validation rejects the ambiguous input rather than silently picking one
value. Only padded keys without a canonical counterpart are trimmed.
* fix(opencode-go): abort stalled SSE streams at provider-owned raw boundary
opencode-go routes through the shared OpenAI-compatible completions provider,
where a stalled SSE socket (provider emits tokens then never closes the stream)
hangs the gateway until stuckSessionAbortMs (~622s) and surfaces as
'LLM request failed' / 'Request was aborted'. Issue #93610 reports ~90% of
opencode-go cron jobs failing intermittently this way.
Add a provider-owned stream wrapper at the opencode-go raw SSE boundary that
injects an AbortController into the underlying OpenAI SDK request and aborts
it after a configurable idle window (default 30s, far below 622s) elapses
without any forward-progress event. The wrapper is:
- Provider-scoped: only applies when model.provider === 'opencode-go'; the
shared openai-completions.ts path is untouched.
- Abortable: calls controller.abort() on the injected AbortSignal, which
propagates through OpenAI SDK requestOptions.signal and genuinely
interrupts the underlying fetch/stream (not just iterator return()).
- Idle-based: every event (text/tool/thinking delta, including delayed
usage-only chunks) refreshes the timer; natural completion (done/error)
cancels it. Normal delayed usage-only completion is preserved.
- Boundary-terminal: pushes a terminal { type: 'error', reason: 'aborted' }
event downstream so consumers do not hang.
TDD: stream-termination.test.ts covers (a) stalled stream after first
progress is aborted within the idle window with a downstream 'aborted'
terminal event, and (b) normal delayed completion within the idle window
is not aborted and the done event is forwarded unchanged.
* fix(opencode-go): align stalled-stream idle default with runtime (120s)
Match the runtime's shared `DEFAULT_LLM_IDLE_TIMEOUT_MS` (120s) so
non-cron interactive opencode-go runs see no behavior change versus the
existing watchdog. Cron runs — for which the runtime disables its idle
watchdog entirely (`resolveLlmIdleTimeoutMs` returns 0 when trigger is
cron and no explicit timeout is set) — still get provider-owned
termination well before the ~622s stuck-session recovery.
Refs #93610
* fix(opencode-go): satisfy CI lint and test type checks
- Remove unnecessary `?? {}` fallback in spread (oxlint
no-useless-fallback-in-spread).
- Drop non-narrowing `!` on the wrapper return type; use
`await Promise.resolve(...)` to collapse the
`StreamLike | Promise<StreamLike>` union before `for await`.
Refs #93610
* fix(opencode-go): arm stalled-stream idle timer only after first event
The wrapper armed the idle timer before the first upstream event, which
would mis-abort slow time-to-first-byte requests — including the
opencode-go cron runs that the runtime deliberately leaves uncapped via
resolveLlmIdleTimeoutMs. Arm only after the first forwarded event, and
add regression coverage for the slow-first-event path.
* fix(opencode-go): cover stalled stream first event
* fix(opencode-go): respect explicit stream timeout
* fix(opencode-go): preserve first-event timer after synthetic start
* fix(opencode-go): satisfy stream termination test lint
* fix(opencode-go): distinguish synthetic stream preambles
* fix(opencode-go): route stalled streams through failover
The generic assistant error text "LLM request failed." (GENERIC_ASSISTANT_ERROR_TEXT) is
produced by formatUserFacingAssistantErrorText when the underlying provider error cannot
be formatted into a specific category. For local providers (LM Studio, Ollama) this wraps
connection/availability failures when the model is not loaded or the endpoint is unreachable.
Without this match, the error is not classified as any transient type (rate_limit, overloaded,
network, server_error, timeout), so cron retry and payload.fallbacks never engage — even
though the configured fallback chain should handle provider availability failures.
Add /^llm request failed\.$/i as an exact-match regex in the timeout error patterns. This
strictly matches only the bare "LLM request failed." string, not variants like
"LLM request failed: provider rejected the request schema or tool payload." (which is a
format/schema error, not transient). Variants with specific transient reasons (connection
refused, network error, etc.) are classified through their own existing patterns.
Closes#93931