Addresses Codex P3 review finding: when shouldRotateAssistant fires on
idleTimedOut alone (timedOut=false), mergeRetryFailoverReason was passed
timedOut: params.timedOut (false), so the accumulated retry reason did
not record 'timeout'. Pass timedOut || idleTimedOut so the timeout reason
survives idle-only rotations and downstream fallback_model receives the
correct reason.
- failover-policy.test.ts: move 4 new it() blocks inside describe()
(they were orphaned outside the block and would not execute)
- run.ts: add idleTimedOut to the assistantFailoverDecision call site
(missing required field caused TypeScript error and reproduced the freeze
for the initial-decision code path in the outer loop)
- assistant-failover.ts: treat idleTimedOut same as timedOut in
markFailedProfile to avoid incorrect profile failure recording
- assistant-failover.ts: add warn log when idle timeout rotates a profile
- assistant-failover.ts: extend resolveAssistantFailoverErrorMessage to
accept idleTimedOut so surface_error emits "LLM request timed out."
instead of the generic "LLM request failed."
When the LLM idle watchdog fires (model produced no tokens for N seconds),
idleTimedOut is set in handleAssistantFailover but was never passed into
resolveRunFailoverDecision. As a result, shouldRotateAssistant saw neither
failoverReason nor timedOut (the run-budget timeout) set, returned false,
and the decision fell through to continue_normal -- the agent silently froze
without surfacing an error or advancing the fallback chain.
Fixes#76877 (regression since 2026.4.24).
Changes:
- failover-policy.ts: add idleTimedOut to AssistantDecisionParams; include it
in shouldRotateAssistant and reason selection in resolveRunFailoverDecision
- assistant-failover.ts: pass idleTimedOut into resolveRunFailoverDecision
- failover-policy.test.ts: 4 new cases for idle timeout path; update existing
assistant stage cases with the new required field (idleTimedOut: false)
- Forward temperature and top_p through OpenAI-compatible chat and responses gateway paths.
- Return OpenAI-compatible 400 errors for invalid sampling params and provider validation failures instead of collapsing them to 500s.
- Add regression coverage and changelog credit.
Co-authored-by: lellansin <lellansin@gmail.com>