diff --git a/CHANGELOG.md b/CHANGELOG.md index b7f6e4a2ee3..d7b12106e18 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -399,7 +399,7 @@ Docs: https://docs.openclaw.ai - Agents/sessions: stop session write-lock timeouts from entering model failover, so local lock contention surfaces directly instead of cascading across providers. (#68700) Thanks @MonkeyLeeT. - Auto-reply: run inbound reply delivery through `message_sending` hooks so plugins can transform or cancel generated replies before they are sent. (#70118) Thanks @jzakirov. - CI/release-checks: pass workflow inputs and matrix values through step environment variables instead of embedding them directly into `run:` shell commands, reducing template-injection surface in the cross-OS release-check workflow. (#66884) Thanks @alexlomt. -- Agents/failover: classify the bare `An unknown error occurred` stream-wrapper message that pi-ai providers throw when streams end with `stopReason: "aborted" | "error"` as a transient timeout regardless of provider, so configured fallback chains rotate for non-Anthropic providers (Google, OpenRouter, Bedrock, etc.) instead of surfacing the literal string to users. Fixes #71620. Thanks @mattcproctor. +- Agents/failover: classify the bare `An unknown error occurred` stream-wrapper message that pi-ai providers throw when streams end with `stopReason: "aborted" | "error"` as a transient timeout regardless of provider, so configured fallback chains rotate for non-Anthropic providers (Google, OpenRouter, Bedrock, etc.) instead of surfacing the literal string to users. Fixes #71620. Thanks @willamhou and @mattcproctor. ## 2026.4.23 diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md index 14a7a859a03..55a41711708 100644 --- a/docs/concepts/model-failover.md +++ b/docs/concepts/model-failover.md @@ -129,15 +129,18 @@ validation failures) are treated as failover‑worthy and use the same cooldowns OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals. -Provider-scoped generic server text can also land in that timeout bucket when -the source matches a known transient pattern. For example, Anthropic bare -`An unknown error occurred` and JSON `api_error` payloads with transient server -text such as `internal server error`, `unknown error, 520`, `upstream error`, -or `backend error` are treated as failover-worthy timeouts. OpenRouter-specific -generic upstream text such as bare `Provider returned error` is also treated as -timeout only when the provider context is actually OpenRouter. Generic internal -fallback text such as `LLM request failed with an unknown error.` stays -conservative and does not trigger failover by itself. +Generic server text can also land in that timeout bucket when the source matches +a known transient pattern. For example, the bare pi-ai stream-wrapper message +`An unknown error occurred` is treated as failover-worthy for every provider +because pi-ai emits it when provider streams end with `stopReason: "aborted"` or +`stopReason: "error"` without specific details. JSON `api_error` payloads with +transient server text such as `internal server error`, `unknown error, 520`, +`upstream error`, or `backend error` are also treated as failover-worthy +timeouts. +OpenRouter-specific generic upstream text such as bare `Provider returned error` +is treated as timeout only when the provider context is actually OpenRouter. +Generic internal fallback text such as `LLM request failed with an unknown +error.` stays conservative and does not trigger failover by itself. Some provider SDKs may otherwise sleep for a long `Retry-After` window before returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and