From 209522e2e091608cbafb9e6fbbe13293f6ce0f63 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Sun, 26 Apr 2026 03:59:53 -0700 Subject: [PATCH] docs(model-failover): rewrite with Steps for runtime flow and rotation, AccordionGroup for cooldown buckets and chain rules, Tabs for which errors advance fallback --- docs/concepts/model-failover.md | 318 ++++++++++++++------------------ 1 file changed, 140 insertions(+), 178 deletions(-) diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md index 55a41711708..3b200709505 100644 --- a/docs/concepts/model-failover.md +++ b/docs/concepts/model-failover.md @@ -5,6 +5,7 @@ read_when: - Updating failover rules for auth profiles or models - Understanding how session model overrides interact with fallback retries title: "Model failover" +sidebarTitle: "Model failover" --- OpenClaw handles failures in two stages: @@ -18,29 +19,31 @@ This doc explains the runtime rules and the data that backs them. For a normal text run, OpenClaw evaluates candidates in this order: -1. The currently selected session model. -2. Configured `agents.defaults.model.fallbacks` in order. -3. The configured primary model at the end when the run started from an override. + + + Resolve the active session model and auth-profile preference. + + + Build the model candidate chain from the currently selected session model, then `agents.defaults.model.fallbacks` in order, ending with the configured primary when the run started from an override. + + + Try the current provider with auth-profile rotation/cooldown rules. + + + If that provider is exhausted with a failover-worthy error, move to the next model candidate. + + + Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use. + + + If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate. + + + If every candidate fails, throw a `FallbackSummaryError` with per-attempt detail and the soonest cooldown expiry when one is known. + + -Inside each candidate, OpenClaw tries auth-profile failover before advancing to -the next model candidate. - -High-level sequence: - -1. Resolve the active session model and auth-profile preference. -2. Build the model candidate chain. -3. Try the current provider with auth-profile rotation/cooldown rules. -4. If that provider is exhausted with a failover-worthy error, move to the next - model candidate. -5. Persist the selected fallback override before the retry starts so other - session readers see the same provider/model the runner is about to use. -6. If the fallback candidate fails, roll back only the fallback-owned session - override fields when they still match that failed candidate. -7. If every candidate fails, throw a `FallbackSummaryError` with per-attempt - detail and the soonest cooldown expiry when one is known. - -This is intentionally narrower than "save and restore the whole session". The -reply runner only persists the model-selection fields it owns for fallback: +This is intentionally narrower than "save and restore the whole session". The reply runner only persists the model-selection fields it owns for fallback: - `providerOverride` - `modelOverride` @@ -48,9 +51,7 @@ reply runner only persists the model-selection fields it owns for fallback: - `authProfileOverrideSource` - `authProfileOverrideCompactionCount` -That prevents a failed fallback retry from overwriting newer unrelated session -mutations such as manual `/model` changes or session rotation updates that -happened while the attempt was running. +That prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual `/model` changes or session rotation updates that happened while the attempt was running. ## Auth storage (keys + OAuth) @@ -61,7 +62,7 @@ OpenClaw uses **auth profiles** for both API keys and OAuth tokens. - Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets). - Legacy import-only OAuth file: `~/.openclaw/credentials/oauth.json` (imported into `auth-profiles.json` on first use). -More detail: [/concepts/oauth](/concepts/oauth) +More detail: [OAuth](/concepts/oauth) Credential types: @@ -81,9 +82,17 @@ Profiles live in `~/.openclaw/agents//agent/auth-profiles.json` under ` When a provider has multiple profiles, OpenClaw chooses an order like this: -1. **Explicit config**: `auth.order[provider]` (if set). -2. **Configured profiles**: `auth.profiles` filtered by provider. -3. **Stored profiles**: entries in `auth-profiles.json` for the provider. + + + `auth.order[provider]` (if set). + + + `auth.profiles` filtered by provider. + + + Entries in `auth-profiles.json` for the provider. + + If no explicit order is configured, OpenClaw uses a round‑robin order: @@ -93,20 +102,17 @@ If no explicit order is configured, OpenClaw uses a round‑robin order: ### Session stickiness (cache-friendly) -OpenClaw **pins the chosen auth profile per session** to keep provider caches warm. -It does **not** rotate on every request. The pinned profile is reused until: +OpenClaw **pins the chosen auth profile per session** to keep provider caches warm. It does **not** rotate on every request. The pinned profile is reused until: - the session is reset (`/new` / `/reset`) - a compaction completes (compaction count increments) - the profile is in cooldown/disabled -Manual selection via `/model …@` sets a **user override** for that session -and is not auto‑rotated until a new session starts. +Manual selection via `/model …@` sets a **user override** for that session and is not auto-rotated until a new session starts. -Auto‑pinned profiles (selected by the session router) are treated as a **preference**: -they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts. -User‑pinned profiles stay locked to that profile; if it fails and model fallbacks -are configured, OpenClaw moves to the next model instead of switching profiles. + +Auto-pinned profiles (selected by the session router) are treated as a **preference**: they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts. User-pinned profiles stay locked to that profile; if it fails and model fallbacks are configured, OpenClaw moves to the next model instead of switching profiles. + ### Why OAuth can "look lost" @@ -117,45 +123,31 @@ If you have both an OAuth profile and an API key profile for the same provider, ## Cooldowns -When a profile fails due to auth/rate‑limit errors (or a timeout that looks -like rate limiting), OpenClaw marks it in cooldown and moves to the next profile. -That rate-limit bucket is broader than plain `429`: it also includes provider -messages such as `Too many concurrent requests`, `ThrottlingException`, -`concurrency limit reached`, `workers_ai ... quota limit exceeded`, -`throttled`, `resource exhausted`, and periodic usage-window limits such as -`weekly/monthly limit reached`. -Format/invalid‑request errors (for example Cloud Code Assist tool call ID -validation failures) are treated as failover‑worthy and use the same cooldowns. -OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, -`stop reason: error`, and `reason: error` are classified as timeout/failover -signals. -Generic server text can also land in that timeout bucket when the source matches -a known transient pattern. For example, the bare pi-ai stream-wrapper message -`An unknown error occurred` is treated as failover-worthy for every provider -because pi-ai emits it when provider streams end with `stopReason: "aborted"` or -`stopReason: "error"` without specific details. JSON `api_error` payloads with -transient server text such as `internal server error`, `unknown error, 520`, -`upstream error`, or `backend error` are also treated as failover-worthy -timeouts. -OpenRouter-specific generic upstream text such as bare `Provider returned error` -is treated as timeout only when the provider context is actually OpenRouter. -Generic internal fallback text such as `LLM request failed with an unknown -error.` stays conservative and does not trigger failover by itself. +When a profile fails due to auth/rate-limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile. -Some provider SDKs may otherwise sleep for a long `Retry-After` window before -returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and -OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60 -seconds by default and surfaces longer retryable responses immediately so this -failover path can run. Tune or disable the cap with -`OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [/concepts/retry](/concepts/retry). + + + That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`. -Rate-limit cooldowns can also be model-scoped: + Format/invalid-request errors (for example Cloud Code Assist tool call ID validation failures) are treated as failover-worthy and use the same cooldowns. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals. -- OpenClaw records `cooldownModel` for rate-limit failures when the failing - model id is known. -- A sibling model on the same provider can still be tried when the cooldown is - scoped to a different model. -- Billing/disabled windows still block the whole profile across models. + Generic server text can also land in that timeout bucket when the source matches a known transient pattern. For example, the bare pi-ai stream-wrapper message `An unknown error occurred` is treated as failover-worthy for every provider because pi-ai emits it when provider streams end with `stopReason: "aborted"` or `stopReason: "error"` without specific details. JSON `api_error` payloads with transient server text such as `internal server error`, `unknown error, 520`, `upstream error`, or `backend error` are also treated as failover-worthy timeouts. + + OpenRouter-specific generic upstream text such as bare `Provider returned error` is treated as timeout only when the provider context is actually OpenRouter. Generic internal fallback text such as `LLM request failed with an unknown error.` stays conservative and does not trigger failover by itself. + + + + Some provider SDKs may otherwise sleep for a long `Retry-After` window before returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60 seconds by default and surfaces longer retryable responses immediately so this failover path can run. Tune or disable the cap with `OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [Retry behavior](/concepts/retry). + + + Rate-limit cooldowns can also be model-scoped: + + - OpenClaw records `cooldownModel` for rate-limit failures when the failing model id is known. + - A sibling model on the same provider can still be tried when the cooldown is scoped to a different model. + - Billing/disabled windows still block the whole profile across models. + + + Cooldowns use exponential backoff: @@ -180,18 +172,13 @@ State is stored in `auth-state.json` under `usageStats`: ## Billing disables -Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failover‑worthy, but they’re usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider. +Billing/credit failures (for example "insufficient credits" / "credit balance too low") are treated as failover-worthy, but they're usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider. -Not every billing-shaped response is `402`, and not every HTTP `402` lands -here. OpenClaw keeps explicit billing text in the billing lane even when a -provider returns `401` or `403` instead, but provider-specific matchers stay -scoped to the provider that owns them (for example OpenRouter `403 Key limit -exceeded`). Meanwhile temporary `402` usage-window and -organization/workspace spend-limit errors are classified as `rate_limit` when -the message looks retryable (for example `weekly usage limit exhausted`, `daily -limit reached, resets tomorrow`, or `organization spending limit exceeded`). -Those stay on the short cooldown/failover path instead of the long -billing-disable path. + +Not every billing-shaped response is `402`, and not every HTTP `402` lands here. OpenClaw keeps explicit billing text in the billing lane even when a provider returns `401` or `403` instead, but provider-specific matchers stay scoped to the provider that owns them (for example OpenRouter `403 Key limit exceeded`). + +Meanwhile temporary `402` usage-window and organization/workspace spend-limit errors are classified as `rate_limit` when the message looks retryable (for example `weekly usage limit exhausted`, `daily limit reached, resets tomorrow`, or `organization spending limit exceeded`). Those stay on the short cooldown/failover path instead of the long billing-disable path. + State is stored in `auth-state.json`: @@ -209,139 +196,114 @@ State is stored in `auth-state.json`: Defaults: - Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**. -- Backoff counters reset if the profile hasn’t failed for **24 hours** (configurable). +- Backoff counters reset if the profile hasn't failed for **24 hours** (configurable). - Overloaded retries allow **1 same-provider profile rotation** before model fallback. - Overloaded retries use **0 ms backoff** by default. ## Model fallback -If all profiles for a provider fail, OpenClaw moves to the next model in -`agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and -timeouts that exhausted profile rotation (other errors do not advance fallback). +If all profiles for a provider fail, OpenClaw moves to the next model in `agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback). -Overloaded and rate-limit errors are handled more aggressively than billing -cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, -then switches to the next configured model fallback without waiting. -Provider-busy signals such as `ModelNotReadyException` land in that overloaded -bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, -`auth.cooldowns.overloadedBackoffMs`, and -`auth.cooldowns.rateLimitedProfileRotations`. +Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as `ModelNotReadyException` land in that overloaded bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, `auth.cooldowns.overloadedBackoffMs`, and `auth.cooldowns.rateLimitedProfileRotations`. -When a run starts with a model override (hooks or CLI), fallbacks still end at -`agents.defaults.model.primary` after trying any configured fallbacks. +When a run starts with a model override (hooks or CLI), fallbacks still end at `agents.defaults.model.primary` after trying any configured fallbacks. ### Candidate chain rules -OpenClaw builds the candidate list from the currently requested `provider/model` -plus configured fallbacks. +OpenClaw builds the candidate list from the currently requested `provider/model` plus configured fallbacks. -Rules: - -- The requested model is always first. -- Explicit configured fallbacks are deduplicated but not filtered by the model - allowlist. They are treated as explicit operator intent. -- If the current run is already on a configured fallback in the same provider - family, OpenClaw keeps using the full configured chain. -- If the current run is on a different provider than config and that current - model is not already part of the configured fallback chain, OpenClaw does not - append unrelated configured fallbacks from another provider. -- When the run started from an override, the configured primary is appended at - the end so the chain can settle back onto the normal default once earlier - candidates are exhausted. + + + - The requested model is always first. + - Explicit configured fallbacks are deduplicated but not filtered by the model allowlist. They are treated as explicit operator intent. + - If the current run is already on a configured fallback in the same provider family, OpenClaw keeps using the full configured chain. + - If the current run is on a different provider than config and that current model is not already part of the configured fallback chain, OpenClaw does not append unrelated configured fallbacks from another provider. + - When the run started from an override, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted. + + ### Which errors advance fallback -Model fallback continues on: - -- auth failures -- rate limits and cooldown exhaustion -- overloaded/provider-busy errors -- timeout-shaped failover errors -- billing disables -- `LiveSessionModelSwitchError`, which is normalized into a failover path so a - stale persisted model does not create an outer retry loop -- other unrecognized errors when there are still remaining candidates - -Model fallback does not continue on: - -- explicit aborts that are not timeout/failover-shaped -- context overflow errors that should stay inside compaction/retry logic - (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum -number of tokens`, `input token count exceeds the maximum number of input -tokens`, `The input is too long for the model`, or `ollama error: context -length exceeded`) -- a final unknown error when there are no candidates left + + + - auth failures + - rate limits and cooldown exhaustion + - overloaded/provider-busy errors + - timeout-shaped failover errors + - billing disables + - `LiveSessionModelSwitchError`, which is normalized into a failover path so a stale persisted model does not create an outer retry loop + - other unrecognized errors when there are still remaining candidates + + + - explicit aborts that are not timeout/failover-shaped + - context overflow errors that should stay inside compaction/retry logic (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum number of tokens`, `input token count exceeds the maximum number of input tokens`, `The input is too long for the model`, or `ollama error: context length exceeded`) + - a final unknown error when there are no candidates left + + ### Cooldown skip vs probe behavior -When every auth profile for a provider is already in cooldown, OpenClaw does -not automatically skip that provider forever. It makes a per-candidate decision: +When every auth profile for a provider is already in cooldown, OpenClaw does not automatically skip that provider forever. It makes a per-candidate decision: -- Persistent auth failures skip the whole provider immediately. -- Billing disables usually skip, but the primary candidate can still be probed - on a throttle so recovery is possible without restarting. -- The primary candidate may be probed near cooldown expiry, with a per-provider - throttle. -- Same-provider fallback siblings can be attempted despite cooldown when the - failure looks transient (`rate_limit`, `overloaded`, or unknown). This is - especially relevant when a rate limit is model-scoped and a sibling model may - still recover immediately. -- Transient cooldown probes are limited to one per provider per fallback run so - a single provider does not stall cross-provider fallback. + + + - Persistent auth failures skip the whole provider immediately. + - Billing disables usually skip, but the primary candidate can still be probed on a throttle so recovery is possible without restarting. + - The primary candidate may be probed near cooldown expiry, with a per-provider throttle. + - Same-provider fallback siblings can be attempted despite cooldown when the failure looks transient (`rate_limit`, `overloaded`, or unknown). This is especially relevant when a rate limit is model-scoped and a sibling model may still recover immediately. + - Transient cooldown probes are limited to one per provider per fallback run so a single provider does not stall cross-provider fallback. + + ## Session overrides and live model switching -Session model changes are shared state. The active runner, `/model` command, -compaction/session updates, and live-session reconciliation all read or write -parts of the same session entry. +Session model changes are shared state. The active runner, `/model` command, compaction/session updates, and live-session reconciliation all read or write parts of the same session entry. That means fallback retries have to coordinate with live model switching: -- Only explicit user-driven model changes mark a pending live switch. That - includes `/model`, `session_status(model=...)`, and `sessions.patch`. -- System-driven model changes such as fallback rotation, heartbeat overrides, - or compaction never mark a pending live switch on their own. -- Before a fallback retry starts, the reply runner persists the selected - fallback override fields to the session entry. -- Live-session reconciliation prefers persisted session overrides over stale - runtime model fields. -- If the fallback attempt fails, the runner rolls back only the override fields - it wrote, and only if they still match that failed candidate. +- Only explicit user-driven model changes mark a pending live switch. That includes `/model`, `session_status(model=...)`, and `sessions.patch`. +- System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own. +- Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry. +- Live-session reconciliation prefers persisted session overrides over stale runtime model fields. +- If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate. This prevents the classic race: -1. Primary fails. -2. Fallback candidate is chosen in memory. -3. Session store still says the old primary. -4. Live-session reconciliation reads the stale session state. -5. The retry gets snapped back to the old model before the fallback attempt - starts. + + + The selected primary model fails. + + + Fallback candidate is chosen in memory. + + + Session store still reflects the old primary. + + + Live-session reconciliation reads the stale session state. + + + The retry gets snapped back to the old model before the fallback attempt starts. + + -The persisted fallback override closes that window, and the narrow rollback -keeps newer manual or runtime session changes intact. +The persisted fallback override closes that window, and the narrow rollback keeps newer manual or runtime session changes intact. ## Observability and failure summaries -`runWithModelFallback(...)` records per-attempt details that feed logs and -user-facing cooldown messaging: +`runWithModelFallback(...)` records per-attempt details that feed logs and user-facing cooldown messaging: - provider/model attempted -- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and - similar failover reasons) +- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and similar failover reasons) - optional status/code - human-readable error summary -When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer -reply runner can use that to build a more specific message such as "all models -are temporarily rate-limited" and include the soonest cooldown expiry when one -is known. +When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer reply runner can use that to build a more specific message such as "all models are temporarily rate-limited" and include the soonest cooldown expiry when one is known. That cooldown summary is model-aware: -- unrelated model-scoped rate limits are ignored for the attempted - provider/model chain -- if the remaining block is a matching model-scoped rate limit, OpenClaw - reports the last matching expiry that still blocks that model +- unrelated model-scoped rate limits are ignored for the attempted provider/model chain +- if the remaining block is a matching model-scoped rate limit, OpenClaw reports the last matching expiry that still blocks that model ## Related config