diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md
index 55a41711708..3b200709505 100644
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -5,6 +5,7 @@ read_when:
- Updating failover rules for auth profiles or models
- Understanding how session model overrides interact with fallback retries
title: "Model failover"
+sidebarTitle: "Model failover"
---
OpenClaw handles failures in two stages:
@@ -18,29 +19,31 @@ This doc explains the runtime rules and the data that backs them.
For a normal text run, OpenClaw evaluates candidates in this order:
-1. The currently selected session model.
-2. Configured `agents.defaults.model.fallbacks` in order.
-3. The configured primary model at the end when the run started from an override.
+
+
+ Resolve the active session model and auth-profile preference.
+
+
+ Build the model candidate chain from the currently selected session model, then `agents.defaults.model.fallbacks` in order, ending with the configured primary when the run started from an override.
+
+
+ Try the current provider with auth-profile rotation/cooldown rules.
+
+
+ If that provider is exhausted with a failover-worthy error, move to the next model candidate.
+
+
+ Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use.
+
+
+ If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate.
+
+
+ If every candidate fails, throw a `FallbackSummaryError` with per-attempt detail and the soonest cooldown expiry when one is known.
+
+
-Inside each candidate, OpenClaw tries auth-profile failover before advancing to
-the next model candidate.
-
-High-level sequence:
-
-1. Resolve the active session model and auth-profile preference.
-2. Build the model candidate chain.
-3. Try the current provider with auth-profile rotation/cooldown rules.
-4. If that provider is exhausted with a failover-worthy error, move to the next
- model candidate.
-5. Persist the selected fallback override before the retry starts so other
- session readers see the same provider/model the runner is about to use.
-6. If the fallback candidate fails, roll back only the fallback-owned session
- override fields when they still match that failed candidate.
-7. If every candidate fails, throw a `FallbackSummaryError` with per-attempt
- detail and the soonest cooldown expiry when one is known.
-
-This is intentionally narrower than "save and restore the whole session". The
-reply runner only persists the model-selection fields it owns for fallback:
+This is intentionally narrower than "save and restore the whole session". The reply runner only persists the model-selection fields it owns for fallback:
- `providerOverride`
- `modelOverride`
@@ -48,9 +51,7 @@ reply runner only persists the model-selection fields it owns for fallback:
- `authProfileOverrideSource`
- `authProfileOverrideCompactionCount`
-That prevents a failed fallback retry from overwriting newer unrelated session
-mutations such as manual `/model` changes or session rotation updates that
-happened while the attempt was running.
+That prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual `/model` changes or session rotation updates that happened while the attempt was running.
## Auth storage (keys + OAuth)
@@ -61,7 +62,7 @@ OpenClaw uses **auth profiles** for both API keys and OAuth tokens.
- Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets).
- Legacy import-only OAuth file: `~/.openclaw/credentials/oauth.json` (imported into `auth-profiles.json` on first use).
-More detail: [/concepts/oauth](/concepts/oauth)
+More detail: [OAuth](/concepts/oauth)
Credential types:
@@ -81,9 +82,17 @@ Profiles live in `~/.openclaw/agents//agent/auth-profiles.json` under `
When a provider has multiple profiles, OpenClaw chooses an order like this:
-1. **Explicit config**: `auth.order[provider]` (if set).
-2. **Configured profiles**: `auth.profiles` filtered by provider.
-3. **Stored profiles**: entries in `auth-profiles.json` for the provider.
+
+
+ `auth.order[provider]` (if set).
+
+
+ `auth.profiles` filtered by provider.
+
+
+ Entries in `auth-profiles.json` for the provider.
+
+
If no explicit order is configured, OpenClaw uses a round‑robin order:
@@ -93,20 +102,17 @@ If no explicit order is configured, OpenClaw uses a round‑robin order:
### Session stickiness (cache-friendly)
-OpenClaw **pins the chosen auth profile per session** to keep provider caches warm.
-It does **not** rotate on every request. The pinned profile is reused until:
+OpenClaw **pins the chosen auth profile per session** to keep provider caches warm. It does **not** rotate on every request. The pinned profile is reused until:
- the session is reset (`/new` / `/reset`)
- a compaction completes (compaction count increments)
- the profile is in cooldown/disabled
-Manual selection via `/model …@` sets a **user override** for that session
-and is not auto‑rotated until a new session starts.
+Manual selection via `/model …@` sets a **user override** for that session and is not auto-rotated until a new session starts.
-Auto‑pinned profiles (selected by the session router) are treated as a **preference**:
-they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts.
-User‑pinned profiles stay locked to that profile; if it fails and model fallbacks
-are configured, OpenClaw moves to the next model instead of switching profiles.
+
+Auto-pinned profiles (selected by the session router) are treated as a **preference**: they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts. User-pinned profiles stay locked to that profile; if it fails and model fallbacks are configured, OpenClaw moves to the next model instead of switching profiles.
+
### Why OAuth can "look lost"
@@ -117,45 +123,31 @@ If you have both an OAuth profile and an API key profile for the same provider,
## Cooldowns
-When a profile fails due to auth/rate‑limit errors (or a timeout that looks
-like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
-That rate-limit bucket is broader than plain `429`: it also includes provider
-messages such as `Too many concurrent requests`, `ThrottlingException`,
-`concurrency limit reached`, `workers_ai ... quota limit exceeded`,
-`throttled`, `resource exhausted`, and periodic usage-window limits such as
-`weekly/monthly limit reached`.
-Format/invalid‑request errors (for example Cloud Code Assist tool call ID
-validation failures) are treated as failover‑worthy and use the same cooldowns.
-OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`,
-`stop reason: error`, and `reason: error` are classified as timeout/failover
-signals.
-Generic server text can also land in that timeout bucket when the source matches
-a known transient pattern. For example, the bare pi-ai stream-wrapper message
-`An unknown error occurred` is treated as failover-worthy for every provider
-because pi-ai emits it when provider streams end with `stopReason: "aborted"` or
-`stopReason: "error"` without specific details. JSON `api_error` payloads with
-transient server text such as `internal server error`, `unknown error, 520`,
-`upstream error`, or `backend error` are also treated as failover-worthy
-timeouts.
-OpenRouter-specific generic upstream text such as bare `Provider returned error`
-is treated as timeout only when the provider context is actually OpenRouter.
-Generic internal fallback text such as `LLM request failed with an unknown
-error.` stays conservative and does not trigger failover by itself.
+When a profile fails due to auth/rate-limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
-Some provider SDKs may otherwise sleep for a long `Retry-After` window before
-returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and
-OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60
-seconds by default and surfaces longer retryable responses immediately so this
-failover path can run. Tune or disable the cap with
-`OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [/concepts/retry](/concepts/retry).
+
+
+ That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`.
-Rate-limit cooldowns can also be model-scoped:
+ Format/invalid-request errors (for example Cloud Code Assist tool call ID validation failures) are treated as failover-worthy and use the same cooldowns. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals.
-- OpenClaw records `cooldownModel` for rate-limit failures when the failing
- model id is known.
-- A sibling model on the same provider can still be tried when the cooldown is
- scoped to a different model.
-- Billing/disabled windows still block the whole profile across models.
+ Generic server text can also land in that timeout bucket when the source matches a known transient pattern. For example, the bare pi-ai stream-wrapper message `An unknown error occurred` is treated as failover-worthy for every provider because pi-ai emits it when provider streams end with `stopReason: "aborted"` or `stopReason: "error"` without specific details. JSON `api_error` payloads with transient server text such as `internal server error`, `unknown error, 520`, `upstream error`, or `backend error` are also treated as failover-worthy timeouts.
+
+ OpenRouter-specific generic upstream text such as bare `Provider returned error` is treated as timeout only when the provider context is actually OpenRouter. Generic internal fallback text such as `LLM request failed with an unknown error.` stays conservative and does not trigger failover by itself.
+
+
+
+ Some provider SDKs may otherwise sleep for a long `Retry-After` window before returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60 seconds by default and surfaces longer retryable responses immediately so this failover path can run. Tune or disable the cap with `OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [Retry behavior](/concepts/retry).
+
+
+ Rate-limit cooldowns can also be model-scoped:
+
+ - OpenClaw records `cooldownModel` for rate-limit failures when the failing model id is known.
+ - A sibling model on the same provider can still be tried when the cooldown is scoped to a different model.
+ - Billing/disabled windows still block the whole profile across models.
+
+
+
Cooldowns use exponential backoff:
@@ -180,18 +172,13 @@ State is stored in `auth-state.json` under `usageStats`:
## Billing disables
-Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failover‑worthy, but they’re usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
+Billing/credit failures (for example "insufficient credits" / "credit balance too low") are treated as failover-worthy, but they're usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
-Not every billing-shaped response is `402`, and not every HTTP `402` lands
-here. OpenClaw keeps explicit billing text in the billing lane even when a
-provider returns `401` or `403` instead, but provider-specific matchers stay
-scoped to the provider that owns them (for example OpenRouter `403 Key limit
-exceeded`). Meanwhile temporary `402` usage-window and
-organization/workspace spend-limit errors are classified as `rate_limit` when
-the message looks retryable (for example `weekly usage limit exhausted`, `daily
-limit reached, resets tomorrow`, or `organization spending limit exceeded`).
-Those stay on the short cooldown/failover path instead of the long
-billing-disable path.
+
+Not every billing-shaped response is `402`, and not every HTTP `402` lands here. OpenClaw keeps explicit billing text in the billing lane even when a provider returns `401` or `403` instead, but provider-specific matchers stay scoped to the provider that owns them (for example OpenRouter `403 Key limit exceeded`).
+
+Meanwhile temporary `402` usage-window and organization/workspace spend-limit errors are classified as `rate_limit` when the message looks retryable (for example `weekly usage limit exhausted`, `daily limit reached, resets tomorrow`, or `organization spending limit exceeded`). Those stay on the short cooldown/failover path instead of the long billing-disable path.
+
State is stored in `auth-state.json`:
@@ -209,139 +196,114 @@ State is stored in `auth-state.json`:
Defaults:
- Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**.
-- Backoff counters reset if the profile hasn’t failed for **24 hours** (configurable).
+- Backoff counters reset if the profile hasn't failed for **24 hours** (configurable).
- Overloaded retries allow **1 same-provider profile rotation** before model fallback.
- Overloaded retries use **0 ms backoff** by default.
## Model fallback
-If all profiles for a provider fail, OpenClaw moves to the next model in
-`agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and
-timeouts that exhausted profile rotation (other errors do not advance fallback).
+If all profiles for a provider fail, OpenClaw moves to the next model in `agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback).
-Overloaded and rate-limit errors are handled more aggressively than billing
-cooldowns. By default, OpenClaw allows one same-provider auth-profile retry,
-then switches to the next configured model fallback without waiting.
-Provider-busy signals such as `ModelNotReadyException` land in that overloaded
-bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`,
-`auth.cooldowns.overloadedBackoffMs`, and
-`auth.cooldowns.rateLimitedProfileRotations`.
+Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as `ModelNotReadyException` land in that overloaded bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, `auth.cooldowns.overloadedBackoffMs`, and `auth.cooldowns.rateLimitedProfileRotations`.
-When a run starts with a model override (hooks or CLI), fallbacks still end at
-`agents.defaults.model.primary` after trying any configured fallbacks.
+When a run starts with a model override (hooks or CLI), fallbacks still end at `agents.defaults.model.primary` after trying any configured fallbacks.
### Candidate chain rules
-OpenClaw builds the candidate list from the currently requested `provider/model`
-plus configured fallbacks.
+OpenClaw builds the candidate list from the currently requested `provider/model` plus configured fallbacks.
-Rules:
-
-- The requested model is always first.
-- Explicit configured fallbacks are deduplicated but not filtered by the model
- allowlist. They are treated as explicit operator intent.
-- If the current run is already on a configured fallback in the same provider
- family, OpenClaw keeps using the full configured chain.
-- If the current run is on a different provider than config and that current
- model is not already part of the configured fallback chain, OpenClaw does not
- append unrelated configured fallbacks from another provider.
-- When the run started from an override, the configured primary is appended at
- the end so the chain can settle back onto the normal default once earlier
- candidates are exhausted.
+
+
+ - The requested model is always first.
+ - Explicit configured fallbacks are deduplicated but not filtered by the model allowlist. They are treated as explicit operator intent.
+ - If the current run is already on a configured fallback in the same provider family, OpenClaw keeps using the full configured chain.
+ - If the current run is on a different provider than config and that current model is not already part of the configured fallback chain, OpenClaw does not append unrelated configured fallbacks from another provider.
+ - When the run started from an override, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted.
+
+
### Which errors advance fallback
-Model fallback continues on:
-
-- auth failures
-- rate limits and cooldown exhaustion
-- overloaded/provider-busy errors
-- timeout-shaped failover errors
-- billing disables
-- `LiveSessionModelSwitchError`, which is normalized into a failover path so a
- stale persisted model does not create an outer retry loop
-- other unrecognized errors when there are still remaining candidates
-
-Model fallback does not continue on:
-
-- explicit aborts that are not timeout/failover-shaped
-- context overflow errors that should stay inside compaction/retry logic
- (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
-number of tokens`, `input token count exceeds the maximum number of input
-tokens`, `The input is too long for the model`, or `ollama error: context
-length exceeded`)
-- a final unknown error when there are no candidates left
+
+
+ - auth failures
+ - rate limits and cooldown exhaustion
+ - overloaded/provider-busy errors
+ - timeout-shaped failover errors
+ - billing disables
+ - `LiveSessionModelSwitchError`, which is normalized into a failover path so a stale persisted model does not create an outer retry loop
+ - other unrecognized errors when there are still remaining candidates
+
+
+ - explicit aborts that are not timeout/failover-shaped
+ - context overflow errors that should stay inside compaction/retry logic (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum number of tokens`, `input token count exceeds the maximum number of input tokens`, `The input is too long for the model`, or `ollama error: context length exceeded`)
+ - a final unknown error when there are no candidates left
+
+
### Cooldown skip vs probe behavior
-When every auth profile for a provider is already in cooldown, OpenClaw does
-not automatically skip that provider forever. It makes a per-candidate decision:
+When every auth profile for a provider is already in cooldown, OpenClaw does not automatically skip that provider forever. It makes a per-candidate decision:
-- Persistent auth failures skip the whole provider immediately.
-- Billing disables usually skip, but the primary candidate can still be probed
- on a throttle so recovery is possible without restarting.
-- The primary candidate may be probed near cooldown expiry, with a per-provider
- throttle.
-- Same-provider fallback siblings can be attempted despite cooldown when the
- failure looks transient (`rate_limit`, `overloaded`, or unknown). This is
- especially relevant when a rate limit is model-scoped and a sibling model may
- still recover immediately.
-- Transient cooldown probes are limited to one per provider per fallback run so
- a single provider does not stall cross-provider fallback.
+
+
+ - Persistent auth failures skip the whole provider immediately.
+ - Billing disables usually skip, but the primary candidate can still be probed on a throttle so recovery is possible without restarting.
+ - The primary candidate may be probed near cooldown expiry, with a per-provider throttle.
+ - Same-provider fallback siblings can be attempted despite cooldown when the failure looks transient (`rate_limit`, `overloaded`, or unknown). This is especially relevant when a rate limit is model-scoped and a sibling model may still recover immediately.
+ - Transient cooldown probes are limited to one per provider per fallback run so a single provider does not stall cross-provider fallback.
+
+
## Session overrides and live model switching
-Session model changes are shared state. The active runner, `/model` command,
-compaction/session updates, and live-session reconciliation all read or write
-parts of the same session entry.
+Session model changes are shared state. The active runner, `/model` command, compaction/session updates, and live-session reconciliation all read or write parts of the same session entry.
That means fallback retries have to coordinate with live model switching:
-- Only explicit user-driven model changes mark a pending live switch. That
- includes `/model`, `session_status(model=...)`, and `sessions.patch`.
-- System-driven model changes such as fallback rotation, heartbeat overrides,
- or compaction never mark a pending live switch on their own.
-- Before a fallback retry starts, the reply runner persists the selected
- fallback override fields to the session entry.
-- Live-session reconciliation prefers persisted session overrides over stale
- runtime model fields.
-- If the fallback attempt fails, the runner rolls back only the override fields
- it wrote, and only if they still match that failed candidate.
+- Only explicit user-driven model changes mark a pending live switch. That includes `/model`, `session_status(model=...)`, and `sessions.patch`.
+- System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own.
+- Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry.
+- Live-session reconciliation prefers persisted session overrides over stale runtime model fields.
+- If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate.
This prevents the classic race:
-1. Primary fails.
-2. Fallback candidate is chosen in memory.
-3. Session store still says the old primary.
-4. Live-session reconciliation reads the stale session state.
-5. The retry gets snapped back to the old model before the fallback attempt
- starts.
+
+
+ The selected primary model fails.
+
+
+ Fallback candidate is chosen in memory.
+
+
+ Session store still reflects the old primary.
+
+
+ Live-session reconciliation reads the stale session state.
+
+
+ The retry gets snapped back to the old model before the fallback attempt starts.
+
+
-The persisted fallback override closes that window, and the narrow rollback
-keeps newer manual or runtime session changes intact.
+The persisted fallback override closes that window, and the narrow rollback keeps newer manual or runtime session changes intact.
## Observability and failure summaries
-`runWithModelFallback(...)` records per-attempt details that feed logs and
-user-facing cooldown messaging:
+`runWithModelFallback(...)` records per-attempt details that feed logs and user-facing cooldown messaging:
- provider/model attempted
-- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and
- similar failover reasons)
+- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and similar failover reasons)
- optional status/code
- human-readable error summary
-When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer
-reply runner can use that to build a more specific message such as "all models
-are temporarily rate-limited" and include the soonest cooldown expiry when one
-is known.
+When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer reply runner can use that to build a more specific message such as "all models are temporarily rate-limited" and include the soonest cooldown expiry when one is known.
That cooldown summary is model-aware:
-- unrelated model-scoped rate limits are ignored for the attempted
- provider/model chain
-- if the remaining block is a matching model-scoped rate limit, OpenClaw
- reports the last matching expiry that still blocks that model
+- unrelated model-scoped rate limits are ignored for the attempted provider/model chain
+- if the remaining block is a matching model-scoped rate limit, OpenClaw reports the last matching expiry that still blocks that model
## Related config