From 209522e2e091608cbafb9e6fbbe13293f6ce0f63 Mon Sep 17 00:00:00 2001
From: Vincent Koc <vincentkoc@ieee.org>
Date: Sun, 26 Apr 2026 03:59:53 -0700
Subject: [PATCH] docs(model-failover): rewrite with Steps for runtime flow and
 rotation, AccordionGroup for cooldown buckets and chain rules, Tabs for which
 errors advance fallback

---
 docs/concepts/model-failover.md | 318 ++++++++++++++------------------
 1 file changed, 140 insertions(+), 178 deletions(-)
diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md
index 55a41711708..3b200709505 100644
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -5,6 +5,7 @@ read_when:
   - Updating failover rules for auth profiles or models
   - Understanding how session model overrides interact with fallback retries
 title: "Model failover"
+sidebarTitle: "Model failover"
 ---
 
 OpenClaw handles failures in two stages:
@@ -18,29 +19,31 @@ This doc explains the runtime rules and the data that backs them.
 
 For a normal text run, OpenClaw evaluates candidates in this order:
 
-1. The currently selected session model.
-2. Configured `agents.defaults.model.fallbacks` in order.
-3. The configured primary model at the end when the run started from an override.
+<Steps>
+  <Step title="Resolve session state">
+    Resolve the active session model and auth-profile preference.
+  </Step>
+  <Step title="Build candidate chain">
+    Build the model candidate chain from the currently selected session model, then `agents.defaults.model.fallbacks` in order, ending with the configured primary when the run started from an override.
+  </Step>
+  <Step title="Try the current provider">
+    Try the current provider with auth-profile rotation/cooldown rules.
+  </Step>
+  <Step title="Advance on failover-worthy errors">
+    If that provider is exhausted with a failover-worthy error, move to the next model candidate.
+  </Step>
+  <Step title="Persist fallback override">
+    Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use.
+  </Step>
+  <Step title="Roll back narrowly on failure">
+    If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate.
+  </Step>
+  <Step title="Throw FallbackSummaryError if exhausted">
+    If every candidate fails, throw a `FallbackSummaryError` with per-attempt detail and the soonest cooldown expiry when one is known.
+  </Step>
+</Steps>
 
-Inside each candidate, OpenClaw tries auth-profile failover before advancing to
-the next model candidate.
-
-High-level sequence:
-
-1. Resolve the active session model and auth-profile preference.
-2. Build the model candidate chain.
-3. Try the current provider with auth-profile rotation/cooldown rules.
-4. If that provider is exhausted with a failover-worthy error, move to the next
-   model candidate.
-5. Persist the selected fallback override before the retry starts so other
-   session readers see the same provider/model the runner is about to use.
-6. If the fallback candidate fails, roll back only the fallback-owned session
-   override fields when they still match that failed candidate.
-7. If every candidate fails, throw a `FallbackSummaryError` with per-attempt
-   detail and the soonest cooldown expiry when one is known.
-
-This is intentionally narrower than "save and restore the whole session". The
-reply runner only persists the model-selection fields it owns for fallback:
+This is intentionally narrower than "save and restore the whole session". The reply runner only persists the model-selection fields it owns for fallback:
 
 - `providerOverride`
 - `modelOverride`
@@ -48,9 +51,7 @@ reply runner only persists the model-selection fields it owns for fallback:
 - `authProfileOverrideSource`
 - `authProfileOverrideCompactionCount`
 
-That prevents a failed fallback retry from overwriting newer unrelated session
-mutations such as manual `/model` changes or session rotation updates that
-happened while the attempt was running.
+That prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual `/model` changes or session rotation updates that happened while the attempt was running.
 
 ## Auth storage (keys + OAuth)
 
@@ -61,7 +62,7 @@ OpenClaw uses **auth profiles** for both API keys and OAuth tokens.
 - Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets).
 - Legacy import-only OAuth file: `~/.openclaw/credentials/oauth.json` (imported into `auth-profiles.json` on first use).
 
-More detail: [/concepts/oauth](/concepts/oauth)
+More detail: [OAuth](/concepts/oauth)
 
 Credential types:
 
@@ -81,9 +82,17 @@ Profiles live in `~/.openclaw/agents/<agentId>/agent/auth-profiles.json` under `
 
 When a provider has multiple profiles, OpenClaw chooses an order like this:
 
-1. **Explicit config**: `auth.order[provider]` (if set).
-2. **Configured profiles**: `auth.profiles` filtered by provider.
-3. **Stored profiles**: entries in `auth-profiles.json` for the provider.
+<Steps>
+  <Step title="Explicit config">
+    `auth.order[provider]` (if set).
+  </Step>
+  <Step title="Configured profiles">
+    `auth.profiles` filtered by provider.
+  </Step>
+  <Step title="Stored profiles">
+    Entries in `auth-profiles.json` for the provider.
+  </Step>
+</Steps>
 
 If no explicit order is configured, OpenClaw uses a round‑robin order:
 
@@ -93,20 +102,17 @@ If no explicit order is configured, OpenClaw uses a round‑robin order:
 
 ### Session stickiness (cache-friendly)
 
-OpenClaw **pins the chosen auth profile per session** to keep provider caches warm.
-It does **not** rotate on every request. The pinned profile is reused until:
+OpenClaw **pins the chosen auth profile per session** to keep provider caches warm. It does **not** rotate on every request. The pinned profile is reused until:
 
 - the session is reset (`/new` / `/reset`)
 - a compaction completes (compaction count increments)
 - the profile is in cooldown/disabled
 
-Manual selection via `/model …@<profileId>` sets a **user override** for that session
-and is not auto‑rotated until a new session starts.
+Manual selection via `/model …@<profileId>` sets a **user override** for that session and is not auto-rotated until a new session starts.
 
-Auto‑pinned profiles (selected by the session router) are treated as a **preference**:
-they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts.
-User‑pinned profiles stay locked to that profile; if it fails and model fallbacks
-are configured, OpenClaw moves to the next model instead of switching profiles.
+<Note>
+Auto-pinned profiles (selected by the session router) are treated as a **preference**: they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts. User-pinned profiles stay locked to that profile; if it fails and model fallbacks are configured, OpenClaw moves to the next model instead of switching profiles.
+</Note>
 
 ### Why OAuth can "look lost"
 
@@ -117,45 +123,31 @@ If you have both an OAuth profile and an API key profile for the same provider,
 
 ## Cooldowns
 
-When a profile fails due to auth/rate‑limit errors (or a timeout that looks
-like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
-That rate-limit bucket is broader than plain `429`: it also includes provider
-messages such as `Too many concurrent requests`, `ThrottlingException`,
-`concurrency limit reached`, `workers_ai ... quota limit exceeded`,
-`throttled`, `resource exhausted`, and periodic usage-window limits such as
-`weekly/monthly limit reached`.
-Format/invalid‑request errors (for example Cloud Code Assist tool call ID
-validation failures) are treated as failover‑worthy and use the same cooldowns.
-OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`,
-`stop reason: error`, and `reason: error` are classified as timeout/failover
-signals.
-Generic server text can also land in that timeout bucket when the source matches
-a known transient pattern. For example, the bare pi-ai stream-wrapper message
-`An unknown error occurred` is treated as failover-worthy for every provider
-because pi-ai emits it when provider streams end with `stopReason: "aborted"` or
-`stopReason: "error"` without specific details. JSON `api_error` payloads with
-transient server text such as `internal server error`, `unknown error, 520`,
-`upstream error`, or `backend error` are also treated as failover-worthy
-timeouts.
-OpenRouter-specific generic upstream text such as bare `Provider returned error`
-is treated as timeout only when the provider context is actually OpenRouter.
-Generic internal fallback text such as `LLM request failed with an unknown
-error.` stays conservative and does not trigger failover by itself.
+When a profile fails due to auth/rate-limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
 
-Some provider SDKs may otherwise sleep for a long `Retry-After` window before
-returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and
-OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60
-seconds by default and surfaces longer retryable responses immediately so this
-failover path can run. Tune or disable the cap with
-`OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [/concepts/retry](/concepts/retry).
+<AccordionGroup>
+  <Accordion title="What lands in the rate-limit / timeout bucket">
+    That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`.
 
-Rate-limit cooldowns can also be model-scoped:
+    Format/invalid-request errors (for example Cloud Code Assist tool call ID validation failures) are treated as failover-worthy and use the same cooldowns. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals.
 
-- OpenClaw records `cooldownModel` for rate-limit failures when the failing
-  model id is known.
-- A sibling model on the same provider can still be tried when the cooldown is
-  scoped to a different model.
-- Billing/disabled windows still block the whole profile across models.
+    Generic server text can also land in that timeout bucket when the source matches a known transient pattern. For example, the bare pi-ai stream-wrapper message `An unknown error occurred` is treated as failover-worthy for every provider because pi-ai emits it when provider streams end with `stopReason: "aborted"` or `stopReason: "error"` without specific details. JSON `api_error` payloads with transient server text such as `internal server error`, `unknown error, 520`, `upstream error`, or `backend error` are also treated as failover-worthy timeouts.
+
+    OpenRouter-specific generic upstream text such as bare `Provider returned error` is treated as timeout only when the provider context is actually OpenRouter. Generic internal fallback text such as `LLM request failed with an unknown error.` stays conservative and does not trigger failover by itself.
+
+  </Accordion>
+  <Accordion title="SDK retry-after caps">
+    Some provider SDKs may otherwise sleep for a long `Retry-After` window before returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60 seconds by default and surfaces longer retryable responses immediately so this failover path can run. Tune or disable the cap with `OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [Retry behavior](/concepts/retry).
+  </Accordion>
+  <Accordion title="Model-scoped cooldowns">
+    Rate-limit cooldowns can also be model-scoped:
+
+    - OpenClaw records `cooldownModel` for rate-limit failures when the failing model id is known.
+    - A sibling model on the same provider can still be tried when the cooldown is scoped to a different model.
+    - Billing/disabled windows still block the whole profile across models.
+
+  </Accordion>
+</AccordionGroup>
 
 Cooldowns use exponential backoff:
 
@@ -180,18 +172,13 @@ State is stored in `auth-state.json` under `usageStats`:
 
 ## Billing disables
 
-Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failover‑worthy, but they’re usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
+Billing/credit failures (for example "insufficient credits" / "credit balance too low") are treated as failover-worthy, but they're usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
 
-Not every billing-shaped response is `402`, and not every HTTP `402` lands
-here. OpenClaw keeps explicit billing text in the billing lane even when a
-provider returns `401` or `403` instead, but provider-specific matchers stay
-scoped to the provider that owns them (for example OpenRouter `403 Key limit
-exceeded`). Meanwhile temporary `402` usage-window and
-organization/workspace spend-limit errors are classified as `rate_limit` when
-the message looks retryable (for example `weekly usage limit exhausted`, `daily
-limit reached, resets tomorrow`, or `organization spending limit exceeded`).
-Those stay on the short cooldown/failover path instead of the long
-billing-disable path.
+<Note>
+Not every billing-shaped response is `402`, and not every HTTP `402` lands here. OpenClaw keeps explicit billing text in the billing lane even when a provider returns `401` or `403` instead, but provider-specific matchers stay scoped to the provider that owns them (for example OpenRouter `403 Key limit exceeded`).
+
+Meanwhile temporary `402` usage-window and organization/workspace spend-limit errors are classified as `rate_limit` when the message looks retryable (for example `weekly usage limit exhausted`, `daily limit reached, resets tomorrow`, or `organization spending limit exceeded`). Those stay on the short cooldown/failover path instead of the long billing-disable path.
+</Note>
 
 State is stored in `auth-state.json`:
 
@@ -209,139 +196,114 @@ State is stored in `auth-state.json`:
 Defaults:
 
 - Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**.
-- Backoff counters reset if the profile hasn’t failed for **24 hours** (configurable).
+- Backoff counters reset if the profile hasn't failed for **24 hours** (configurable).
 - Overloaded retries allow **1 same-provider profile rotation** before model fallback.
 - Overloaded retries use **0 ms backoff** by default.
 
 ## Model fallback
 
-If all profiles for a provider fail, OpenClaw moves to the next model in
-`agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and
-timeouts that exhausted profile rotation (other errors do not advance fallback).
+If all profiles for a provider fail, OpenClaw moves to the next model in `agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback).
 
-Overloaded and rate-limit errors are handled more aggressively than billing
-cooldowns. By default, OpenClaw allows one same-provider auth-profile retry,
-then switches to the next configured model fallback without waiting.
-Provider-busy signals such as `ModelNotReadyException` land in that overloaded
-bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`,
-`auth.cooldowns.overloadedBackoffMs`, and
-`auth.cooldowns.rateLimitedProfileRotations`.
+Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as `ModelNotReadyException` land in that overloaded bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, `auth.cooldowns.overloadedBackoffMs`, and `auth.cooldowns.rateLimitedProfileRotations`.
 
-When a run starts with a model override (hooks or CLI), fallbacks still end at
-`agents.defaults.model.primary` after trying any configured fallbacks.
+When a run starts with a model override (hooks or CLI), fallbacks still end at `agents.defaults.model.primary` after trying any configured fallbacks.
 
 ### Candidate chain rules
 
-OpenClaw builds the candidate list from the currently requested `provider/model`
-plus configured fallbacks.
+OpenClaw builds the candidate list from the currently requested `provider/model` plus configured fallbacks.
 
-Rules:
-
-- The requested model is always first.
-- Explicit configured fallbacks are deduplicated but not filtered by the model
-  allowlist. They are treated as explicit operator intent.
-- If the current run is already on a configured fallback in the same provider
-  family, OpenClaw keeps using the full configured chain.
-- If the current run is on a different provider than config and that current
-  model is not already part of the configured fallback chain, OpenClaw does not
-  append unrelated configured fallbacks from another provider.
-- When the run started from an override, the configured primary is appended at
-  the end so the chain can settle back onto the normal default once earlier
-  candidates are exhausted.
+<AccordionGroup>
+  <Accordion title="Rules">
+    - The requested model is always first.
+    - Explicit configured fallbacks are deduplicated but not filtered by the model allowlist. They are treated as explicit operator intent.
+    - If the current run is already on a configured fallback in the same provider family, OpenClaw keeps using the full configured chain.
+    - If the current run is on a different provider than config and that current model is not already part of the configured fallback chain, OpenClaw does not append unrelated configured fallbacks from another provider.
+    - When the run started from an override, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted.
+  </Accordion>
+</AccordionGroup>
 
 ### Which errors advance fallback
 
-Model fallback continues on:
-
-- auth failures
-- rate limits and cooldown exhaustion
-- overloaded/provider-busy errors
-- timeout-shaped failover errors
-- billing disables
-- `LiveSessionModelSwitchError`, which is normalized into a failover path so a
-  stale persisted model does not create an outer retry loop
-- other unrecognized errors when there are still remaining candidates
-
-Model fallback does not continue on:
-
-- explicit aborts that are not timeout/failover-shaped
-- context overflow errors that should stay inside compaction/retry logic
-  (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
-number of tokens`, `input token count exceeds the maximum number of input
-tokens`, `The input is too long for the model`, or `ollama error: context
-length exceeded`)
-- a final unknown error when there are no candidates left
+<Tabs>
+  <Tab title="Continues on">
+    - auth failures
+    - rate limits and cooldown exhaustion
+    - overloaded/provider-busy errors
+    - timeout-shaped failover errors
+    - billing disables
+    - `LiveSessionModelSwitchError`, which is normalized into a failover path so a stale persisted model does not create an outer retry loop
+    - other unrecognized errors when there are still remaining candidates
+  </Tab>
+  <Tab title="Does not continue on">
+    - explicit aborts that are not timeout/failover-shaped
+    - context overflow errors that should stay inside compaction/retry logic (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum number of tokens`, `input token count exceeds the maximum number of input tokens`, `The input is too long for the model`, or `ollama error: context length exceeded`)
+    - a final unknown error when there are no candidates left
+  </Tab>
+</Tabs>
 
 ### Cooldown skip vs probe behavior
 
-When every auth profile for a provider is already in cooldown, OpenClaw does
-not automatically skip that provider forever. It makes a per-candidate decision:
+When every auth profile for a provider is already in cooldown, OpenClaw does not automatically skip that provider forever. It makes a per-candidate decision:
 
-- Persistent auth failures skip the whole provider immediately.
-- Billing disables usually skip, but the primary candidate can still be probed
-  on a throttle so recovery is possible without restarting.
-- The primary candidate may be probed near cooldown expiry, with a per-provider
-  throttle.
-- Same-provider fallback siblings can be attempted despite cooldown when the
-  failure looks transient (`rate_limit`, `overloaded`, or unknown). This is
-  especially relevant when a rate limit is model-scoped and a sibling model may
-  still recover immediately.
-- Transient cooldown probes are limited to one per provider per fallback run so
-  a single provider does not stall cross-provider fallback.
+<AccordionGroup>
+  <Accordion title="Per-candidate decisions">
+    - Persistent auth failures skip the whole provider immediately.
+    - Billing disables usually skip, but the primary candidate can still be probed on a throttle so recovery is possible without restarting.
+    - The primary candidate may be probed near cooldown expiry, with a per-provider throttle.
+    - Same-provider fallback siblings can be attempted despite cooldown when the failure looks transient (`rate_limit`, `overloaded`, or unknown). This is especially relevant when a rate limit is model-scoped and a sibling model may still recover immediately.
+    - Transient cooldown probes are limited to one per provider per fallback run so a single provider does not stall cross-provider fallback.
+  </Accordion>
+</AccordionGroup>
 
 ## Session overrides and live model switching
 
-Session model changes are shared state. The active runner, `/model` command,
-compaction/session updates, and live-session reconciliation all read or write
-parts of the same session entry.
+Session model changes are shared state. The active runner, `/model` command, compaction/session updates, and live-session reconciliation all read or write parts of the same session entry.
 
 That means fallback retries have to coordinate with live model switching:
 
-- Only explicit user-driven model changes mark a pending live switch. That
-  includes `/model`, `session_status(model=...)`, and `sessions.patch`.
-- System-driven model changes such as fallback rotation, heartbeat overrides,
-  or compaction never mark a pending live switch on their own.
-- Before a fallback retry starts, the reply runner persists the selected
-  fallback override fields to the session entry.
-- Live-session reconciliation prefers persisted session overrides over stale
-  runtime model fields.
-- If the fallback attempt fails, the runner rolls back only the override fields
-  it wrote, and only if they still match that failed candidate.
+- Only explicit user-driven model changes mark a pending live switch. That includes `/model`, `session_status(model=...)`, and `sessions.patch`.
+- System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own.
+- Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry.
+- Live-session reconciliation prefers persisted session overrides over stale runtime model fields.
+- If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate.
 
 This prevents the classic race:
 
-1. Primary fails.
-2. Fallback candidate is chosen in memory.
-3. Session store still says the old primary.
-4. Live-session reconciliation reads the stale session state.
-5. The retry gets snapped back to the old model before the fallback attempt
-   starts.
+<Steps>
+  <Step title="Primary fails">
+    The selected primary model fails.
+  </Step>
+  <Step title="Fallback chosen in memory">
+    Fallback candidate is chosen in memory.
+  </Step>
+  <Step title="Session store still says old primary">
+    Session store still reflects the old primary.
+  </Step>
+  <Step title="Live reconciliation reads stale state">
+    Live-session reconciliation reads the stale session state.
+  </Step>
+  <Step title="Retry snapped back">
+    The retry gets snapped back to the old model before the fallback attempt starts.
+  </Step>
+</Steps>
 
-The persisted fallback override closes that window, and the narrow rollback
-keeps newer manual or runtime session changes intact.
+The persisted fallback override closes that window, and the narrow rollback keeps newer manual or runtime session changes intact.
 
 ## Observability and failure summaries
 
-`runWithModelFallback(...)` records per-attempt details that feed logs and
-user-facing cooldown messaging:
+`runWithModelFallback(...)` records per-attempt details that feed logs and user-facing cooldown messaging:
 
 - provider/model attempted
-- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and
-  similar failover reasons)
+- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and similar failover reasons)
 - optional status/code
 - human-readable error summary
 
-When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer
-reply runner can use that to build a more specific message such as "all models
-are temporarily rate-limited" and include the soonest cooldown expiry when one
-is known.
+When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer reply runner can use that to build a more specific message such as "all models are temporarily rate-limited" and include the soonest cooldown expiry when one is known.
 
 That cooldown summary is model-aware:
 
-- unrelated model-scoped rate limits are ignored for the attempted
-  provider/model chain
-- if the remaining block is a matching model-scoped rate limit, OpenClaw
-  reports the last matching expiry that still blocks that model
+- unrelated model-scoped rate limits are ignored for the attempted provider/model chain
+- if the remaining block is a matching model-scoped rate limit, OpenClaw reports the last matching expiry that still blocks that model
 
 ## Related config