diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md index 790c116f7bb..f7eb0976e83 100644 --- a/docs/concepts/model-failover.md +++ b/docs/concepts/model-failover.md @@ -3,6 +3,7 @@ summary: "How OpenClaw rotates auth profiles and falls back across models" read_when: - Diagnosing auth profile rotation, cooldowns, or model fallback behavior - Updating failover rules for auth profiles or models + - Understanding how session model overrides interact with fallback retries title: "Model Failover" --- @@ -15,6 +16,44 @@ OpenClaw handles failures in two stages: This doc explains the runtime rules and the data that backs them. +## Runtime flow + +For a normal text run, OpenClaw evaluates candidates in this order: + +1. The currently selected session model. +2. Configured `agents.defaults.model.fallbacks` in order. +3. The configured primary model at the end when the run started from an override. + +Inside each candidate, OpenClaw tries auth-profile failover before advancing to +the next model candidate. + +High-level sequence: + +1. Resolve the active session model and auth-profile preference. +2. Build the model candidate chain. +3. Try the current provider with auth-profile rotation/cooldown rules. +4. If that provider is exhausted with a failover-worthy error, move to the next + model candidate. +5. Persist the selected fallback override before the retry starts so other + session readers see the same provider/model the runner is about to use. +6. If the fallback candidate fails, roll back only the fallback-owned session + override fields when they still match that failed candidate. +7. If every candidate fails, throw a `FallbackSummaryError` with per-attempt + detail and the soonest cooldown expiry when one is known. + +This is intentionally narrower than "save and restore the whole session". The +reply runner only persists the model-selection fields it owns for fallback: + +- `providerOverride` +- `modelOverride` +- `authProfileOverride` +- `authProfileOverrideSource` +- `authProfileOverrideCompactionCount` + +That prevents a failed fallback retry from overwriting newer unrelated session +mutations such as manual `/model` changes or session rotation updates that +happened while the attempt was running. + ## Auth storage (keys + OAuth) OpenClaw uses **auth profiles** for both API keys and OAuth tokens. @@ -148,6 +187,102 @@ with `auth.cooldowns.overloadedProfileRotations`, When a run starts with a model override (hooks or CLI), fallbacks still end at `agents.defaults.model.primary` after trying any configured fallbacks. +### Candidate chain rules + +OpenClaw builds the candidate list from the currently requested `provider/model` +plus configured fallbacks. + +Rules: + +- The requested model is always first. +- Explicit configured fallbacks are deduplicated but not filtered by the model + allowlist. They are treated as explicit operator intent. +- If the current run is already on a configured fallback in the same provider + family, OpenClaw keeps using the full configured chain. +- If the current run is on a different provider than config and that current + model is not already part of the configured fallback chain, OpenClaw does not + append unrelated configured fallbacks from another provider. +- When the run started from an override, the configured primary is appended at + the end so the chain can settle back onto the normal default once earlier + candidates are exhausted. + +### Which errors advance fallback + +Model fallback continues on: + +- auth failures +- rate limits and cooldown exhaustion +- overloaded/provider-busy errors +- timeout-shaped failover errors +- billing disables +- `LiveSessionModelSwitchError`, which is normalized into a failover path so a + stale persisted model does not create an outer retry loop +- other unrecognized errors when there are still remaining candidates + +Model fallback does not continue on: + +- explicit aborts that are not timeout/failover-shaped +- context overflow errors that should stay inside compaction/retry logic +- a final unknown error when there are no candidates left + +### Cooldown skip vs probe behavior + +When every auth profile for a provider is already in cooldown, OpenClaw does +not automatically skip that provider forever. It makes a per-candidate decision: + +- Persistent auth failures skip the whole provider immediately. +- Billing disables usually skip, but the primary candidate can still be probed + on a throttle so recovery is possible without restarting. +- The primary candidate may be probed near cooldown expiry, with a per-provider + throttle. +- Same-provider fallback siblings can be attempted despite cooldown when the + failure looks transient (`rate_limit`, `overloaded`, or unknown). +- Transient cooldown probes are limited to one per provider per fallback run so + a single provider does not stall cross-provider fallback. + +## Session overrides and live model switching + +Session model changes are shared state. The active runner, `/model` command, +compaction/session updates, and live-session reconciliation all read or write +parts of the same session entry. + +That means fallback retries have to coordinate with live model switching: + +- Before a fallback retry starts, the reply runner persists the selected + fallback override fields to the session entry. +- Live-session reconciliation prefers persisted session overrides over stale + runtime model fields. +- If the fallback attempt fails, the runner rolls back only the override fields + it wrote, and only if they still match that failed candidate. + +This prevents the classic race: + +1. Primary fails. +2. Fallback candidate is chosen in memory. +3. Session store still says the old primary. +4. Live-session reconciliation reads the stale session state. +5. The retry gets snapped back to the old model before the fallback attempt + starts. + +The persisted fallback override closes that window, and the narrow rollback +keeps newer manual or runtime session changes intact. + +## Observability and failure summaries + +`runWithModelFallback(...)` records per-attempt details that feed logs and +user-facing cooldown messaging: + +- provider/model attempted +- reason (`rate_limit`, `overloaded`, `billing`, `auth`, `model_not_found`, and + similar failover reasons) +- optional status/code +- human-readable error summary + +When every candidate fails, OpenClaw throws `FallbackSummaryError`. The outer +reply runner can use that to build a more specific message such as "all models +are temporarily rate-limited" and include the soonest cooldown expiry when one +is known. + ## Related config See [Gateway configuration](/gateway/configuration) for: diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md index 84e44d4a959..96d5702ceaf 100644 --- a/docs/concepts/model-providers.md +++ b/docs/concepts/model-providers.md @@ -16,6 +16,8 @@ For model selection rules, see [/concepts/models](/concepts/models). - Model refs use `provider/model` (example: `opencode/claude-opus-4-6`). - If you set `agents.defaults.models`, it becomes the allowlist. - CLI helpers: `openclaw onboard`, `openclaw models list`, `openclaw models set `. +- Fallback runtime rules, cooldown probes, and session-override persistence are + documented in [/concepts/model-failover](/concepts/model-failover). - Provider plugins can inject model catalogs via `registerProvider({ catalog })`; OpenClaw merges that output into `models.providers` before writing `models.json`.