openclaw/docs/concepts/model-failover.md at 3de5979bdc8e4e9e9d3fee446eaab53cad2ff605

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-22 13:34:04 +00:00

Files

Peter Steinberger f91de52f0d refactor: move runtime state to SQLite

* refactor: remove stale file-backed shims

* fix: harden sqlite state ci boundaries

* refactor: store matrix idb snapshots in sqlite

* fix: satisfy rebased CI guardrails

* refactor: store current conversation bindings in sqlite table

* refactor: store tui last sessions in sqlite table

* refactor: reset sqlite schema history

* refactor: drop unshipped sqlite table migration

* refactor: remove plugin index file rollback

* refactor: drop unshipped sqlite sidecar migrations

* refactor: remove runtime commitments kv migration

* refactor: preserve kysely sync result types

* refactor: drop unshipped sqlite schema migration table

* test: keep session usage coverage sqlite-backed

* refactor: keep sqlite migration doctor-only

* refactor: isolate device legacy imports

* refactor: isolate push voicewake legacy imports

* refactor: isolate remaining runtime legacy imports

* refactor: tighten sqlite migration guardrails

* test: cover sqlite persisted enum parsing

* refactor: isolate legacy update and tui imports

* refactor: tighten sqlite state ownership

* refactor: move legacy imports behind doctor

* refactor: remove legacy session row lookup

* refactor: canonicalize memory transcript locators

* refactor: drop transcript path scope fallbacks

* refactor: drop runtime legacy session delivery pruning

* refactor: store tts prefs only in sqlite

* refactor: remove cron store path runtime

* refactor: use cron sqlite store keys

* refactor: rename telegram message cache scope

* refactor: read memory dreaming status from sqlite

* refactor: rename cron status store key

* refactor: stop remembering transcript file paths

* test: use sqlite locators in agent fixtures

* refactor: remove file-shaped commitments and cron store surfaces

* refactor: keep compaction transcript handles out of session rows

* refactor: derive transcript handles from session identity

* refactor: derive runtime transcript handles

* refactor: remove gateway session locator reads

* refactor: remove transcript locator from session rows

* refactor: store raw stream diagnostics in sqlite

* refactor: remove file-shaped transcript rotation

* refactor: hide legacy trajectory paths from runtime

* refactor: remove runtime transcript file bridges

* refactor: repair database-first rebase fallout

* refactor: align tests with database-first state

* refactor: remove transcript file handoffs

* refactor: sync post-compaction memory by transcript scope

* refactor: run codex app-server sessions by id

* refactor: bind codex runtime state by session id

* refactor: pass memory transcripts by sqlite scope

* refactor: remove transcript locator cleanup leftovers

* test: remove stale transcript file fixtures

* refactor: remove transcript locator test helper

* test: make cron sqlite keys explicit

* test: remove cron runtime store paths

* test: remove stale session file fixtures

* test: use sqlite cron keys in diagnostics

* refactor: remove runtime delivery queue backfill

* test: drop fake export session file mocks

* refactor: rename acp session read failure flag

* refactor: rename acp row session key

* refactor: remove session store test seams

* refactor: move legacy session parser tests to doctor

* refactor: reindex managed memory in place

* refactor: drop stale session store wording

* refactor: rename session row helpers

* refactor: rename sqlite session entry modules

* refactor: remove transcript locator leftovers

* refactor: trim file-era audit wording

* refactor: clean managed media through sqlite

* fix: prefer explicit agent for exports

* fix: use prepared agent for session resets

* fix: canonicalize legacy codex binding import

* test: rename state cleanup helper

* docs: align backup docs with sqlite state

* refactor: drop legacy Pi usage auth fallback

* refactor: move legacy auth profile imports to doctor

* refactor: keep Pi model discovery auth in memory

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

* refactor: remove model json compatibility aliases

* refactor: store auth profiles in sqlite

* refactor: seed copied auth profiles in sqlite

* refactor: make auth profile runtime sqlite-addressed

* refactor: migrate hermes secrets into sqlite auth store

* refactor: move plugin install config migration to doctor

* refactor: rename plugin index audit checks

* test: drop auth file assumptions

* test: remove legacy transcript file assertions

* refactor: drop legacy cli session aliases

* refactor: store skill uploads in sqlite

* refactor: keep subagent attachments in sqlite vfs

* refactor: drop subagent attachment cleanup state

* refactor: move legacy session aliases to doctor

* refactor: require node 24 for sqlite state runtime

* refactor: move provider caches into sqlite state

* fix: harden virtual agent filesystem

* refactor: enforce database-first runtime state

* refactor: rename compaction transcript rotation setting

* test: clean sqlite refactor test types

* refactor: consolidate sqlite runtime state

* refactor: model session conversations in sqlite

* refactor: stop deriving cron delivery from session keys

* refactor: stop classifying sessions from key shape

* refactor: hydrate announce targets from typed delivery

* refactor: route heartbeat delivery from typed sqlite context

* refactor: tighten typed sqlite session routing

* refactor: remove session origin routing shadow

* refactor: drop session origin shadow fixtures

* perf: query sqlite vfs paths by prefix

* refactor: use typed conversation metadata for sessions

* refactor: prefer typed session routing metadata

* refactor: require typed session routing metadata

* refactor: resolve group tool policy from typed sessions

* refactor: delete dead session thread info bridge

* Show Codex subscription reset times in channel errors (#80456)

* feat(plugin-sdk): consolidate session workflow APIs

* fix(agents): allow read-only agent mount reads

* [codex] refresh plugin regression fixtures

* fix(agents): restore compaction gateway logs

* test: tighten gateway startup assertions

* Redact persisted secret-shaped payloads [AI] (#79006)

* test: tighten device pair notify assertions

* test: tighten hermes secret assertions

* test: assert matrix client error shapes

* test: assert config compat warnings

* fix(heartbeat): remap cron-run exec events to session keys (#80214)

* fix(codex): route btw through native side threads

* fix(auth): accept friendly OpenAI order for Codex profiles

* fix(codex): rotate auth profiles inside harness

* fix: keep browser status page probe within timeout

* test: assert agents add outputs

* test: pin cron read status

* fix(agents): avoid Pi resource discovery stalls

Co-authored-by: dataCenter430 <titan032000@gmail.com>

* fix: retire timed-out codex app-server clients

* test: tighten qa lab runtime assertions

* test: check security fix outputs

* test: verify extension runtime messages

* feat(wake): expose typed sessionKey on wake protocol + system event CLI

* fix(gateway): await session_end during shutdown drain and track channel + compaction lifecycle paths (#57790)

* test: guard talk consult call helper

* fix(codex): scale context engine projection (#80761)

* fix(codex): scale context engine projection

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* chore: align Codex projection changelog

* chore: realign Codex projection changelog

* fix: isolate Codex projection patch

---------

Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>

* refactor: move agent runtime state toward piless

* refactor: remove cron session reaper

* refactor: move session management to sqlite

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: remove stale file-backed shims

* test: harden kysely type coverage

# Conflicts:
#	.agents/skills/kysely-database-access/SKILL.md
#	src/infra/kysely-sync.types.test.ts
#	src/proxy-capture/store.sqlite.test.ts
#	src/state/openclaw-agent-db.test.ts
#	src/state/openclaw-state-db.test.ts

* refactor: remove cron store path runtime

* refactor: keep compaction transcript handles out of session rows

* refactor: derive embedded transcripts from sqlite identity

* refactor: remove embedded transcript locator handoff

* refactor: remove runtime transcript file bridges

* refactor: remove transcript file handoffs

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

# Conflicts:
#	docs/cli/secrets.md
#	docs/gateway/authentication.md
#	docs/gateway/secrets.md

* fix: keep oauth sibling sync sqlite-local

# Conflicts:
#	src/commands/onboard-auth.test.ts

* refactor: remove task session store maintenance

# Conflicts:
#	src/commands/tasks.ts

* refactor: keep diagnostics in state sqlite

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* Show Codex subscription reset times in channel errors (#80456)

* fix(codex): refresh subscription limit resets

* fix(codex): format reset times for channels

* Update CHANGELOG with latest changes and fixes

Updated CHANGELOG with recent fixes and improvements.

* fix(codex): keep command load failures on codex surface

* fix(codex): format account rate limits as rows

* fix(codex): summarize account limits as usage status

* fix(codex): simplify account limit status

* test: tighten subagent announce queue assertion

* test: tighten session delete lifecycle assertions

* test: tighten cron ops assertions

* fix: track cron execution milestones

* test: tighten hermes secret assertions

* test: assert matrix sync store payloads

* test: assert config compat warnings

* fix(codex): align btw side thread semantics

* fix(codex): honor codex fallback blocking

* fix(agents): avoid Pi resource discovery stalls

* test: tighten codex event assertions

* test: tighten cron assertions

* Fix Codex app-server OAuth harness auth

* refactor: move agent runtime state toward piless

* refactor: move device and push state to sqlite

* refactor: move runtime json state imports to doctor

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: clarify cron sqlite store keys

* refactor: remove stale file-backed shims

* refactor: bind codex runtime state by session id

* test: expect sqlite trajectory branch export

* refactor: rename session row helpers

* fix: keep legacy device identity import in doctor

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* build: align pi contract wrappers

* chore: repair database-first rebase

* refactor: remove session file test contracts

* test: update gateway session expectations

* refactor: stop routing from session compatibility shadows

* refactor: stop persisting session route shadows

* refactor: use typed delivery context in clients

* refactor: stop echoing session route shadows

* refactor: repair embedded runner rebase imports

# Conflicts:
#	src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts

* refactor: align pi contract imports

* refactor: satisfy kysely sync helper guard

* refactor: remove file transcript bridge remnants

* refactor: remove session locator compatibility

* refactor: remove session file test contracts

* refactor: keep rebase database-first clean

* refactor: remove session file assumptions from e2e

* docs: clarify database-first goal state

* test: remove legacy store markers from sqlite runtime tests

* refactor: remove legacy store assumptions from runtime seams

* refactor: align sqlite runtime helper seams

* test: update memory recall sqlite audit mock

* refactor: align database-first runtime type seams

* test: clarify doctor cron legacy store names

* fix: preserve sqlite session route projections

* test: fix copilot token cache test syntax

* docs: update database-first proof status

* test: align database-first test fixtures

* docs: update database-first proof status

* refactor: clean extension database-first drift

* test: align agent session route proof

* test: clarify doctor legacy path fixtures

* chore: clean database-first changed checks

* chore: repair database-first rebase markers

* build: allow baileys git subdependency

* chore: repair exp-vfs rebase drift

* chore: finish exp-vfs rebase cleanup

* chore: satisfy rebase lint drift

* chore: fix qqbot rebase type seam

* chore: fix rebase drift leftovers

* fix: keep auth profile oauth secrets out of sqlite

* fix: repair rebase drift tests

* test: stabilize pairing request ordering

* test: use source manifests in plugin contract checks

* fix: restore gateway session metadata after rebase

* fix: repair database-first rebase drift

* fix: clean up database-first rebase fallout

* test: stabilize line quick reply receipt time

* fix: repair extension rebase drift

* test: keep transcript redaction tests sqlite-backed

* fix: carry injected transcript redaction through sqlite

* chore: clean database branch rebase residue

* fix: repair database branch CI drift

* fix: repair database branch CI guard drift

* fix: stabilize oauth tls preflight test

* test: align database branch fast guards

* test: repair build artifact boundary guards

* chore: clean changelog rebase markers

---------

Co-authored-by: pashpashpash <nik@vault77.ai>
Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: stainlu <stainlu@newtype-ai.org>
Co-authored-by: Jason Zhou <jason.zhou.design@gmail.com>
Co-authored-by: Ruben Cuevas <hi@rubencu.com>
Co-authored-by: Pavan Kumar Gondhi <pavangondhi@gmail.com>
Co-authored-by: Shakker <shakkerdroid@gmail.com>
Co-authored-by: Kaspre <36520309+Kaspre@users.noreply.github.com>
Co-authored-by: dataCenter430 <titan032000@gmail.com>
Co-authored-by: Kaspre <kaspre@gmail.com>
Co-authored-by: pandadev66 <nova.full.stack@outlook.com>
Co-authored-by: Eva <admin@100yen.org>
Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Co-authored-by: jeffjhunter <support@aipersonamethod.com>

2026-05-13 13:15:12 +01:00

22 KiB

Raw Blame History

summary, read_when, title, sidebarTitle

summary

read_when

title

sidebarTitle

How OpenClaw rotates auth profiles and falls back across models

Diagnosing auth profile rotation, cooldowns, or model fallback behavior

Updating failover rules for auth profiles or models

Understanding how session model overrides interact with fallback retries

Model failover

OpenClaw handles failures in two stages:

Auth profile rotation within the current provider.
Model fallback to the next model in agents.defaults.model.fallbacks.

This doc explains the runtime rules and the data that backs them.

Runtime flow

For a normal text run, OpenClaw evaluates candidates in this order:

Resolve the active session model and auth-profile preference. Build the model candidate chain from the current model selection and the fallback policy for that selection source. Configured defaults, cron job primaries, and auto-selected fallback models can use configured fallbacks; explicit user session selections are strict. Try the current provider with auth-profile rotation/cooldown rules. If that provider is exhausted with a failover-worthy error, move to the next model candidate. Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use. The persisted model override is marked `modelOverrideSource: "auto"`. If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate. If every candidate fails, throw a `FallbackSummaryError` with per-attempt detail and the soonest cooldown expiry when one is known.

This is intentionally narrower than "save and restore the whole session". The reply runner only persists the model-selection fields it owns for fallback:

providerOverride
modelOverride
modelOverrideSource
authProfileOverride
authProfileOverrideSource
authProfileOverrideCompactionCount

That prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual /model changes or session rotation updates that happened while the attempt was running.

Selection source policy

OpenClaw separates the selected provider/model from why it was selected. That source controls whether the fallback chain is allowed:

Configured default: agents.defaults.model.primary uses agents.defaults.model.fallbacks.
Agent primary: agents.list[].model is strict unless that agent model object includes its own fallbacks. Use fallbacks: [] to make the strict behavior explicit, or provide a non-empty list to opt that agent into model fallback.
Auto fallback override: a runtime fallback writes providerOverride, modelOverride, modelOverrideSource: "auto", and the selected origin model before retrying. That auto override can keep walking the configured fallback chain and is cleared by /new, /reset, and sessions.reset. Heartbeat runs without an explicit heartbeat.model also clear a direct auto override when its origin no longer matches the current configured default.
User session override: /model, the model picker, session_status(model=...), and sessions.patch write modelOverrideSource: "user". That is an exact session selection. If the selected provider/model fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated configured fallback.
Legacy session override: older session entries may have modelOverride without modelOverrideSource. OpenClaw treats those as user overrides so an explicit old selection is not silently converted into fallback behavior.
Cron payload model: a cron job payload.model / --model is a job primary, not a user session override. It uses configured fallbacks unless the job provides payload.fallbacks; payload.fallbacks: [] makes the cron run strict.

Auth storage (keys + OAuth)

OpenClaw uses auth profiles for both API keys and OAuth tokens.

Secrets live in ~/.openclaw/state/openclaw.sqlite#table/auth_profile_stores/<agentDir>.
Runtime auth-routing state is SQLite-primary. Legacy per-agent auth-state.json files are doctor-import inputs only.
Config auth.profiles / auth.order are metadata + routing only (no secrets).
Legacy import-only OAuth file: ~/.openclaw/credentials/oauth.json (imported by doctor into SQLite).

More detail: OAuth

Credential types:

type: "api_key" → { provider, key }
type: "oauth" → { provider, access, refresh, expires, email? } (+ projectId/enterpriseUrl for some providers)

Profile IDs

OAuth logins create distinct profiles so multiple accounts can coexist.

Default: provider:default when no email is available.
OAuth with email: provider:<email> (for example google-antigravity:user@gmail.com).

Profiles live in ~/.openclaw/state/openclaw.sqlite#table/auth_profile_stores/<agentDir> under profiles.

Rotation order

When a provider has multiple profiles, OpenClaw chooses an order like this:

`auth.order[provider]` (if set). `auth.profiles` filtered by provider. Entries in the SQLite auth-profile row for the provider.

If no explicit order is configured, OpenClaw uses a round-robin order:

Primary key: profile type (OAuth before API keys).
Secondary key: usageStats.lastUsed (oldest first, within each type).
Cooldown/disabled profiles are moved to the end, ordered by soonest expiry.

Session stickiness (cache-friendly)

OpenClaw pins the chosen auth profile per session to keep provider caches warm. It does not rotate on every request. The pinned profile is reused until:

the session is reset (/new / /reset)
a compaction completes (compaction count increments)
the profile is in cooldown/disabled

Manual selection via /model …@<profileId> sets a user override for that session and is not auto-rotated until a new session starts.

Auto-pinned profiles (selected by the session router) are treated as a **preference**: they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts. When the original profile becomes available again, new runs can prefer it again without changing the selected model or runtime. User-pinned profiles stay locked to that profile; if it fails and model fallbacks are configured, OpenClaw moves to the next model instead of switching profiles.

OpenAI Codex subscription plus API-key backup

For OpenAI agent models, auth and runtime are separate. openai/gpt-* stays on the Codex harness while auth can rotate between a Codex subscription profile and an OpenAI API-key backup.

Use auth.order.openai for the user-facing order:

{
  auth: {
    order: {
      openai: ["openai-codex:user@example.com", "openai:api-key-backup"],
    },
  },
}

Existing Codex subscription profiles may still use the legacy openai-codex:* profile id. The ordered API-key backup can be a normal openai:* API-key profile. When the subscription hits a Codex usage limit, OpenClaw records the exact reset time when Codex provides one, tries the next ordered auth profile, and keeps the run inside the Codex harness. Once the reset time passes, the subscription profile is eligible again and the next automatic selection can return to it.

Use a user-pinned profile only when you want to force one account/key for that session. User-pinned profiles are intentionally strict and do not silently jump to another profile.

Cooldowns

When a profile fails due to auth/rate-limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.

That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`.

Format/invalid-request errors are usually terminal because retrying the same payload would fail the same way, so OpenClaw surfaces them instead of rotating auth profiles. Known retry-repair paths can opt in explicitly: for example Cloud Code Assist tool call ID validation failures are sanitized and retried once through the `allowFormatRetry` policy. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals.

Generic server text can also land in that timeout bucket when the source matches a known transient pattern. For example, the bare pi-ai stream-wrapper message `An unknown error occurred` is treated as failover-worthy for every provider because pi-ai emits it when provider streams end with `stopReason: "aborted"` or `stopReason: "error"` without specific details. JSON `api_error` payloads with transient server text such as `internal server error`, `unknown error, 520`, `upstream error`, or `backend error` are also treated as failover-worthy timeouts.

OpenRouter-specific generic upstream text such as bare `Provider returned error` is treated as timeout only when the provider context is actually OpenRouter. Generic internal fallback text such as `LLM request failed with an unknown error.` stays conservative and does not trigger failover by itself.

Some provider SDKs may otherwise sleep for a long `Retry-After` window before returning control to OpenClaw. For Stainless-based SDKs such as Anthropic and OpenAI, OpenClaw caps SDK-internal `retry-after-ms` / `retry-after` waits at 60 seconds by default and surfaces longer retryable responses immediately so this failover path can run. Tune or disable the cap with `OPENCLAW_SDK_RETRY_MAX_WAIT_SECONDS`; see [Retry behavior](/concepts/retry). Rate-limit cooldowns can also be model-scoped:

- OpenClaw records `cooldownModel` for rate-limit failures when the failing model id is known.
- A sibling model on the same provider can still be tried when the cooldown is scoped to a different model.
- Billing/disabled windows still block the whole profile across models.

Cooldowns use exponential backoff:

1 minute
5 minutes
25 minutes
1 hour (cap)

State is stored in SQLite under usageStats:

{
  "usageStats": {
    "provider:profile": {
      "lastUsed": 1736160000000,
      "cooldownUntil": 1736160600000,
      "errorCount": 2
    }
  }
}

Billing disables

Billing/credit failures (for example "insufficient credits" / "credit balance too low") are treated as failover-worthy, but they're usually not transient. Instead of a short cooldown, OpenClaw marks the profile as disabled (with a longer backoff) and rotates to the next profile/provider.

Not every billing-shaped response is `402`, and not every HTTP `402` lands here. OpenClaw keeps explicit billing text in the billing lane even when a provider returns `401` or `403` instead, but provider-specific matchers stay scoped to the provider that owns them (for example OpenRouter `403 Key limit exceeded`).

Meanwhile temporary 402 usage-window and organization/workspace spend-limit errors are classified as rate_limit when the message looks retryable (for example weekly usage limit exhausted, daily limit reached, resets tomorrow, or organization spending limit exceeded). Those stay on the short cooldown/failover path instead of the long billing-disable path.

State is stored in SQLite:

{
  "usageStats": {
    "provider:profile": {
      "disabledUntil": 1736178000000,
      "disabledReason": "billing"
    }
  }
}

Defaults:

Billing backoff starts at 5 hours, doubles per billing failure, and caps at 24 hours.
Backoff counters reset if the profile hasn't failed for 24 hours (configurable).
Overloaded retries allow 1 same-provider profile rotation before model fallback.
Overloaded retries use 0 ms backoff by default.

Model fallback

If all profiles for a provider fail, OpenClaw moves to the next model in agents.defaults.model.fallbacks. This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback). Provider errors that do not expose enough detail are still labeled precisely in fallback state: empty_response means the provider returned no usable message or status, no_error_details means the provider explicitly returned Unknown error (no error details in response), and unclassified means OpenClaw preserved the raw preview but no classifier matched it yet.

Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as ModelNotReadyException land in that overloaded bucket. Tune this with auth.cooldowns.overloadedProfileRotations, auth.cooldowns.overloadedBackoffMs, and auth.cooldowns.rateLimitedProfileRotations.

When a run starts from the configured default primary, a cron job primary, an agent primary with explicit fallbacks, or an auto-selected fallback override, OpenClaw can walk the matching configured fallback chain. Agent primaries without explicit fallbacks and explicit user selections (for example /model ollama/qwen3.5:27b, the model picker, sessions.patch, or one-off CLI provider/model overrides) are strict: if that provider/model is unreachable or fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated fallback.

Candidate chain rules

OpenClaw builds the candidate list from the currently requested provider/model plus configured fallbacks.

- The requested model is always first. - Explicit configured fallbacks are deduplicated but not filtered by the model allowlist. They are treated as explicit operator intent. - If the current run is already on a configured fallback in the same provider family, OpenClaw keeps using the full configured chain. - When no explicit fallback override is supplied, configured fallbacks are tried before the configured primary even if the requested model uses a different provider. - When no explicit fallback override is supplied to the fallback runner, the configured primary is appended at the end so the chain can settle back onto the normal default once earlier candidates are exhausted. - When a caller supplies `fallbacksOverride`, the runner uses exactly the requested model plus that override list. An empty list disables model fallback and prevents the configured primary from being appended as a hidden retry target.

Which errors advance fallback

- auth failures - rate limits and cooldown exhaustion - overloaded/provider-busy errors - timeout-shaped failover errors - billing disables - `LiveSessionModelSwitchError`, which is normalized into a failover path so a stale persisted model does not create an outer retry loop - other unrecognized errors when there are still remaining candidates - explicit aborts that are not timeout/failover-shaped - context overflow errors that should stay inside compaction/retry logic (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum number of tokens`, `input token count exceeds the maximum number of input tokens`, `The input is too long for the model`, or `ollama error: context length exceeded`) - a final unknown error when there are no candidates left

Cooldown skip vs probe behavior

When every auth profile for a provider is already in cooldown, OpenClaw does not automatically skip that provider forever. It makes a per-candidate decision:

- Persistent auth failures skip the whole provider immediately. - Billing disables usually skip, but the primary candidate can still be probed on a throttle so recovery is possible without restarting. - The primary candidate may be probed near cooldown expiry, with a per-provider throttle. - Same-provider fallback siblings can be attempted despite cooldown when the failure looks transient (`rate_limit`, `overloaded`, or unknown). This is especially relevant when a rate limit is model-scoped and a sibling model may still recover immediately. - Transient cooldown probes are limited to one per provider per fallback run so a single provider does not stall cross-provider fallback.

Session overrides and live model switching

Session model changes are shared state. The active runner, /model command, compaction/session updates, and live-session reconciliation all read or write parts of the same session entry.

That means fallback retries have to coordinate with live model switching:

Only explicit user-driven model changes mark a pending live switch. That includes /model, session_status(model=...), and sessions.patch.
System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own.
User-driven model overrides are treated as exact selections for fallback policy, so an unreachable selected provider surfaces as a failure instead of being masked by agents.defaults.model.fallbacks.
Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry.
Auto fallback overrides remain selected on subsequent turns so OpenClaw does not probe a known-bad primary on every message. /new, /reset, and sessions.reset clear auto-sourced overrides and return the session to the configured default.
/status shows the selected model and, when fallback state differs, the active fallback model and reason.
Live-session reconciliation prefers persisted session overrides over stale runtime model fields.
If a live-switch error points at a later candidate in the active fallback chain, OpenClaw jumps directly to that selected model instead of walking unrelated candidates first.
If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate.

This prevents the classic race:

The selected primary model fails. Fallback candidate is chosen in memory. Session store still reflects the old primary. Live-session reconciliation reads the stale session state. The retry gets snapped back to the old model before the fallback attempt starts.

The persisted fallback override closes that window, and the narrow rollback keeps newer manual or runtime session changes intact.

Observability and failure summaries

runWithModelFallback(...) records per-attempt details that feed logs and user-facing cooldown messaging:

provider/model attempted
reason (rate_limit, overloaded, billing, auth, model_not_found, and similar failover reasons)
optional status/code
human-readable error summary

Structured model_fallback_decision logs also include flat fallbackStep* fields when a candidate fails, is skipped, or a later fallback succeeds. These fields make the attempted transition explicit (fallbackStepFromModel, fallbackStepToModel, fallbackStepFromFailureReason, fallbackStepFromFailureDetail, fallbackStepFinalOutcome) so log and diagnostic exporters can reconstruct the primary failure even when the terminal fallback also fails.

When every candidate fails, OpenClaw throws FallbackSummaryError. The outer reply runner can use that to build a more specific message such as "all models are temporarily rate-limited" and include the soonest cooldown expiry when one is known.

That cooldown summary is model-aware:

unrelated model-scoped rate limits are ignored for the attempted provider/model chain
if the remaining block is a matching model-scoped rate limit, OpenClaw reports the last matching expiry that still blocks that model

See Gateway configuration for:

auth.profiles / auth.order
auth.cooldowns.billingBackoffHours / auth.cooldowns.billingBackoffHoursByProvider
auth.cooldowns.billingMaxHours / auth.cooldowns.failureWindowHours
auth.cooldowns.overloadedProfileRotations / auth.cooldowns.overloadedBackoffMs
auth.cooldowns.rateLimitedProfileRotations
agents.defaults.model.primary / agents.defaults.model.fallbacks
agents.defaults.imageModel routing

See Models for the broader model selection and fallback overview.

22 KiB Raw Blame History