fix: retry primary after auto model fallback

This commit is contained in:
Peter Steinberger
2026-04-27 09:18:54 +01:00
parent 983bac7afa
commit 9611260225
4 changed files with 145 additions and 7 deletions

View File

@@ -21,7 +21,7 @@ For a normal text run, OpenClaw evaluates candidates in this order:
<Steps>
<Step title="Resolve session state">
Resolve the active session model and auth-profile preference.
Resolve the active session model and auth-profile preference. A session override with `modelOverrideSource: "auto"` came from an earlier fallback, so the next run clears it first and retries the configured primary; user-selected overrides stay sticky.
</Step>
<Step title="Build candidate chain">
Build the model candidate chain from the currently selected session model, then `agents.defaults.model.fallbacks` in order, ending with the configured primary when the run started from an override.
@@ -33,7 +33,7 @@ For a normal text run, OpenClaw evaluates candidates in this order:
If that provider is exhausted with a failover-worthy error, move to the next model candidate.
</Step>
<Step title="Persist fallback override">
Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use.
Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use. The persisted model override is marked `modelOverrideSource: "auto"`.
</Step>
<Step title="Roll back narrowly on failure">
If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate.
@@ -264,6 +264,8 @@ That means fallback retries have to coordinate with live model switching:
- Only explicit user-driven model changes mark a pending live switch. That includes `/model`, `session_status(model=...)`, and `sessions.patch`.
- System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own.
- Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry.
- On the next run, auto fallback overrides are cleared before model selection so the configured primary is retried. If it is still unhealthy, the fallback loop records a fresh auto override for that new attempt.
- User model overrides (`modelOverrideSource: "user"`) and legacy overrides without a source field remain persistent across turns.
- Live-session reconciliation prefers persisted session overrides over stale runtime model fields.
- If a live-switch error points at a later candidate in the active fallback chain, OpenClaw jumps directly to that selected model instead of walking unrelated candidates first.
- If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate.