fix: escalate to model fallback after rate-limit profile rotation cap (#58707)

* fix: escalate to model fallback after rate-limit profile rotation cap Per-model rate limits (e.g. Anthropic Sonnet-only quotas) are not relieved by rotating auth profiles — if all profiles share the same model quota, cycling between them loops forever without falling back to the next model in the configured fallbacks chain. Apply the same rotation-cap pattern introduced for overloaded_error (#58348) to rate_limit errors: - Add `rateLimitedProfileRotations` to auth.cooldowns config (default: 1) - After N profile rotations on a rate_limit error, throw FailoverError to trigger cross-provider model fallback - Add `resolveRateLimitProfileRotationLimit` helper following the same pattern as `resolveOverloadProfileRotationLimit` Fixes #58572 * fix: cap prompt-side rate-limit failover (#58707) (thanks @Forgely3D) * fix: restore latest-main gates for #58707 --------- Co-authored-by: Ember (Forgely3D) <ember@forgely.co> Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-22 06:32:00 +00:00 · 2026-04-01 02:54:10 -06:00
parent 8fce663861
commit 4fa11632b4
22 changed files with 357 additions and 45 deletions
--- a/docs/.generated/config-baseline.json
+++ b/docs/.generated/config-baseline.json
@@ -7907,6 +7907,23 @@
      "help": "Maximum same-provider auth-profile rotations allowed for overloaded errors before switching to model fallback (default: 1).",
      "hasChildren": false
    },
+    {
+      "path": "auth.cooldowns.rateLimitedProfileRotations",
+      "kind": "core",
+      "type": "integer",
+      "required": false,
+      "deprecated": false,
+      "sensitive": false,
+      "tags": [
+        "access",
+        "auth",
+        "performance",
+        "storage"
+      ],
+      "label": "Rate-Limited Profile Rotations",
+      "help": "Maximum same-provider auth-profile rotations allowed for rate-limit errors before switching to model fallback (default: 1).",
+      "hasChildren": false
+    },
    {
      "path": "auth.order",
      "kind": "core",
--- a/docs/.generated/config-baseline.jsonl
+++ b/docs/.generated/config-baseline.jsonl
@@ -1,4 +1,4 @@
-{"generatedBy":"scripts/generate-config-doc-baseline.ts","recordType":"meta","totalPaths":5729}
+{"generatedBy":"scripts/generate-config-doc-baseline.ts","recordType":"meta","totalPaths":5730}
 {"recordType":"path","path":"acp","kind":"core","type":"object","required":false,"deprecated":false,"sensitive":false,"tags":["advanced"],"label":"ACP","help":"ACP runtime controls for enabling dispatch, selecting backends, constraining allowed agent targets, and tuning streamed turn projection behavior.","hasChildren":true}
 {"recordType":"path","path":"acp.allowedAgents","kind":"core","type":"array","required":false,"deprecated":false,"sensitive":false,"tags":["access"],"label":"ACP Allowed Agents","help":"Allowlist of ACP target agent ids permitted for ACP runtime sessions. Empty means no additional allowlist restriction.","hasChildren":true}
 {"recordType":"path","path":"acp.allowedAgents.*","kind":"core","type":"string","required":false,"deprecated":false,"sensitive":false,"tags":[],"hasChildren":false}
@@ -701,6 +701,7 @@
 {"recordType":"path","path":"auth.cooldowns.failureWindowHours","kind":"core","type":"number","required":false,"deprecated":false,"sensitive":false,"tags":["access","auth"],"label":"Failover Window (hours)","help":"Failure window (hours) for backoff counters (default: 24).","hasChildren":false}
 {"recordType":"path","path":"auth.cooldowns.overloadedBackoffMs","kind":"core","type":"integer","required":false,"deprecated":false,"sensitive":false,"tags":["access","auth","reliability","storage"],"label":"Overloaded Backoff (ms)","help":"Fixed delay in milliseconds before retrying an overloaded provider/profile rotation (default: 0).","hasChildren":false}
 {"recordType":"path","path":"auth.cooldowns.overloadedProfileRotations","kind":"core","type":"integer","required":false,"deprecated":false,"sensitive":false,"tags":["access","auth","storage"],"label":"Overloaded Profile Rotations","help":"Maximum same-provider auth-profile rotations allowed for overloaded errors before switching to model fallback (default: 1).","hasChildren":false}
+{"recordType":"path","path":"auth.cooldowns.rateLimitedProfileRotations","kind":"core","type":"integer","required":false,"deprecated":false,"sensitive":false,"tags":["access","auth","performance","storage"],"label":"Rate-Limited Profile Rotations","help":"Maximum same-provider auth-profile rotations allowed for rate-limit errors before switching to model fallback (default: 1).","hasChildren":false}
 {"recordType":"path","path":"auth.order","kind":"core","type":"object","required":false,"deprecated":false,"sensitive":false,"tags":["access","auth"],"label":"Auth Profile Order","help":"Ordered auth profile IDs per provider (used for automatic failover).","hasChildren":true}
 {"recordType":"path","path":"auth.order.*","kind":"core","type":"array","required":false,"deprecated":false,"sensitive":false,"tags":[],"hasChildren":true}
 {"recordType":"path","path":"auth.order.*.*","kind":"core","type":"string","required":false,"deprecated":false,"sensitive":false,"tags":[],"hasChildren":false}
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -138,10 +138,12 @@ If all profiles for a provider fail, OpenClaw moves to the next model in
 `agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and
 timeouts that exhausted profile rotation (other errors do not advance fallback).

-Overloaded errors are handled more aggressively than billing cooldowns. By default,
-OpenClaw allows one same-provider auth-profile retry, then switches to the next
-configured model fallback without waiting. Tune this with
-`auth.cooldowns.overloadedProfileRotations` and `auth.cooldowns.overloadedBackoffMs`.
+Overloaded and rate-limit errors are handled more aggressively than billing
+cooldowns. By default, OpenClaw allows one same-provider auth-profile retry,
+then switches to the next configured model fallback without waiting. Tune this
+with `auth.cooldowns.overloadedProfileRotations`,
+`auth.cooldowns.overloadedBackoffMs`, and
+`auth.cooldowns.rateLimitedProfileRotations`.

 When a run starts with a model override (hooks or CLI), fallbacks still end at
 `agents.defaults.model.primary` after trying any configured fallbacks.
@@ -154,6 +156,7 @@ See [Gateway configuration](/gateway/configuration) for:
 - `auth.cooldowns.billingBackoffHours` / `auth.cooldowns.billingBackoffHoursByProvider`
 - `auth.cooldowns.billingMaxHours` / `auth.cooldowns.failureWindowHours`
 - `auth.cooldowns.overloadedProfileRotations` / `auth.cooldowns.overloadedBackoffMs`
+- `auth.cooldowns.rateLimitedProfileRotations`
 - `agents.defaults.model.primary` / `agents.defaults.model.fallbacks`
 - `agents.defaults.imageModel` routing

--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -3031,6 +3031,7 @@ Notes:
      failureWindowHours: 24,
      overloadedProfileRotations: 1,
      overloadedBackoffMs: 0,
+      rateLimitedProfileRotations: 1,
    },
  },
 }
@@ -3042,6 +3043,7 @@ Notes:
 - `failureWindowHours`: rolling window in hours used for backoff counters (default: `24`).
 - `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`).
 - `overloadedBackoffMs`: fixed delay before retrying an overloaded provider/profile rotation (default: `0`).
+- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`).

 ---