From 73584b1d33d76bceec1f172c13db69f80c5cbc52 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sat, 4 Apr 2026 14:44:42 +0100 Subject: [PATCH] docs: refresh failover and compaction refs --- docs/concepts/compaction.md | 4 +++- docs/concepts/model-failover.md | 6 ++++++ docs/concepts/model-providers.md | 5 ++++- docs/help/faq.md | 10 ++++++++++ docs/reference/session-management-compaction.md | 5 ++++- 5 files changed, 27 insertions(+), 3 deletions(-) diff --git a/docs/concepts/compaction.md b/docs/concepts/compaction.md index cf69945358b..5efb21b3f03 100644 --- a/docs/concepts/compaction.md +++ b/docs/concepts/compaction.md @@ -25,7 +25,9 @@ model sees on the next turn. Auto-compaction is on by default. It runs when the session nears the context limit, or when the model returns a context-overflow error (in which case -OpenClaw compacts and retries). +OpenClaw compacts and retries). Typical overflow signatures include +`request_too_large`, `context length exceeded`, `input exceeds the maximum +number of tokens`, and `input is too long for the model`. Before compacting, OpenClaw automatically reminds the agent to save important diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md index 86488571b8f..a743190d848 100644 --- a/docs/concepts/model-failover.md +++ b/docs/concepts/model-failover.md @@ -120,6 +120,10 @@ If you have both an OAuth profile and an API key profile for the same provider, When a profile fails due to auth/rate‑limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile. +That rate-limit bucket is broader than plain `429`: it also includes provider +messages such as `Too many concurrent requests`, `ThrottlingException`, +`throttled`, `resource exhausted`, and periodic usage-window limits such as +`weekly/monthly limit reached`. Format/invalid‑request errors (for example Cloud Code Assist tool call ID validation failures) are treated as failover‑worthy and use the same cooldowns. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, @@ -223,6 +227,8 @@ Model fallback does not continue on: - explicit aborts that are not timeout/failover-shaped - context overflow errors that should stay inside compaction/retry logic + (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum +number of tokens`, or `The input is too long for the model`) - a final unknown error when there are no candidates left ### Cooldown skip vs probe behavior diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md index d894438f7a0..34e85813f20 100644 --- a/docs/concepts/model-providers.md +++ b/docs/concepts/model-providers.md @@ -154,7 +154,10 @@ surface. - `_API_KEY_*` (numbered list, e.g. `_API_KEY_1`) - For Google providers, `GOOGLE_API_KEY` is also included as fallback. - Key selection order preserves priority and deduplicates values. -- Requests are retried with the next key only on rate-limit responses (for example `429`, `rate_limit`, `quota`, `resource exhausted`). +- Requests are retried with the next key only on rate-limit responses (for + example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many +concurrent requests`, `ThrottlingException`, or periodic usage-limit + messages). - Non-rate-limit failures fail immediately; no key rotation is attempted. - When all candidate keys fail, the final error is returned from the last attempt. diff --git a/docs/help/faq.md b/docs/help/faq.md index 4aef00d4594..53d6eb986b6 100644 --- a/docs/help/faq.md +++ b/docs/help/faq.md @@ -2362,6 +2362,16 @@ for usage/billing and raise limits as needed. Cooldowns apply to failing profiles (exponential backoff), so OpenClaw can keep responding even when a provider is rate-limited or temporarily failing. + The rate-limit bucket includes more than plain `429` responses. OpenClaw + also treats messages like `Too many concurrent requests`, + `ThrottlingException`, `resource exhausted`, and periodic usage-window + limits (`weekly/monthly limit reached`) as failover-worthy rate limits. + + Context-overflow errors are different: signatures such as + `request_too_large`, `input exceeds the maximum number of tokens`, or + `input is too long for the model` stay on the compaction/retry path instead + of advancing model fallback. + diff --git a/docs/reference/session-management-compaction.md b/docs/reference/session-management-compaction.md index f3dbd61c6d3..f394db5a170 100644 --- a/docs/reference/session-management-compaction.md +++ b/docs/reference/session-management-compaction.md @@ -216,7 +216,10 @@ Compaction is **persistent** (unlike session pruning). See [/concepts/session-pr In the embedded Pi agent, auto-compaction triggers in two cases: -1. **Overflow recovery**: the model returns a context overflow error → compact → retry. +1. **Overflow recovery**: the model returns a context overflow error + (`request_too_large`, `context length exceeded`, `input exceeds the maximum +number of tokens`, `input is too long for the model`, and similar + provider-shaped variants) → compact → retry. 2. **Threshold maintenance**: after a successful turn, when: `contextTokens > contextWindow - reserveTokens`