From e336300e600d17b34861da519fb4ac86fe8ffbdf Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sat, 4 Apr 2026 20:43:58 +0100 Subject: [PATCH] docs: refresh failover and compaction pattern refs --- docs/concepts/compaction.md | 4 +++- docs/concepts/model-failover.md | 10 +++++++--- docs/concepts/model-providers.md | 4 ++-- docs/gateway/authentication.md | 4 +++- docs/gateway/configuration-reference.md | 4 ++-- docs/help/faq.md | 20 ++++++++++++------- docs/pi.md | 7 ++++++- .../session-management-compaction.md | 5 +++-- 8 files changed, 39 insertions(+), 19 deletions(-) diff --git a/docs/concepts/compaction.md b/docs/concepts/compaction.md index 2ce5b64bd75..1ebc0e78f6f 100644 --- a/docs/concepts/compaction.md +++ b/docs/concepts/compaction.md @@ -32,7 +32,9 @@ Auto-compaction is on by default. It runs when the session nears the context limit, or when the model returns a context-overflow error (in which case OpenClaw compacts and retries). Typical overflow signatures include `request_too_large`, `context length exceeded`, `input exceeds the maximum -number of tokens`, and `input is too long for the model`. +number of tokens`, `input token count exceeds the maximum number of input +tokens`, `input is too long for the model`, and `ollama error: context length +exceeded`. Before compacting, OpenClaw automatically reminds the agent to save important diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md index a38650144b2..0a841d2e997 100644 --- a/docs/concepts/model-failover.md +++ b/docs/concepts/model-failover.md @@ -122,6 +122,7 @@ When a profile fails due to auth/rate‑limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile. That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, +`concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`. Format/invalid‑request errors (for example Cloud Code Assist tool call ID @@ -203,8 +204,9 @@ timeouts that exhausted profile rotation (other errors do not advance fallback). Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, -then switches to the next configured model fallback without waiting. Tune this -with `auth.cooldowns.overloadedProfileRotations`, +then switches to the next configured model fallback without waiting. +Provider-busy signals such as `ModelNotReadyException` land in that overloaded +bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`, `auth.cooldowns.overloadedBackoffMs`, and `auth.cooldowns.rateLimitedProfileRotations`. @@ -248,7 +250,9 @@ Model fallback does not continue on: - explicit aborts that are not timeout/failover-shaped - context overflow errors that should stay inside compaction/retry logic (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum -number of tokens`, or `The input is too long for the model`) +number of tokens`, `input token count exceeds the maximum number of input +tokens`, `The input is too long for the model`, or `ollama error: context +length exceeded`) - a final unknown error when there are no candidates left ### Cooldown skip vs probe behavior diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md index 58e705fdf6c..68107cb2993 100644 --- a/docs/concepts/model-providers.md +++ b/docs/concepts/model-providers.md @@ -168,8 +168,8 @@ surface. - Key selection order preserves priority and deduplicates values. - Requests are retried with the next key only on rate-limit responses (for example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many -concurrent requests`, `ThrottlingException`, or periodic usage-limit - messages). +concurrent requests`, `ThrottlingException`, `concurrency limit reached`, + `workers_ai ... quota limit exceeded`, or periodic usage-limit messages). - Non-rate-limit failures fail immediately; no key rotation is attempted. - When all candidate keys fail, the final error is returned from the last attempt. diff --git a/docs/gateway/authentication.md b/docs/gateway/authentication.md index 38a22c08250..a559b8b4abf 100644 --- a/docs/gateway/authentication.md +++ b/docs/gateway/authentication.md @@ -144,7 +144,9 @@ hits a provider rate limit. - Google providers also include `GOOGLE_API_KEY` as an additional fallback. - The same key list is deduplicated before use. - OpenClaw retries with the next key only for rate-limit errors (for example - `429`, `rate_limit`, `quota`, `resource exhausted`). + `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many concurrent +requests`, `ThrottlingException`, `concurrency limit reached`, or + `workers_ai ... quota limit exceeded`). - Non-rate-limit errors are not retried with alternate keys. - If all keys fail, the final error from the last attempt is returned. diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index e398227107a..12a93d74d0e 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -3161,9 +3161,9 @@ Notes: - `authPermanentBackoffMinutes`: base backoff in minutes for high-confidence `auth_permanent` failures (default: `10`). - `authPermanentMaxMinutes`: cap in minutes for `auth_permanent` backoff growth (default: `60`). - `failureWindowHours`: rolling window in hours used for backoff counters (default: `24`). -- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`). +- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`). Provider-busy shapes such as `ModelNotReadyException` land here. - `overloadedBackoffMs`: fixed delay before retrying an overloaded provider/profile rotation (default: `0`). -- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`). +- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`). That rate-limit bucket includes provider-shaped text such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, and `resource exhausted`. --- diff --git a/docs/help/faq.md b/docs/help/faq.md index 96f9f19c03e..62f6b840eb4 100644 --- a/docs/help/faq.md +++ b/docs/help/faq.md @@ -2476,8 +2476,10 @@ for usage/billing and raise limits as needed. The rate-limit bucket includes more than plain `429` responses. OpenClaw also treats messages like `Too many concurrent requests`, - `ThrottlingException`, `resource exhausted`, and periodic usage-window - limits (`weekly/monthly limit reached`) as failover-worthy rate limits. + `ThrottlingException`, `concurrency limit reached`, + `workers_ai ... quota limit exceeded`, `resource exhausted`, and periodic + usage-window limits (`weekly/monthly limit reached`) as failover-worthy + rate limits. Some billing-looking responses are not `402`, and some HTTP `402` responses also stay in that transient bucket. If a provider returns @@ -2490,17 +2492,21 @@ for usage/billing and raise limits as needed. `rate_limit`, not a long billing disable. Context-overflow errors are different: signatures such as - `request_too_large`, `input exceeds the maximum number of tokens`, or - `input is too long for the model` stay on the compaction/retry path instead - of advancing model fallback. + `request_too_large`, `input exceeds the maximum number of tokens`, + `input token count exceeds the maximum number of input tokens`, + `input is too long for the model`, or `ollama error: context length + exceeded` stay on the compaction/retry path instead of advancing model + fallback. Generic server-error text is intentionally narrower than "anything with unknown/error in it". OpenClaw does treat provider-scoped transient shapes such as Anthropic bare `An unknown error occurred`, OpenRouter bare `Provider returned error`, stop-reason errors like `Unhandled stop reason: - error`, and JSON `api_error` payloads with transient server text + error`, JSON `api_error` payloads with transient server text (`internal server error`, `unknown error, 520`, `upstream error`, `backend - error`) as timeout/failover signals when the provider context matches. + error`), and provider-busy errors such as `ModelNotReadyException` as + failover-worthy timeout/overloaded signals when the provider context + matches. Generic internal fallback text like `LLM request failed with an unknown error.` stays conservative and does not trigger model fallback by itself. diff --git a/docs/pi.md b/docs/pi.md index c7e878a5573..09cb9ac819b 100644 --- a/docs/pi.md +++ b/docs/pi.md @@ -326,7 +326,12 @@ trackSessionManagerAccess(params.sessionFile); ### Compaction -Auto-compaction triggers on context overflow. `compactEmbeddedPiSessionDirect()` handles manual compaction: +Auto-compaction triggers on context overflow. Common overflow signatures +include `request_too_large`, `context length exceeded`, `input exceeds the +maximum number of tokens`, `input token count exceeds the maximum number of +input tokens`, `input is too long for the model`, and `ollama error: context +length exceeded`. `compactEmbeddedPiSessionDirect()` handles manual +compaction: ```typescript const compactResult = await compactEmbeddedPiSessionDirect({ diff --git a/docs/reference/session-management-compaction.md b/docs/reference/session-management-compaction.md index da48bd5c6ae..1cbd70a7bcb 100644 --- a/docs/reference/session-management-compaction.md +++ b/docs/reference/session-management-compaction.md @@ -231,8 +231,9 @@ In the embedded Pi agent, auto-compaction triggers in two cases: 1. **Overflow recovery**: the model returns a context overflow error (`request_too_large`, `context length exceeded`, `input exceeds the maximum -number of tokens`, `input is too long for the model`, and similar - provider-shaped variants) → compact → retry. +number of tokens`, `input token count exceeds the maximum number of input +tokens`, `input is too long for the model`, `ollama error: context length +exceeded`, and similar provider-shaped variants) → compact → retry. 2. **Threshold maintenance**: after a successful turn, when: `contextTokens > contextWindow - reserveTokens`