From e336300e600d17b34861da519fb4ac86fe8ffbdf Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Sat, 4 Apr 2026 20:43:58 +0100
Subject: [PATCH] docs: refresh failover and compaction pattern refs

---
 docs/concepts/compaction.md                   |  4 +++-
 docs/concepts/model-failover.md               | 10 +++++++---
 docs/concepts/model-providers.md              |  4 ++--
 docs/gateway/authentication.md                |  4 +++-
 docs/gateway/configuration-reference.md       |  4 ++--
 docs/help/faq.md                              | 20 ++++++++++++-------
 docs/pi.md                                    |  7 ++++++-
 .../session-management-compaction.md          |  5 +++--
 8 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/docs/concepts/compaction.md b/docs/concepts/compaction.md
index 2ce5b64bd75..1ebc0e78f6f 100644
--- a/docs/concepts/compaction.md
+++ b/docs/concepts/compaction.md
@@ -32,7 +32,9 @@ Auto-compaction is on by default. It runs when the session nears the context
 limit, or when the model returns a context-overflow error (in which case
 OpenClaw compacts and retries). Typical overflow signatures include
 `request_too_large`, `context length exceeded`, `input exceeds the maximum
-number of tokens`, and `input is too long for the model`.
+number of tokens`, `input token count exceeds the maximum number of input
+tokens`, `input is too long for the model`, and `ollama error: context length
+exceeded`.
 
 <Info>
 Before compacting, OpenClaw automatically reminds the agent to save important
diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md
index a38650144b2..0a841d2e997 100644
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -122,6 +122,7 @@ When a profile fails due to auth/rate‑limit errors (or a timeout that looks
 like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
 That rate-limit bucket is broader than plain `429`: it also includes provider
 messages such as `Too many concurrent requests`, `ThrottlingException`,
+`concurrency limit reached`, `workers_ai ... quota limit exceeded`,
 `throttled`, `resource exhausted`, and periodic usage-window limits such as
 `weekly/monthly limit reached`.
 Format/invalid‑request errors (for example Cloud Code Assist tool call ID
@@ -203,8 +204,9 @@ timeouts that exhausted profile rotation (other errors do not advance fallback).
 
 Overloaded and rate-limit errors are handled more aggressively than billing
 cooldowns. By default, OpenClaw allows one same-provider auth-profile retry,
-then switches to the next configured model fallback without waiting. Tune this
-with `auth.cooldowns.overloadedProfileRotations`,
+then switches to the next configured model fallback without waiting.
+Provider-busy signals such as `ModelNotReadyException` land in that overloaded
+bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`,
 `auth.cooldowns.overloadedBackoffMs`, and
 `auth.cooldowns.rateLimitedProfileRotations`.
 
@@ -248,7 +250,9 @@ Model fallback does not continue on:
 - explicit aborts that are not timeout/failover-shaped
 - context overflow errors that should stay inside compaction/retry logic
   (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
-number of tokens`, or `The input is too long for the model`)
+number of tokens`, `input token count exceeds the maximum number of input
+tokens`, `The input is too long for the model`, or `ollama error: context
+length exceeded`)
 - a final unknown error when there are no candidates left
 
 ### Cooldown skip vs probe behavior
diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md
index 58e705fdf6c..68107cb2993 100644
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -168,8 +168,8 @@ surface.
 - Key selection order preserves priority and deduplicates values.
 - Requests are retried with the next key only on rate-limit responses (for
   example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many
-concurrent requests`, `ThrottlingException`, or periodic usage-limit
-  messages).
+concurrent requests`, `ThrottlingException`, `concurrency limit reached`,
+  `workers_ai ... quota limit exceeded`, or periodic usage-limit messages).
 - Non-rate-limit failures fail immediately; no key rotation is attempted.
 - When all candidate keys fail, the final error is returned from the last attempt.
 
diff --git a/docs/gateway/authentication.md b/docs/gateway/authentication.md
index 38a22c08250..a559b8b4abf 100644
--- a/docs/gateway/authentication.md
+++ b/docs/gateway/authentication.md
@@ -144,7 +144,9 @@ hits a provider rate limit.
 - Google providers also include `GOOGLE_API_KEY` as an additional fallback.
 - The same key list is deduplicated before use.
 - OpenClaw retries with the next key only for rate-limit errors (for example
-  `429`, `rate_limit`, `quota`, `resource exhausted`).
+  `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many concurrent
+requests`, `ThrottlingException`, `concurrency limit reached`, or
+  `workers_ai ... quota limit exceeded`).
 - Non-rate-limit errors are not retried with alternate keys.
 - If all keys fail, the final error from the last attempt is returned.
 
diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md
index e398227107a..12a93d74d0e 100644
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -3161,9 +3161,9 @@ Notes:
 - `authPermanentBackoffMinutes`: base backoff in minutes for high-confidence `auth_permanent` failures (default: `10`).
 - `authPermanentMaxMinutes`: cap in minutes for `auth_permanent` backoff growth (default: `60`).
 - `failureWindowHours`: rolling window in hours used for backoff counters (default: `24`).
-- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`).
+- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`). Provider-busy shapes such as `ModelNotReadyException` land here.
 - `overloadedBackoffMs`: fixed delay before retrying an overloaded provider/profile rotation (default: `0`).
-- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`).
+- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`). That rate-limit bucket includes provider-shaped text such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, and `resource exhausted`.
 
 ---
 
diff --git a/docs/help/faq.md b/docs/help/faq.md
index 96f9f19c03e..62f6b840eb4 100644
--- a/docs/help/faq.md
+++ b/docs/help/faq.md
@@ -2476,8 +2476,10 @@ for usage/billing and raise limits as needed.
 
     The rate-limit bucket includes more than plain `429` responses. OpenClaw
     also treats messages like `Too many concurrent requests`,
-    `ThrottlingException`, `resource exhausted`, and periodic usage-window
-    limits (`weekly/monthly limit reached`) as failover-worthy rate limits.
+    `ThrottlingException`, `concurrency limit reached`,
+    `workers_ai ... quota limit exceeded`, `resource exhausted`, and periodic
+    usage-window limits (`weekly/monthly limit reached`) as failover-worthy
+    rate limits.
 
     Some billing-looking responses are not `402`, and some HTTP `402`
     responses also stay in that transient bucket. If a provider returns
@@ -2490,17 +2492,21 @@ for usage/billing and raise limits as needed.
     `rate_limit`, not a long billing disable.
 
     Context-overflow errors are different: signatures such as
-    `request_too_large`, `input exceeds the maximum number of tokens`, or
-    `input is too long for the model` stay on the compaction/retry path instead
-    of advancing model fallback.
+    `request_too_large`, `input exceeds the maximum number of tokens`,
+    `input token count exceeds the maximum number of input tokens`,
+    `input is too long for the model`, or `ollama error: context length
+    exceeded` stay on the compaction/retry path instead of advancing model
+    fallback.
 
     Generic server-error text is intentionally narrower than "anything with
     unknown/error in it". OpenClaw does treat provider-scoped transient shapes
     such as Anthropic bare `An unknown error occurred`, OpenRouter bare
     `Provider returned error`, stop-reason errors like `Unhandled stop reason:
-    error`, and JSON `api_error` payloads with transient server text
+    error`, JSON `api_error` payloads with transient server text
     (`internal server error`, `unknown error, 520`, `upstream error`, `backend
-    error`) as timeout/failover signals when the provider context matches.
+    error`), and provider-busy errors such as `ModelNotReadyException` as
+    failover-worthy timeout/overloaded signals when the provider context
+    matches.
     Generic internal fallback text like `LLM request failed with an unknown
     error.` stays conservative and does not trigger model fallback by itself.
 
diff --git a/docs/pi.md b/docs/pi.md
index c7e878a5573..09cb9ac819b 100644
--- a/docs/pi.md
+++ b/docs/pi.md
@@ -326,7 +326,12 @@ trackSessionManagerAccess(params.sessionFile);
 
 ### Compaction
 
-Auto-compaction triggers on context overflow. `compactEmbeddedPiSessionDirect()` handles manual compaction:
+Auto-compaction triggers on context overflow. Common overflow signatures
+include `request_too_large`, `context length exceeded`, `input exceeds the
+maximum number of tokens`, `input token count exceeds the maximum number of
+input tokens`, `input is too long for the model`, and `ollama error: context
+length exceeded`. `compactEmbeddedPiSessionDirect()` handles manual
+compaction:
 
 ```typescript
 const compactResult = await compactEmbeddedPiSessionDirect({
diff --git a/docs/reference/session-management-compaction.md b/docs/reference/session-management-compaction.md
index da48bd5c6ae..1cbd70a7bcb 100644
--- a/docs/reference/session-management-compaction.md
+++ b/docs/reference/session-management-compaction.md
@@ -231,8 +231,9 @@ In the embedded Pi agent, auto-compaction triggers in two cases:
 
 1. **Overflow recovery**: the model returns a context overflow error
    (`request_too_large`, `context length exceeded`, `input exceeds the maximum
-number of tokens`, `input is too long for the model`, and similar
-   provider-shaped variants) → compact → retry.
+number of tokens`, `input token count exceeds the maximum number of input
+tokens`, `input is too long for the model`, `ollama error: context length
+exceeded`, and similar provider-shaped variants) → compact → retry.
 2. **Threshold maintenance**: after a successful turn, when:
 
 `contextTokens > contextWindow - reserveTokens`