From 73584b1d33d76bceec1f172c13db69f80c5cbc52 Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Sat, 4 Apr 2026 14:44:42 +0100
Subject: [PATCH] docs: refresh failover and compaction refs

---
 docs/concepts/compaction.md                     |  4 +++-
 docs/concepts/model-failover.md                 |  6 ++++++
 docs/concepts/model-providers.md                |  5 ++++-
 docs/help/faq.md                                | 10 ++++++++++
 docs/reference/session-management-compaction.md |  5 ++++-
 5 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/docs/concepts/compaction.md b/docs/concepts/compaction.md
index cf69945358b..5efb21b3f03 100644
--- a/docs/concepts/compaction.md
+++ b/docs/concepts/compaction.md
@@ -25,7 +25,9 @@ model sees on the next turn.
 
 Auto-compaction is on by default. It runs when the session nears the context
 limit, or when the model returns a context-overflow error (in which case
-OpenClaw compacts and retries).
+OpenClaw compacts and retries). Typical overflow signatures include
+`request_too_large`, `context length exceeded`, `input exceeds the maximum
+number of tokens`, and `input is too long for the model`.
 
 <Info>
 Before compacting, OpenClaw automatically reminds the agent to save important
diff --git a/docs/concepts/model-failover.md b/docs/concepts/model-failover.md
index 86488571b8f..a743190d848 100644
--- a/docs/concepts/model-failover.md
+++ b/docs/concepts/model-failover.md
@@ -120,6 +120,10 @@ If you have both an OAuth profile and an API key profile for the same provider,
 
 When a profile fails due to auth/rate‑limit errors (or a timeout that looks
 like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
+That rate-limit bucket is broader than plain `429`: it also includes provider
+messages such as `Too many concurrent requests`, `ThrottlingException`,
+`throttled`, `resource exhausted`, and periodic usage-window limits such as
+`weekly/monthly limit reached`.
 Format/invalid‑request errors (for example Cloud Code Assist tool call ID
 validation failures) are treated as failover‑worthy and use the same cooldowns.
 OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`,
@@ -223,6 +227,8 @@ Model fallback does not continue on:
 
 - explicit aborts that are not timeout/failover-shaped
 - context overflow errors that should stay inside compaction/retry logic
+  (for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
+number of tokens`, or `The input is too long for the model`)
 - a final unknown error when there are no candidates left
 
 ### Cooldown skip vs probe behavior
diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md
index d894438f7a0..34e85813f20 100644
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -154,7 +154,10 @@ surface.
   - `<PROVIDER>_API_KEY_*` (numbered list, e.g. `<PROVIDER>_API_KEY_1`)
 - For Google providers, `GOOGLE_API_KEY` is also included as fallback.
 - Key selection order preserves priority and deduplicates values.
-- Requests are retried with the next key only on rate-limit responses (for example `429`, `rate_limit`, `quota`, `resource exhausted`).
+- Requests are retried with the next key only on rate-limit responses (for
+  example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many
+concurrent requests`, `ThrottlingException`, or periodic usage-limit
+  messages).
 - Non-rate-limit failures fail immediately; no key rotation is attempted.
 - When all candidate keys fail, the final error is returned from the last attempt.
 
diff --git a/docs/help/faq.md b/docs/help/faq.md
index 4aef00d4594..53d6eb986b6 100644
--- a/docs/help/faq.md
+++ b/docs/help/faq.md
@@ -2362,6 +2362,16 @@ for usage/billing and raise limits as needed.
 
     Cooldowns apply to failing profiles (exponential backoff), so OpenClaw can keep responding even when a provider is rate-limited or temporarily failing.
 
+    The rate-limit bucket includes more than plain `429` responses. OpenClaw
+    also treats messages like `Too many concurrent requests`,
+    `ThrottlingException`, `resource exhausted`, and periodic usage-window
+    limits (`weekly/monthly limit reached`) as failover-worthy rate limits.
+
+    Context-overflow errors are different: signatures such as
+    `request_too_large`, `input exceeds the maximum number of tokens`, or
+    `input is too long for the model` stay on the compaction/retry path instead
+    of advancing model fallback.
+
   </Accordion>
 
   <Accordion title='What does "No credentials found for profile anthropic:default" mean?'>
diff --git a/docs/reference/session-management-compaction.md b/docs/reference/session-management-compaction.md
index f3dbd61c6d3..f394db5a170 100644
--- a/docs/reference/session-management-compaction.md
+++ b/docs/reference/session-management-compaction.md
@@ -216,7 +216,10 @@ Compaction is **persistent** (unlike session pruning). See [/concepts/session-pr
 
 In the embedded Pi agent, auto-compaction triggers in two cases:
 
-1. **Overflow recovery**: the model returns a context overflow error → compact → retry.
+1. **Overflow recovery**: the model returns a context overflow error
+   (`request_too_large`, `context length exceeded`, `input exceeds the maximum
+number of tokens`, `input is too long for the model`, and similar
+   provider-shaped variants) → compact → retry.
 2. **Threshold maintenance**: after a successful turn, when:
 
 `contextTokens > contextWindow - reserveTokens`