mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-05 22:32:12 +00:00
docs: refresh failover and compaction pattern refs
This commit is contained in:
@@ -32,7 +32,9 @@ Auto-compaction is on by default. It runs when the session nears the context
|
||||
limit, or when the model returns a context-overflow error (in which case
|
||||
OpenClaw compacts and retries). Typical overflow signatures include
|
||||
`request_too_large`, `context length exceeded`, `input exceeds the maximum
|
||||
number of tokens`, and `input is too long for the model`.
|
||||
number of tokens`, `input token count exceeds the maximum number of input
|
||||
tokens`, `input is too long for the model`, and `ollama error: context length
|
||||
exceeded`.
|
||||
|
||||
<Info>
|
||||
Before compacting, OpenClaw automatically reminds the agent to save important
|
||||
|
||||
@@ -122,6 +122,7 @@ When a profile fails due to auth/rate‑limit errors (or a timeout that looks
|
||||
like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
|
||||
That rate-limit bucket is broader than plain `429`: it also includes provider
|
||||
messages such as `Too many concurrent requests`, `ThrottlingException`,
|
||||
`concurrency limit reached`, `workers_ai ... quota limit exceeded`,
|
||||
`throttled`, `resource exhausted`, and periodic usage-window limits such as
|
||||
`weekly/monthly limit reached`.
|
||||
Format/invalid‑request errors (for example Cloud Code Assist tool call ID
|
||||
@@ -203,8 +204,9 @@ timeouts that exhausted profile rotation (other errors do not advance fallback).
|
||||
|
||||
Overloaded and rate-limit errors are handled more aggressively than billing
|
||||
cooldowns. By default, OpenClaw allows one same-provider auth-profile retry,
|
||||
then switches to the next configured model fallback without waiting. Tune this
|
||||
with `auth.cooldowns.overloadedProfileRotations`,
|
||||
then switches to the next configured model fallback without waiting.
|
||||
Provider-busy signals such as `ModelNotReadyException` land in that overloaded
|
||||
bucket. Tune this with `auth.cooldowns.overloadedProfileRotations`,
|
||||
`auth.cooldowns.overloadedBackoffMs`, and
|
||||
`auth.cooldowns.rateLimitedProfileRotations`.
|
||||
|
||||
@@ -248,7 +250,9 @@ Model fallback does not continue on:
|
||||
- explicit aborts that are not timeout/failover-shaped
|
||||
- context overflow errors that should stay inside compaction/retry logic
|
||||
(for example `request_too_large`, `INVALID_ARGUMENT: input exceeds the maximum
|
||||
number of tokens`, or `The input is too long for the model`)
|
||||
number of tokens`, `input token count exceeds the maximum number of input
|
||||
tokens`, `The input is too long for the model`, or `ollama error: context
|
||||
length exceeded`)
|
||||
- a final unknown error when there are no candidates left
|
||||
|
||||
### Cooldown skip vs probe behavior
|
||||
|
||||
@@ -168,8 +168,8 @@ surface.
|
||||
- Key selection order preserves priority and deduplicates values.
|
||||
- Requests are retried with the next key only on rate-limit responses (for
|
||||
example `429`, `rate_limit`, `quota`, `resource exhausted`, `Too many
|
||||
concurrent requests`, `ThrottlingException`, or periodic usage-limit
|
||||
messages).
|
||||
concurrent requests`, `ThrottlingException`, `concurrency limit reached`,
|
||||
`workers_ai ... quota limit exceeded`, or periodic usage-limit messages).
|
||||
- Non-rate-limit failures fail immediately; no key rotation is attempted.
|
||||
- When all candidate keys fail, the final error is returned from the last attempt.
|
||||
|
||||
|
||||
@@ -144,7 +144,9 @@ hits a provider rate limit.
|
||||
- Google providers also include `GOOGLE_API_KEY` as an additional fallback.
|
||||
- The same key list is deduplicated before use.
|
||||
- OpenClaw retries with the next key only for rate-limit errors (for example
|
||||
`429`, `rate_limit`, `quota`, `resource exhausted`).
|
||||
`429`, `rate_limit`, `quota`, `resource exhausted`, `Too many concurrent
|
||||
requests`, `ThrottlingException`, `concurrency limit reached`, or
|
||||
`workers_ai ... quota limit exceeded`).
|
||||
- Non-rate-limit errors are not retried with alternate keys.
|
||||
- If all keys fail, the final error from the last attempt is returned.
|
||||
|
||||
|
||||
@@ -3161,9 +3161,9 @@ Notes:
|
||||
- `authPermanentBackoffMinutes`: base backoff in minutes for high-confidence `auth_permanent` failures (default: `10`).
|
||||
- `authPermanentMaxMinutes`: cap in minutes for `auth_permanent` backoff growth (default: `60`).
|
||||
- `failureWindowHours`: rolling window in hours used for backoff counters (default: `24`).
|
||||
- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`).
|
||||
- `overloadedProfileRotations`: maximum same-provider auth-profile rotations for overloaded errors before switching to model fallback (default: `1`). Provider-busy shapes such as `ModelNotReadyException` land here.
|
||||
- `overloadedBackoffMs`: fixed delay before retrying an overloaded provider/profile rotation (default: `0`).
|
||||
- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`).
|
||||
- `rateLimitedProfileRotations`: maximum same-provider auth-profile rotations for rate-limit errors before switching to model fallback (default: `1`). That rate-limit bucket includes provider-shaped text such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, and `resource exhausted`.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -2476,8 +2476,10 @@ for usage/billing and raise limits as needed.
|
||||
|
||||
The rate-limit bucket includes more than plain `429` responses. OpenClaw
|
||||
also treats messages like `Too many concurrent requests`,
|
||||
`ThrottlingException`, `resource exhausted`, and periodic usage-window
|
||||
limits (`weekly/monthly limit reached`) as failover-worthy rate limits.
|
||||
`ThrottlingException`, `concurrency limit reached`,
|
||||
`workers_ai ... quota limit exceeded`, `resource exhausted`, and periodic
|
||||
usage-window limits (`weekly/monthly limit reached`) as failover-worthy
|
||||
rate limits.
|
||||
|
||||
Some billing-looking responses are not `402`, and some HTTP `402`
|
||||
responses also stay in that transient bucket. If a provider returns
|
||||
@@ -2490,17 +2492,21 @@ for usage/billing and raise limits as needed.
|
||||
`rate_limit`, not a long billing disable.
|
||||
|
||||
Context-overflow errors are different: signatures such as
|
||||
`request_too_large`, `input exceeds the maximum number of tokens`, or
|
||||
`input is too long for the model` stay on the compaction/retry path instead
|
||||
of advancing model fallback.
|
||||
`request_too_large`, `input exceeds the maximum number of tokens`,
|
||||
`input token count exceeds the maximum number of input tokens`,
|
||||
`input is too long for the model`, or `ollama error: context length
|
||||
exceeded` stay on the compaction/retry path instead of advancing model
|
||||
fallback.
|
||||
|
||||
Generic server-error text is intentionally narrower than "anything with
|
||||
unknown/error in it". OpenClaw does treat provider-scoped transient shapes
|
||||
such as Anthropic bare `An unknown error occurred`, OpenRouter bare
|
||||
`Provider returned error`, stop-reason errors like `Unhandled stop reason:
|
||||
error`, and JSON `api_error` payloads with transient server text
|
||||
error`, JSON `api_error` payloads with transient server text
|
||||
(`internal server error`, `unknown error, 520`, `upstream error`, `backend
|
||||
error`) as timeout/failover signals when the provider context matches.
|
||||
error`), and provider-busy errors such as `ModelNotReadyException` as
|
||||
failover-worthy timeout/overloaded signals when the provider context
|
||||
matches.
|
||||
Generic internal fallback text like `LLM request failed with an unknown
|
||||
error.` stays conservative and does not trigger model fallback by itself.
|
||||
|
||||
|
||||
@@ -326,7 +326,12 @@ trackSessionManagerAccess(params.sessionFile);
|
||||
|
||||
### Compaction
|
||||
|
||||
Auto-compaction triggers on context overflow. `compactEmbeddedPiSessionDirect()` handles manual compaction:
|
||||
Auto-compaction triggers on context overflow. Common overflow signatures
|
||||
include `request_too_large`, `context length exceeded`, `input exceeds the
|
||||
maximum number of tokens`, `input token count exceeds the maximum number of
|
||||
input tokens`, `input is too long for the model`, and `ollama error: context
|
||||
length exceeded`. `compactEmbeddedPiSessionDirect()` handles manual
|
||||
compaction:
|
||||
|
||||
```typescript
|
||||
const compactResult = await compactEmbeddedPiSessionDirect({
|
||||
|
||||
@@ -231,8 +231,9 @@ In the embedded Pi agent, auto-compaction triggers in two cases:
|
||||
|
||||
1. **Overflow recovery**: the model returns a context overflow error
|
||||
(`request_too_large`, `context length exceeded`, `input exceeds the maximum
|
||||
number of tokens`, `input is too long for the model`, and similar
|
||||
provider-shaped variants) → compact → retry.
|
||||
number of tokens`, `input token count exceeds the maximum number of input
|
||||
tokens`, `input is too long for the model`, `ollama error: context length
|
||||
exceeded`, and similar provider-shaped variants) → compact → retry.
|
||||
2. **Threshold maintenance**: after a successful turn, when:
|
||||
|
||||
`contextTokens > contextWindow - reserveTokens`
|
||||
|
||||
Reference in New Issue
Block a user