Commit Graph

34 Commits

Author SHA1 Message Date
Peter Steinberger
6739c28718 refactor: clarify auth failover policy 2026-04-04 02:49:18 +09:00
Peter Steinberger
865fa2ba72 fix: narrow auth permanent lockouts 2026-04-04 02:35:27 +09:00
Extra Small
42e1d489fd fix(auth): use shorter backoff for auth_permanent failures
auth_permanent errors (e.g. API_KEY_INVALID) can be caused by transient
provider outages rather than genuinely revoked credentials. Previously
these used the same 5h-24h billing backoff, which left providers disabled
long after the upstream issue resolved.

Introduce separate authPermanentBackoffMinutes (default: 10) and
authPermanentMaxMinutes (default: 60) config options so auth_permanent
failures recover in minutes rather than hours.

Fixes #56838
2026-04-04 02:35:27 +09:00
ryanngit
0b3d31c0ce feat(auth): WHAM-aware Codex cooldown for multi-profile setups (#58625)
Instead of exponential backoff guesses on Codex 429, probe the WHAM
usage API to determine real availability and write accurate cooldowns.

- Burst/concurrency contention: 15s circuit-break
- Genuine rate limit: proportional to real reset time (capped 2-4h)
- Expired token: 12h cooldown
- Dead account: 24h cooldown
- Probe failure: 30s fail-open

This prevents cascade lockouts when multiple agents share a pool of
Codex profiles, and avoids wasted retries on genuinely exhausted
profiles.

Closes #26329, relates to #1815, #1522, #23996, #54060

Co-authored-by: ryanngit <ryanngit@users.noreply.github.com>
2026-03-31 21:09:25 -04:00
kiranvk2011
84401223c7 fix: per-model cooldown scope, stepped backoff, and user-facing rate-limit message (#49834)
Merged via squash.

Prepared head SHA: 7c488c070c
Co-authored-by: kiranvk-2011 <91108465+kiranvk-2011@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-25 22:03:49 +03:00
Peter Steinberger
f182c3a292 test: inject thread-safe deps for agent tools 2026-03-23 04:16:53 -07:00
VibhorGautam
4473242b4f fix: use unknown instead of rate_limit as default cooldown reason (#42911)
Merged via squash.

Prepared head SHA: bebf6704d7
Co-authored-by: VibhorGautam <55019395+VibhorGautam@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-11 21:34:14 +03:00
Altay
531e8362b1 Agents: add fallback error observations (#41337)
Merged via squash.

Prepared head SHA: 852469c82f
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-10 01:12:10 +03:00
zerone0x
5f90883ad3 fix(auth): reset cooldown error counters on expiry to prevent infinite escalation (#41028)
Merged via squash.

Prepared head SHA: 89bd83f09a
Co-authored-by: zerone0x <39543393+zerone0x@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-09 23:40:11 +03:00
Florian Hines
33e7394861 fix(providers): make all models available in kilocode provider (#32352)
* kilocode: dynamic model discovery, kilo/auto default, cooldown exemption

- Replace 9-model hardcoded catalog with dynamic discovery from
  GET /api/gateway/models (Venice-like pattern with static fallback)
- Default model changed from anthropic/claude-opus-4.6 to kilo/auto
  (smart routing model)
- Add createKilocodeWrapper for X-KILOCODE-FEATURE header injection
  and reasoning.effort handling (skip for kilo/auto)
- Add kilocode to cooldown-exempt providers (proxy like OpenRouter)
- Keep sync buildKilocodeProvider for onboarding, add async
  buildKilocodeProviderWithDiscovery for implicit provider resolution
- Per-token gateway pricing converted to per-1M-token for cost fields

* kilocode: skip reasoning injection for x-ai models, harden discovery loop

* fix(kilocode): keep valid discovered duplicates (openclaw#32352, thanks @pandemicsyn)

* refactor(proxy): normalize reasoning payload guards (openclaw#32352, thanks @pandemicsyn)

* chore(changelog): note kilocode hardening (openclaw#32352, thanks @pandemicsyn and @vincentkoc)

* chore(changelog): fix kilocode note format (openclaw#32352, thanks @pandemicsyn and @vincentkoc)

* test(kilocode): support auto-model override cases (openclaw#32352, thanks @pandemicsyn)

* Update CHANGELOG.md

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-07 08:14:06 -08:00
Altay
6e962d8b9e fix(agents): handle overloaded failover separately (#38301)
* fix(agents): skip auth-profile failure on overload

* fix(agents): note overload auth-profile fallback fix

* fix(agents): classify overloaded failures separately

* fix(agents): back off before overload failover

* fix(agents): tighten overload probe and backoff state

* fix(agents): persist overloaded cooldown across runs

* fix(agents): tighten overloaded status handling

* test(agents): add overload regression coverage

* fix(agents): restore runner imports after rebase

* test(agents): add overload fallback integration coverage

* fix(agents): harden overloaded failover abort handling

* test(agents): tighten overload classifier coverage

* test(agents): cover all-overloaded fallback exhaustion

* fix(cron): retry overloaded fallback summaries

* fix(cron): treat HTTP 529 as overloaded retry
2026-03-07 01:42:11 +03:00
Shadow
e28ff1215c fix: discord auto presence health signal (#33277) (thanks @thewilloftheshadow) (#33277) 2026-03-03 11:20:59 -06:00
Peter Steinberger
9617ac9dd5 refactor: dedupe agent and reply runtimes 2026-03-02 19:57:33 +00:00
Aleksandrs Tihenko
c0026274d9 fix(auth): distinguish revoked API keys from transient auth errors (#25754)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 8f9c07a200
Co-authored-by: rrenamed <87486610+rrenamed@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-25 19:47:16 -05:00
Vincent Koc
daa4f34ce8 Auth: bypass cooldown tracking for OpenRouter 2026-02-24 19:12:08 -05:00
Vignesh Natarajan
5c7c37a02a Agents: infer auth-profile unavailable failover reason 2026-02-22 16:10:32 -08:00
Peter Steinberger
07527e22ce refactor(auth-profiles): centralize active-window logic + strengthen regression coverage 2026-02-22 13:23:19 +01:00
Peter Steinberger
7c3c406a35 fix: keep auth-profile cooldown windows immutable in-window (#23536) (thanks @arosstale) 2026-02-22 13:14:02 +01:00
artale
dc69610d51 fix(auth-profiles): never shorten cooldown deadline on retry
When the backoff saturates at 60 min and retries fire every 30 min
(e.g. cron jobs), each failed request was resetting cooldownUntil to
now+60m.  Because now+60m < existing deadline, the window kept getting
renewed and the profile never recovered without manually clearing
usageStats in auth-profiles.json.

Fix: only write a new cooldownUntil (or disabledUntil for billing) when
the new deadline is strictly later than the existing one.  This lets the
original window expire naturally while still allowing genuine backoff
extension when error counts climb further.

Fixes #23516

[AI-assisted]
2026-02-22 13:14:02 +01:00
Nabbil Khan
f91034aa6b fix(auth): clear all usage stats fields in clearAuthProfileCooldown (openclaw#19211) thanks @nabbilkhan
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: nabbilkhan <203121263+nabbilkhan@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-19 22:21:37 -06:00
Peter Steinberger
28d49b8d44 refactor(auth-profiles): reuse cooldown timestamp resolver 2026-02-18 17:13:47 +00:00
Peter Steinberger
b8b43175c5 style: align formatting with oxfmt 0.33 2026-02-18 01:34:35 +00:00
Peter Steinberger
31f9be126c style: run oxfmt and fix gate failures 2026-02-18 01:29:02 +00:00
cpojer
d0cb8c19b2 chore: wtf. 2026-02-17 13:36:48 +09:00
Sebastian
ed11e93cf2 chore(format) 2026-02-16 23:20:16 -05:00
cpojer
90ef2d6bdf chore: Update formatting. 2026-02-17 09:18:40 +09:00
nabbilkhan
03cadc4b7a fix(auth): auto-expire stale auth profile cooldowns and reset error count
When an auth profile hits a rate limit, `errorCount` is incremented and
`cooldownUntil` is set with exponential backoff. After the cooldown
expires, the time-based check correctly returns false — but `errorCount`
persists. The next transient failure immediately escalates to a much
longer cooldown because the backoff formula uses the stale count:

  60s × 5^(errorCount-1), max 1h

This creates a positive feedback loop where profiles appear permanently
stuck after rate limits, requiring manual JSON editing to recover.

Add `clearExpiredCooldowns()` which sweeps all profiles on every call to
`resolveAuthProfileOrder()` and clears expired `cooldownUntil` /
`disabledUntil` values along with resetting `errorCount` and
`failureCounts` — giving the profile a fair retry window (circuit-breaker
half-open → closed transition).

Key design decisions:
- `cooldownUntil` and `disabledUntil` handled independently (a profile
  can have both; only the expired one is cleared)
- `errorCount` reset only when ALL unusable windows have expired
- `lastFailureAt` preserved for the existing failureWindowMs decay logic
- In-memory mutation; disk persistence happens lazily on the next store
  write, matching the existing save pattern

Fixes #3604
Related: #13623, #15851, #11972, #8434
2026-02-16 12:53:45 -06:00
Ítalo Souza
39bb1b3322 fix: auto-recover primary model after rate-limit cooldown expires (#17478) (#18045)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: f7a7865727
Co-authored-by: PlayerGhost <28265945+PlayerGhost@users.noreply.github.com>
Co-authored-by: sebslight <19554889+sebslight@users.noreply.github.com>
Reviewed-by: @sebslight
2026-02-16 08:03:35 -05:00
cpojer
f06dd8df06 chore: Enable "experimentalSortImports" in Oxfmt and reformat all imorts. 2026-02-01 10:03:47 +09:00
cpojer
5ceff756e1 chore: Enable "curly" rule to avoid single-statement if confusion/errors. 2026-01-31 16:19:20 +09:00
Peter Steinberger
9a7160786a refactor: rename to openclaw 2026-01-30 03:16:21 +01:00
Peter Steinberger
6d16a658e5 refactor: rename clawdbot to moltbot with legacy compat 2026-01-27 12:21:02 +00:00
Peter Steinberger
c379191f80 chore: migrate to oxlint and oxfmt
Co-authored-by: Christoph Nakazawa <christoph.pojer@gmail.com>
2026-01-14 15:02:19 +00:00
Peter Steinberger
bcbfb357be refactor(src): split oversized modules 2026-01-14 01:17:56 +00:00