Commit Graph

6686 Commits

Author SHA1 Message Date
Peter Steinberger
2a1e47ffcb fix(agents): restore raw model run type coverage 2026-04-28 09:20:58 +01:00
Peter Steinberger
fb3ea9efb1 fix: keep gateway model probes raw 2026-04-28 09:11:47 +01:00
Ayaan Zaidi
5e4c29e9bc fix(agents): require claude fallback source provider 2026-04-28 13:35:59 +05:30
stainlu
4369c20bfe fix(agents): make originalProvider optional in runAgentAttempt params
The required-typed param introduced in 9987e7797f broke
attempt-execution.cli.test.ts and auth-profile-runtime-contract.test.ts
which construct runAgentAttempt params without an originalProvider field.
Make it optional and explicitly require the typeof check before passing
to isClaudeCliProvider so a missing field correctly skips the seed
(defensive default for fallback paths that didn't plumb the original
provider through, no-op for non-fallback paths).
2026-04-28 13:35:59 +05:30
stainlu
0bfcdcf044 fix(agents): scope claude-cli fallback seed and pair summary with boundary
Addresses review on #72069:

- Codex P1 ("Gate Claude prelude seeding by source provider"): the
  guard checked the *current* fallback candidate but not the failed
  attempt. A session that still carried a stale
  cliSessionBindings["claude-cli"] from an unrelated past run would
  inject Claude transcript context into a fallback chain that started
  on a different provider (e.g. openai -> openai-codex), leaking
  irrelevant prior conversation. Plumb `originalProvider` (the
  user-requested provider for the chain) through to runAgentAttempt
  and require `isClaudeCliProvider(originalProvider)` before reading
  Claude history.

- Codex P2 ("Prefer latest compact boundary when summary is missing"):
  the resolver always preferred the most recent explicit summary, so
  a later compaction without its own summary entry (rare crash case)
  paired stale summary text with post-latest-boundary turns. Restructure
  readClaudeCliFallbackSeed to queue summaries into pendingSummary and
  flush each boundary's pair atomically. A boundary with no preceding
  summary now correctly falls back to the boundary's own content
  rather than serving an older summary alongside fresh turns.

- Greptile P2 (newest-first break vs sparse coverage): the
  formatFallbackTurns walk intentionally stops on the first oversized
  turn so the prelude stays a contiguous "what was happening just
  before the failure" window. Document the design choice inline so a
  future maintainer doesn't reflexively change it to skip-and-continue.

Tests:
- New gateway cases for the boundary-without-summary edge case and
  for trailing summaries written without a paired boundary.
- existing 33 attempt-execution + 14 cli-session-history tests still
  pass; broader src/agents/command suite stays green (63/63).
2026-04-28 13:35:59 +05:30
stainlu
9691399e53 fix(agents): drop unnecessary non-null assertion in fallback prelude formatter
Local default oxlint did not run --type-aware so the warning was missed
on the initial commit; CI surfaced it via check-lint. Hoist the heading
into a named const so its length is read directly without the assertion.
2026-04-28 13:35:59 +05:30
stainlu
a96f1fa5ef fix(agents): seed claude-cli fallback prompts with prior-session context (#69973)
When a claude-cli attempt failed with a fallbackable error (e.g. a 402
billing limit), the next candidate -- typically a non-CLI provider --
ran with no prior conversation context. Claude Code keeps its own
JSONL session under ~/.claude/projects/, but the fallback runner only
sees what OpenClaw assembles from its own transcript, which is empty
for claude-cli sessions. The fallback model therefore behaved as if
the conversation just started, even though Claude later resumed fine.

Resolution mirrors what Claude Code itself does on resume after
compaction: prefer the explicit `/compact` summary, then append the
most recent post-boundary turns up to a char budget. Concretely:

- `readClaudeCliFallbackSeed` (gateway): walks the Claude JSONL with
  awareness of `type: "summary"` and `type: "system",
  subtype: "compact_boundary"` entries. Pre-boundary turns are dropped
  (they are represented by the summary); post-boundary turns become
  the recent-window. Multiple compactions are handled by preferring
  the latest summary. Path safety reuses the existing
  `resolveClaudeCliSessionFilePath` validation.

- `formatClaudeCliFallbackPrelude` / `buildClaudeCliFallbackContext\
Prelude` (agents helpers): format the harvested seed into a labeled
  prelude. Tool blocks are coalesced to compact "(tool call: name)" /
  "(tool result: …)" hints to keep the prompt budget honest. Newest
  turns are kept first when truncating; the summary is clearly
  labeled "(truncated)" if it overflows.

- `resolveFallbackRetryPrompt`: gains an optional
  `priorContextPrelude` that prepends before the existing retry
  marker. Empty/whitespace preludes are ignored; first-attempt prompts
  are unchanged.

- `runAgentAttempt`: builds the prelude when `isFallbackRetry === true`
  AND the new candidate is non-claude-cli AND a Claude-cli session
  binding is present. Same-provider fallbacks (claude-cli to
  claude-cli) are unaffected because Claude's own --resume still works.

Verified the new tests (12 in cli-session-history, 12 added to
attempt-execution) catch the regression: removing the prelude prepend
in resolveFallbackRetryPrompt makes both new prelude cases fail,
restoring the original cold-start behavior.

References:
- https://code.claude.com/docs/en/how-claude-code-works
- "Inside Claude Code: The Session File Format"
  https://databunny.medium.com/inside-claude-code-the-session-file-format-and-how-to-inspect-it-b9998e66d56b
2026-04-28 13:35:59 +05:30
Peter Steinberger
f3191b7962 fix(agents): abort stalled Anthropic SSE reads 2026-04-28 09:00:37 +01:00
Zhang Xiaofeng
a0900926c3 fix: add CJK error patterns to failover classification (#56242)
* fix: add CJK error patterns to failover classification

Chinese LLM providers (ZhipuAI/GLM, Bailian, Kimi/Moonshot, DeepSeek,
etc.) return error messages in Chinese. The existing failover
classification only matches English patterns, causing these errors to
fall through as unclassified — surfacing raw provider errors to users
instead of triggering model fallback.

Real production example: ZhipuAI error code 1234 returns
'网络错误,错误id:xxx,请联系客服。' (network error). This was not
matched by the existing 'network error' English pattern, so no failover
was triggered despite having a configured fallback model.

Changes:
- Add Chinese patterns to all error categories in failover-matches.ts:
  timeout, serverError, rateLimit, billing, auth, overloaded
- Add Chinese network error detection in formatTransportErrorCopy()
  for user-friendly error messages
- Add comprehensive test coverage for all CJK error categories

Follows the existing precedent set by Chinese context overflow patterns
in isContextOverflowError().

* fix: narrow billing pattern and fix placeholder issue URL

- Change '账户余额' to '账户余额不足' to avoid false positives on
  messages that merely mention account balance (per greptile review)
- Replace XXXXX placeholder with actual issue #56242

* fix: wire CJK auth failover patterns

* fix: classify CJK provider failover errors

* fix: place failover changelog entry in unreleased

---------

Co-authored-by: Altay <altay@uinaf.dev>
2026-04-28 10:44:17 +03:00
Peter Steinberger
04e96c11ea fix(gateway): skip plugin pricing scans when disabled 2026-04-28 08:23:53 +01:00
Peter Steinberger
12962dd883 fix(models): keep agent primaries strict 2026-04-28 08:01:42 +01:00
scoootscooob
3c636208b0 fix(messages): keep group replies tool-only by default
Rewrites the always-on reply handling so group/channel rooms default to message-tool-visible output, while `messages.groupChat.visibleReplies: \"automatic\"` preserves legacy auto-posting.\n\nThanks @scoootscooob.
2026-04-28 07:36:43 +01:00
Frank Yang
e008830d0e fix(agents): clean up local Claude stdio runs (#73292)
Clean up local Claude stdio one-shot runs before returning from embedded `openclaw agent --local`, including bundle MCP loopback teardown for local process resources.

Keeps gateway-owned MCP loopback cleanup internal to the Gateway, documents the local-vs-gateway behavior, and aligns the stale OpenAI provider-runtime fixture with the current unsupported Codex mini route.
2026-04-28 07:06:01 +01:00
Peter Steinberger
b5371bfd63 fix(auth): migrate flat auth profiles in doctor 2026-04-28 06:53:48 +01:00
Peter Steinberger
dc6031197b fix(models): hide unsupported codex mini route 2026-04-28 06:43:51 +01:00
Peter Steinberger
82eb90b8a2 fix(agents): preserve trusted tool media metadata 2026-04-28 06:19:41 +01:00
Peter Steinberger
8c8dfa768a refactor(models): share catalog capability lookup 2026-04-28 06:18:54 +01:00
Peter Steinberger
017b8db616 ci: speed up release validation shards 2026-04-28 06:14:23 +01:00
Peter Steinberger
3d53b39917 fix(gateway): honor configured vision models 2026-04-28 06:10:14 +01:00
Peter Steinberger
738f5f7508 fix: prevent channel login exec wedges 2026-04-28 05:16:43 +01:00
Peter Steinberger
ab95812d65 fix: record model fallback steps in trajectories 2026-04-28 05:08:34 +01:00
Peter Steinberger
714f3b59cc fix: preserve unknown compaction failure detail 2026-04-28 05:08:34 +01:00
Shakker
08cc44b57d feat: lazily load tool result middleware plugins 2026-04-28 04:33:47 +01:00
Peter Steinberger
6c0cdf43e4 fix: honor subagent spawn model overrides 2026-04-28 04:25:31 +01:00
Peter Steinberger
993fee4066 fix(agents): avoid empty Anthropic tool result blocks 2026-04-28 04:20:49 +01:00
Peter Steinberger
343f2d7245 fix: fail closed for invalid cron payload models 2026-04-28 04:12:54 +01:00
Peter Steinberger
2628326264 refactor: expose agent runtime test contracts 2026-04-28 03:40:57 +01:00
Peter Steinberger
e1acb61317 refactor: expose SDK test helper subpaths 2026-04-28 03:28:17 +01:00
Peter Steinberger
a0a0ab4d9e fix(memory): resolve custom embedding provider ids 2026-04-28 03:11:19 +01:00
Peter Steinberger
632b0fd580 chore: update workspace dependencies 2026-04-28 03:09:44 +01:00
Shakker
9682f3937e fix: skip runtime normalization for broad model lists 2026-04-28 02:59:06 +01:00
Neerav Makwana
1106cc7fd2 fix(cli): skip memory eager context warmup 2026-04-28 02:56:56 +01:00
Peter Steinberger
947aae5a99 refactor(models): move suppressions to manifests 2026-04-28 02:38:31 +01:00
Peter Steinberger
c0fdf9923b perf(agents): keep model resolution caches warm 2026-04-28 02:38:31 +01:00
Jochen Roessner
e9be25b554 perf: cache model resolution to avoid repeated plugin-provider loads
On ARM64 devices (e.g. Raspberry Pi 4), resolvePluginProviders takes ~20s
on first call. Three bugs cause this cost to be paid repeatedly:

1. ensureOpenClawModelsJson readyCache fingerprint includes models.json
   mtime. After a write, the stored fingerprint (pre-write mtime) never
   matches again, forcing every caller to re-run planOpenClawModelsJson.

2. readyCache has one entry per file path. Agents with different configs
   (e.g. main agent vs active-memory subagent) overwrite each other's
   entry, so neither benefits from caching.

3. resolveExplicitModelWithRegistry calls shouldSuppressBuiltInModel →
   resolveProviderPluginsForCatalogHooks on every agent run. The internal
   cache key includes the full config, so callers with slightly different
   configs each pay the full provider-load cost.

Fixes:
- Remove modelsFileMtimeMs from fingerprint (bug 1)
- Add noopCache to MODELS_JSON_STATE keyed by (path, mtime) — a noop
  result is config-agnostic, so any caller can reuse it (bug 2)
- Cache resolveExplicitModelWithRegistry by (provider, modelId, agentDir),
  stable for the lifetime of a gateway session (bug 3)

Measured on Raspberry Pi 4 (ARM64):
  active-memory subagent preprocessing: 66-75s → ~3s (warm)
  active-memory total elapsed:           ~96s  → ~14s (warm)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 02:38:31 +01:00
Peter Steinberger
13ff3142bd fix(agents): classify terminal results for fallback 2026-04-28 02:35:51 +01:00
Peter Steinberger
82ca94fdd7 test: curate google live profile signal 2026-04-28 02:32:44 +01:00
Peter Steinberger
de76ad506c test: stabilize release live e2e lanes 2026-04-28 02:30:36 +01:00
Peter Steinberger
b6a90188e7 test: trim hot test runtime imports 2026-04-28 02:25:55 +01:00
Peter Steinberger
b891dbb133 test: curate openrouter live profile signal 2026-04-28 02:17:48 +01:00
Peter Steinberger
6ebe3087fc test: narrow live gateway profile signal 2026-04-28 01:30:59 +01:00
Peter Steinberger
e508d81f79 perf: avoid registry loads in hot tests 2026-04-28 01:20:47 +01:00
Peter Steinberger
6b1089ffe5 fix: keep group silence on no-reply path 2026-04-28 01:20:00 +01:00
Peter Steinberger
e27c32b9b0 refactor(plugin-sdk): publish route helpers 2026-04-28 01:13:01 +01:00
Peter Steinberger
3eec9e4642 refactor(channels): reuse route context helpers 2026-04-28 01:13:00 +01:00
Peter Steinberger
3876682635 refactor(channels): centralize route normalization 2026-04-28 01:13:00 +01:00
EVA
1adaa28dc8 [plugin sdk] Add generic plugin host-hook contracts (#72287)
Merged via squash.

Prepared head SHA: 68e5f2ce19
Co-authored-by: 100yenadmin <239388517+100yenadmin@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-04-27 17:07:02 -07:00
Peter Steinberger
802f13ac15 fix(memory): cap ollama non-batch embedding concurrency 2026-04-28 00:34:18 +01:00
Peter Steinberger
56ef6334f0 perf: combine pty exec coverage 2026-04-28 00:17:03 +01:00
Peter Steinberger
05a93c1788 perf: avoid sdk client setup in openai transport test 2026-04-28 00:07:29 +01:00