Addresses review on #72069:
- Codex P1 ("Gate Claude prelude seeding by source provider"): the
guard checked the *current* fallback candidate but not the failed
attempt. A session that still carried a stale
cliSessionBindings["claude-cli"] from an unrelated past run would
inject Claude transcript context into a fallback chain that started
on a different provider (e.g. openai -> openai-codex), leaking
irrelevant prior conversation. Plumb `originalProvider` (the
user-requested provider for the chain) through to runAgentAttempt
and require `isClaudeCliProvider(originalProvider)` before reading
Claude history.
- Codex P2 ("Prefer latest compact boundary when summary is missing"):
the resolver always preferred the most recent explicit summary, so
a later compaction without its own summary entry (rare crash case)
paired stale summary text with post-latest-boundary turns. Restructure
readClaudeCliFallbackSeed to queue summaries into pendingSummary and
flush each boundary's pair atomically. A boundary with no preceding
summary now correctly falls back to the boundary's own content
rather than serving an older summary alongside fresh turns.
- Greptile P2 (newest-first break vs sparse coverage): the
formatFallbackTurns walk intentionally stops on the first oversized
turn so the prelude stays a contiguous "what was happening just
before the failure" window. Document the design choice inline so a
future maintainer doesn't reflexively change it to skip-and-continue.
Tests:
- New gateway cases for the boundary-without-summary edge case and
for trailing summaries written without a paired boundary.
- existing 33 attempt-execution + 14 cli-session-history tests still
pass; broader src/agents/command suite stays green (63/63).
Local default oxlint did not run --type-aware so the warning was missed
on the initial commit; CI surfaced it via check-lint. Hoist the heading
into a named const so its length is read directly without the assertion.
When a claude-cli attempt failed with a fallbackable error (e.g. a 402
billing limit), the next candidate -- typically a non-CLI provider --
ran with no prior conversation context. Claude Code keeps its own
JSONL session under ~/.claude/projects/, but the fallback runner only
sees what OpenClaw assembles from its own transcript, which is empty
for claude-cli sessions. The fallback model therefore behaved as if
the conversation just started, even though Claude later resumed fine.
Resolution mirrors what Claude Code itself does on resume after
compaction: prefer the explicit `/compact` summary, then append the
most recent post-boundary turns up to a char budget. Concretely:
- `readClaudeCliFallbackSeed` (gateway): walks the Claude JSONL with
awareness of `type: "summary"` and `type: "system",
subtype: "compact_boundary"` entries. Pre-boundary turns are dropped
(they are represented by the summary); post-boundary turns become
the recent-window. Multiple compactions are handled by preferring
the latest summary. Path safety reuses the existing
`resolveClaudeCliSessionFilePath` validation.
- `formatClaudeCliFallbackPrelude` / `buildClaudeCliFallbackContext\
Prelude` (agents helpers): format the harvested seed into a labeled
prelude. Tool blocks are coalesced to compact "(tool call: name)" /
"(tool result: …)" hints to keep the prompt budget honest. Newest
turns are kept first when truncating; the summary is clearly
labeled "(truncated)" if it overflows.
- `resolveFallbackRetryPrompt`: gains an optional
`priorContextPrelude` that prepends before the existing retry
marker. Empty/whitespace preludes are ignored; first-attempt prompts
are unchanged.
- `runAgentAttempt`: builds the prelude when `isFallbackRetry === true`
AND the new candidate is non-claude-cli AND a Claude-cli session
binding is present. Same-provider fallbacks (claude-cli to
claude-cli) are unaffected because Claude's own --resume still works.
Verified the new tests (12 in cli-session-history, 12 added to
attempt-execution) catch the regression: removing the prelude prepend
in resolveFallbackRetryPrompt makes both new prelude cases fail,
restoring the original cold-start behavior.
References:
- https://code.claude.com/docs/en/how-claude-code-works
- "Inside Claude Code: The Session File Format"
https://databunny.medium.com/inside-claude-code-the-session-file-format-and-how-to-inspect-it-b9998e66d56b
* fix: add CJK error patterns to failover classification
Chinese LLM providers (ZhipuAI/GLM, Bailian, Kimi/Moonshot, DeepSeek,
etc.) return error messages in Chinese. The existing failover
classification only matches English patterns, causing these errors to
fall through as unclassified — surfacing raw provider errors to users
instead of triggering model fallback.
Real production example: ZhipuAI error code 1234 returns
'网络错误,错误id:xxx,请联系客服。' (network error). This was not
matched by the existing 'network error' English pattern, so no failover
was triggered despite having a configured fallback model.
Changes:
- Add Chinese patterns to all error categories in failover-matches.ts:
timeout, serverError, rateLimit, billing, auth, overloaded
- Add Chinese network error detection in formatTransportErrorCopy()
for user-friendly error messages
- Add comprehensive test coverage for all CJK error categories
Follows the existing precedent set by Chinese context overflow patterns
in isContextOverflowError().
* fix: narrow billing pattern and fix placeholder issue URL
- Change '账户余额' to '账户余额不足' to avoid false positives on
messages that merely mention account balance (per greptile review)
- Replace XXXXX placeholder with actual issue #56242
* fix: wire CJK auth failover patterns
* fix: classify CJK provider failover errors
* fix: place failover changelog entry in unreleased
---------
Co-authored-by: Altay <altay@uinaf.dev>
When a bundled plugin (e.g. plugin-sdk loaded transitively) is resolved via a
pluginRoot already inside the existing plugin-runtime-deps cache, its path
does not match the `dist/extensions/<plugin>` shape, so
resolveBundledPluginPackageRoot() returns null and the caller falls back to
the raw pluginRoot. resolveExistingExternalBundledRuntimeDepsRoots() then
rejected the path because the relative segment crossed a directory separator,
causing the resolver to mint a fresh `openclaw-unknown-<pathhash>` cache
beside the real versioned one. The two caches raced replaceNodeModulesDir()
and triggered ENOTEMPTY crash loops.
Treat any descendant of `<base>/openclaw-*` as belonging to that cache key
so nested resolutions return the existing versioned root instead of creating
a self-referential zombie cache.
Fixes#72956
createSubsystemLogger writes through writeConsoleLine, which intentionally
bypasses the patched console.* capture handler in src/logging/console.ts to
avoid recursion. That bypass also skipped the sink-boundary
redactSensitiveText() gate, so secrets reaching subsystem loggers as
message strings or formatted meta could appear verbatim on the terminal —
a follow-up to the file-transport redaction landed in #67953, tracked
under #64046.
Apply redactSensitiveText() at the writeConsoleLine() exit, immediately
after the existing Windows surrogate sanitization and before dispatching
to the rawConsole sink. This covers all subsystem console paths
(trace/debug/info/warn/error/fatal and .raw) because they share the same
writeConsoleLine() exit, matching the redact-at-sink-boundary pattern
already used in console.ts and the file transport.
Closes#73284