Detect text accumulated before tool_use blocks in the Claude CLI
streaming parser and emit it as commentary via a new onCommentaryText
callback. This enables the same commentary progress display that the
Codex backend already provides through preamble item events.
- Add onCommentaryText optional callback to createCliJsonlStreamingParser
- Flush accumulated assistantText as commentary when content_block_start
with tool_use type is encountered
- Track last flushed position to avoid duplicate emissions on consecutive
tool_use blocks without intervening text
- Wire the callback in both execute.ts (regular CLI spawn + live session)
and claude-live-session.ts, emitting AgentItemEventData with
kind=preamble and progressText
- Add 3 test cases covering: text before tool_use, empty text before
tool_use, and consecutive tool_use dedup
Fix live model inference edge cases across provider streaming, model switching, outbound delivery, and gateway tool resolution.
Includes live/provider issue fixes and leaves #89100 explicitly partial for the remaining FM-2 group routing case.
* fix(agents): route media task hints below the system-prompt cache boundary
Per-turn image/video/music generation task hints were injected into the
static prependSystemContext slot, landing above the cache boundary inside the
cacheable prefix. The hints are present only on user/manual turns and vary
with active media tasks, so the cacheable prefix shifted turn-to-turn and
defeated Anthropic/OpenAI prompt caching (#85203).
Split the per-turn media hints out of the prepend resolver into
resolveAttemptMediaTaskSystemPromptAddition and route them below the boundary
via the existing prependSystemPromptAddition helper, matching how subagent and
context-engine system-prompt additions are already routed. The static plugin
prependSystemContext / appendSystemContext hook fields are unchanged and
remain in the cacheable prefix. Applied at both consumers (embedded agent
runner and CLI runner).
* fix(agents): keep media task hints below the cache boundary for hook systemPrompt overrides
A before_prompt_build hook that returns a full systemPrompt override replaces
the base prompt with marker-free text. Per-turn media-generation task hints
were then front-prepended into that marker-free prompt, which providers cache
as a single block, so the cached prefix still shifted turn-to-turn on the
override path (#85203).
Wrap the base with ensureSystemPromptCacheBoundary at both media-routing sites
(embedded agent runner and CLI runner) so a marker-free override gets an
appended boundary and the hint routes into the uncached suffix. The helper is
idempotent, so marker-bearing prompts are unchanged. The shared
prependSystemPromptAddition wrapper and the static prependSystemContext /
appendSystemContext hook fields are untouched.
* fix(agents): keep marker-free idle prompts cacheable below the boundary
A marker-free hook systemPrompt override only had the cache boundary
ensured on turns with an active media task. On idle turns the later
appendModelIdentitySystemPrompt landed above the absent boundary, so the
idle cached system prefix diverged from active turns and prompt caching
broke across active/idle transitions. Ensure the boundary regardless of
media state in both the embedded and CLI runners, and extend the
regression to cover the model-identity append across active->idle.
* fix(agents): scope cache-boundary ensure to the model-identity append
Ensuring the boundary unconditionally on media-idle turns appended a
boundary marker to empty raw/gateway system prompts (turning "" into a
marker-only prompt) and to prompts with nothing below the boundary.
Instead ensure the boundary only when a model identity line is actually
appended to a non-empty prompt, in both the embedded and CLI runners.
This still keeps the identity below the boundary for marker-free hook
systemPrompt overrides (the #85203 idle-cache regression) while leaving
empty and identity-less prompts untouched.
* test: refresh stale type and lint expectations
* test: stabilize CI timeout checks
* test: satisfy channel entry lint
* fix(agents): skip cache boundary for blank prompts
* fix(channels): keep draft flush timer referenced
* test(agents): tolerate failed exec timeout setup
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Extract shared normalization/coercion helpers into private @openclaw/normalization-core workspace package while preserving existing plugin SDK helper subpaths.\n\nAlso keeps direct normalization-core imports internal, wires UI/build/loader resolution, and replaces the slow PR network CodeQL lane with a fast added-line boundary scan while retaining full CodeQL for scheduled/manual runs.\n\nVerification: local moved tests, plugin SDK boundary tests, extension loader tests, agents-support shard, UI build/test, build artifacts, lint, workflow guards, autoreview, and GitHub CI passed on PR head 963d893715.
Add MCP server add/configure/login/reload flows plus config/runtime support for enablement, filters, timeouts, OAuth, TLS, and parallel execution hints. Update docs and tests for the expanded MCP operator surface.
A claude-cli session whose JSONL transcript ends with an assistant
`tool_use` content block that was never answered by a `tool_result` user
message cannot resume — claude-cli will sit waiting for the missing
`tool_result`, hit its no-output watchdog, and the runtime kills it
with `reason=abort`. The dispatcher then sees an empty payload and emits
NO_REPLY, which to the user looks like the agent silently ignored their
message — same end-user symptom as the binding-flush amnesia bug, but a
different root cause.
The orphan can be left behind when:
- Gateway restarts mid-tool (brew upgrade, manual kickstart, OOM,
crash) — claude was waiting on a tool result that never arrived.
- `claude-live-session.ts` no-output watchdog fires while a tool is
actively running and OC kills the subprocess.
- The tool itself crashed or hung past its own deadline.
In all cases the resumed session is dead until the binding gets cleared,
because every subsequent resume hits the same trailing tool_use and the
same kill cycle. Observed in production on a personal OpenClaw gateway
(3d-engineer agent, 50-message-deep transcript ending in a Bash
`tool_use`; every Telegram message after the orphan landed silently
aborted at the 180s no-output mark).
Add `claudeCliSessionTranscriptHasOrphanedToolUse` to the helpers that
walks the JSONL, finds the last assistant message, and returns true if
any of its `tool_use` ids has no matching `tool_result` later in the
file. Wire into `prepareCliRunContext` as a second invalidator gate
alongside `missing-transcript`. The new `invalidatedReason:
"orphaned-tool-use"` follows the same path as missing-transcript: the
binding is dropped, this turn starts a fresh session, and the prior
context is reseeded into the new session via `RAW_TRANSCRIPT_RESEED`.
Detection only considers TRAILING orphans — an unanswered tool_use
deeper in history is inert because a later assistant message already
moved past it. Only the most recent assistant message's tool_use ids
matter for forward progress.
Probe runs only for claude-cli providers and only when the transcript-
content gate already passed, so we add no I/O on already-invalidated
sessions and no behavior change for non-claude providers.
AI-assisted: yes. Tooling: Claude Opus + claude-cli.
Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state.
- Capture the latest claude-cli session_id from JSONL output.
- Resolve Claude project transcript paths through the shared canonical project-dir resolver.
- Probe transcript content from the actual CLI process cwd.
- Thanks @benjamin1492!
* feat: add Claude Opus 4.8 support
* fix: omit Vertex Opus sampling overrides
* fix: preserve Opus adaptive thinking levels
* fix: clamp Anthropic max effort support
* fix: use sha256 for QA mock call ids
* fix: type Anthropic transport test model metadata
* test: update PDF model default for Opus 4.8