Commit Graph

265 Commits

Author SHA1 Message Date
Ayaan Zaidi
66c9feb41d fix(cli): type claude live commentary flag (#89834) (thanks @anagnorisis2peripeteia) 2026-06-08 21:13:22 +05:30
Ayaan Zaidi
817a0910f3 fix(cli): suppress claude commentary answer partials 2026-06-08 21:13:22 +05:30
Cameron Beeley
5fef91f1de fix: apply backend output transforms to commentary progress text 2026-06-08 21:13:22 +05:30
Cameron Beeley
d03952ccd4 feat: add commentary text emission to Claude CLI streaming parser
Detect text accumulated before tool_use blocks in the Claude CLI
streaming parser and emit it as commentary via a new onCommentaryText
callback. This enables the same commentary progress display that the
Codex backend already provides through preamble item events.

- Add onCommentaryText optional callback to createCliJsonlStreamingParser
- Flush accumulated assistantText as commentary when content_block_start
  with tool_use type is encountered
- Track last flushed position to avoid duplicate emissions on consecutive
  tool_use blocks without intervening text
- Wire the callback in both execute.ts (regular CLI spawn + live session)
  and claude-live-session.ts, emitting AgentItemEventData with
  kind=preamble and progressText
- Add 3 test cases covering: text before tool_use, empty text before
  tool_use, and consecutive tool_use dedup
2026-06-08 21:13:22 +05:30
Shakker
ca7047e460 fix: restore cli prepare session env 2026-06-08 14:19:41 +01:00
Shakker
226341e847 test: scope bundle mcp harness env 2026-06-08 14:19:41 +01:00
Shakker
81d9c2f41f test: scope session history state 2026-06-04 22:49:01 +01:00
Peter Steinberger
8fb70a90bd docs: document cli runner preparation tests 2026-06-04 14:11:04 -04:00
Peter Steinberger
429bf9fe84 docs: document cli runner bundle mcp tests 2026-06-04 14:07:58 -04:00
Vincent Koc
ecb30fece4 fix(ci): stabilize include permission checks 2026-06-04 07:35:25 -07:00
Peter Steinberger
087fcf4085 docs: document cli runner history helpers 2026-06-04 09:08:46 -04:00
Peter Steinberger
1d7d8a1658 docs: document cli runner execution helpers 2026-06-04 09:06:53 -04:00
Peter Steinberger
f178c31305 docs: document cli runner shared helpers 2026-06-04 09:05:29 -04:00
Peter Steinberger
e6ec78ede4 docs: document cli runner mcp helpers 2026-06-04 09:04:02 -04:00
Peter Steinberger
b7b069c4d6 docs: document Claude CLI runner helpers 2026-06-04 01:08:59 -04:00
Peter Steinberger
e168a82367 docs: document cli runner mcp helpers 2026-06-03 22:59:39 -04:00
NVIDIAN
eb417bc672 fix(messages): preserve inbound audio for message-tool TTS
Preserve inbound-audio context for message-tool TTS across embedded reply runs, CLI MCP loopback, and queued follow-up paths.

Thanks @ai-hpc.

Co-authored-by: ai-hpc <mail.speedy.hpc@hotmail.com>
2026-06-02 06:45:34 -04:00
Peter Steinberger
9ead0ae921 fix: repair live model inference edge cases
Fix live model inference edge cases across provider streaming, model switching, outbound delivery, and gateway tool resolution.

Includes live/provider issue fixes and leaves #89100 explicitly partial for the remaining FM-2 group routing case.
2026-06-01 23:03:27 -04:00
Peter Steinberger
27dde7a4d6 chore(lint): enable stricter error rules 2026-06-01 01:12:21 +01:00
Peter Steinberger
22cb7fb6b7 chore(lint): enable no-promise-executor-return 2026-05-31 23:06:13 +01:00
Peter Steinberger
2df95c0b10 chore(lint): enable no-misused-promises 2026-05-31 20:42:13 +01:00
brokemac79
e8c7c933f8 Retry stale CLI sessions in runner lifecycle 2026-05-31 17:28:05 +05:30
Peter Steinberger
77f1359612 refactor: extract media and ACP core packages (#88534)
* refactor: extract media and acp core packages

* refactor: remove relocated media and acp sources

* build: wire new core packages into dependency checks

* test: alias new core packages in vitest

* build: keep media sniffer runtime dependency

* docs: refresh plugin sdk api baseline

* fix: keep normalized proposal queries non-empty

* test: keep channel timer tests isolated

* fix: keep rebased plugin checks green

* fix: preserve sms numeric allowlist entries

* test: harden exec foreground timeout failure

* test: remove duplicate skill workshop assertion

* fix: remove channel config lint suppression

* test: refresh lint suppression allowlist
2026-05-31 11:30:33 +01:00
David
778c4f90b9 fix(agents): route per-turn media task hints below the cache boundary (#87998)
* fix(agents): route media task hints below the system-prompt cache boundary

Per-turn image/video/music generation task hints were injected into the
static prependSystemContext slot, landing above the cache boundary inside the
cacheable prefix. The hints are present only on user/manual turns and vary
with active media tasks, so the cacheable prefix shifted turn-to-turn and
defeated Anthropic/OpenAI prompt caching (#85203).

Split the per-turn media hints out of the prepend resolver into
resolveAttemptMediaTaskSystemPromptAddition and route them below the boundary
via the existing prependSystemPromptAddition helper, matching how subagent and
context-engine system-prompt additions are already routed. The static plugin
prependSystemContext / appendSystemContext hook fields are unchanged and
remain in the cacheable prefix. Applied at both consumers (embedded agent
runner and CLI runner).

* fix(agents): keep media task hints below the cache boundary for hook systemPrompt overrides

A before_prompt_build hook that returns a full systemPrompt override replaces
the base prompt with marker-free text. Per-turn media-generation task hints
were then front-prepended into that marker-free prompt, which providers cache
as a single block, so the cached prefix still shifted turn-to-turn on the
override path (#85203).

Wrap the base with ensureSystemPromptCacheBoundary at both media-routing sites
(embedded agent runner and CLI runner) so a marker-free override gets an
appended boundary and the hint routes into the uncached suffix. The helper is
idempotent, so marker-bearing prompts are unchanged. The shared
prependSystemPromptAddition wrapper and the static prependSystemContext /
appendSystemContext hook fields are untouched.

* fix(agents): keep marker-free idle prompts cacheable below the boundary

A marker-free hook systemPrompt override only had the cache boundary
ensured on turns with an active media task. On idle turns the later
appendModelIdentitySystemPrompt landed above the absent boundary, so the
idle cached system prefix diverged from active turns and prompt caching
broke across active/idle transitions. Ensure the boundary regardless of
media state in both the embedded and CLI runners, and extend the
regression to cover the model-identity append across active->idle.

* fix(agents): scope cache-boundary ensure to the model-identity append

Ensuring the boundary unconditionally on media-idle turns appended a
boundary marker to empty raw/gateway system prompts (turning "" into a
marker-only prompt) and to prompts with nothing below the boundary.
Instead ensure the boundary only when a model identity line is actually
appended to a non-empty prompt, in both the embedded and CLI runners.
This still keeps the identity below the boundary for marker-free hook
systemPrompt overrides (the #85203 idle-cache regression) while leaving
empty and identity-less prompts untouched.

* test: refresh stale type and lint expectations

* test: stabilize CI timeout checks

* test: satisfy channel entry lint

* fix(agents): skip cache boundary for blank prompts

* fix(channels): keep draft flush timer referenced

* test(agents): tolerate failed exec timeout setup

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-31 10:35:53 +01:00
Peter Steinberger
4eba3e5d7d chore(lint): enable more readability rules 2026-05-31 07:38:33 +01:00
Peter Steinberger
deb7bc6539 chore(lint): enable readability lint rules 2026-05-31 07:17:57 +01:00
Peter Steinberger
23dac6c263 test: keep vitest cases under one second 2026-05-31 06:51:34 +01:00
Peter Steinberger
00d8d7ead0 refactor: extract normalization core package
Extract shared normalization/coercion helpers into private @openclaw/normalization-core workspace package while preserving existing plugin SDK helper subpaths.\n\nAlso keeps direct normalization-core imports internal, wires UI/build/loader resolution, and replaces the slow PR network CodeQL lane with a fast added-line boundary scan while retaining full CodeQL for scheduled/manual runs.\n\nVerification: local moved tests, plugin SDK boundary tests, extension loader tests, agents-support shard, UI build/test, build artifacts, lint, workflow guards, autoreview, and GitHub CI passed on PR head 963d893715.
2026-05-31 01:33:00 +01:00
Peter Steinberger
4c33aaa86c refactor: unify OpenAI provider identity (#88451)
* refactor: unify OpenAI provider identity

* refactor: move legacy oauth sidecar doctor helpers

* test: align OpenAI fixtures after rebase

* test: clean OpenAI provider unification

* fix: finish OpenAI provider cleanup

* fix: finish OpenAI cleanup follow-through

* fix: finish OpenAI CI cleanup
2026-05-31 00:29:44 +01:00
Peter Steinberger
38d3d11cbc feat: improve MCP operator workflows
Add MCP server add/configure/login/reload flows plus config/runtime support for enablement, filters, timeouts, OAuth, TLS, and parallel execution hints. Update docs and tests for the expanded MCP operator surface.
2026-05-30 23:51:40 +01:00
Ayaan Zaidi
a176b8ec2f perf(cli): compact resumed room-event prompts 2026-05-30 18:53:59 +05:30
Ayaan Zaidi
21b5f601b6 fix(agents): preserve auth-boundary cli invalidation 2026-05-30 10:09:19 +05:30
Ayaan Zaidi
2e21158d04 refactor(agents): simplify cli session recovery probes 2026-05-30 10:09:19 +05:30
Abdel Gomez-Perez
16b510807b fix(agents/cli-runner): invalidate sessions whose transcript ends mid-tool
A claude-cli session whose JSONL transcript ends with an assistant
`tool_use` content block that was never answered by a `tool_result` user
message cannot resume — claude-cli will sit waiting for the missing
`tool_result`, hit its no-output watchdog, and the runtime kills it
with `reason=abort`. The dispatcher then sees an empty payload and emits
NO_REPLY, which to the user looks like the agent silently ignored their
message — same end-user symptom as the binding-flush amnesia bug, but a
different root cause.

The orphan can be left behind when:
  - Gateway restarts mid-tool (brew upgrade, manual kickstart, OOM,
    crash) — claude was waiting on a tool result that never arrived.
  - `claude-live-session.ts` no-output watchdog fires while a tool is
    actively running and OC kills the subprocess.
  - The tool itself crashed or hung past its own deadline.

In all cases the resumed session is dead until the binding gets cleared,
because every subsequent resume hits the same trailing tool_use and the
same kill cycle. Observed in production on a personal OpenClaw gateway
(3d-engineer agent, 50-message-deep transcript ending in a Bash
`tool_use`; every Telegram message after the orphan landed silently
aborted at the 180s no-output mark).

Add `claudeCliSessionTranscriptHasOrphanedToolUse` to the helpers that
walks the JSONL, finds the last assistant message, and returns true if
any of its `tool_use` ids has no matching `tool_result` later in the
file. Wire into `prepareCliRunContext` as a second invalidator gate
alongside `missing-transcript`. The new `invalidatedReason:
"orphaned-tool-use"` follows the same path as missing-transcript: the
binding is dropped, this turn starts a fresh session, and the prior
context is reseeded into the new session via `RAW_TRANSCRIPT_RESEED`.

Detection only considers TRAILING orphans — an unanswered tool_use
deeper in history is inert because a later assistant message already
moved past it. Only the most recent assistant message's tool_use ids
matter for forward progress.

Probe runs only for claude-cli providers and only when the transcript-
content gate already passed, so we add no I/O on already-invalidated
sessions and no behavior change for non-claude providers.

AI-assisted: yes. Tooling: Claude Opus + claude-cli.
2026-05-30 10:09:19 +05:30
Josh Avant
584fa3215c Fix restart sentinel internal continuations (#88161)
* fix restart sentinel internal continuations

* update gateway prompt snapshots

* stabilize sandbox browser audit timer tests

* drive sandbox audit timeouts deterministically

* drive gh-read timeout tests deterministically

* drive label-open-issues timeout tests deterministically

* document deterministic timeout test timers

* test: preserve deterministic timer setup after rebase
2026-05-29 19:06:54 -07:00
benjamin1492
de455304cc fix(command): stabilize claude-cli transcript resume (#81048)
Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state.

- Capture the latest claude-cli session_id from JSONL output.
- Resolve Claude project transcript paths through the shared canonical project-dir resolver.
- Probe transcript content from the actual CLI process cwd.
- Thanks @benjamin1492!
2026-05-29 22:56:09 +05:30
Shakker
d9278c8efd refactor: organize skills subsystem layout 2026-05-29 17:35:02 +01:00
Shakker
355fb4d860 refactor: use direct skills imports 2026-05-29 17:35:02 +01:00
Shakker
8640b6aa7f fix: drop stale system prompt override imports 2026-05-29 17:35:02 +01:00
Shakker
5fff679aea fix: align skills branch with upstream tar verbose test 2026-05-29 17:35:02 +01:00
Shakker
22e2d1560f refactor: centralize skills subsystem 2026-05-29 17:35:02 +01:00
Rajvardhan Patil
5518ac998f fix(agents): add CLI turn output digests
Adds content-safe output fingerprints to CLI backend turn logs so repeated byte-identical responses can be detected from gateway logs without exposing response text.

Covers Claude live-session turns, synthetic cron before_agent_reply short-circuits, and ordinary CLI subprocess turns with shared outBytes/outHash fields.

Verification:
- pnpm test src/agents/cli-runner.spawn.test.ts src/agents/cli-runner.before-agent-reply-cron.test.ts -- --reporter=verbose
- pnpm check:changed (Blacksmith Testbox tbx_01kssdqes22wqhas0v7h339zr7)
- .agents/skills/autoreview/scripts/autoreview --mode local
- .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
- GitHub PR checks passed for e130c1acbf

Fixes #81004

Co-authored-by: Rajvardhan Patil <raj@Rajvardhans-MacBook-Air.local>
2026-05-29 09:50:56 +01:00
Abdel Gomez-Perez
9de6abd8d7 fix(agents): bridge CLI tool progress events 2026-05-29 13:04:31 +05:30
Peter Steinberger
1188aa3b81 feat: add Claude Opus 4.8 support (#87890)
* feat: add Claude Opus 4.8 support

* fix: omit Vertex Opus sampling overrides

* fix: preserve Opus adaptive thinking levels

* fix: clamp Anthropic max effort support

* fix: use sha256 for QA mock call ids

* fix: type Anthropic transport test model metadata

* test: update PDF model default for Opus 4.8
2026-05-29 06:10:42 +01:00
Peter Steinberger
f09b69a78f test: drop removed gateway live shard fixture 2026-05-28 20:41:11 -04:00
Peter Steinberger
e12a6d6a67 refactor(agents): own system prompt assembly 2026-05-29 01:22:09 +01:00
Josh Avant
4c3a0292ff Fix Claude live tool progress for watchdog recovery (#87546)
* fix: keep claude live tools fresh for watchdog

* fix: avoid claude live active tool spread
2026-05-28 01:37:40 -07:00
Mariano
7299c56953 Fix sub-agent cwd/workspace separation (#87218)
Merged via squash.

Prepared head SHA: f47b073830
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-05-27 23:55:24 +02:00
Alix-007
f4329fe0d6 fix(agents): bound plugin system context
* fix(agents): bound plugin system context

* test(agents): align wrapped system context expectations

* style(agents): format hook context helper

* test(codex): expect plugin system context boundary

---------

Co-authored-by: Alix-007 <267018309+Alix-007@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-27 21:16:15 +01:00
Peter Steinberger
bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341)
* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types
2026-05-27 19:24:04 +01:00