Adds `messages.responseUsage` (precedence session -> channel -> config default
-> off) so the persistent /usage footer can default-on, with three distinct
states: explicit on (tokens/full), explicit off (persisted), and unset (inherit
the configured default).
Unifies effective-value resolution behind a single channel-aware resolver
`resolveEffectiveResponseUsage` used by reply rendering, the no-arg /usage
toggle, the ACP control, and the gateway session-row builder; the row builder's
`effectiveResponseUsage` is carried through sessions.changed events, chat
snapshots, and the UI row so live consumers never go stale. `/usage reset`
(aliases inherit/clear/default) clears the override to inherit; only explicit
off persists; a full session reset preserves the preference. ACP "Usage detail"
gains an "inherit" option for unset sessions. Docs/help/completions updated; "on"
documented as a legacy alias; config-doc baseline regenerated.
* fix(tasks): preserve both cron-run session key shapes during maintenance
Session-registry maintenance keeps running cron jobs' session rows, but
readRunningCronJobIds built the preserve-set with job.id.toLowerCase() only.
Cron-run session keys carry two job-segment shapes: main-session runs use the
slugified segment (normalizeCronLaneSegment, e.g. "daily-report") while
default-isolated runs use the raw lowercased id ("daily report", built from
cron:${job.id} via toAgentStoreSessionKey, which lowercases but does not
slugify). The lowercase-only matcher preserved isolated runs but pruned
main-session runs of any non-slug job id (e.g. "Daily Report") as stale.
Preserve both shapes (raw lowercased id and slugified segment). This is
strictly more-preserving, so no live running cron session is dropped. Adds a
regression test seeding both a slug main-session run and a raw isolated run for
a non-slug job id, asserting both survive while a non-running job's run is still
pruned.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(tasks): match cron session keys to target shape
* fix(tasks): preserve active cron aliases across retargeting
* fix(tasks): retain explicit cron session aliases
---------
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
#91499 auto-stamps the creator's tool surface as a default toolsAllow cap
on agentTurn cron payloads whenever the creating session is tool-restricted
(a narrowing allow-policy or an explicit deny). CLI backends cannot enforce
a runtime toolsAllow — cli-runner/prepare.ts rejects any defined allow-list
— so every scheduled agentTurn that resolves to a CLI backend (e.g.
claude-cli) fails to start. This silently broke per-thread scheduled
continuations on CLI backends.
A CLI backend is not a runtime tool-policy boundary: it runs with its own
configured tool set, as the operator, on the local machine, and refuses a
runtime allow-list outright. An inherited default cap is therefore
unenforceable on a CLI backend. Decide at run time, where the backend is
known:
- Flag the default. capCronAgentTurnToolsAllow stamps toolsAllowIsDefault
when it fills in the creator surface because the cron requested nothing
(or a bare "*"). An explicit narrowing or empty allow-list is a real
per-cron restriction and carries no flag.
- Drop only the default, only on CLI. The run-executor drops a flagged
default in the CLI branch and lets the run proceed. An explicit per-cron
restriction (no flag) is deliberately passed through, so prepare.ts still
fails it closed and surfaces that the requested policy needs an embedded
runtime. Embedded runs are untouched and keep the full cap enforced.
- Persist the flag. New nullable cron_jobs.payload_tools_allow_is_default
column (additive ensureColumn migration + codec read/write) so the
decision survives a gateway restart, plus toolsAllowIsDefault on the
gateway-protocol agentTurn payload schema — the stamped payload is
otherwise rejected by the contract's additionalProperties:false.
- Preserve the flag across updates. A no-toolsAllow update (reschedule,
prompt edit) no longer carries the stored default forward as a literal
value — that routed it through the explicit-narrowing branch, stripped the
flag, and re-broke the job on CLI after the next restart. The default is
re-derived (flag intact); an explicit restriction is still carried forward
unflagged.
Net policy: on CLI only the unenforceable inherited default is relaxed;
explicit per-cron restrictions still fail closed; embedded backends are
unchanged.
Tests: run-executor drops the flagged default but propagates an explicit
restriction on CLI; cron-tool stamps/clears the flag across create and
update and preserves it across a no-toolsAllow update; store round-trips the
flag (and its absence) through SQLite.
Not covered: agentTurn crons created during the regression window carry a
flagless toolsAllow and remain fail-closed on CLI until recreated or updated
with an explicit toolsAllow.
When a session's model comes from steering/fallback runtime fields
(entry.modelProvider/entry.model) rather than explicit override fields,
switching back to the default model via /model default would not set
liveModelSwitchPending. The isDefault branch in applyModelOverrideToSessionEntry
only sets selectionUpdated when it deletes override fields — but when no
override fields exist, selectionUpdated stays false, preventing the
liveModelSwitchPending flag from being set at the gate condition.
Fix: after the runtime alignment check, set selectionUpdated when
selection.isDefault and runtime fields are misaligned, so that
liveModelSwitchPending is properly set for the pending live switch.
Adds test coverage for this previously untested scenario.
Related to #96269
Co-authored-by: Claude <noreply@anthropic.com>
Summary:
- The PR changes the agent tools manager to treat spawned-but-nonzero fd/rg probes as missing and adds regression tests for non-zero and zero spawn status.
- PR surface: Source +3, Tests +27. Total +30 across 2 files.
- Reproducibility: yes. Current main ignores non-zero `spawnSync.status`, and a live Node probe confirms a spawned child can exit non-zero while leaving `error` unset.
Automerge notes:
- No ClawSweeper repair was needed after automerge opt-in.
Validation:
- ClawSweeper review passed for head 377d560eff.
- Required merge gates passed before the squash merge.
Prepared head SHA: 377d560eff
Review: https://github.com/openclaw/openclaw/pull/96361#issuecomment-4788071605
Co-authored-by: liyuanbin <li.yuanbin1@xydigit.com>
Co-authored-by: Claude <noreply@anthropic.com>
Approved-by: takhoffman
* perf(mcp): parallelize MCP server connections in getCatalog to reduce prep latency
Every agent request incurred 6-7s of prep latency because bundle-tools
connected to configured MCP servers sequentially, one at a time. With
4-5 MCP servers at ~1.5s each (default tools/list timeout), the total
was the sum of all servers' connection times.
Fix: split getCatalog() into two phases:
1. Synchronous pre-computation of safe server names (fast, sequential)
2. Async connection + tool listing (parallelized via Promise.allSettled)
Now MCP servers connect and list tools concurrently, reducing the total
latency from the sum of all servers to roughly the slowest single server.
Each server still has its own error handling — individual failures are
gracefully demoted to diagnostics, not fatal to the catalog.
Prep stage timing change:
Before: bundle-tools = sum(connection + listTools) for each server
After: bundle-tools = max(connection + listTools) across all servers
Closes#94162
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(mcp): add missing braces for eslint curly rule
Two if-statements lacked braces, failing the CI check-lint job.
Co-Authored-By: Claude <noreply@anthropic.com>
* test(mcp): add deterministic regression test for parallel catalog loading
- Add focused timing test that proves parallel MCP catalog loading
completes in max(server delays) not sum(server delays)
- Test creates 3 slow stdio MCP servers (200/400/600ms delays) and
asserts wall time < sum(delays) to verify parallelism
- Would fail under the original sequential for-await loop
- Add standalone scripts/repro-94162-timing.mjs for documentation
Part of #94162
* fix(agents): bound MCP catalog fanout
* fix: harden bundle MCP catalog session lifecycle
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mmyzwl <mmyzwl@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
inferParamBFromIdOrName used a consuming trailing boundary `b(?:[^a-z0-9]|$)`,
so when two `<num>b` parameter tokens are separated by a single delimiter
("8b 70b", "8b-70b"), the first match ate the shared delimiter and the second
token's required leading boundary had nothing to match, silently skipping it —
returning the first (often smaller) size instead of the largest. Make the
trailing boundary a non-consuming lookahead.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): run heartbeat_prompt_contribution on harness prompt builds
Harness runtimes (e.g. the Codex app-server) assemble the prompt through
resolveAgentHarnessBeforePromptBuildResult rather than the embedded runner's
resolvePromptBuildHookResult. The harness helper ran before_prompt_build and
before_agent_start but never invoked heartbeat_prompt_contribution, so that hook
silently no-ops on those runtimes: plugins that contribute heartbeat context via
the documented hook get nothing on heartbeat turns.
Invoke heartbeat_prompt_contribution from the harness helper too, gated on
ctx.trigger === "heartbeat", merging its prepend/append context ahead of the
before_prompt_build / before_agent_start contributions (matching the embedded
path's ordering). before_prompt_build appendContext is already honored here, so
no change is needed for boot-style append contributions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): preserve heartbeat hook ordering
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
* fix(exec): preserve turn-source routing target in approval followups for plugin channels
When an async exec approval is resolved and the originating session is
resumed, buildAgentFollowupArgs forwarded the turn-source to/accountId/threadId
only for built-in deliverable channels or gateway-internal channels. For an
external channel plugin whose channel is not in the in-process deliverable set,
the followup dispatched channel alone and dropped the recipient, so the resumed
agent reply routed to webchat instead of the originating channel.
Forward the turn-source routing fields whenever the resolved delivery target is
not used, matching how the channel itself is already preserved, so the gateway
can route the post-approval reply back to the originating channel.
Fixes#96103
* fix(exec): normalize followup thread routing
* fix(exec): normalize followup thread routing
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(ports): route isPortBusy through checkPortInUse to catch IPv4-only occupants
* fix(ports): treat PortUsageStatus unknown as busy in isPortBusy
Per ClawSweeper review: checkPortInUse returns 'unknown' when every host
probe fails for a non-EADDRINUSE reason. Treating unknown as 'not busy'
could cause forceFreePortAndWait to exit before lsof/fuser inspects the
port. Conservative fix: only 'free' means not busy; everything else
(busy or unknown) triggers further inspection.
* fix(ports): reuse canonical multi-address probe
* fix(ports): reuse canonical multi-address probe
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
* fix(doctor): ignore skipped local embedding probe
* fix(doctor): keep skipped local model diagnostics
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>