When a session's model comes from steering/fallback runtime fields
(entry.modelProvider/entry.model) rather than explicit override fields,
switching back to the default model via /model default would not set
liveModelSwitchPending. The isDefault branch in applyModelOverrideToSessionEntry
only sets selectionUpdated when it deletes override fields — but when no
override fields exist, selectionUpdated stays false, preventing the
liveModelSwitchPending flag from being set at the gate condition.
Fix: after the runtime alignment check, set selectionUpdated when
selection.isDefault and runtime fields are misaligned, so that
liveModelSwitchPending is properly set for the pending live switch.
Adds test coverage for this previously untested scenario.
Related to #96269
Co-authored-by: Claude <noreply@anthropic.com>
Summary:
- The PR changes the agent tools manager to treat spawned-but-nonzero fd/rg probes as missing and adds regression tests for non-zero and zero spawn status.
- PR surface: Source +3, Tests +27. Total +30 across 2 files.
- Reproducibility: yes. Current main ignores non-zero `spawnSync.status`, and a live Node probe confirms a spawned child can exit non-zero while leaving `error` unset.
Automerge notes:
- No ClawSweeper repair was needed after automerge opt-in.
Validation:
- ClawSweeper review passed for head 377d560eff.
- Required merge gates passed before the squash merge.
Prepared head SHA: 377d560eff
Review: https://github.com/openclaw/openclaw/pull/96361#issuecomment-4788071605
Co-authored-by: liyuanbin <li.yuanbin1@xydigit.com>
Co-authored-by: Claude <noreply@anthropic.com>
Approved-by: takhoffman
* perf(mcp): parallelize MCP server connections in getCatalog to reduce prep latency
Every agent request incurred 6-7s of prep latency because bundle-tools
connected to configured MCP servers sequentially, one at a time. With
4-5 MCP servers at ~1.5s each (default tools/list timeout), the total
was the sum of all servers' connection times.
Fix: split getCatalog() into two phases:
1. Synchronous pre-computation of safe server names (fast, sequential)
2. Async connection + tool listing (parallelized via Promise.allSettled)
Now MCP servers connect and list tools concurrently, reducing the total
latency from the sum of all servers to roughly the slowest single server.
Each server still has its own error handling — individual failures are
gracefully demoted to diagnostics, not fatal to the catalog.
Prep stage timing change:
Before: bundle-tools = sum(connection + listTools) for each server
After: bundle-tools = max(connection + listTools) across all servers
Closes#94162
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(mcp): add missing braces for eslint curly rule
Two if-statements lacked braces, failing the CI check-lint job.
Co-Authored-By: Claude <noreply@anthropic.com>
* test(mcp): add deterministic regression test for parallel catalog loading
- Add focused timing test that proves parallel MCP catalog loading
completes in max(server delays) not sum(server delays)
- Test creates 3 slow stdio MCP servers (200/400/600ms delays) and
asserts wall time < sum(delays) to verify parallelism
- Would fail under the original sequential for-await loop
- Add standalone scripts/repro-94162-timing.mjs for documentation
Part of #94162
* fix(agents): bound MCP catalog fanout
* fix: harden bundle MCP catalog session lifecycle
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mmyzwl <mmyzwl@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
inferParamBFromIdOrName used a consuming trailing boundary `b(?:[^a-z0-9]|$)`,
so when two `<num>b` parameter tokens are separated by a single delimiter
("8b 70b", "8b-70b"), the first match ate the shared delimiter and the second
token's required leading boundary had nothing to match, silently skipping it —
returning the first (often smaller) size instead of the largest. Make the
trailing boundary a non-consuming lookahead.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): run heartbeat_prompt_contribution on harness prompt builds
Harness runtimes (e.g. the Codex app-server) assemble the prompt through
resolveAgentHarnessBeforePromptBuildResult rather than the embedded runner's
resolvePromptBuildHookResult. The harness helper ran before_prompt_build and
before_agent_start but never invoked heartbeat_prompt_contribution, so that hook
silently no-ops on those runtimes: plugins that contribute heartbeat context via
the documented hook get nothing on heartbeat turns.
Invoke heartbeat_prompt_contribution from the harness helper too, gated on
ctx.trigger === "heartbeat", merging its prepend/append context ahead of the
before_prompt_build / before_agent_start contributions (matching the embedded
path's ordering). before_prompt_build appendContext is already honored here, so
no change is needed for boot-style append contributions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): preserve heartbeat hook ordering
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
* fix(exec): preserve turn-source routing target in approval followups for plugin channels
When an async exec approval is resolved and the originating session is
resumed, buildAgentFollowupArgs forwarded the turn-source to/accountId/threadId
only for built-in deliverable channels or gateway-internal channels. For an
external channel plugin whose channel is not in the in-process deliverable set,
the followup dispatched channel alone and dropped the recipient, so the resumed
agent reply routed to webchat instead of the originating channel.
Forward the turn-source routing fields whenever the resolved delivery target is
not used, matching how the channel itself is already preserved, so the gateway
can route the post-approval reply back to the originating channel.
Fixes#96103
* fix(exec): normalize followup thread routing
* fix(exec): normalize followup thread routing
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(ports): route isPortBusy through checkPortInUse to catch IPv4-only occupants
* fix(ports): treat PortUsageStatus unknown as busy in isPortBusy
Per ClawSweeper review: checkPortInUse returns 'unknown' when every host
probe fails for a non-EADDRINUSE reason. Treating unknown as 'not busy'
could cause forceFreePortAndWait to exit before lsof/fuser inspects the
port. Conservative fix: only 'free' means not busy; everything else
(busy or unknown) triggers further inspection.
* fix(ports): reuse canonical multi-address probe
* fix(ports): reuse canonical multi-address probe
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
* fix(doctor): ignore skipped local embedding probe
* fix(doctor): keep skipped local model diagnostics
---------
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
The install-resolution path (fetchClawHubSkillInstallResolution) still read
ClawHub JSON with an unbounded response.json(), the one ClawHub JSON reader
left uncapped by the prior hardening. Route it through the existing
parseClawHubJsonBody helper so every ClawHub JSON success/structured-block
body is bounded by the same 16 MiB cap and cancels the stream on overflow.
Pure reuse of the helper introduced in this PR (no new abstraction); adds a
regression test that an oversized install-resolution body is rejected and the
underlying stream is cancelled.
ClawHub is an external marketplace (untrusted source); fetchJson read the
success body via response.json() and readErrorBody read the error body via
response.text(), both without a byte cap, so a hostile or malfunctioning host
could exhaust memory with an unbounded response. Read both through the existing
read-response-with-limit helpers (16 MiB cap for JSON, 8 KiB / 400 chars for the
error snippet), cancelling the stream on overflow/idle. Symmetric counterpart to
the Anthropic error-stream hardening in #95108.