* fix(agents): preserve native Anthropic tool IDs for hybrid providers
Fixes#66892
MiniMax and other hybrid providers use api.minimaxi.com/anthropic
(modelApi: anthropic-messages), which generates and expects native
Anthropic tool_call_ids in toolu_* format. The hybrid replay policy
(buildHybridAnthropicOrOpenAIReplayPolicy) applied strict
sanitization that stripped underscores from these IDs, causing
MiniMax to reject them with error 2013.
The native Anthropic provider already preserved these IDs via
preserveNativeAnthropicToolUseIds (added in 4613f121ad). This
commit enables the same flag for the hybrid anthropic-messages
branch, so toolu_* IDs pass through unsanitized while other
synthetic IDs still get strict cleanup.
* fix(agents): repair sanitized replay tool results before send
* fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu)
* fix: preserve aborted-span tool results during replay sanitize (#67620) (thanks @stainlu)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* fix: tighten trusted tool media passthrough
* changelog: tighten trusted tool media passthrough (#67303)
* address review: thread rawToolName into emitToolResultOutput and keep plugin-tool media passthrough
- Pass rawToolName through emitToolResultOutput params so the emit and
collect calls no longer reference an out-of-scope identifier
(ReferenceError on any verbose tool-output path).
- Widen builtinToolNames to all effective tool raw names for this run
(core + bundled/trusted plugin tools), so plugin tools on the trusted
media list still receive local MEDIA: passthrough. Admission-time
client-tool conflict check keeps using the core-only set so unrelated
plugin names do not spuriously reject client definitions; MEDIA
passthrough is still gated by the raw-name set, so a client tool that
normalize-collides with a plugin name cannot inherit its media trust.
- Add unit coverage for bundled-plugin raw-name passthrough and for
case-variant plugin-name collisions.
* drop redundant String() casts flagged by oxlint no-useless-cast
The names from effectiveTools, client tool function names, and the
existingToolNames iterable are already typed as string, so wrapping them
in String(...) adds nothing and trips oxlint's no-useless-cast rule.
* fix(context-engine): pass deferred maintenance token budget
Thread tokenBudget through the after-turn runtime context so background context-engine maintenance reuses the real model context window instead of falling back to 128k. Also pass through a best-effort currentTokenCount from the latest call total and make the runtime context type explicit about both fields.
Regeneration-Prompt: |
OpenClaw already passed the real context token budget into direct context-engine calls like afterTurn and assemble, but deferred maintain() reused only the runtimeContext object and that object did not carry tokenBudget. Lossless Claw therefore fell back to 128k during background maintenance, which made budget-trigger fire much more aggressively than the live model context warranted. Thread the real contextTokenBudget into buildAfterTurnRuntimeContext so deferred maintenance receives the same budget, and pass a straightforward best-effort currentTokenCount from the latest call total while the relevant data is already in scope. Keep the change additive, update the runtime-context type, and cover the background maintenance/runtime-context behavior with focused tests.
* fix(context-engine): use prompt usage for deferred maintenance
Fixes#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
* fix: forward optional params dropped at the runEmbeddedAttempt call site
runEmbeddedPiAgent in pi-embedded-runner/run.ts hand-enumerates ~85 fields
when calling runEmbeddedAttempt({...}). Several optional fields on
RunEmbeddedPiAgentParams were added to the type and to attempt.ts (the
consumer) but were never wired at this specific call site. Because every
field is declared as ?: optional on EmbeddedRunAttemptParams, TypeScript
does not flag the missing fields and the attempt silently receives
undefined for each.
Four fields were affected:
- toolsAllow (#58504, #62569): cron's --tools allow-list. Persisted in
jobs.json by the CLI, forwarded by cron/isolated-agent/run-executor.ts
to runEmbeddedPiAgent, but dropped here. Result: provider request
ships the full tool catalog on every cron run regardless of toolsAllow,
defeating the ~95% input-token reduction documented in #58504 and the
--tools restriction documented in docs/automation/cron-jobs.md:85.
- disableMessageTool: cron/isolated-agent/run-executor.ts:164 sets it
from toolPolicy.disableMessageTool, derived at run.ts:110 as
`params.deliveryContract === "cron-owned" ? true : params.deliveryRequested`.
Every cron-owned delivery (the default per docs) is supposed to disable
the message tool so the runner owns the final delivery path. Without
forwarding, the agent can call messaging tools mid-cron and cause
duplicate or wrong-channel sends.
- requireExplicitMessageTarget: cron/isolated-agent/run-executor.ts:163
sets it from toolPolicy.requireExplicitMessageTarget. Has a fallback at
attempt.ts:568-569 to `?? isSubagentSessionKey(params.sessionKey)`, so
non-subagent crons silently get false instead of the intended value.
- internalEvents: agents/command/attempt-execution.ts:478 passes it via
params.opts.internalEvents. Different caller path from cron, but the
same drop point. Internal events array silently dropped before reaching
the consumer at attempt.ts:1480.
The fix is four lines in the runEmbeddedAttempt({...}) call, immediately
after the bootstrapContextMode/bootstrapContextRunKind lines added by
PR #62264 (which fixed two more fields with the identical pattern at the
same call site).
A regression test (run.attempt-param-forwarding.test.ts) covers all six
optional fields shown to have been bitten by this class of bug at this
seam. The next ?: optional field added to RunEmbeddedPiAgentParams without
wiring at the runEmbeddedAttempt call site will fail a test instead of
silently shipping broken — addressing the missing-guardrail concern PR
#60776's writeup explicitly noted.
Verified locally: 6/6 forwarding tests pass, 258 pi-embedded-runner/run*
tests pass, 176 cron/isolated-agent tests pass, oxlint and tsgo deltas
versus origin/main are zero.
Fixes#62569
* test: distill param forwarding guardrails
* fix: restore embedded-run param forwarding (#62675) (thanks @hexsprite)
---------
Co-authored-by: Ayaan Zaidi <hi@obviy.us>
* openclaw-11f.1: retry reasoning-only OpenAI turns
Regeneration-Prompt: |
Patch the embedded runner so a signed reasoning-only assistant turn with no user-visible text is treated as recoverable instead of silently ending the run. Keep the change focused on the active OpenAI GPT-style path, retry the turn with an explicit visible-answer continuation instruction, and fall back to the existing incomplete-turn error handling only after retries are exhausted. Add regression coverage for the helper classification and for the outer run loop retry behavior, and keep unrelated provider behavior unchanged.
* openclaw-11f.1: address reasoning-only review feedback
Regeneration-Prompt: |
Follow up on PR review feedback for the reasoning-only retry patch. Keep the fix narrow: move the retry limit into a named constant alongside the other retry-policy values, document why the limit is 2, and prevent reasoning-only auto-retries after any side effects so the runner falls back to the existing caution path instead of risking duplicate actions. Add regression coverage for the side-effect guard and the named limit behavior.
* openclaw-11f.1: drop local pebbles artifacts
Regeneration-Prompt: |
Remove accidentally committed local pebbles tracker artifacts from the PR branch without changing runtime code. Keep the cleanup limited to deleting the tracked .pebbles files from version control, and rely on local git excludes for future pebbles activity so these files stay out of diffs.
* openclaw-11f.1: tighten reasoning-only retry guards
Regeneration-Prompt: |
Follow up on the remaining review feedback for the reasoning-only retry path. Keep the fix narrow: do not auto-retry a reasoning-only turn when the assistant already terminated with stopReason error, and evaluate the OpenAI-specific retry guard against the provider/model metadata of the assistant turn that actually produced the partial output rather than the outer run configuration. Add regression coverage for both behaviors in the incomplete-turn runner tests.
* openclaw-11f.1: retry empty GPT turns once
Regeneration-Prompt: |
Extend the embedded runner's GPT-style incomplete-turn recovery with a separate generic empty-response retry path. Keep it narrower than the existing reasoning-only recovery: one retry only, replay-safe only, no side effects, no assistant error turns, and scoped to the active assistant provider/model metadata. Add explicit warning logs when the empty-response retry triggers and when its single retry budget is exhausted, and add regression coverage for the success and exhaustion cases without changing broader provider fallback behavior.
* openclaw-11f.1: harden reasoning-only retry completion checks
Regeneration-Prompt: |
Follow up on the remaining review feedback for the GPT-style recovery path. Keep the change narrow: only retry reasoning-only turns when there is no visible assistant answer yet, and if the reasoning-only retry budget is exhausted without any visible answer, surface the existing incomplete-turn error instead of treating reasoning-only payloads as a successful completion. Add focused regression coverage for both scenarios and preserve the adjacent empty-response retry behavior.
* openclaw-11f.1: preserve profile cooldown on retry exhaustion
Regeneration-Prompt: |
Follow up on the final review comment for the GPT-style recovery path. Keep the change narrow: when the reasoning-only retry budget is exhausted and the run returns the incomplete-turn error early, preserve the same auth-profile cooldown behavior that the normal incomplete-turn branch already applies so multi-profile failover continues to work consistently. Verify the touched runner suites still pass.
* fix: recover GPT-style empty turns
Regeneration-Prompt: |
Add the required changelog entry for the PR that hardens embedded GPT-style recovery of reasoning-only and empty-response turns. Keep the changelog update under ## Unreleased > ### Fixes, append-only, and include the PR number plus author attribution on the same line.
* improve trace raw diagnostics and command acks
* address trace review feedback
* avoid sync transcript reads in raw trace
* preserve raw cli output for trace
* gate trace emission at reply time
* reflect raw trace mode in status surfaces
* agents: auto-activate strict-agentic for GPT-5 and emit blocked-exit liveness
Closes two hard blockers on the GPT-5.4 parity completion gate:
1) Criterion 1 (no stalls after planning) is universal, but the pre-existing
strict-agentic execution contract was opt-in only. Out-of-the-box GPT-5
openai / openai-codex users who never set
`agents.defaults.embeddedPi.executionContract` still got only 1
planning-only retry and then fell through to the normal completion path
with the plan-only text, i.e. they still stalled.
Introduce `resolveEffectiveExecutionContract(...)` in
src/agents/execution-contract.ts. Behavior:
- supported provider/model (openai or openai-codex + gpt-5-family) AND
explicit "strict-agentic" or unspecified → "strict-agentic"
- supported provider/model AND explicit "default" → "default" (opt-out)
- unsupported provider/model → "default" regardless of explicit value
`isStrictAgenticExecutionContractActive` now delegates to the effective
resolver so the 2-retry + blocked-state treatment applies by default to
every GPT-5 openai/codex run. Explicit opt-out still works for users who
intentionally want the pre-parity-program behavior.
2) Criterion 4 (replay/liveness failures are explicit, not silent
disappearance) is violated by the strict-agentic blocked exit itself.
Every other terminal return path in src/agents/pi-embedded-runner/run.ts
sets `replayInvalid` + `livenessState` via `setTerminalLifecycleMeta`,
but the strict-agentic exit at run.ts:1615 falls through without them.
Add explicit `livenessState: "abandoned"` + `replayInvalid` (via the
shared `resolveReplayInvalidForAttempt` helper) to that exit, plus a
`setTerminalLifecycleMeta` call so downstream observers (lifecycle log,
ACP bridge, telemetry) see the same explicit terminal state they see on
every other exit branch.
Regressions added:
- `auto-enables update_plan for unconfigured GPT-5 openai runs`
- `respects explicit default contract opt-out on GPT-5 runs`
- `does not auto-enable update_plan for non-openai providers even when unconfigured`
- `emits explicit replayInvalid + abandoned liveness state at the strict-agentic blocked exit`
- `auto-activates strict-agentic for unconfigured GPT-5 openai runs and surfaces the blocked state`
- `respects explicit default contract opt-out on GPT-5 openai runs`
Local validation:
- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/agents/openclaw-tools.sessions.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
122/122 passing.
Refs #64227
* agents: address loop-6 review comments on strict-agentic contract
Triages all three loop-6 review comments on PR #64679:
1. Copilot: 'The strict-agentic blocked exit returns an error payload
(isError: true) but sets livenessState to "abandoned". Elsewhere in
the runner/lifecycle flow, error terminal states are treated as
"blocked".' Verified: every other hardcoded error terminal branch in
run.ts (role ordering at 1152, image size at 1206, schema error at
1244, compaction timeout at 1128, aborted-with-no-payloads at 606)
uses livenessState: "blocked". Match that convention at the
strict-agentic blocked exit at 1634. Updated the 'emits explicit
replayInvalid + abandoned liveness state' regression test to assert
the new "blocked" value and renamed the assertion commentary.
2. Copilot: 'The JSDoc for resolveEffectiveExecutionContract says
explicit "strict-agentic" in config always resolves to
"strict-agentic", but the implementation collapses to "default"
whenever the provider/mode is unsupported.' Rewrite the JSDoc to
explicitly document the unsupported-provider collapse as the lead
case (strict-agentic is a GPT-5-family openai/openai-codex-only
runtime contract) before listing the supported-lane behavior matrix.
No code change; this is a docstring-only clarification.
3. Greptile P2: 'Non-preferred Anthropic model constant. CLAUDE.md says
to prefer sonnet-4.6 for Anthropic test constants.' Swap
claude-opus-4-6 → claude-sonnet-4-6 in the two update_plan gating
fixtures that assert non-openai providers don't auto-enable the
planning tool. Behavior unchanged; model constant now matches repo
testing guidance.
Local validation:
- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
29/29 passing.
Refs #64227
* test: rename strict-agentic blocked-exit liveness regression to match blocked state
Addresses loop-7 Copilot finding on PR #64679: loop 6 changed the
assertion to livenessState === 'blocked' to match the rest of the
hard-error terminal branches in run.ts, but the test title still said
'abandoned liveness state', which made failures and test output
misleading. Rename the test title to match the asserted value. No
code change beyond the it(...) title.
Validation: pnpm test src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
(19/19 pass).
Refs #64227
* agents: widen strict-agentic auto-activation to handle prefixed and variant GPT-5 model ids
* Align strict-agentic retry matching
* runtime: harden strict-agentic model matching
---------
Co-authored-by: Eva <eva@100yen.org>