#91499 auto-stamps the creator's tool surface as a default toolsAllow cap
on agentTurn cron payloads whenever the creating session is tool-restricted
(a narrowing allow-policy or an explicit deny). CLI backends cannot enforce
a runtime toolsAllow — cli-runner/prepare.ts rejects any defined allow-list
— so every scheduled agentTurn that resolves to a CLI backend (e.g.
claude-cli) fails to start. This silently broke per-thread scheduled
continuations on CLI backends.
A CLI backend is not a runtime tool-policy boundary: it runs with its own
configured tool set, as the operator, on the local machine, and refuses a
runtime allow-list outright. An inherited default cap is therefore
unenforceable on a CLI backend. Decide at run time, where the backend is
known:
- Flag the default. capCronAgentTurnToolsAllow stamps toolsAllowIsDefault
when it fills in the creator surface because the cron requested nothing
(or a bare "*"). An explicit narrowing or empty allow-list is a real
per-cron restriction and carries no flag.
- Drop only the default, only on CLI. The run-executor drops a flagged
default in the CLI branch and lets the run proceed. An explicit per-cron
restriction (no flag) is deliberately passed through, so prepare.ts still
fails it closed and surfaces that the requested policy needs an embedded
runtime. Embedded runs are untouched and keep the full cap enforced.
- Persist the flag. New nullable cron_jobs.payload_tools_allow_is_default
column (additive ensureColumn migration + codec read/write) so the
decision survives a gateway restart, plus toolsAllowIsDefault on the
gateway-protocol agentTurn payload schema — the stamped payload is
otherwise rejected by the contract's additionalProperties:false.
- Preserve the flag across updates. A no-toolsAllow update (reschedule,
prompt edit) no longer carries the stored default forward as a literal
value — that routed it through the explicit-narrowing branch, stripped the
flag, and re-broke the job on CLI after the next restart. The default is
re-derived (flag intact); an explicit restriction is still carried forward
unflagged.
Net policy: on CLI only the unenforceable inherited default is relaxed;
explicit per-cron restrictions still fail closed; embedded backends are
unchanged.
Tests: run-executor drops the flagged default but propagates an explicit
restriction on CLI; cron-tool stamps/clears the flag across create and
update and preserves it across a no-toolsAllow update; store round-trips the
flag (and its absence) through SQLite.
Not covered: agentTurn crons created during the regression window carry a
flagless toolsAllow and remain fail-closed on CLI until recreated or updated
with an explicit toolsAllow.
`stringifyNonErrorCause` is typed `string`, but its `try` returned
`JSON.stringify(value)`, which is `undefined` for functions, symbols, and
undefined causes — leaking undefined to callers that format nested ACP runtime
failures and expect a string. Fall back to a tag string when stringify yields
undefined, matching the already-correct sibling at `src/infra/errors.ts`.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #91742 wired memory_search's 15s deadline AbortSignal through the builtin
memory manager but missed the QMD backend behind the same
MemorySearchManager.search interface. With QMD, the tool returns "timed out
after 15s" to the agent while the spawned qmd query/search subprocess keeps
running for the full qmd command timeout (memory.qmd.limits.timeoutMs, whose
embed-heavy default was raised to 600s in #87572), leaving orphaned
embedding/search work running after the agent already moved on.
Add optional AbortSignal support to runCliCommand: an aborting signal kills the
spawned child immediately and rejects with the abort reason, funneled through a
single settle() guard so abort/timeout/error/close cannot double-settle. Thread
the search signal through QmdMemoryManager.search -> runQmdSearch -> runQmd ->
runCliCommand for the default direct-qmd subprocess path (including the query
fallback), and fast-fail search() when the signal is already aborted.
findCutPoint defaulted cutIndex to the earliest valid cut (cutPoints[0],
keep everything) and only moved it forward to a cut point at or after the
backward token cursor. When the final entry is a toolResult whose estimate
alone meets keepRecentTokens, the cursor stops at that trailing toolResult
index, no valid cut point sits at or after it (toolResult entries are not
valid cut points), and the default stuck at keep-everything. Compaction then
summarized zero messages, so preflight and overflow compaction silently
no-op and the session loops on a context it cannot shrink.
Default cutIndex to the most recent valid cut before the forward search.
When a cut point exists at or after the cursor the search still finds it and
behavior is unchanged; only the trailing-tool-result case now keeps the
recent tail and summarizes the prefix.
Persistent ACP threads died on the second turn for Kiro: when the backend
can no longer resume a stale session, acpx raises a SessionResumeRequiredError
whose reason text varies by backend ("Resource not found" for Claude,
"Internal error" / RequestError -32603 for Kiro). The recovery gate matched
the human reason text and required "resource not found", so Kiro's "Internal
error" never triggered the fresh-session retry and the thread produced no
reply (ACP_TURN_FAILED).
Recover by acpx's structured detail code instead of the reason text: acpx
tags every such failure with detailCode "SESSION_RESUME_REQUIRED"
(retryable), independent of wording. The two AcpRuntimeError construction
seams were discarding detailCode, so preserve it on AcpRuntimeError and match
it across the error and its cause chain. This fixes every backend's
resume-required failure and is more precise than the reason regex — a generic
"Internal error" without the code is still surfaced rather than silently
retried.
Fixes#87830. Reported by @chouzz.
normalizeAgentEventType checked the `phase:"end" || status==="completed"`
branch before the `failed/blocked` branch, but terminal tool/item events are
emitted with phase:"end" AND the real status, so failed and blocked tools were
normalized to tool.call.completed and the tool.call.failed branch was dead for
the item stream. SDK consumers filtering on tool.call.failed never saw tool
failures (they looked like successes). Reorder so failed/blocked is classified
before end/completed.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
estimateTokens charged 4800 chars per image in the toolResult branch but
counted only text in the user branch, so image blocks in recent user turns
scored zero. findCutPoint never reached keepRecentTokens and left the cut at
the earliest point, so image-heavy sessions compacted to a no-op and looped on
context overflow. Fold the per-image accounting into one shared helper used by
both branches.
SDK list helpers now send an empty params object when filters are omitted while preserving explicit invalid params for Gateway validation.\n\nVerification:\n- git diff --check origin/main...HEAD\n- node --check packages/sdk/src/client.ts\n- codex review --base origin/main\n- GitHub Actions CI release gate 27855603923 succeeded on 353f13c0d1
Distinguish validated gateway reachability from pre-open and TLS-validation failures, and sanitize close diagnostics before terminal output.
Fixes#79099.
Co-authored-by: xialonglee <li.xialong@xydigit.com>