* feat(openai): add GPT-5.6 series support
* docs: refresh map for GPT-5.6
* fix(openai): preserve GPT-5.6 thinking metadata
* fix(codex): sync managed app server version
* fix(codex): sync managed app server version
* fix(openai): account for GPT-5.6 cache writes
---------
Co-authored-by: Peter Steinberger <steipete@golden-gate.local>
* fix(cron): clear agentTurn thinking override when patched with null
Cron agentTurn patches could clear model/fallbacks/toolsAllow overrides by
sending an explicit null, but thinking had no clear path: the patch schema and
normalizer dropped thinking:null before it reached the merge logic, and the
payload merge only handled string values. Blanking the Thinking/Effort field in
the Cron Control UI therefore silently preserved the old value.
Add thinking:null support across the patch schema, exported type, normalizer,
and payload merge (mirroring model). The Control UI now sends an explicit clear
for model/thinking when an edited job blanks a previously stored override, and
the CLI gains --clear-thinking for parity with --clear-model.
* docs(cron): document --clear-thinking beside sibling clear flags
* fix(llm): coerce stringified JSON arrays/objects in tool argument validation
When LLMs serialize array or object tool parameters as JSON strings
(e.g. tags: '["test","debug"]' instead of tags: ["test","debug"]),
validateToolArguments now attempts JSON.parse coercion before
rejecting the value. This mirrors the existing numeric string
coercion path and fixes MCP tool calls from providers like MiMo,
Ollama, and others that stringify complex parameters.
Fixes#96916
* fix(llm): bound JSON.parse size for schema-gated array/object coercion
Add MAX_JSON_COERCE_LENGTH (64KB) guard before JSON.parse in the array
and object coercion branches. Oversized stringified arguments are left
for normal validation to reject rather than synchronously parsed.
Addresses Codex review finding: unbounded JSON.parse on model-controlled
tool arguments could block the event loop or spike memory.
truncateLine could cut a surrogate pair when the maxChars boundary
fell between a high surrogate and its paired low surrogate, producing
a broken unpaired surrogate in grep tool output.
wrapNoteMessage measures every fit decision in visible columns, but its
splitLongWord fallback (for a single word longer than the line budget) sliced
the word into groups of maxLen code points. maxLen is a visible-column budget,
so a run of wide characters (CJK / fullwidth / emoji, 2 columns each) produced
lines up to twice the budget — e.g. a 24-char CJK word at maxWidth 20 emitted a
40-column line.
Accumulate grapheme visible width (the same unit visibleWidth uses) and start a
new segment when the next grapheme would exceed maxLen, so every wrapped segment
fits the budget; a single grapheme wider than the budget still emits, preserving
progress. ASCII wrapping is unchanged.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(validation): preserve null in anyOf unions instead of coercing to empty string
Fixes#96716
* fix(validation): preserve null in anyOf unions instead of coercing to empty string
* fix(validation): preserve null in anyOf unions instead of coercing to empty string
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(markdown): a fenced-code line with trailing text is content, not a closing fence
scanFenceSpans accepted any line starting with >=3 matching fence markers as a
closing fence, ignoring trailing text after the marker. Per CommonMark a closing
fence may be followed only by whitespace, so a code-content line such as
"``` not a close" was wrongly treated as a close: the block ended early, the
following lines were reported as outside any fence, and the trailing marker line
became a new unclosed opener.
That made isSafeFenceBreak() return true for offsets inside the real code block
and findFenceSpanAt() return undefined, so chunkers (chunkMarkdownText, the
embedded-agent block chunker) could split inside a fenced code block — the exact
thing this module exists to prevent.
Require the closing fence's trailing text to be whitespace-only. Opening info
strings, bare closes, and longer same-marker closes are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(markdown): honor fence suffix whitespace rules
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
---------
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
#91499 auto-stamps the creator's tool surface as a default toolsAllow cap
on agentTurn cron payloads whenever the creating session is tool-restricted
(a narrowing allow-policy or an explicit deny). CLI backends cannot enforce
a runtime toolsAllow — cli-runner/prepare.ts rejects any defined allow-list
— so every scheduled agentTurn that resolves to a CLI backend (e.g.
claude-cli) fails to start. This silently broke per-thread scheduled
continuations on CLI backends.
A CLI backend is not a runtime tool-policy boundary: it runs with its own
configured tool set, as the operator, on the local machine, and refuses a
runtime allow-list outright. An inherited default cap is therefore
unenforceable on a CLI backend. Decide at run time, where the backend is
known:
- Flag the default. capCronAgentTurnToolsAllow stamps toolsAllowIsDefault
when it fills in the creator surface because the cron requested nothing
(or a bare "*"). An explicit narrowing or empty allow-list is a real
per-cron restriction and carries no flag.
- Drop only the default, only on CLI. The run-executor drops a flagged
default in the CLI branch and lets the run proceed. An explicit per-cron
restriction (no flag) is deliberately passed through, so prepare.ts still
fails it closed and surfaces that the requested policy needs an embedded
runtime. Embedded runs are untouched and keep the full cap enforced.
- Persist the flag. New nullable cron_jobs.payload_tools_allow_is_default
column (additive ensureColumn migration + codec read/write) so the
decision survives a gateway restart, plus toolsAllowIsDefault on the
gateway-protocol agentTurn payload schema — the stamped payload is
otherwise rejected by the contract's additionalProperties:false.
- Preserve the flag across updates. A no-toolsAllow update (reschedule,
prompt edit) no longer carries the stored default forward as a literal
value — that routed it through the explicit-narrowing branch, stripped the
flag, and re-broke the job on CLI after the next restart. The default is
re-derived (flag intact); an explicit restriction is still carried forward
unflagged.
Net policy: on CLI only the unenforceable inherited default is relaxed;
explicit per-cron restrictions still fail closed; embedded backends are
unchanged.
Tests: run-executor drops the flagged default but propagates an explicit
restriction on CLI; cron-tool stamps/clears the flag across create and
update and preserves it across a no-toolsAllow update; store round-trips the
flag (and its absence) through SQLite.
Not covered: agentTurn crons created during the regression window carry a
flagless toolsAllow and remain fail-closed on CLI until recreated or updated
with an explicit toolsAllow.
`stringifyNonErrorCause` is typed `string`, but its `try` returned
`JSON.stringify(value)`, which is `undefined` for functions, symbols, and
undefined causes — leaking undefined to callers that format nested ACP runtime
failures and expect a string. Fall back to a tag string when stringify yields
undefined, matching the already-correct sibling at `src/infra/errors.ts`.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #91742 wired memory_search's 15s deadline AbortSignal through the builtin
memory manager but missed the QMD backend behind the same
MemorySearchManager.search interface. With QMD, the tool returns "timed out
after 15s" to the agent while the spawned qmd query/search subprocess keeps
running for the full qmd command timeout (memory.qmd.limits.timeoutMs, whose
embed-heavy default was raised to 600s in #87572), leaving orphaned
embedding/search work running after the agent already moved on.
Add optional AbortSignal support to runCliCommand: an aborting signal kills the
spawned child immediately and rejects with the abort reason, funneled through a
single settle() guard so abort/timeout/error/close cannot double-settle. Thread
the search signal through QmdMemoryManager.search -> runQmdSearch -> runQmd ->
runCliCommand for the default direct-qmd subprocess path (including the query
fallback), and fast-fail search() when the signal is already aborted.
findCutPoint defaulted cutIndex to the earliest valid cut (cutPoints[0],
keep everything) and only moved it forward to a cut point at or after the
backward token cursor. When the final entry is a toolResult whose estimate
alone meets keepRecentTokens, the cursor stops at that trailing toolResult
index, no valid cut point sits at or after it (toolResult entries are not
valid cut points), and the default stuck at keep-everything. Compaction then
summarized zero messages, so preflight and overflow compaction silently
no-op and the session loops on a context it cannot shrink.
Default cutIndex to the most recent valid cut before the forward search.
When a cut point exists at or after the cursor the search still finds it and
behavior is unchanged; only the trailing-tool-result case now keeps the
recent tail and summarizes the prefix.
Persistent ACP threads died on the second turn for Kiro: when the backend
can no longer resume a stale session, acpx raises a SessionResumeRequiredError
whose reason text varies by backend ("Resource not found" for Claude,
"Internal error" / RequestError -32603 for Kiro). The recovery gate matched
the human reason text and required "resource not found", so Kiro's "Internal
error" never triggered the fresh-session retry and the thread produced no
reply (ACP_TURN_FAILED).
Recover by acpx's structured detail code instead of the reason text: acpx
tags every such failure with detailCode "SESSION_RESUME_REQUIRED"
(retryable), independent of wording. The two AcpRuntimeError construction
seams were discarding detailCode, so preserve it on AcpRuntimeError and match
it across the error and its cause chain. This fixes every backend's
resume-required failure and is more precise than the reason regex — a generic
"Internal error" without the code is still surfaced rather than silently
retried.
Fixes#87830. Reported by @chouzz.
normalizeAgentEventType checked the `phase:"end" || status==="completed"`
branch before the `failed/blocked` branch, but terminal tool/item events are
emitted with phase:"end" AND the real status, so failed and blocked tools were
normalized to tool.call.completed and the tool.call.failed branch was dead for
the item stream. SDK consumers filtering on tool.call.failed never saw tool
failures (they looked like successes). Reorder so failed/blocked is classified
before end/completed.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>