* fix: async transcript I/O to unblock gateway event loop
Two related fixes for event-loop starvation caused by synchronous file
operations on session transcript files during gateway hot paths.
## sessions.list: yield between transcript reads (#75330)
Extract filterAndSortSessionEntries() from listSessionsFromStore() and
add a new listSessionsFromStoreAsync() that yields to the event loop
via setImmediate every 10 session rows. The sessions.list RPC handler
now uses the async version.
The synchronous version is kept for callers that need it (sessions-
resolve visibility checks, embedded backends, subagent tools).
The dominant blocker is readSessionTitleFieldsFromTranscript(), which
performs fs.statSync + fs.openSync + fs.readSync (head) + fs.readSync
(tail) for every session row that requests derived titles or last-
message previews. With 100+ sessions, this blocks the event loop for
32-64 seconds, starving WebSocket heartbeats, channel I/O, and
concurrent RPC.
## session compaction: async file copy (#75414)
Add captureCompactionCheckpointSnapshotAsync() using fs.promises for
stat, copyFile, and unlink instead of fsSync equivalents. Switch both
compact.ts and compact.queued.ts to the async version.
The synchronous copyFileSync of large transcript files (20MB+ observed
in production) was blocking the event loop for the entire copy duration
— one reporter measured a 43-minute event loop block from a single
compaction checkpoint capture.
Refs: #75330, #75414
* test: cover async transcript I/O responsiveness
* fix: avoid sync checkpoint metadata reads
Fix three child-process stdin write paths that let async EPIPE errors
escape to uncaughtException and crash the gateway.
extensions/imessage/src/client.ts (the actual #75438 crash path):
- Add child.stdin.on('error') listener in start() to catch async EPIPE
and reject all pending requests via failAll().
- Add write callback to request() stdin.write() that rejects the
specific pending request on error, instead of leaving it hanging
until timeout.
src/agents/mcp-stdio-transport.ts:
- Fix write callback race in send(): previously resolved the promise
immediately when write() returned true, then the write callback with
EPIPE would fire after the promise was already fulfilled. Now always
settles the promise from the write callback so the outcome is known
before resolving.
src/process/exec.ts:
- Add stdin.on('error') before writing input so EPIPE from a
prematurely-exited child is swallowed — the process exit handler
reports the real status.
One reporter observed a gateway crash after 10.5 hours of stable
uptime — a single EPIPE on an iMessage RPC child process stdin write
killed the gateway with code 1.
Fixes: #75438
* fix(agents): trim trailing assistant turns and rewrite blank user messages in session repair
Session-file repair now:
- Trims trailing assistant messages so the JSONL never ends on
role=assistant, preventing the Anthropic 400 prefill-loop that
fires when thinking is enabled. (#75271)
- Rewrites blank-only user messages to a synthetic '(continue)'
placeholder instead of dropping them, so strict providers
(Qwen/mlx-vlm, Anthropic) no longer reject transcripts missing
a user turn. (#75313)
Closes#75271, closes#75313.
* refactor: clean up comments in session-file repair
* fix(agents): preserve trailing assistant tool-call turns during session trim
Mirror the outbound guard (stripTrailingAssistantPrefillTurns):
skip assistant entries containing toolCall/toolUse/functionCall
blocks so transcript repair can synthesize missing tool results.
Addresses PR review feedback from clawsweeper on #75606.