Fix three child-process stdin write paths that let async EPIPE errors
escape to uncaughtException and crash the gateway.
extensions/imessage/src/client.ts (the actual #75438 crash path):
- Add child.stdin.on('error') listener in start() to catch async EPIPE
and reject all pending requests via failAll().
- Add write callback to request() stdin.write() that rejects the
specific pending request on error, instead of leaving it hanging
until timeout.
src/agents/mcp-stdio-transport.ts:
- Fix write callback race in send(): previously resolved the promise
immediately when write() returned true, then the write callback with
EPIPE would fire after the promise was already fulfilled. Now always
settles the promise from the write callback so the outcome is known
before resolving.
src/process/exec.ts:
- Add stdin.on('error') before writing input so EPIPE from a
prematurely-exited child is swallowed — the process exit handler
reports the real status.
One reporter observed a gateway crash after 10.5 hours of stable
uptime — a single EPIPE on an iMessage RPC child process stdin write
killed the gateway with code 1.
Fixes: #75438
* fix(agents): trim trailing assistant turns and rewrite blank user messages in session repair
Session-file repair now:
- Trims trailing assistant messages so the JSONL never ends on
role=assistant, preventing the Anthropic 400 prefill-loop that
fires when thinking is enabled. (#75271)
- Rewrites blank-only user messages to a synthetic '(continue)'
placeholder instead of dropping them, so strict providers
(Qwen/mlx-vlm, Anthropic) no longer reject transcripts missing
a user turn. (#75313)
Closes#75271, closes#75313.
* refactor: clean up comments in session-file repair
* fix(agents): preserve trailing assistant tool-call turns during session trim
Mirror the outbound guard (stripTrailingAssistantPrefillTurns):
skip assistant entries containing toolCall/toolUse/functionCall
blocks so transcript repair can synthesize missing tool results.
Addresses PR review feedback from clawsweeper on #75606.
Summary:
- The PR updates the shared status reaction controller to track active remove-capable reactions, defer cleanup until clear/restoreInitial, adjust controller and Slack lifecycle tests, add a changelog entry, and carries qrcode runtime-dependency mirror hunks from its older base.
ClawSweeper fixups:
- Included follow-up commit: fix: limit status reaction restore cleanup
- Included follow-up commit: chore: merge main into status reaction cleanup
- Included follow-up commit: fix: mirror qrcode runtime dependency
Validation:
- ClawSweeper review passed for head f3efcb4fd3.
- Required merge gates passed before the squash merge.
Prepared head SHA: f3efcb4fd3
Review: https://github.com/openclaw/openclaw/pull/75582#issuecomment-4358876584
Co-authored-by: Peter Steinberger <steipete@steipete-macstudio.local>
Co-authored-by: Peter Steinberger <steipete@gmail.com>