Show input-wait hints in process log/poll for idle interactive background sessions, keep list markers and structured stdin metadata, and document the recovery flow through log plus existing input actions.
Docs: updated docs/gateway/background-process.md.
Verification:
- pnpm test src/agents/bash-tools.test.ts
- pnpm test src/agents/bash-tools.process.input-hints.test.ts
- pnpm test src/agents/bash-tools.process.input-hints.test.ts src/agents/bash-tools.process.poll-timeout.test.ts src/agents/bash-tools.process.supervisor.test.ts src/agents/bash-tools.process-send-keys.test.ts
- pnpm check:docs
- git diff --check
- CI on 4aea1f11fe: check, check-additional, check-docs, checks-node-core, process/security relevant shards, real behavior proof passed
Fixes#33957.
Thanks @bitloi and @vincentkoc.
Co-authored-by: bitloi <89318445+bitloi@users.noreply.github.com>
Co-authored-by: bitloi <raphaelaloi.eth@gmail.com>
* fix(process): skip kill-tree group kill when child wasn't detached (#71662)
When the supervisor spawns a child with detached:false (service-managed
runtime under launchd/systemd), the child shares the gateway's process
group. On session abort or SIGKILL, killProcessTree was unconditionally
issuing process.kill(-pid, 'SIGTERM') — which targets the entire process
GROUP (negative pid is POSIX group-kill semantics) and therefore
SIGTERMs the gateway parent along with the child.
Reporter saw this on macOS (LaunchAgent + KeepAlive=true): aborting a
claude-cli/claude-opus-4-7 session caused the gateway to receive
SIGTERM, then auto-restart, dropping all in-flight sessions. Switching
the primary model to a non-cli provider eliminated it because the
non-cli paths don't go through this kill-tree call. Did not occur on
Linux VPS where the gateway runs detached, because there
useDetached === true and the child got its own process group.
Fix:
- killProcessTree now accepts opts.detached?: boolean. When detached:false,
killProcessTreeUnix skips the `-pid` group-kill and goes straight to
direct-pid SIGTERM/SIGKILL. Group-kill default (detached:true) is
preserved so all existing callers behave exactly as before.
- supervisor/adapters/child.ts:286 now threads the spawn-time `useDetached`
flag into killProcessTree, so the kill-tree path matches the spawn-time
detachment decision (line 45 of the same file already computes
useDetached = process.platform !== 'win32' && !isServiceManagedRuntime()).
Tests:
- new: detached:false skips group kill and uses direct pid SIGTERM only.
- new: default behaviour (detached:true) still uses group kill (regression
guard so the existing test case isn't accidentally weakened).
Existing tests still pass (6/6 in kill-tree.test.ts). Lint clean.
Out of scope: other killProcessTree callers (mcp-stdio-transport,
bash-tools.process, etc.) keep the default group-kill behaviour because
those processes are typically detached from the gateway. Only the
supervisor/adapters/child.ts path threads `detached` through, since it's
the path that knows whether the child was actually spawned detached.
* fixup(process): also gate kill-tree group-kill on the no-detach spawn fallback (#71662)
Greptile review on the original PR caught a P1 gap: when
spawnWithFallback's initial detached spawn fails and it retries with the
no-detach fallback (label: "no-detach", options.detached: false), the
child runs detached:false but my variable useDetached was still true.
The kill closure then passed `detached: useDetached` = true to
killProcessTree, which still group-killed the gateway — same bug, just
on the fallback path.
Compute the actual detachment as
`useDetached && !spawned.usedFallback` after spawn returns, and pass
that through. This closes the gap: the kill path now correctly skips
group-kill in BOTH:
1. Service-managed runtime (useDetached=false from the start, original case)
2. Detached-spawn fallback to no-detach (useDetached=true at intent
time but spawned.usedFallback=true)
Tests:
- existing 'uses process-tree kill for default SIGKILL' updated to
assert the new {detached} option is forwarded.
- new: passes detached:false to killProcessTree when spawn fell back.
- new: passes detached:false in service-managed mode (regression guard
for the original fix).
11/11 tests pass in child.test.ts. 6/6 in kill-tree.test.ts.
Raise eligible Linux child processes own oom_score_adj from a child-side /bin/sh exec shim so cgroup memory pressure prefers transient workers over the long-lived gateway. Cover supervisor children, PTY shells, MCP stdio servers, and OpenClaw-launched browser processes through the shared process runtime seam.
Harden the wrapper for distroless images, shell startup env, per-child and process-level opt-outs, dash-compatible exec, and leading-dash command names. Document Linux verification and OOM behavior.
Fixes#70404.
Co-authored-by: Neerav Makwana <261249544+neeravmakwana@users.noreply.github.com>