Commit Graph

75 Commits

Author SHA1 Message Date
Vincent Koc
2252cf6f03 fix(supervisor): bound captured process output 2026-05-28 13:43:36 +02:00
WhatsSkiLL
b13166bc0c fix: gracefully escalate process supervisor cancellations (#85865)
* fix: gracefully escalate supervisor cancellations

* fix: preserve process-tree cancellation during grace

* fix: satisfy signal monitor allSettled lint

* fix(process): split graceful cancel signal escalation

---------

Co-authored-by: JARVIS-Glasses <284122573+JARVIS-Glasses@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-24 03:35:37 +01:00
Peter Steinberger
c69b58157f test: guard process child adapter calls 2026-05-11 22:41:27 +01:00
Peter Steinberger
ca6a513c55 test: guard process queue calls 2026-05-11 22:40:24 +01:00
Peter Steinberger
70a0236ccc test: guard process spawn calls 2026-05-11 22:36:06 +01:00
bitloi
ed6b030a43 feat(process): show input-wait hints in log and poll
Show input-wait hints in process log/poll for idle interactive background sessions, keep list markers and structured stdin metadata, and document the recovery flow through log plus existing input actions.

Docs: updated docs/gateway/background-process.md.

Verification:
- pnpm test src/agents/bash-tools.test.ts
- pnpm test src/agents/bash-tools.process.input-hints.test.ts
- pnpm test src/agents/bash-tools.process.input-hints.test.ts src/agents/bash-tools.process.poll-timeout.test.ts src/agents/bash-tools.process.supervisor.test.ts src/agents/bash-tools.process-send-keys.test.ts
- pnpm check:docs
- git diff --check
- CI on 4aea1f11fe: check, check-additional, check-docs, checks-node-core, process/security relevant shards, real behavior proof passed

Fixes #33957.
Thanks @bitloi and @vincentkoc.

Co-authored-by: bitloi <89318445+bitloi@users.noreply.github.com>
Co-authored-by: bitloi <raphaelaloi.eth@gmail.com>
2026-05-10 04:13:07 -04:00
Peter Steinberger
b447d30349 test: tighten process assertions 2026-05-09 12:28:59 +01:00
Shakker
8cd978c02a test: tighten core empty array assertions 2026-05-09 05:12:12 +01:00
Peter Steinberger
03e7fcfcc8 test: simplify supervisor adapter fixture 2026-05-08 20:13:35 +01:00
Peter Steinberger
03ac05a3cd test: tighten core helper assertions 2026-05-08 16:48:41 +01:00
Peter Steinberger
5c589673ec test: clarify loose boolean assertions 2026-05-08 14:00:34 +01:00
Peter Steinberger
9ef37d1907 test: tighten assertions and harness coverage 2026-05-08 05:28:12 +01:00
Vincent Koc
e6d2c9b080 fix(process): decode Windows command output with console codepage awareness (#72393)
* fix(process): decode Windows command output with console codepage awareness

* fix(clownfish): address review for ghcrawl-199248-agentic-merge (1)
2026-04-26 23:10:59 -07:00
hcl
4a72e1b990 fix(process): skip kill-tree group kill when child wasn't detached (#71662) (#71681)
* fix(process): skip kill-tree group kill when child wasn't detached (#71662)

When the supervisor spawns a child with detached:false (service-managed
runtime under launchd/systemd), the child shares the gateway's process
group. On session abort or SIGKILL, killProcessTree was unconditionally
issuing process.kill(-pid, 'SIGTERM') — which targets the entire process
GROUP (negative pid is POSIX group-kill semantics) and therefore
SIGTERMs the gateway parent along with the child.

Reporter saw this on macOS (LaunchAgent + KeepAlive=true): aborting a
claude-cli/claude-opus-4-7 session caused the gateway to receive
SIGTERM, then auto-restart, dropping all in-flight sessions. Switching
the primary model to a non-cli provider eliminated it because the
non-cli paths don't go through this kill-tree call. Did not occur on
Linux VPS where the gateway runs detached, because there
useDetached === true and the child got its own process group.

Fix:
- killProcessTree now accepts opts.detached?: boolean. When detached:false,
  killProcessTreeUnix skips the `-pid` group-kill and goes straight to
  direct-pid SIGTERM/SIGKILL. Group-kill default (detached:true) is
  preserved so all existing callers behave exactly as before.
- supervisor/adapters/child.ts:286 now threads the spawn-time `useDetached`
  flag into killProcessTree, so the kill-tree path matches the spawn-time
  detachment decision (line 45 of the same file already computes
  useDetached = process.platform !== 'win32' && !isServiceManagedRuntime()).

Tests:
- new: detached:false skips group kill and uses direct pid SIGTERM only.
- new: default behaviour (detached:true) still uses group kill (regression
  guard so the existing test case isn't accidentally weakened).

Existing tests still pass (6/6 in kill-tree.test.ts). Lint clean.

Out of scope: other killProcessTree callers (mcp-stdio-transport,
bash-tools.process, etc.) keep the default group-kill behaviour because
those processes are typically detached from the gateway. Only the
supervisor/adapters/child.ts path threads `detached` through, since it's
the path that knows whether the child was actually spawned detached.

* fixup(process): also gate kill-tree group-kill on the no-detach spawn fallback (#71662)

Greptile review on the original PR caught a P1 gap: when
spawnWithFallback's initial detached spawn fails and it retries with the
no-detach fallback (label: "no-detach", options.detached: false), the
child runs detached:false but my variable useDetached was still true.
The kill closure then passed `detached: useDetached` = true to
killProcessTree, which still group-killed the gateway — same bug, just
on the fallback path.

Compute the actual detachment as
`useDetached && !spawned.usedFallback` after spawn returns, and pass
that through. This closes the gap: the kill path now correctly skips
group-kill in BOTH:
1. Service-managed runtime (useDetached=false from the start, original case)
2. Detached-spawn fallback to no-detach (useDetached=true at intent
   time but spawned.usedFallback=true)

Tests:
- existing 'uses process-tree kill for default SIGKILL' updated to
  assert the new {detached} option is forwarded.
- new: passes detached:false to killProcessTree when spawn fell back.
- new: passes detached:false in service-managed mode (regression guard
  for the original fix).

11/11 tests pass in child.test.ts. 6/6 in kill-tree.test.ts.
2026-04-25 17:08:53 -04:00
Peter Steinberger
01bc49c88c test: move pty cleanup coverage to adapter 2026-04-24 11:09:55 +01:00
Peter Steinberger
cc9dcd3d69 fix(gateway): prefer linux child OOM victims
Raise eligible Linux child processes own oom_score_adj from a child-side /bin/sh exec shim so cgroup memory pressure prefers transient workers over the long-lived gateway. Cover supervisor children, PTY shells, MCP stdio servers, and OpenClaw-launched browser processes through the shared process runtime seam.

Harden the wrapper for distroless images, shell startup env, per-child and process-level opt-outs, dash-compatible exec, and leading-dash command names. Document Linux verification and OOM behavior.

Fixes #70404.

Co-authored-by: Neerav Makwana <261249544+neeravmakwana@users.noreply.github.com>
2026-04-23 05:23:40 +01:00
Peter Steinberger
0195da6b0e refactor: cache optional runtime imports 2026-04-18 20:45:26 +01:00
Peter Steinberger
4fa961d4f1 refactor(lint): enable map spread rule 2026-04-18 20:37:12 +01:00
Peter Steinberger
c035c5c0d2 refactor: cache lazy runtime imports 2026-04-18 16:18:26 +01:00
Vincent Koc
e1e20c424b test(process): share supervisor sigkill wait assertions 2026-04-12 04:52:29 +01:00
Peter Steinberger
ebfd468ee0 refactor: simplify typed conversions 2026-04-11 01:01:30 +01:00
Vincent Koc
78d2e9e2a8 fix(ci): repair main type drift 2026-04-10 08:13:02 +01:00
Ayaan Zaidi
c003e982a2 fix(process): drain Windows stdio before exit fallback settle 2026-04-10 10:09:25 +05:30
Ayaan Zaidi
063049c0d4 fix(process): wait for close after Windows exit fallback 2026-04-10 10:09:25 +05:30
Ayaan Zaidi
4b6b1a3ed3 fix(process): settle Windows supervisor waits from exit state 2026-04-10 10:09:25 +05:30
Peter Steinberger
371c4147f3 fix: restore ci after rebase drift 2026-04-07 07:36:11 +01:00
Peter Steinberger
0a6fd459f9 refactor: dedupe channel and cli readers 2026-04-07 07:36:11 +01:00
Peter Steinberger
6b6ddcd2a6 test: speed up core runtime suites 2026-03-31 02:25:02 +01:00
Shakker
81e65e119f test: mock supervisor timeout flows 2026-03-31 01:40:55 +01:00
Gustavo Madeira Santana
6a37ecad82 Supervisor: unblock waits after forced child kill 2026-03-30 00:45:22 -04:00
Peter Steinberger
351a931a62 fix(ci): restore runtime-api guardrails 2026-03-27 15:56:54 +00:00
Peter Steinberger
2383107711 fix: unblock supervisor and memory gate failures 2026-03-24 18:40:46 +00:00
Peter Steinberger
e7817ad12a test: continue vitest threads migration 2026-03-24 08:37:00 +00:00
Peter Steinberger
5fb7a1363f fix: stabilize full gate 2026-03-17 07:06:25 +00:00
Peter Steinberger
344b2286aa refactor: share windows command shim resolution 2026-03-10 22:18:04 +00:00
yuweuii
6c9b49a10b fix(sessions): clear stale contextTokens on model switch (#38044)
Merged via squash.

Prepared head SHA: bac2df4b7f
Co-authored-by: yuweuii <82372187+yuweuii@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-03-08 10:59:16 -07:00
Peter Steinberger
e20f445099 fix(supervisor): keep service-managed children attached (#38463, thanks @spirittechie)
Co-authored-by: Jesse Paul <drzin69@gmail.com>
2026-03-07 21:36:24 +00:00
Peter Steinberger
09748ab109 test(perf): speed up supervisor and exec process tests 2026-03-02 13:53:10 +00:00
Peter Steinberger
d9ff3bf1af test(perf): tighten process exec and supervisor timing fixtures 2026-03-02 11:56:57 +00:00
Peter Steinberger
bff785aecc test(perf): tighten process test timeouts and fs setup 2026-03-02 11:16:24 +00:00
Peter Steinberger
8a1465c314 test(perf): trim timer-heavy suites and guardrail scanning 2026-03-02 10:28:39 +00:00
Peter Steinberger
04030ddf68 test(runtime): trim timer-heavy regression suites 2026-03-02 09:47:29 +00:00
Peter Steinberger
3b5a276a48 test: speed up supervisor test timing 2026-02-23 20:15:56 +00:00
Peter Steinberger
fe62711342 test(gate): stabilize env- and timing-sensitive process/web-search checks 2026-02-23 19:19:58 +00:00
Peter Steinberger
34ea33f057 refactor: dedupe core config and runtime helpers 2026-02-22 17:11:54 +00:00
Peter Steinberger
d01cc69ef0 test: tighten process timeout fixtures 2026-02-22 17:06:35 +00:00
Peter Steinberger
d116bcfb14 refactor(runtime): consolidate followup, gateway, and provider dedupe paths 2026-02-22 14:08:51 +00:00
Peter Steinberger
00eb2541dc test: shorten idle child timers in timeout assertions 2026-02-22 12:37:49 +00:00
Peter Steinberger
d6d73d0ed9 test(core): trim redundant test resets and use mockClear 2026-02-22 08:12:55 +00:00
Peter Steinberger
1f0695ba47 test(core): use lightweight clears in update, child adapter, and copilot token setup 2026-02-22 08:01:16 +00:00