mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 10:50:44 +00:00
fix(gateway): prefer linux child OOM victims
Raise eligible Linux child processes own oom_score_adj from a child-side /bin/sh exec shim so cgroup memory pressure prefers transient workers over the long-lived gateway. Cover supervisor children, PTY shells, MCP stdio servers, and OpenClaw-launched browser processes through the shared process runtime seam. Harden the wrapper for distroless images, shell startup env, per-child and process-level opt-outs, dash-compatible exec, and leading-dash command names. Document Linux verification and OOM behavior. Fixes #70404. Co-authored-by: Neerav Makwana <261249544+neeravmakwana@users.noreply.github.com>
This commit is contained in:
@@ -1,2 +1,2 @@
|
||||
e10f01ce10a381ecb098b805cee95b7278d16de42e02c7873f54448eb2b6c5cc plugin-sdk-api-baseline.json
|
||||
918b646ff2e0849c4feba5ef930a08187a7bdad3a2d35ba4e1dd456fe3ea2cea plugin-sdk-api-baseline.jsonl
|
||||
6297ca54fecbf277f3ed2e76410cc79aef95cf7dd887ab2383858a2132f81777 plugin-sdk-api-baseline.json
|
||||
aa3343fda656a0034f9dd5ec7e28fcf45d49b15c1ed64329673ac1629285730c plugin-sdk-api-baseline.jsonl
|
||||
|
||||
@@ -3,6 +3,7 @@ summary: "Linux support + companion app status"
|
||||
read_when:
|
||||
- Looking for Linux companion app status
|
||||
- Planning platform coverage or contributions
|
||||
- Debugging Linux OOM kills or exit 137 on a VPS or container
|
||||
title: "Linux App"
|
||||
---
|
||||
|
||||
@@ -98,3 +99,39 @@ Enable it:
|
||||
```
|
||||
systemctl --user enable --now openclaw-gateway[-<profile>].service
|
||||
```
|
||||
|
||||
## Memory pressure and OOM kills
|
||||
|
||||
On Linux, the kernel chooses an OOM victim when a host, VM, or container cgroup
|
||||
runs out of memory. The Gateway can be a poor victim because it owns long-lived
|
||||
sessions and channel connections. OpenClaw therefore biases transient child
|
||||
processes to be killed before the Gateway when possible.
|
||||
|
||||
For eligible Linux child spawns, OpenClaw starts the child through a short
|
||||
`/bin/sh` wrapper that raises the child's own `oom_score_adj` to `1000`, then
|
||||
`exec`s the real command. This is an unprivileged operation because the child is
|
||||
only increasing its own OOM kill likelihood.
|
||||
|
||||
Covered child process surfaces include:
|
||||
|
||||
- supervisor-managed command children,
|
||||
- PTY shell children,
|
||||
- MCP stdio server children,
|
||||
- OpenClaw-launched browser/Chrome processes.
|
||||
|
||||
The wrapper is Linux-only and is skipped when `/bin/sh` is unavailable. It is
|
||||
also skipped if the child env sets `OPENCLAW_CHILD_OOM_SCORE_ADJ=0`, `false`,
|
||||
`no`, or `off`.
|
||||
|
||||
To verify a child process:
|
||||
|
||||
```bash
|
||||
cat /proc/<child-pid>/oom_score_adj
|
||||
```
|
||||
|
||||
Expected value for covered children is `1000`. The Gateway process should keep
|
||||
its normal score, usually `0`.
|
||||
|
||||
This does not replace normal memory tuning. If a VPS or container repeatedly
|
||||
kills children, increase the memory limit, reduce concurrency, or add stronger
|
||||
resource controls such as systemd `MemoryMax=` or container-level memory limits.
|
||||
|
||||
@@ -114,3 +114,6 @@ If you deliberately installed a system unit instead, edit
|
||||
|
||||
How `Restart=` policies help automated recovery:
|
||||
[systemd can automate service recovery](https://www.redhat.com/en/blog/systemd-automate-recovery).
|
||||
|
||||
For Linux OOM behavior, child process victim selection, and `exit 137`
|
||||
diagnostics, see [Linux memory pressure and OOM kills](/platforms/linux#memory-pressure-and-oom-kills).
|
||||
|
||||
Reference in New Issue
Block a user