feat(codex): add guardian app-server mode (#70090)

Reworks the Codex app-server Guardian change into the final landing shape: - keep YOLO as the default local app-server mode - add explicit `appServer.mode: "guardian"` - remove the legacy `OPENCLAW_CODEX_APP_SERVER_GUARDIAN` shortcut - document Guardian configuration and behavior - add Guardian event projection and Docker live probes for approved/ask-back decisions Co-authored-by: pashpashpash <nik@vault77.ai>
2026-05-06 09:10:45 +00:00 · 2026-04-22 16:25:43 -07:00
parent 34e45ecfcc
commit ff02563c7c
15 changed files with 482 additions and 38 deletions
--- a/docs/help/testing.md
+++ b/docs/help/testing.md
@@ -608,11 +608,15 @@ Docker notes:
    thread can resume
  - run `/codex status` and `/codex models` through the same gateway command
    path
+  - optionally run two Guardian-reviewed escalated shell probes: one benign
+    command that should be approved and one fake-secret upload that should be
+    denied so the agent asks back
 - Test: `src/gateway/gateway-codex-harness.live.test.ts`
 - Enable: `OPENCLAW_LIVE_CODEX_HARNESS=1`
 - Default model: `codex/gpt-5.4`
 - Optional image probe: `OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1`
 - Optional MCP/tool probe: `OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1`
+- Optional Guardian probe: `OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1`
 - The smoke sets `OPENCLAW_AGENT_HARNESS_FALLBACK=none` so a broken Codex
  harness cannot pass by silently falling back to PI.
 - Auth: `OPENAI_API_KEY` from the shell/profile, plus optional copied
@@ -625,6 +629,7 @@ source ~/.profile
 OPENCLAW_LIVE_CODEX_HARNESS=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1 \
+  OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1 \
  OPENCLAW_LIVE_CODEX_HARNESS_MODEL=codex/gpt-5.4 \
  pnpm test:live -- src/gateway/gateway-codex-harness.live.test.ts
 ```
@@ -642,9 +647,11 @@ Docker notes:
 - It sources the mounted `~/.profile`, passes `OPENAI_API_KEY`, copies Codex CLI
  auth files when present, installs `@openai/codex` into a writable mounted npm
  prefix, stages the source tree, then runs only the Codex-harness live test.
- Docker enables the image and MCP/tool probes by default. Set
+- Docker enables the image, MCP/tool, and Guardian probes by default. Set
  `OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0` or
-  `OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0` when you need a narrower debug run.
+  `OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0` or
+  `OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0` when you need a narrower debug
+  run.
 - Docker also exports `OPENCLAW_AGENT_HARNESS_FALLBACK=none`, matching the live
  test config so `openai-codex/*` or PI fallback cannot hide a Codex harness
  regression.
--- a/docs/plugins/codex-harness.md
+++ b/docs/plugins/codex-harness.md
@@ -271,12 +271,14 @@ By default, the plugin starts Codex locally with:
 codex app-server --listen stdio://
 ```

-By default, OpenClaw starts local Codex harness sessions fully unchained:
-`approvalPolicy: "never"` and `sandbox: "danger-full-access"`. That matches the
-trusted local operator posture used by the Codex CLI and lets autonomous
-heartbeats use network and shell tools without waiting on an invisible native
-approval path. You can tighten that policy, for example by routing reviews
-through the guardian:
+By default, OpenClaw starts local Codex harness sessions in YOLO mode:
+`approvalPolicy: "never"`, `approvalsReviewer: "user"`, and
+`sandbox: "danger-full-access"`. This is the trusted local operator posture used
+for autonomous heartbeats: Codex can use shell and network tools without
+stopping on native approval prompts that nobody is around to answer.
+
+To opt in to Codex guardian-reviewed approvals, set `appServer.mode:
+"guardian"`:

 ```json5
 {
@@ -286,9 +288,7 @@ through the guardian:
        enabled: true,
        config: {
          appServer: {
-            approvalPolicy: "untrusted",
-            approvalsReviewer: "guardian_subagent",
-            sandbox: "workspace-write",
+            mode: "guardian",
            serviceTier: "priority",
          },
        },
@@ -298,6 +298,45 @@ through the guardian:
 }
 ```

+Guardian mode expands to:
+
+```json5
+{
+  plugins: {
+    entries: {
+      codex: {
+        enabled: true,
+        config: {
+          appServer: {
+            mode: "guardian",
+            approvalPolicy: "on-request",
+            approvalsReviewer: "guardian_subagent",
+            sandbox: "workspace-write",
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+Guardian is a native Codex approval reviewer. When Codex asks to leave the
+sandbox, write outside the workspace, or add permissions such as network access,
+Codex routes that approval request to a reviewer subagent instead of a human
+prompt. The reviewer gathers context and applies Codex's risk framework, then
+approves or denies the specific request. Guardian is useful when you want more
+guardrails than YOLO mode but still need unattended agents and heartbeats to
+make progress.
+
+The Docker live harness includes a Guardian probe when
+`OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1`. It starts the Codex harness in
+Guardian mode, verifies that a benign escalated shell command is approved, and
+verifies that a fake-secret upload to an untrusted external destination is
+denied so the agent asks back for explicit approval.
+
+The individual policy fields still win over `mode`, so advanced deployments can
+mix the preset with explicit choices.
+
 For an already-running app-server, use WebSocket transport:

 ```json5
@@ -322,30 +361,35 @@ For an already-running app-server, use WebSocket transport:

 Supported `appServer` fields:

-| Field               | Default                                  | Meaning                                                                  |
-| ------------------- | ---------------------------------------- | ------------------------------------------------------------------------ |
-| `transport`         | `"stdio"`                                | `"stdio"` spawns Codex; `"websocket"` connects to `url`.                 |
-| `command`           | `"codex"`                                | Executable for stdio transport.                                          |
-| `args`              | `["app-server", "--listen", "stdio://"]` | Arguments for stdio transport.                                           |
-| `url`               | unset                                    | WebSocket app-server URL.                                                |
-| `authToken`         | unset                                    | Bearer token for WebSocket transport.                                    |
-| `headers`           | `{}`                                     | Extra WebSocket headers.                                                 |
-| `requestTimeoutMs`  | `60000`                                  | Timeout for app-server control-plane calls.                              |
-| `approvalPolicy`    | `"never"`                                | Native Codex approval policy sent to thread start/resume/turn.           |
-| `sandbox`           | `"danger-full-access"`                   | Native Codex sandbox mode sent to thread start/resume.                   |
-| `approvalsReviewer` | `"user"`                                 | Use `"guardian_subagent"` to let Codex guardian review native approvals. |
-| `serviceTier`       | unset                                    | Optional Codex service tier, for example `"priority"`.                   |
+| Field               | Default                                  | Meaning                                                         |
+| ------------------- | ---------------------------------------- | --------------------------------------------------------------- |
+| `transport`         | `"stdio"`                                | `"stdio"` spawns Codex; `"websocket"` connects to `url`.        |
+| `command`           | `"codex"`                                | Executable for stdio transport.                                 |
+| `args`              | `["app-server", "--listen", "stdio://"]` | Arguments for stdio transport.                                  |
+| `url`               | unset                                    | WebSocket app-server URL.                                       |
+| `authToken`         | unset                                    | Bearer token for WebSocket transport.                           |
+| `headers`           | `{}`                                     | Extra WebSocket headers.                                        |
+| `requestTimeoutMs`  | `60000`                                  | Timeout for app-server control-plane calls.                     |
+| `mode`              | `"yolo"`                                 | Preset for YOLO or guardian-reviewed execution.                 |
+| `approvalPolicy`    | `"never"`                                | Native Codex approval policy sent to thread start/resume/turn.  |
+| `sandbox`           | `"danger-full-access"`                   | Native Codex sandbox mode sent to thread start/resume.          |
+| `approvalsReviewer` | `"user"`                                 | Use `"guardian_subagent"` to let Codex Guardian review prompts. |
+| `serviceTier`       | unset                                    | Optional Codex service tier, for example `"priority"`.          |

 The older environment variables still work as fallbacks for local testing when
 the matching config field is unset:

 - `OPENCLAW_CODEX_APP_SERVER_BIN`
 - `OPENCLAW_CODEX_APP_SERVER_ARGS`
+- `OPENCLAW_CODEX_APP_SERVER_MODE=yolo|guardian`
 - `OPENCLAW_CODEX_APP_SERVER_APPROVAL_POLICY`
 - `OPENCLAW_CODEX_APP_SERVER_SANDBOX`
- `OPENCLAW_CODEX_APP_SERVER_GUARDIAN=1`

-Config is preferred for repeatable deployments.
+`OPENCLAW_CODEX_APP_SERVER_GUARDIAN=1` was removed. Use
+`plugins.entries.codex.config.appServer.mode: "guardian"` instead, or
+`OPENCLAW_CODEX_APP_SERVER_MODE=guardian` for one-off local testing. Config is
+preferred for repeatable deployments because it keeps the plugin behavior in the
+same reviewed file as the rest of the Codex harness setup.

 ## Common recipes

@@ -390,6 +434,7 @@ Guardian-reviewed Codex approvals:
        enabled: true,
        config: {
          appServer: {
+            mode: "guardian",
            approvalPolicy: "on-request",
            approvalsReviewer: "guardian_subagent",
            sandbox: "workspace-write",