docs(tools): rewrite loop detection, code execution, and tighten elevated/skills

Loop detection (docs/tools/loop-detection.md): substantial rewrite. Fixed the post-compaction guard default story — the guard runs whenever tools.loopDetection.enabled is not explicitly false, even with no config block at all (verified in src/agents/pi-embedded-runner/run.ts near line 800: 'enabled: resolvedLoopDetectionConfig?.enabled !== false'). The previous doc framed it as opt-in. Added the missing unknownToolThreshold field (default 10) sourced from src/config/schema.help.ts, a complete fields table, and a CardGroup related links section. Code execution (docs/tools/code-execution.md): rewrote with Steps-driven setup, code-verified defaults from extensions/xai/src/code-execution-shared.ts (default model grok-4-1-fast, default timeout 30 s, optional maxTurns), the missing_xai_api_key structured error documented as JSON, and a properties summary table. Replaced the trailing bullet list with a CardGroup pointing at exec, exec-approvals, web tools, and the xAI provider page. Elevated (docs/tools/elevated.md): converted Related to a CardGroup and added a Note that the bash chat command (! prefix / /bash alias) also requires tools.elevated, sourced from src/config/schema.help.ts:1375. Skills config (docs/tools/skills-config.md): renamed the 'Sandboxed skills + env vars' subhead to remove the brittle '+' character per docs/CLAUDE.md, promoted the host-only env warning to a Warning block so the most common skill-config footgun stays visible, and converted Related to a CardGroup including a config-reference link.
2026-05-06 05:20:43 +00:00 · 2026-05-05 16:48:27 -07:00
parent b3ab3cde96
commit 180e295dc6
4 changed files with 209 additions and 97 deletions
--- a/docs/tools/code-execution.md
+++ b/docs/tools/code-execution.md
@@ -1,5 +1,5 @@
 ---
-summary: "code_execution -- run sandboxed remote Python analysis with xAI"
+summary: "code_execution: run sandboxed remote Python analysis with xAI"
 read_when:
  - You want to enable or configure code_execution
  - You want remote analysis without local shell access
@@ -7,53 +7,95 @@ read_when:
 title: "Code execution"
 ---

-`code_execution` runs sandboxed remote Python analysis on xAI's Responses API.
+`code_execution` runs sandboxed remote Python analysis on xAI's Responses API. It is registered by the bundled `xai` plugin (under the `tools` contract) and dispatches to the same `https://api.x.ai/v1/responses` endpoint used by `x_search`.
+
+| Property           | Value                                                          |
+| ------------------ | -------------------------------------------------------------- |
+| Tool name          | `code_execution`                                               |
+| Provider plugin    | `xai` (bundled, `enabledByDefault: true`)                      |
+| Auth               | `XAI_API_KEY` or `plugins.entries.xai.config.webSearch.apiKey` |
+| Default model      | `grok-4-1-fast`                                                |
+| Default timeout    | 30 seconds                                                     |
+| Default `maxTurns` | unset (xAI applies its own internal limit)                     |
+
 This is different from local [`exec`](/tools/exec):

- `exec` runs shell commands on your machine or node
- `code_execution` runs Python in xAI's remote sandbox
+- `exec` runs shell commands on your machine or paired node.
+- `code_execution` runs Python in xAI's remote sandbox.

 Use `code_execution` for:

- calculations
- tabulation
- quick statistics
- chart-style analysis
- analyzing data returned by `x_search` or `web_search`
+- Calculations.
+- Tabulation.
+- Quick statistics.
+- Chart-style analysis.
+- Analyzing data returned by `x_search` or `web_search`.

-Do **not** use it when you need local files, your shell, your repo, or paired
-devices. Use [`exec`](/tools/exec) for that.
+Do **not** use it when you need local files, your shell, your repo, or paired devices. Use [`exec`](/tools/exec) for that.

 ## Setup

-You need an xAI API key. Any of these work:
+<Steps>
+  <Step title="Provide an xAI API key">
+    Set `XAI_API_KEY` in the gateway environment, or configure the key under the xAI plugin so the same credential covers `code_execution`, `x_search`, web search, and other xAI tools:

- `XAI_API_KEY`
- `plugins.entries.xai.config.webSearch.apiKey`
+    ```bash
+    export XAI_API_KEY=xai-...
+    ```

-Example:
+    Or via config:

-```json5
-{
-  plugins: {
-    entries: {
-      xai: {
-        config: {
-          webSearch: {
-            apiKey: "xai-...",
-          },
-          codeExecution: {
-            enabled: true,
-            model: "grok-4-1-fast",
-            maxTurns: 2,
-            timeoutSeconds: 30,
+    ```json5
+    {
+      plugins: {
+        entries: {
+          xai: {
+            config: {
+              webSearch: {
+                apiKey: "xai-...",
+              },
+            },
          },
        },
      },
-    },
-  },
-}
-```
+    }
+    ```
+
+  </Step>
+
+  <Step title="Enable and tune code_execution">
+    The tool is gated on `plugins.entries.xai.config.codeExecution.enabled`. Default is off.
+
+    ```json5
+    {
+      plugins: {
+        entries: {
+          xai: {
+            config: {
+              codeExecution: {
+                enabled: true,
+                model: "grok-4-1-fast", // override the default xAI code-execution model
+                maxTurns: 2,            // optional cap on internal tool turns
+                timeoutSeconds: 30,     // request timeout (default: 30)
+              },
+            },
+          },
+        },
+      },
+    }
+    ```
+
+  </Step>
+
+  <Step title="Restart the Gateway">
+    ```bash
+    openclaw gateway restart
+    ```
+
+    `code_execution` shows up in the agent's tool list once the xAI plugin re-registers with `enabled: true`.
+
+  </Step>
+</Steps>

 ## How to use it

@@ -71,20 +113,40 @@ Use x_search to find posts mentioning OpenClaw this week, then use code_executio
 Use web_search to gather the latest AI benchmark numbers, then use code_execution to compare percent changes.
 ```

-The tool takes a single `task` parameter internally, so the agent should send
-the full analysis request and any inline data in one prompt.
+The tool takes a single `task` parameter internally, so the agent should send the full analysis request and any inline data in one prompt.
+
+## Errors
+
+When the tool runs without auth, it returns a structured `missing_xai_api_key` error pointing at the env var and config path. The error is JSON, not a thrown exception, so the agent can self-correct:
+
+```json
+{
+  "error": "missing_xai_api_key",
+  "message": "code_execution needs an xAI API key. Set XAI_API_KEY in the Gateway environment, or configure plugins.entries.xai.config.webSearch.apiKey.",
+  "docs": "https://docs.openclaw.ai/tools/code-execution"
+}
+```

 ## Limits

 - This is remote xAI execution, not local process execution.
- It should be treated as ephemeral analysis, not a persistent notebook.
+- Treat results as ephemeral analysis, not a persistent notebook session.
 - Do not assume access to local files or your workspace.
- For fresh X data, use [`x_search`](/tools/web#x_search) first.
+- For fresh X data, use [`x_search`](/tools/web#x_search) first and pipe the result into `code_execution`.

 ## Related

- [Exec tool](/tools/exec)
- [Exec approvals](/tools/exec-approvals)
- [apply_patch tool](/tools/apply-patch)
- [Web tools](/tools/web)
- [xAI](/providers/xai)
+<CardGroup cols={2}>
+  <Card title="Exec tool" href="/tools/exec" icon="terminal">
+    Local shell execution on your machine or paired node.
+  </Card>
+  <Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
+    Allow/deny policy for shell execution.
+  </Card>
+  <Card title="Web tools" href="/tools/web" icon="globe">
+    `web_search`, `x_search`, and `web_fetch`.
+  </Card>
+  <Card title="xAI provider" href="/providers/xai" icon="microchip">
+    Grok models, web/x search, and code execution config.
+  </Card>
+</CardGroup>
--- a/docs/tools/elevated.md
+++ b/docs/tools/elevated.md
@@ -102,13 +102,27 @@ Allowlist entry formats:

 ## What elevated does not control

- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it
+- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it.
 - **Host selection policy**: elevated does not turn `auto` into a free cross-host override. It uses the configured/session exec target rules, choosing `node` only when the target is already `node`.
- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode
+- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode.
+
+<Note>
+  The bash chat command (`!` prefix; `/bash` alias) is a separate gate that requires `tools.elevated` to be enabled in addition to its own `tools.bash.enabled` flag. Disabling elevated locks `!` shell commands out as well.
+</Note>

 ## Related

- [Exec tool](/tools/exec) — shell command execution
- [Exec approvals](/tools/exec-approvals) — approval and allowlist system
- [Sandboxing](/gateway/sandboxing) — sandbox configuration
- [Sandbox vs Tool Policy vs Elevated](/gateway/sandbox-vs-tool-policy-vs-elevated)
+<CardGroup cols={2}>
+  <Card title="Exec tool" href="/tools/exec" icon="terminal">
+    Shell command execution from the agent.
+  </Card>
+  <Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
+    Approval and allowlist system for `exec`.
+  </Card>
+  <Card title="Sandboxing" href="/gateway/sandboxing" icon="box">
+    Gateway-level sandbox configuration.
+  </Card>
+  <Card title="Sandbox vs Tool Policy vs Elevated" href="/gateway/sandbox-vs-tool-policy-vs-elevated" icon="scale-balanced">
+    How the three gates compose during a tool call.
+  </Card>
+</CardGroup>
--- a/docs/tools/loop-detection.md
+++ b/docs/tools/loop-detection.md
@@ -5,37 +5,45 @@ read_when:
  - A user reports agents getting stuck repeating tool calls
  - You need to tune repetitive-call protection
  - You are editing agent tool/runtime policies
+  - You hit `compaction_loop_persisted` aborts after a context-overflow retry
 ---

-OpenClaw can keep agents from getting stuck in repeated tool-call patterns.
-The guard is **disabled by default**.
+OpenClaw has two cooperating guardrails for repetitive tool-call patterns:

-Enable it only where needed, because it can block legitimate repeated calls with strict settings.
+1. **Loop detection** (`tools.loopDetection.enabled`) — disabled by default. Watches the rolling tool-call history for repeated patterns and unknown-tool retries.
+2. **Post-compaction guard** (`tools.loopDetection.postCompactionGuard`) — enabled by default unless `tools.loopDetection.enabled` is explicitly `false`. Arms after every compaction-retry and aborts the run when the agent emits the same `(tool, args, result)` triple within the window.
+
+Both are configured under the same `tools.loopDetection` block, but the post-compaction guard runs whenever the master switch is not explicitly off. Set `tools.loopDetection.enabled: false` to silence both surfaces.

 ## Why this exists

 - Detect repetitive sequences that do not make progress.
 - Detect high-frequency no-result loops (same tool, same inputs, repeated errors).
 - Detect specific repeated-call patterns for known polling tools.
+- Prevent context-overflow then compaction then same-loop cycles from running indefinitely.

 ## Configuration block

-Global defaults:
+Global defaults, with every documented field shown:

 ```json5
 {
  tools: {
    loopDetection: {
-      enabled: false,
+      enabled: false, // master switch for the rolling-history detectors
      historySize: 30,
      warningThreshold: 10,
      criticalThreshold: 20,
+      unknownToolThreshold: 10,
      globalCircuitBreakerThreshold: 30,
      detectors: {
        genericRepeat: true,
        knownPollNoProgress: true,
        pingPong: true,
      },
+      postCompactionGuard: {
+        windowSize: 3, // armed after compaction-retry; runs unless enabled is explicitly false
+      },
    },
  },
 }
@@ -64,67 +72,83 @@ Per-agent override (optional):

 ### Field behavior

- `enabled`: Master switch. `false` means no loop detection is performed.
- `historySize`: number of recent tool calls kept for analysis.
- `warningThreshold`: threshold before classifying a pattern as warning-only.
- `criticalThreshold`: threshold for blocking repetitive loop patterns.
- `globalCircuitBreakerThreshold`: global no-progress breaker threshold.
- `detectors.genericRepeat`: detects repeated same-tool + same-params patterns.
- `detectors.knownPollNoProgress`: detects known polling-like patterns with no state change.
- `detectors.pingPong`: detects alternating ping-pong patterns.
+| Field                            | Default | Effect                                                                                                                          |
+| -------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------- |
+| `enabled`                        | `false` | Master switch for the rolling-history detectors. Setting `false` also disables the post-compaction guard.                       |
+| `historySize`                    | `30`    | Number of recent tool calls kept for analysis.                                                                                  |
+| `warningThreshold`               | `10`    | Threshold before a pattern is classified as warning-only.                                                                       |
+| `criticalThreshold`              | `20`    | Threshold for blocking repetitive loop patterns.                                                                                |
+| `unknownToolThreshold`           | `10`    | Block repeated calls to the same unavailable tool after this many misses.                                                       |
+| `globalCircuitBreakerThreshold`  | `30`    | Global no-progress breaker threshold across all detectors.                                                                      |
+| `detectors.genericRepeat`        | `true`  | Detects repeated same-tool + same-params patterns.                                                                              |
+| `detectors.knownPollNoProgress`  | `true`  | Detects known polling-like patterns with no state change.                                                                       |
+| `detectors.pingPong`             | `true`  | Detects alternating ping-pong patterns.                                                                                         |
+| `postCompactionGuard.windowSize` | `3`     | Number of post-compaction tool calls during which the guard stays armed and the count of identical triples that aborts the run. |

-For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory.
-When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs.
+For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory. When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs.

 ## Recommended setup

- For smaller models, start with `enabled: true`, defaults unchanged. Flagship models rarely need loop detection and can leave it disabled.
+- For smaller models, set `enabled: true` and leave the thresholds at their defaults. Flagship models rarely need rolling-history detection and can leave the master switch at `false` while still benefiting from the post-compaction guard.
 - Keep thresholds ordered as `warningThreshold < criticalThreshold < globalCircuitBreakerThreshold`.
 - If false positives occur:
-  - raise `warningThreshold` and/or `criticalThreshold`
-  - (optionally) raise `globalCircuitBreakerThreshold`
-  - disable only the detector causing issues
-  - reduce `historySize` for less strict historical context
+  - Raise `warningThreshold` and/or `criticalThreshold`.
+  - Optionally raise `globalCircuitBreakerThreshold`.
+  - Disable only the specific detector causing issues (`detectors.<name>: false`).
+  - Reduce `historySize` for less strict historical context.
+- To disable everything (including the post-compaction guard), set `tools.loopDetection.enabled: false` explicitly.

 ## Post-compaction guard

-When the runner completes an auto-compaction-retry (after a context-overflow), it arms a short-window guard that watches the next few tool calls. If the agent emits the _same_ `(toolName, args, result)` triple multiple times within that window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error.
+When the runner completes a compaction-retry after a context-overflow, it arms a short-window guard that watches the next few tool calls. If the agent emits the same `(toolName, argsHash, resultHash)` triple multiple times within the window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error.

-This is a separate code path from the global `tools.loopDetection` detectors. It is independently configurable:
+The guard is gated by the master `tools.loopDetection.enabled` flag with one twist: it stays **enabled when the flag is unset or `true`** and only deactivates when the flag is explicitly `false`. This is intentional. The guard exists to escape compaction loops that would otherwise burn unbounded tokens, so a no-config user still gets the protection.

 ```json5
 {
  tools: {
    loopDetection: {
-      enabled: true, // existing master switch; set false to disable loop guards
+      // master switch; set false to disable the guard along with the rolling detectors
+      enabled: true,
      postCompactionGuard: {
-        windowSize: 3, // default: 3
+        windowSize: 3, // default
      },
    },
  },
 }
 ```

- `windowSize`: number of post-compaction tool calls during which the guard stays armed _and_ the count of identical (tool, args, result) triples that triggers an abort.
+- Lower `windowSize` is stricter (fewer attempts before abort).
+- Higher `windowSize` gives the agent more recovery attempts.
+- The guard never aborts when results are changing, only when results are byte-identical across the window.
+- It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry.

-The guard never aborts when results are changing, only when results are byte-identical across the window. It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry.
+<Note>
+  The post-compaction guard runs whenever the master flag is not explicitly `false`, even if you never wrote a `tools.loopDetection` block. To verify, look for `post-compaction guard armed for N attempts` in the gateway log immediately after a compaction event.
+</Note>

 ## Logs and expected behavior

-When a loop is detected, OpenClaw reports a loop event and blocks or dampens the next tool-cycle depending on severity.
-This protects users from runaway token spend and lockups while preserving normal tool access.
+When a loop is detected, OpenClaw reports a loop event and either dampens or blocks the next tool-cycle depending on severity. This protects users from runaway token spend and lockups while preserving normal tool access.

- Prefer warning and temporary suppression first.
- Escalate only when repeated evidence accumulates.
-
-## Notes
-
- `tools.loopDetection` is merged with agent-level overrides.
- Per-agent config fully overrides or extends global values.
- If no config exists, guardrails stay off.
+- Warnings come first.
+- Suppression follows when patterns persist past the warning threshold.
+- Critical thresholds block the next tool-cycle and surface a clear loop-detection reason in the run record.
+- The post-compaction guard emits `compaction_loop_persisted` errors with the offending tool name and identical-call count.

 ## Related

- [Exec approvals](/tools/exec-approvals)
- [Thinking levels](/tools/thinking)
- [Sub-agents](/tools/subagents)
+<CardGroup cols={2}>
+  <Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
+    Allow/deny policy for shell execution.
+  </Card>
+  <Card title="Thinking levels" href="/tools/thinking" icon="brain">
+    Reasoning effort levels and provider-policy interaction.
+  </Card>
+  <Card title="Sub-agents" href="/tools/subagents" icon="users">
+    Spawning isolated agents to bound runaway behavior.
+  </Card>
+  <Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
+    Full `tools.loopDetection` schema and merging semantics.
+  </Card>
+</CardGroup>
--- a/docs/tools/skills-config.md
+++ b/docs/tools/skills-config.md
@@ -118,20 +118,32 @@ Per-skill fields:
  `skills.load.extraDirs`.
 - Changes to skills are picked up on the next agent turn when the watcher is enabled.

-### Sandboxed skills + env vars
+### Sandboxed skills and env vars

-When a session is **sandboxed**, skill processes run inside the configured
-sandbox backend. The sandbox does **not** inherit the host `process.env`.
+When a session is **sandboxed**, skill processes run inside the configured sandbox backend. The sandbox does **not** inherit the host `process.env`.
+
+<Warning>
+  Global `env` and `skills.entries.<skill>.env`/`apiKey` apply to **host** runs only. Inside a sandbox they have no effect, so a skill that depends on `GEMINI_API_KEY` will fail with `apiKey not configured` unless the sandbox is given the variable separately.
+</Warning>

 Use one of:

- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`)
- bake the env into your custom sandbox image or remote sandbox environment
-
-Global `env` and `skills.entries.<skill>.env/apiKey` apply to **host** runs only.
+- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`).
+- Bake the env into your custom sandbox image or remote sandbox environment.

 ## Related

- [Skills](/tools/skills)
- [Creating skills](/tools/creating-skills)
- [Slash commands](/tools/slash-commands)
+<CardGroup cols={2}>
+  <Card title="Skills" href="/tools/skills" icon="puzzle-piece">
+    What skills are and how they load.
+  </Card>
+  <Card title="Creating skills" href="/tools/creating-skills" icon="hammer">
+    Authoring custom skill packs.
+  </Card>
+  <Card title="Slash commands" href="/tools/slash-commands" icon="terminal">
+    Native command catalog and chat directives.
+  </Card>
+  <Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
+    Full `skills` and `agents.skills` schema.
+  </Card>
+</CardGroup>