From 180e295dc604461d7e586dabe467e28a78a513fb Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 5 May 2026 16:48:27 -0700 Subject: [PATCH] docs(tools): rewrite loop detection, code execution, and tighten elevated/skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Loop detection (docs/tools/loop-detection.md): substantial rewrite. Fixed the post-compaction guard default story — the guard runs whenever tools.loopDetection.enabled is not explicitly false, even with no config block at all (verified in src/agents/pi-embedded-runner/run.ts near line 800: 'enabled: resolvedLoopDetectionConfig?.enabled !== false'). The previous doc framed it as opt-in. Added the missing unknownToolThreshold field (default 10) sourced from src/config/schema.help.ts, a complete fields table, and a CardGroup related links section. Code execution (docs/tools/code-execution.md): rewrote with Steps-driven setup, code-verified defaults from extensions/xai/src/code-execution-shared.ts (default model grok-4-1-fast, default timeout 30 s, optional maxTurns), the missing_xai_api_key structured error documented as JSON, and a properties summary table. Replaced the trailing bullet list with a CardGroup pointing at exec, exec-approvals, web tools, and the xAI provider page. Elevated (docs/tools/elevated.md): converted Related to a CardGroup and added a Note that the bash chat command (! prefix / /bash alias) also requires tools.elevated, sourced from src/config/schema.help.ts:1375. Skills config (docs/tools/skills-config.md): renamed the 'Sandboxed skills + env vars' subhead to remove the brittle '+' character per docs/CLAUDE.md, promoted the host-only env warning to a Warning block so the most common skill-config footgun stays visible, and converted Related to a CardGroup including a config-reference link. --- docs/tools/code-execution.md | 146 +++++++++++++++++++++++++---------- docs/tools/elevated.md | 26 +++++-- docs/tools/loop-detection.md | 102 ++++++++++++++---------- docs/tools/skills-config.md | 32 +++++--- 4 files changed, 209 insertions(+), 97 deletions(-) diff --git a/docs/tools/code-execution.md b/docs/tools/code-execution.md index efc8db68aa9..de0e1b846f8 100644 --- a/docs/tools/code-execution.md +++ b/docs/tools/code-execution.md @@ -1,5 +1,5 @@ --- -summary: "code_execution -- run sandboxed remote Python analysis with xAI" +summary: "code_execution: run sandboxed remote Python analysis with xAI" read_when: - You want to enable or configure code_execution - You want remote analysis without local shell access @@ -7,53 +7,95 @@ read_when: title: "Code execution" --- -`code_execution` runs sandboxed remote Python analysis on xAI's Responses API. +`code_execution` runs sandboxed remote Python analysis on xAI's Responses API. It is registered by the bundled `xai` plugin (under the `tools` contract) and dispatches to the same `https://api.x.ai/v1/responses` endpoint used by `x_search`. + +| Property | Value | +| ------------------ | -------------------------------------------------------------- | +| Tool name | `code_execution` | +| Provider plugin | `xai` (bundled, `enabledByDefault: true`) | +| Auth | `XAI_API_KEY` or `plugins.entries.xai.config.webSearch.apiKey` | +| Default model | `grok-4-1-fast` | +| Default timeout | 30 seconds | +| Default `maxTurns` | unset (xAI applies its own internal limit) | + This is different from local [`exec`](/tools/exec): -- `exec` runs shell commands on your machine or node -- `code_execution` runs Python in xAI's remote sandbox +- `exec` runs shell commands on your machine or paired node. +- `code_execution` runs Python in xAI's remote sandbox. Use `code_execution` for: -- calculations -- tabulation -- quick statistics -- chart-style analysis -- analyzing data returned by `x_search` or `web_search` +- Calculations. +- Tabulation. +- Quick statistics. +- Chart-style analysis. +- Analyzing data returned by `x_search` or `web_search`. -Do **not** use it when you need local files, your shell, your repo, or paired -devices. Use [`exec`](/tools/exec) for that. +Do **not** use it when you need local files, your shell, your repo, or paired devices. Use [`exec`](/tools/exec) for that. ## Setup -You need an xAI API key. Any of these work: + + + Set `XAI_API_KEY` in the gateway environment, or configure the key under the xAI plugin so the same credential covers `code_execution`, `x_search`, web search, and other xAI tools: -- `XAI_API_KEY` -- `plugins.entries.xai.config.webSearch.apiKey` + ```bash + export XAI_API_KEY=xai-... + ``` -Example: + Or via config: -```json5 -{ - plugins: { - entries: { - xai: { - config: { - webSearch: { - apiKey: "xai-...", - }, - codeExecution: { - enabled: true, - model: "grok-4-1-fast", - maxTurns: 2, - timeoutSeconds: 30, + ```json5 + { + plugins: { + entries: { + xai: { + config: { + webSearch: { + apiKey: "xai-...", + }, + }, }, }, }, - }, - }, -} -``` + } + ``` + + + + + The tool is gated on `plugins.entries.xai.config.codeExecution.enabled`. Default is off. + + ```json5 + { + plugins: { + entries: { + xai: { + config: { + codeExecution: { + enabled: true, + model: "grok-4-1-fast", // override the default xAI code-execution model + maxTurns: 2, // optional cap on internal tool turns + timeoutSeconds: 30, // request timeout (default: 30) + }, + }, + }, + }, + }, + } + ``` + + + + + ```bash + openclaw gateway restart + ``` + + `code_execution` shows up in the agent's tool list once the xAI plugin re-registers with `enabled: true`. + + + ## How to use it @@ -71,20 +113,40 @@ Use x_search to find posts mentioning OpenClaw this week, then use code_executio Use web_search to gather the latest AI benchmark numbers, then use code_execution to compare percent changes. ``` -The tool takes a single `task` parameter internally, so the agent should send -the full analysis request and any inline data in one prompt. +The tool takes a single `task` parameter internally, so the agent should send the full analysis request and any inline data in one prompt. + +## Errors + +When the tool runs without auth, it returns a structured `missing_xai_api_key` error pointing at the env var and config path. The error is JSON, not a thrown exception, so the agent can self-correct: + +```json +{ + "error": "missing_xai_api_key", + "message": "code_execution needs an xAI API key. Set XAI_API_KEY in the Gateway environment, or configure plugins.entries.xai.config.webSearch.apiKey.", + "docs": "https://docs.openclaw.ai/tools/code-execution" +} +``` ## Limits - This is remote xAI execution, not local process execution. -- It should be treated as ephemeral analysis, not a persistent notebook. +- Treat results as ephemeral analysis, not a persistent notebook session. - Do not assume access to local files or your workspace. -- For fresh X data, use [`x_search`](/tools/web#x_search) first. +- For fresh X data, use [`x_search`](/tools/web#x_search) first and pipe the result into `code_execution`. ## Related -- [Exec tool](/tools/exec) -- [Exec approvals](/tools/exec-approvals) -- [apply_patch tool](/tools/apply-patch) -- [Web tools](/tools/web) -- [xAI](/providers/xai) + + + Local shell execution on your machine or paired node. + + + Allow/deny policy for shell execution. + + + `web_search`, `x_search`, and `web_fetch`. + + + Grok models, web/x search, and code execution config. + + diff --git a/docs/tools/elevated.md b/docs/tools/elevated.md index 470b1a2208c..3908d811728 100644 --- a/docs/tools/elevated.md +++ b/docs/tools/elevated.md @@ -102,13 +102,27 @@ Allowlist entry formats: ## What elevated does not control -- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it +- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it. - **Host selection policy**: elevated does not turn `auto` into a free cross-host override. It uses the configured/session exec target rules, choosing `node` only when the target is already `node`. -- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode +- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode. + + + The bash chat command (`!` prefix; `/bash` alias) is a separate gate that requires `tools.elevated` to be enabled in addition to its own `tools.bash.enabled` flag. Disabling elevated locks `!` shell commands out as well. + ## Related -- [Exec tool](/tools/exec) — shell command execution -- [Exec approvals](/tools/exec-approvals) — approval and allowlist system -- [Sandboxing](/gateway/sandboxing) — sandbox configuration -- [Sandbox vs Tool Policy vs Elevated](/gateway/sandbox-vs-tool-policy-vs-elevated) + + + Shell command execution from the agent. + + + Approval and allowlist system for `exec`. + + + Gateway-level sandbox configuration. + + + How the three gates compose during a tool call. + + diff --git a/docs/tools/loop-detection.md b/docs/tools/loop-detection.md index 4503d2c8ff7..25696660304 100644 --- a/docs/tools/loop-detection.md +++ b/docs/tools/loop-detection.md @@ -5,37 +5,45 @@ read_when: - A user reports agents getting stuck repeating tool calls - You need to tune repetitive-call protection - You are editing agent tool/runtime policies + - You hit `compaction_loop_persisted` aborts after a context-overflow retry --- -OpenClaw can keep agents from getting stuck in repeated tool-call patterns. -The guard is **disabled by default**. +OpenClaw has two cooperating guardrails for repetitive tool-call patterns: -Enable it only where needed, because it can block legitimate repeated calls with strict settings. +1. **Loop detection** (`tools.loopDetection.enabled`) — disabled by default. Watches the rolling tool-call history for repeated patterns and unknown-tool retries. +2. **Post-compaction guard** (`tools.loopDetection.postCompactionGuard`) — enabled by default unless `tools.loopDetection.enabled` is explicitly `false`. Arms after every compaction-retry and aborts the run when the agent emits the same `(tool, args, result)` triple within the window. + +Both are configured under the same `tools.loopDetection` block, but the post-compaction guard runs whenever the master switch is not explicitly off. Set `tools.loopDetection.enabled: false` to silence both surfaces. ## Why this exists - Detect repetitive sequences that do not make progress. - Detect high-frequency no-result loops (same tool, same inputs, repeated errors). - Detect specific repeated-call patterns for known polling tools. +- Prevent context-overflow then compaction then same-loop cycles from running indefinitely. ## Configuration block -Global defaults: +Global defaults, with every documented field shown: ```json5 { tools: { loopDetection: { - enabled: false, + enabled: false, // master switch for the rolling-history detectors historySize: 30, warningThreshold: 10, criticalThreshold: 20, + unknownToolThreshold: 10, globalCircuitBreakerThreshold: 30, detectors: { genericRepeat: true, knownPollNoProgress: true, pingPong: true, }, + postCompactionGuard: { + windowSize: 3, // armed after compaction-retry; runs unless enabled is explicitly false + }, }, }, } @@ -64,67 +72,83 @@ Per-agent override (optional): ### Field behavior -- `enabled`: Master switch. `false` means no loop detection is performed. -- `historySize`: number of recent tool calls kept for analysis. -- `warningThreshold`: threshold before classifying a pattern as warning-only. -- `criticalThreshold`: threshold for blocking repetitive loop patterns. -- `globalCircuitBreakerThreshold`: global no-progress breaker threshold. -- `detectors.genericRepeat`: detects repeated same-tool + same-params patterns. -- `detectors.knownPollNoProgress`: detects known polling-like patterns with no state change. -- `detectors.pingPong`: detects alternating ping-pong patterns. +| Field | Default | Effect | +| -------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------- | +| `enabled` | `false` | Master switch for the rolling-history detectors. Setting `false` also disables the post-compaction guard. | +| `historySize` | `30` | Number of recent tool calls kept for analysis. | +| `warningThreshold` | `10` | Threshold before a pattern is classified as warning-only. | +| `criticalThreshold` | `20` | Threshold for blocking repetitive loop patterns. | +| `unknownToolThreshold` | `10` | Block repeated calls to the same unavailable tool after this many misses. | +| `globalCircuitBreakerThreshold` | `30` | Global no-progress breaker threshold across all detectors. | +| `detectors.genericRepeat` | `true` | Detects repeated same-tool + same-params patterns. | +| `detectors.knownPollNoProgress` | `true` | Detects known polling-like patterns with no state change. | +| `detectors.pingPong` | `true` | Detects alternating ping-pong patterns. | +| `postCompactionGuard.windowSize` | `3` | Number of post-compaction tool calls during which the guard stays armed and the count of identical triples that aborts the run. | -For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory. -When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs. +For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory. When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs. ## Recommended setup -- For smaller models, start with `enabled: true`, defaults unchanged. Flagship models rarely need loop detection and can leave it disabled. +- For smaller models, set `enabled: true` and leave the thresholds at their defaults. Flagship models rarely need rolling-history detection and can leave the master switch at `false` while still benefiting from the post-compaction guard. - Keep thresholds ordered as `warningThreshold < criticalThreshold < globalCircuitBreakerThreshold`. - If false positives occur: - - raise `warningThreshold` and/or `criticalThreshold` - - (optionally) raise `globalCircuitBreakerThreshold` - - disable only the detector causing issues - - reduce `historySize` for less strict historical context + - Raise `warningThreshold` and/or `criticalThreshold`. + - Optionally raise `globalCircuitBreakerThreshold`. + - Disable only the specific detector causing issues (`detectors.: false`). + - Reduce `historySize` for less strict historical context. +- To disable everything (including the post-compaction guard), set `tools.loopDetection.enabled: false` explicitly. ## Post-compaction guard -When the runner completes an auto-compaction-retry (after a context-overflow), it arms a short-window guard that watches the next few tool calls. If the agent emits the _same_ `(toolName, args, result)` triple multiple times within that window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error. +When the runner completes a compaction-retry after a context-overflow, it arms a short-window guard that watches the next few tool calls. If the agent emits the same `(toolName, argsHash, resultHash)` triple multiple times within the window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error. -This is a separate code path from the global `tools.loopDetection` detectors. It is independently configurable: +The guard is gated by the master `tools.loopDetection.enabled` flag with one twist: it stays **enabled when the flag is unset or `true`** and only deactivates when the flag is explicitly `false`. This is intentional. The guard exists to escape compaction loops that would otherwise burn unbounded tokens, so a no-config user still gets the protection. ```json5 { tools: { loopDetection: { - enabled: true, // existing master switch; set false to disable loop guards + // master switch; set false to disable the guard along with the rolling detectors + enabled: true, postCompactionGuard: { - windowSize: 3, // default: 3 + windowSize: 3, // default }, }, }, } ``` -- `windowSize`: number of post-compaction tool calls during which the guard stays armed _and_ the count of identical (tool, args, result) triples that triggers an abort. +- Lower `windowSize` is stricter (fewer attempts before abort). +- Higher `windowSize` gives the agent more recovery attempts. +- The guard never aborts when results are changing, only when results are byte-identical across the window. +- It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry. -The guard never aborts when results are changing, only when results are byte-identical across the window. It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry. + + The post-compaction guard runs whenever the master flag is not explicitly `false`, even if you never wrote a `tools.loopDetection` block. To verify, look for `post-compaction guard armed for N attempts` in the gateway log immediately after a compaction event. + ## Logs and expected behavior -When a loop is detected, OpenClaw reports a loop event and blocks or dampens the next tool-cycle depending on severity. -This protects users from runaway token spend and lockups while preserving normal tool access. +When a loop is detected, OpenClaw reports a loop event and either dampens or blocks the next tool-cycle depending on severity. This protects users from runaway token spend and lockups while preserving normal tool access. -- Prefer warning and temporary suppression first. -- Escalate only when repeated evidence accumulates. - -## Notes - -- `tools.loopDetection` is merged with agent-level overrides. -- Per-agent config fully overrides or extends global values. -- If no config exists, guardrails stay off. +- Warnings come first. +- Suppression follows when patterns persist past the warning threshold. +- Critical thresholds block the next tool-cycle and surface a clear loop-detection reason in the run record. +- The post-compaction guard emits `compaction_loop_persisted` errors with the offending tool name and identical-call count. ## Related -- [Exec approvals](/tools/exec-approvals) -- [Thinking levels](/tools/thinking) -- [Sub-agents](/tools/subagents) + + + Allow/deny policy for shell execution. + + + Reasoning effort levels and provider-policy interaction. + + + Spawning isolated agents to bound runaway behavior. + + + Full `tools.loopDetection` schema and merging semantics. + + diff --git a/docs/tools/skills-config.md b/docs/tools/skills-config.md index 79cad91324e..f9a3e80fe3b 100644 --- a/docs/tools/skills-config.md +++ b/docs/tools/skills-config.md @@ -118,20 +118,32 @@ Per-skill fields: `skills.load.extraDirs`. - Changes to skills are picked up on the next agent turn when the watcher is enabled. -### Sandboxed skills + env vars +### Sandboxed skills and env vars -When a session is **sandboxed**, skill processes run inside the configured -sandbox backend. The sandbox does **not** inherit the host `process.env`. +When a session is **sandboxed**, skill processes run inside the configured sandbox backend. The sandbox does **not** inherit the host `process.env`. + + + Global `env` and `skills.entries..env`/`apiKey` apply to **host** runs only. Inside a sandbox they have no effect, so a skill that depends on `GEMINI_API_KEY` will fail with `apiKey not configured` unless the sandbox is given the variable separately. + Use one of: -- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`) -- bake the env into your custom sandbox image or remote sandbox environment - -Global `env` and `skills.entries..env/apiKey` apply to **host** runs only. +- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`). +- Bake the env into your custom sandbox image or remote sandbox environment. ## Related -- [Skills](/tools/skills) -- [Creating skills](/tools/creating-skills) -- [Slash commands](/tools/slash-commands) + + + What skills are and how they load. + + + Authoring custom skill packs. + + + Native command catalog and chat directives. + + + Full `skills` and `agents.skills` schema. + +