mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 05:20:43 +00:00
docs(tools): rewrite loop detection, code execution, and tighten elevated/skills
Loop detection (docs/tools/loop-detection.md): substantial rewrite. Fixed the post-compaction guard default story — the guard runs whenever tools.loopDetection.enabled is not explicitly false, even with no config block at all (verified in src/agents/pi-embedded-runner/run.ts near line 800: 'enabled: resolvedLoopDetectionConfig?.enabled !== false'). The previous doc framed it as opt-in. Added the missing unknownToolThreshold field (default 10) sourced from src/config/schema.help.ts, a complete fields table, and a CardGroup related links section. Code execution (docs/tools/code-execution.md): rewrote with Steps-driven setup, code-verified defaults from extensions/xai/src/code-execution-shared.ts (default model grok-4-1-fast, default timeout 30 s, optional maxTurns), the missing_xai_api_key structured error documented as JSON, and a properties summary table. Replaced the trailing bullet list with a CardGroup pointing at exec, exec-approvals, web tools, and the xAI provider page. Elevated (docs/tools/elevated.md): converted Related to a CardGroup and added a Note that the bash chat command (! prefix / /bash alias) also requires tools.elevated, sourced from src/config/schema.help.ts:1375. Skills config (docs/tools/skills-config.md): renamed the 'Sandboxed skills + env vars' subhead to remove the brittle '+' character per docs/CLAUDE.md, promoted the host-only env warning to a Warning block so the most common skill-config footgun stays visible, and converted Related to a CardGroup including a config-reference link.
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "code_execution -- run sandboxed remote Python analysis with xAI"
|
||||
summary: "code_execution: run sandboxed remote Python analysis with xAI"
|
||||
read_when:
|
||||
- You want to enable or configure code_execution
|
||||
- You want remote analysis without local shell access
|
||||
@@ -7,53 +7,95 @@ read_when:
|
||||
title: "Code execution"
|
||||
---
|
||||
|
||||
`code_execution` runs sandboxed remote Python analysis on xAI's Responses API.
|
||||
`code_execution` runs sandboxed remote Python analysis on xAI's Responses API. It is registered by the bundled `xai` plugin (under the `tools` contract) and dispatches to the same `https://api.x.ai/v1/responses` endpoint used by `x_search`.
|
||||
|
||||
| Property | Value |
|
||||
| ------------------ | -------------------------------------------------------------- |
|
||||
| Tool name | `code_execution` |
|
||||
| Provider plugin | `xai` (bundled, `enabledByDefault: true`) |
|
||||
| Auth | `XAI_API_KEY` or `plugins.entries.xai.config.webSearch.apiKey` |
|
||||
| Default model | `grok-4-1-fast` |
|
||||
| Default timeout | 30 seconds |
|
||||
| Default `maxTurns` | unset (xAI applies its own internal limit) |
|
||||
|
||||
This is different from local [`exec`](/tools/exec):
|
||||
|
||||
- `exec` runs shell commands on your machine or node
|
||||
- `code_execution` runs Python in xAI's remote sandbox
|
||||
- `exec` runs shell commands on your machine or paired node.
|
||||
- `code_execution` runs Python in xAI's remote sandbox.
|
||||
|
||||
Use `code_execution` for:
|
||||
|
||||
- calculations
|
||||
- tabulation
|
||||
- quick statistics
|
||||
- chart-style analysis
|
||||
- analyzing data returned by `x_search` or `web_search`
|
||||
- Calculations.
|
||||
- Tabulation.
|
||||
- Quick statistics.
|
||||
- Chart-style analysis.
|
||||
- Analyzing data returned by `x_search` or `web_search`.
|
||||
|
||||
Do **not** use it when you need local files, your shell, your repo, or paired
|
||||
devices. Use [`exec`](/tools/exec) for that.
|
||||
Do **not** use it when you need local files, your shell, your repo, or paired devices. Use [`exec`](/tools/exec) for that.
|
||||
|
||||
## Setup
|
||||
|
||||
You need an xAI API key. Any of these work:
|
||||
<Steps>
|
||||
<Step title="Provide an xAI API key">
|
||||
Set `XAI_API_KEY` in the gateway environment, or configure the key under the xAI plugin so the same credential covers `code_execution`, `x_search`, web search, and other xAI tools:
|
||||
|
||||
- `XAI_API_KEY`
|
||||
- `plugins.entries.xai.config.webSearch.apiKey`
|
||||
```bash
|
||||
export XAI_API_KEY=xai-...
|
||||
```
|
||||
|
||||
Example:
|
||||
Or via config:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
xai: {
|
||||
config: {
|
||||
webSearch: {
|
||||
apiKey: "xai-...",
|
||||
},
|
||||
codeExecution: {
|
||||
enabled: true,
|
||||
model: "grok-4-1-fast",
|
||||
maxTurns: 2,
|
||||
timeoutSeconds: 30,
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
xai: {
|
||||
config: {
|
||||
webSearch: {
|
||||
apiKey: "xai-...",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
|
||||
</Step>
|
||||
|
||||
<Step title="Enable and tune code_execution">
|
||||
The tool is gated on `plugins.entries.xai.config.codeExecution.enabled`. Default is off.
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
xai: {
|
||||
config: {
|
||||
codeExecution: {
|
||||
enabled: true,
|
||||
model: "grok-4-1-fast", // override the default xAI code-execution model
|
||||
maxTurns: 2, // optional cap on internal tool turns
|
||||
timeoutSeconds: 30, // request timeout (default: 30)
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
</Step>
|
||||
|
||||
<Step title="Restart the Gateway">
|
||||
```bash
|
||||
openclaw gateway restart
|
||||
```
|
||||
|
||||
`code_execution` shows up in the agent's tool list once the xAI plugin re-registers with `enabled: true`.
|
||||
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## How to use it
|
||||
|
||||
@@ -71,20 +113,40 @@ Use x_search to find posts mentioning OpenClaw this week, then use code_executio
|
||||
Use web_search to gather the latest AI benchmark numbers, then use code_execution to compare percent changes.
|
||||
```
|
||||
|
||||
The tool takes a single `task` parameter internally, so the agent should send
|
||||
the full analysis request and any inline data in one prompt.
|
||||
The tool takes a single `task` parameter internally, so the agent should send the full analysis request and any inline data in one prompt.
|
||||
|
||||
## Errors
|
||||
|
||||
When the tool runs without auth, it returns a structured `missing_xai_api_key` error pointing at the env var and config path. The error is JSON, not a thrown exception, so the agent can self-correct:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "missing_xai_api_key",
|
||||
"message": "code_execution needs an xAI API key. Set XAI_API_KEY in the Gateway environment, or configure plugins.entries.xai.config.webSearch.apiKey.",
|
||||
"docs": "https://docs.openclaw.ai/tools/code-execution"
|
||||
}
|
||||
```
|
||||
|
||||
## Limits
|
||||
|
||||
- This is remote xAI execution, not local process execution.
|
||||
- It should be treated as ephemeral analysis, not a persistent notebook.
|
||||
- Treat results as ephemeral analysis, not a persistent notebook session.
|
||||
- Do not assume access to local files or your workspace.
|
||||
- For fresh X data, use [`x_search`](/tools/web#x_search) first.
|
||||
- For fresh X data, use [`x_search`](/tools/web#x_search) first and pipe the result into `code_execution`.
|
||||
|
||||
## Related
|
||||
|
||||
- [Exec tool](/tools/exec)
|
||||
- [Exec approvals](/tools/exec-approvals)
|
||||
- [apply_patch tool](/tools/apply-patch)
|
||||
- [Web tools](/tools/web)
|
||||
- [xAI](/providers/xai)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Exec tool" href="/tools/exec" icon="terminal">
|
||||
Local shell execution on your machine or paired node.
|
||||
</Card>
|
||||
<Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
|
||||
Allow/deny policy for shell execution.
|
||||
</Card>
|
||||
<Card title="Web tools" href="/tools/web" icon="globe">
|
||||
`web_search`, `x_search`, and `web_fetch`.
|
||||
</Card>
|
||||
<Card title="xAI provider" href="/providers/xai" icon="microchip">
|
||||
Grok models, web/x search, and code execution config.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -102,13 +102,27 @@ Allowlist entry formats:
|
||||
|
||||
## What elevated does not control
|
||||
|
||||
- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it
|
||||
- **Tool policy**: if `exec` is denied by tool policy, elevated cannot override it.
|
||||
- **Host selection policy**: elevated does not turn `auto` into a free cross-host override. It uses the configured/session exec target rules, choosing `node` only when the target is already `node`.
|
||||
- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode
|
||||
- **Separate from `/exec`**: the `/exec` directive adjusts per-session exec defaults for authorized senders and does not require elevated mode.
|
||||
|
||||
<Note>
|
||||
The bash chat command (`!` prefix; `/bash` alias) is a separate gate that requires `tools.elevated` to be enabled in addition to its own `tools.bash.enabled` flag. Disabling elevated locks `!` shell commands out as well.
|
||||
</Note>
|
||||
|
||||
## Related
|
||||
|
||||
- [Exec tool](/tools/exec) — shell command execution
|
||||
- [Exec approvals](/tools/exec-approvals) — approval and allowlist system
|
||||
- [Sandboxing](/gateway/sandboxing) — sandbox configuration
|
||||
- [Sandbox vs Tool Policy vs Elevated](/gateway/sandbox-vs-tool-policy-vs-elevated)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Exec tool" href="/tools/exec" icon="terminal">
|
||||
Shell command execution from the agent.
|
||||
</Card>
|
||||
<Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
|
||||
Approval and allowlist system for `exec`.
|
||||
</Card>
|
||||
<Card title="Sandboxing" href="/gateway/sandboxing" icon="box">
|
||||
Gateway-level sandbox configuration.
|
||||
</Card>
|
||||
<Card title="Sandbox vs Tool Policy vs Elevated" href="/gateway/sandbox-vs-tool-policy-vs-elevated" icon="scale-balanced">
|
||||
How the three gates compose during a tool call.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -5,37 +5,45 @@ read_when:
|
||||
- A user reports agents getting stuck repeating tool calls
|
||||
- You need to tune repetitive-call protection
|
||||
- You are editing agent tool/runtime policies
|
||||
- You hit `compaction_loop_persisted` aborts after a context-overflow retry
|
||||
---
|
||||
|
||||
OpenClaw can keep agents from getting stuck in repeated tool-call patterns.
|
||||
The guard is **disabled by default**.
|
||||
OpenClaw has two cooperating guardrails for repetitive tool-call patterns:
|
||||
|
||||
Enable it only where needed, because it can block legitimate repeated calls with strict settings.
|
||||
1. **Loop detection** (`tools.loopDetection.enabled`) — disabled by default. Watches the rolling tool-call history for repeated patterns and unknown-tool retries.
|
||||
2. **Post-compaction guard** (`tools.loopDetection.postCompactionGuard`) — enabled by default unless `tools.loopDetection.enabled` is explicitly `false`. Arms after every compaction-retry and aborts the run when the agent emits the same `(tool, args, result)` triple within the window.
|
||||
|
||||
Both are configured under the same `tools.loopDetection` block, but the post-compaction guard runs whenever the master switch is not explicitly off. Set `tools.loopDetection.enabled: false` to silence both surfaces.
|
||||
|
||||
## Why this exists
|
||||
|
||||
- Detect repetitive sequences that do not make progress.
|
||||
- Detect high-frequency no-result loops (same tool, same inputs, repeated errors).
|
||||
- Detect specific repeated-call patterns for known polling tools.
|
||||
- Prevent context-overflow then compaction then same-loop cycles from running indefinitely.
|
||||
|
||||
## Configuration block
|
||||
|
||||
Global defaults:
|
||||
Global defaults, with every documented field shown:
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: false,
|
||||
enabled: false, // master switch for the rolling-history detectors
|
||||
historySize: 30,
|
||||
warningThreshold: 10,
|
||||
criticalThreshold: 20,
|
||||
unknownToolThreshold: 10,
|
||||
globalCircuitBreakerThreshold: 30,
|
||||
detectors: {
|
||||
genericRepeat: true,
|
||||
knownPollNoProgress: true,
|
||||
pingPong: true,
|
||||
},
|
||||
postCompactionGuard: {
|
||||
windowSize: 3, // armed after compaction-retry; runs unless enabled is explicitly false
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
@@ -64,67 +72,83 @@ Per-agent override (optional):
|
||||
|
||||
### Field behavior
|
||||
|
||||
- `enabled`: Master switch. `false` means no loop detection is performed.
|
||||
- `historySize`: number of recent tool calls kept for analysis.
|
||||
- `warningThreshold`: threshold before classifying a pattern as warning-only.
|
||||
- `criticalThreshold`: threshold for blocking repetitive loop patterns.
|
||||
- `globalCircuitBreakerThreshold`: global no-progress breaker threshold.
|
||||
- `detectors.genericRepeat`: detects repeated same-tool + same-params patterns.
|
||||
- `detectors.knownPollNoProgress`: detects known polling-like patterns with no state change.
|
||||
- `detectors.pingPong`: detects alternating ping-pong patterns.
|
||||
| Field | Default | Effect |
|
||||
| -------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `enabled` | `false` | Master switch for the rolling-history detectors. Setting `false` also disables the post-compaction guard. |
|
||||
| `historySize` | `30` | Number of recent tool calls kept for analysis. |
|
||||
| `warningThreshold` | `10` | Threshold before a pattern is classified as warning-only. |
|
||||
| `criticalThreshold` | `20` | Threshold for blocking repetitive loop patterns. |
|
||||
| `unknownToolThreshold` | `10` | Block repeated calls to the same unavailable tool after this many misses. |
|
||||
| `globalCircuitBreakerThreshold` | `30` | Global no-progress breaker threshold across all detectors. |
|
||||
| `detectors.genericRepeat` | `true` | Detects repeated same-tool + same-params patterns. |
|
||||
| `detectors.knownPollNoProgress` | `true` | Detects known polling-like patterns with no state change. |
|
||||
| `detectors.pingPong` | `true` | Detects alternating ping-pong patterns. |
|
||||
| `postCompactionGuard.windowSize` | `3` | Number of post-compaction tool calls during which the guard stays armed and the count of identical triples that aborts the run. |
|
||||
|
||||
For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory.
|
||||
When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs.
|
||||
For `exec`, no-progress checks compare stable command outcomes and ignore volatile runtime metadata such as duration, PID, session ID, and working directory. When a run id is available, recent tool-call history is evaluated only within that run so scheduled heartbeat cycles and fresh runs do not inherit stale loop counts from earlier runs.
|
||||
|
||||
## Recommended setup
|
||||
|
||||
- For smaller models, start with `enabled: true`, defaults unchanged. Flagship models rarely need loop detection and can leave it disabled.
|
||||
- For smaller models, set `enabled: true` and leave the thresholds at their defaults. Flagship models rarely need rolling-history detection and can leave the master switch at `false` while still benefiting from the post-compaction guard.
|
||||
- Keep thresholds ordered as `warningThreshold < criticalThreshold < globalCircuitBreakerThreshold`.
|
||||
- If false positives occur:
|
||||
- raise `warningThreshold` and/or `criticalThreshold`
|
||||
- (optionally) raise `globalCircuitBreakerThreshold`
|
||||
- disable only the detector causing issues
|
||||
- reduce `historySize` for less strict historical context
|
||||
- Raise `warningThreshold` and/or `criticalThreshold`.
|
||||
- Optionally raise `globalCircuitBreakerThreshold`.
|
||||
- Disable only the specific detector causing issues (`detectors.<name>: false`).
|
||||
- Reduce `historySize` for less strict historical context.
|
||||
- To disable everything (including the post-compaction guard), set `tools.loopDetection.enabled: false` explicitly.
|
||||
|
||||
## Post-compaction guard
|
||||
|
||||
When the runner completes an auto-compaction-retry (after a context-overflow), it arms a short-window guard that watches the next few tool calls. If the agent emits the _same_ `(toolName, args, result)` triple multiple times within that window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error.
|
||||
When the runner completes a compaction-retry after a context-overflow, it arms a short-window guard that watches the next few tool calls. If the agent emits the same `(toolName, argsHash, resultHash)` triple multiple times within the window, the guard concludes that compaction did not break the loop and aborts the run with a `compaction_loop_persisted` error.
|
||||
|
||||
This is a separate code path from the global `tools.loopDetection` detectors. It is independently configurable:
|
||||
The guard is gated by the master `tools.loopDetection.enabled` flag with one twist: it stays **enabled when the flag is unset or `true`** and only deactivates when the flag is explicitly `false`. This is intentional. The guard exists to escape compaction loops that would otherwise burn unbounded tokens, so a no-config user still gets the protection.
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
loopDetection: {
|
||||
enabled: true, // existing master switch; set false to disable loop guards
|
||||
// master switch; set false to disable the guard along with the rolling detectors
|
||||
enabled: true,
|
||||
postCompactionGuard: {
|
||||
windowSize: 3, // default: 3
|
||||
windowSize: 3, // default
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
- `windowSize`: number of post-compaction tool calls during which the guard stays armed _and_ the count of identical (tool, args, result) triples that triggers an abort.
|
||||
- Lower `windowSize` is stricter (fewer attempts before abort).
|
||||
- Higher `windowSize` gives the agent more recovery attempts.
|
||||
- The guard never aborts when results are changing, only when results are byte-identical across the window.
|
||||
- It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry.
|
||||
|
||||
The guard never aborts when results are changing, only when results are byte-identical across the window. It is intentionally narrow: it fires only in the immediate aftermath of a compaction-retry.
|
||||
<Note>
|
||||
The post-compaction guard runs whenever the master flag is not explicitly `false`, even if you never wrote a `tools.loopDetection` block. To verify, look for `post-compaction guard armed for N attempts` in the gateway log immediately after a compaction event.
|
||||
</Note>
|
||||
|
||||
## Logs and expected behavior
|
||||
|
||||
When a loop is detected, OpenClaw reports a loop event and blocks or dampens the next tool-cycle depending on severity.
|
||||
This protects users from runaway token spend and lockups while preserving normal tool access.
|
||||
When a loop is detected, OpenClaw reports a loop event and either dampens or blocks the next tool-cycle depending on severity. This protects users from runaway token spend and lockups while preserving normal tool access.
|
||||
|
||||
- Prefer warning and temporary suppression first.
|
||||
- Escalate only when repeated evidence accumulates.
|
||||
|
||||
## Notes
|
||||
|
||||
- `tools.loopDetection` is merged with agent-level overrides.
|
||||
- Per-agent config fully overrides or extends global values.
|
||||
- If no config exists, guardrails stay off.
|
||||
- Warnings come first.
|
||||
- Suppression follows when patterns persist past the warning threshold.
|
||||
- Critical thresholds block the next tool-cycle and surface a clear loop-detection reason in the run record.
|
||||
- The post-compaction guard emits `compaction_loop_persisted` errors with the offending tool name and identical-call count.
|
||||
|
||||
## Related
|
||||
|
||||
- [Exec approvals](/tools/exec-approvals)
|
||||
- [Thinking levels](/tools/thinking)
|
||||
- [Sub-agents](/tools/subagents)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Exec approvals" href="/tools/exec-approvals" icon="shield">
|
||||
Allow/deny policy for shell execution.
|
||||
</Card>
|
||||
<Card title="Thinking levels" href="/tools/thinking" icon="brain">
|
||||
Reasoning effort levels and provider-policy interaction.
|
||||
</Card>
|
||||
<Card title="Sub-agents" href="/tools/subagents" icon="users">
|
||||
Spawning isolated agents to bound runaway behavior.
|
||||
</Card>
|
||||
<Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
|
||||
Full `tools.loopDetection` schema and merging semantics.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -118,20 +118,32 @@ Per-skill fields:
|
||||
`skills.load.extraDirs`.
|
||||
- Changes to skills are picked up on the next agent turn when the watcher is enabled.
|
||||
|
||||
### Sandboxed skills + env vars
|
||||
### Sandboxed skills and env vars
|
||||
|
||||
When a session is **sandboxed**, skill processes run inside the configured
|
||||
sandbox backend. The sandbox does **not** inherit the host `process.env`.
|
||||
When a session is **sandboxed**, skill processes run inside the configured sandbox backend. The sandbox does **not** inherit the host `process.env`.
|
||||
|
||||
<Warning>
|
||||
Global `env` and `skills.entries.<skill>.env`/`apiKey` apply to **host** runs only. Inside a sandbox they have no effect, so a skill that depends on `GEMINI_API_KEY` will fail with `apiKey not configured` unless the sandbox is given the variable separately.
|
||||
</Warning>
|
||||
|
||||
Use one of:
|
||||
|
||||
- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`)
|
||||
- bake the env into your custom sandbox image or remote sandbox environment
|
||||
|
||||
Global `env` and `skills.entries.<skill>.env/apiKey` apply to **host** runs only.
|
||||
- `agents.defaults.sandbox.docker.env` for the Docker backend (or per-agent `agents.list[].sandbox.docker.env`).
|
||||
- Bake the env into your custom sandbox image or remote sandbox environment.
|
||||
|
||||
## Related
|
||||
|
||||
- [Skills](/tools/skills)
|
||||
- [Creating skills](/tools/creating-skills)
|
||||
- [Slash commands](/tools/slash-commands)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Skills" href="/tools/skills" icon="puzzle-piece">
|
||||
What skills are and how they load.
|
||||
</Card>
|
||||
<Card title="Creating skills" href="/tools/creating-skills" icon="hammer">
|
||||
Authoring custom skill packs.
|
||||
</Card>
|
||||
<Card title="Slash commands" href="/tools/slash-commands" icon="terminal">
|
||||
Native command catalog and chat directives.
|
||||
</Card>
|
||||
<Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
|
||||
Full `skills` and `agents.skills` schema.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
Reference in New Issue
Block a user