docs: replace english locale mirrors with translated landing pages

This commit is contained in:
xvlad
2026-02-05 17:51:15 -03:00
committed by Peter Steinberger
parent 8bccf9e8ed
commit 72676d318e
613 changed files with 34 additions and 110885 deletions

View File

@@ -1,114 +0,0 @@
---
summary: "How inbound audio/voice notes are downloaded, transcribed, and injected into replies"
read_when:
- Changing audio transcription or media handling
title: "Audio and Voice Notes"
---
# Audio / Voice Notes — 2026-01-17
## What works
- **Media understanding (audio)**: If audio understanding is enabled (or autodetected), OpenClaw:
1. Locates the first audio attachment (local path or URL) and downloads it if needed.
2. Enforces `maxBytes` before sending to each model entry.
3. Runs the first eligible model entry in order (provider or CLI).
4. If it fails or skips (size/timeout), it tries the next entry.
5. On success, it replaces `Body` with an `[Audio]` block and sets `{{Transcript}}`.
- **Command parsing**: When transcription succeeds, `CommandBody`/`RawBody` are set to the transcript so slash commands still work.
- **Verbose logging**: In `--verbose`, we log when transcription runs and when it replaces the body.
## Auto-detection (default)
If you **dont configure models** and `tools.media.audio.enabled` is **not** set to `false`,
OpenClaw auto-detects in this order and stops at the first working option:
1. **Local CLIs** (if installed)
- `sherpa-onnx-offline` (requires `SHERPA_ONNX_MODEL_DIR` with encoder/decoder/joiner/tokens)
- `whisper-cli` (from `whisper-cpp`; uses `WHISPER_CPP_MODEL` or the bundled tiny model)
- `whisper` (Python CLI; downloads models automatically)
2. **Gemini CLI** (`gemini`) using `read_many_files`
3. **Provider keys** (OpenAI → Groq → Deepgram → Google)
To disable auto-detection, set `tools.media.audio.enabled: false`.
To customize, set `tools.media.audio.models`.
Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI is on `PATH` (we expand `~`), or set an explicit CLI model with a full command path.
## Config examples
### Provider + CLI fallback (OpenAI + Whisper CLI)
```json5
{
tools: {
media: {
audio: {
enabled: true,
maxBytes: 20971520,
models: [
{ provider: "openai", model: "gpt-4o-mini-transcribe" },
{
type: "cli",
command: "whisper",
args: ["--model", "base", "{{MediaPath}}"],
timeoutSeconds: 45,
},
],
},
},
},
}
```
### Provider-only with scope gating
```json5
{
tools: {
media: {
audio: {
enabled: true,
scope: {
default: "allow",
rules: [{ action: "deny", match: { chatType: "group" } }],
},
models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
},
},
},
}
```
### Provider-only (Deepgram)
```json5
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "deepgram", model: "nova-3" }],
},
},
},
}
```
## Notes & limits
- Provider auth follows the standard model auth order (auth profiles, env vars, `models.providers.*.apiKey`).
- Deepgram picks up `DEEPGRAM_API_KEY` when `provider: "deepgram"` is used.
- Deepgram setup details: [Deepgram (audio transcription)](/providers/deepgram).
- Audio providers can override `baseUrl`, `headers`, and `providerOptions` via `tools.media.audio`.
- Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
- Default `maxChars` for audio is **unset** (full transcript). Set `tools.media.audio.maxChars` or per-entry `maxChars` to trim output.
- OpenAI auto default is `gpt-4o-mini-transcribe`; set `model: "gpt-4o-transcribe"` for higher accuracy.
- Use `tools.media.audio.attachments` to process multiple voice notes (`mode: "all"` + `maxAttachments`).
- Transcript is available to templates as `{{Transcript}}`.
- CLI stdout is capped (5MB); keep CLI output concise.
## Gotchas
- Scope rules use first-match wins. `chatType` is normalized to `direct`, `group`, or `room`.
- Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via `jq -r .text`.
- Keep timeouts reasonable (`timeoutSeconds`, default 60s) to avoid blocking the reply queue.

View File

@@ -1,156 +0,0 @@
---
summary: "Camera capture (iOS node + macOS app) for agent use: photos (jpg) and short video clips (mp4)"
read_when:
- Adding or modifying camera capture on iOS nodes or macOS
- Extending agent-accessible MEDIA temp-file workflows
title: "Camera Capture"
---
# Camera capture (agent)
OpenClaw supports **camera capture** for agent workflows:
- **iOS node** (paired via Gateway): capture a **photo** (`jpg`) or **short video clip** (`mp4`, with optional audio) via `node.invoke`.
- **Android node** (paired via Gateway): capture a **photo** (`jpg`) or **short video clip** (`mp4`, with optional audio) via `node.invoke`.
- **macOS app** (node via Gateway): capture a **photo** (`jpg`) or **short video clip** (`mp4`, with optional audio) via `node.invoke`.
All camera access is gated behind **user-controlled settings**.
## iOS node
### User setting (default on)
- iOS Settings tab → **Camera****Allow Camera** (`camera.enabled`)
- Default: **on** (missing key is treated as enabled).
- When off: `camera.*` commands return `CAMERA_DISABLED`.
### Commands (via Gateway `node.invoke`)
- `camera.list`
- Response payload:
- `devices`: array of `{ id, name, position, deviceType }`
- `camera.snap`
- Params:
- `facing`: `front|back` (default: `front`)
- `maxWidth`: number (optional; default `1600` on the iOS node)
- `quality`: `0..1` (optional; default `0.9`)
- `format`: currently `jpg`
- `delayMs`: number (optional; default `0`)
- `deviceId`: string (optional; from `camera.list`)
- Response payload:
- `format: "jpg"`
- `base64: "<...>"`
- `width`, `height`
- Payload guard: photos are recompressed to keep the base64 payload under 5 MB.
- `camera.clip`
- Params:
- `facing`: `front|back` (default: `front`)
- `durationMs`: number (default `3000`, clamped to a max of `60000`)
- `includeAudio`: boolean (default `true`)
- `format`: currently `mp4`
- `deviceId`: string (optional; from `camera.list`)
- Response payload:
- `format: "mp4"`
- `base64: "<...>"`
- `durationMs`
- `hasAudio`
### Foreground requirement
Like `canvas.*`, the iOS node only allows `camera.*` commands in the **foreground**. Background invocations return `NODE_BACKGROUND_UNAVAILABLE`.
### CLI helper (temp files + MEDIA)
The easiest way to get attachments is via the CLI helper, which writes decoded media to a temp file and prints `MEDIA:<path>`.
Examples:
```bash
openclaw nodes camera snap --node <id> # default: both front + back (2 MEDIA lines)
openclaw nodes camera snap --node <id> --facing front
openclaw nodes camera clip --node <id> --duration 3000
openclaw nodes camera clip --node <id> --no-audio
```
Notes:
- `nodes camera snap` defaults to **both** facings to give the agent both views.
- Output files are temporary (in the OS temp directory) unless you build your own wrapper.
## Android node
### User setting (default on)
- Android Settings sheet → **Camera****Allow Camera** (`camera.enabled`)
- Default: **on** (missing key is treated as enabled).
- When off: `camera.*` commands return `CAMERA_DISABLED`.
### Permissions
- Android requires runtime permissions:
- `CAMERA` for both `camera.snap` and `camera.clip`.
- `RECORD_AUDIO` for `camera.clip` when `includeAudio=true`.
If permissions are missing, the app will prompt when possible; if denied, `camera.*` requests fail with a
`*_PERMISSION_REQUIRED` error.
### Foreground requirement
Like `canvas.*`, the Android node only allows `camera.*` commands in the **foreground**. Background invocations return `NODE_BACKGROUND_UNAVAILABLE`.
### Payload guard
Photos are recompressed to keep the base64 payload under 5 MB.
## macOS app
### User setting (default off)
The macOS companion app exposes a checkbox:
- **Settings → General → Allow Camera** (`openclaw.cameraEnabled`)
- Default: **off**
- When off: camera requests return “Camera disabled by user”.
### CLI helper (node invoke)
Use the main `openclaw` CLI to invoke camera commands on the macOS node.
Examples:
```bash
openclaw nodes camera list --node <id> # list camera ids
openclaw nodes camera snap --node <id> # prints MEDIA:<path>
openclaw nodes camera snap --node <id> --max-width 1280
openclaw nodes camera snap --node <id> --delay-ms 2000
openclaw nodes camera snap --node <id> --device-id <id>
openclaw nodes camera clip --node <id> --duration 10s # prints MEDIA:<path>
openclaw nodes camera clip --node <id> --duration-ms 3000 # prints MEDIA:<path> (legacy flag)
openclaw nodes camera clip --node <id> --device-id <id>
openclaw nodes camera clip --node <id> --no-audio
```
Notes:
- `openclaw nodes camera snap` defaults to `maxWidth=1600` unless overridden.
- On macOS, `camera.snap` waits `delayMs` (default 2000ms) after warm-up/exposure settle before capturing.
- Photo payloads are recompressed to keep base64 under 5 MB.
## Safety + practical limits
- Camera and microphone access trigger the usual OS permission prompts (and require usage strings in Info.plist).
- Video clips are capped (currently `<= 60s`) to avoid oversized node payloads (base64 overhead + message limits).
## macOS screen video (OS-level)
For _screen_ video (not camera), use the macOS companion:
```bash
openclaw nodes screen record --node <id> --duration 10s --fps 15 # prints MEDIA:<path>
```
Notes:
- Requires macOS **Screen Recording** permission (TCC).

View File

@@ -1,72 +0,0 @@
---
summary: "Image and media handling rules for send, gateway, and agent replies"
read_when:
- Modifying media pipeline or attachments
title: "Image and Media Support"
---
# Image & Media Support — 2025-12-05
The WhatsApp channel runs via **Baileys Web**. This document captures the current media handling rules for send, gateway, and agent replies.
## Goals
- Send media with optional captions via `openclaw message send --media`.
- Allow auto-replies from the web inbox to include media alongside text.
- Keep per-type limits sane and predictable.
## CLI Surface
- `openclaw message send --media <path-or-url> [--message <caption>]`
- `--media` optional; caption can be empty for media-only sends.
- `--dry-run` prints the resolved payload; `--json` emits `{ channel, to, messageId, mediaUrl, caption }`.
## WhatsApp Web channel behavior
- Input: local file path **or** HTTP(S) URL.
- Flow: load into a Buffer, detect media kind, and build the correct payload:
- **Images:** resize & recompress to JPEG (max side 2048px) targeting `agents.defaults.mediaMaxMb` (default 5MB), capped at 6MB.
- **Audio/Voice/Video:** pass-through up to 16MB; audio is sent as a voice note (`ptt: true`).
- **Documents:** anything else, up to 100MB, with filename preserved when available.
- WhatsApp GIF-style playback: send an MP4 with `gifPlayback: true` (CLI: `--gif-playback`) so mobile clients loop inline.
- MIME detection prefers magic bytes, then headers, then file extension.
- Caption comes from `--message` or `reply.text`; empty caption is allowed.
- Logging: non-verbose shows `↩️`/`✅`; verbose includes size and source path/URL.
## Auto-Reply Pipeline
- `getReplyFromConfig` returns `{ text?, mediaUrl?, mediaUrls? }`.
- When media is present, the web sender resolves local paths or URLs using the same pipeline as `openclaw message send`.
- Multiple media entries are sent sequentially if provided.
## Inbound Media to Commands (Pi)
- When inbound web messages include media, OpenClaw downloads to a temp file and exposes templating variables:
- `{{MediaUrl}}` pseudo-URL for the inbound media.
- `{{MediaPath}}` local temp path written before running the command.
- When a per-session Docker sandbox is enabled, inbound media is copied into the sandbox workspace and `MediaPath`/`MediaUrl` are rewritten to a relative path like `media/inbound/<filename>`.
- Media understanding (if configured via `tools.media.*` or shared `tools.media.models`) runs before templating and can insert `[Image]`, `[Audio]`, and `[Video]` blocks into `Body`.
- Audio sets `{{Transcript}}` and uses the transcript for command parsing so slash commands still work.
- Video and image descriptions preserve any caption text for command parsing.
- By default only the first matching image/audio/video attachment is processed; set `tools.media.<cap>.attachments` to process multiple attachments.
## Limits & Errors
**Outbound send caps (WhatsApp web send)**
- Images: ~6MB cap after recompression.
- Audio/voice/video: 16MB cap; documents: 100MB cap.
- Oversize or unreadable media → clear error in logs and the reply is skipped.
**Media understanding caps (transcription/description)**
- Image default: 10MB (`tools.media.image.maxBytes`).
- Audio default: 20MB (`tools.media.audio.maxBytes`).
- Video default: 50MB (`tools.media.video.maxBytes`).
- Oversize media skips understanding, but replies still go through with the original body.
## Notes for Tests
- Cover send + reply flows for image/audio/document cases.
- Validate recompression for images (size bound) and voice-note flag for audio.
- Ensure multi-media replies fan out as sequential sends.

View File

@@ -1,341 +0,0 @@
---
summary: "Nodes: pairing, capabilities, permissions, and CLI helpers for canvas/camera/screen/system"
read_when:
- Pairing iOS/Android nodes to a gateway
- Using node canvas/camera for agent context
- Adding new node commands or CLI helpers
title: "Nodes"
---
# Nodes
A **node** is a companion device (macOS/iOS/Android/headless) that connects to the Gateway **WebSocket** (same port as operators) with `role: "node"` and exposes a command surface (e.g. `canvas.*`, `camera.*`, `system.*`) via `node.invoke`. Protocol details: [Gateway protocol](/gateway/protocol).
Legacy transport: [Bridge protocol](/gateway/bridge-protocol) (TCP JSONL; deprecated/removed for current nodes).
macOS can also run in **node mode**: the menubar app connects to the Gateways WS server and exposes its local canvas/camera commands as a node (so `openclaw nodes …` works against this Mac).
Notes:
- Nodes are **peripherals**, not gateways. They dont run the gateway service.
- Telegram/WhatsApp/etc. messages land on the **gateway**, not on nodes.
## Pairing + status
**WS nodes use device pairing.** Nodes present a device identity during `connect`; the Gateway
creates a device pairing request for `role: node`. Approve via the devices CLI (or UI).
Quick CLI:
```bash
openclaw devices list
openclaw devices approve <requestId>
openclaw devices reject <requestId>
openclaw nodes status
openclaw nodes describe --node <idOrNameOrIp>
```
Notes:
- `nodes status` marks a node as **paired** when its device pairing role includes `node`.
- `node.pair.*` (CLI: `openclaw nodes pending/approve/reject`) is a separate gateway-owned
node pairing store; it does **not** gate the WS `connect` handshake.
## Remote node host (system.run)
Use a **node host** when your Gateway runs on one machine and you want commands
to execute on another. The model still talks to the **gateway**; the gateway
forwards `exec` calls to the **node host** when `host=node` is selected.
### What runs where
- **Gateway host**: receives messages, runs the model, routes tool calls.
- **Node host**: executes `system.run`/`system.which` on the node machine.
- **Approvals**: enforced on the node host via `~/.openclaw/exec-approvals.json`.
### Start a node host (foreground)
On the node machine:
```bash
openclaw node run --host <gateway-host> --port 18789 --display-name "Build Node"
```
### Remote gateway via SSH tunnel (loopback bind)
If the Gateway binds to loopback (`gateway.bind=loopback`, default in local mode),
remote node hosts cannot connect directly. Create an SSH tunnel and point the
node host at the local end of the tunnel.
Example (node host -> gateway host):
```bash
# Terminal A (keep running): forward local 18790 -> gateway 127.0.0.1:18789
ssh -N -L 18790:127.0.0.1:18789 user@gateway-host
# Terminal B: export the gateway token and connect through the tunnel
export OPENCLAW_GATEWAY_TOKEN="<gateway-token>"
openclaw node run --host 127.0.0.1 --port 18790 --display-name "Build Node"
```
Notes:
- The token is `gateway.auth.token` from the gateway config (`~/.openclaw/openclaw.json` on the gateway host).
- `openclaw node run` reads `OPENCLAW_GATEWAY_TOKEN` for auth.
### Start a node host (service)
```bash
openclaw node install --host <gateway-host> --port 18789 --display-name "Build Node"
openclaw node restart
```
### Pair + name
On the gateway host:
```bash
openclaw nodes pending
openclaw nodes approve <requestId>
openclaw nodes list
```
Naming options:
- `--display-name` on `openclaw node run` / `openclaw node install` (persists in `~/.openclaw/node.json` on the node).
- `openclaw nodes rename --node <id|name|ip> --name "Build Node"` (gateway override).
### Allowlist the commands
Exec approvals are **per node host**. Add allowlist entries from the gateway:
```bash
openclaw approvals allowlist add --node <id|name|ip> "/usr/bin/uname"
openclaw approvals allowlist add --node <id|name|ip> "/usr/bin/sw_vers"
```
Approvals live on the node host at `~/.openclaw/exec-approvals.json`.
### Point exec at the node
Configure defaults (gateway config):
```bash
openclaw config set tools.exec.host node
openclaw config set tools.exec.security allowlist
openclaw config set tools.exec.node "<id-or-name>"
```
Or per session:
```
/exec host=node security=allowlist node=<id-or-name>
```
Once set, any `exec` call with `host=node` runs on the node host (subject to the
node allowlist/approvals).
Related:
- [Node host CLI](/cli/node)
- [Exec tool](/tools/exec)
- [Exec approvals](/tools/exec-approvals)
## Invoking commands
Low-level (raw RPC):
```bash
openclaw nodes invoke --node <idOrNameOrIp> --command canvas.eval --params '{"javaScript":"location.href"}'
```
Higher-level helpers exist for the common “give the agent a MEDIA attachment” workflows.
## Screenshots (canvas snapshots)
If the node is showing the Canvas (WebView), `canvas.snapshot` returns `{ format, base64 }`.
CLI helper (writes to a temp file and prints `MEDIA:<path>`):
```bash
openclaw nodes canvas snapshot --node <idOrNameOrIp> --format png
openclaw nodes canvas snapshot --node <idOrNameOrIp> --format jpg --max-width 1200 --quality 0.9
```
### Canvas controls
```bash
openclaw nodes canvas present --node <idOrNameOrIp> --target https://example.com
openclaw nodes canvas hide --node <idOrNameOrIp>
openclaw nodes canvas navigate https://example.com --node <idOrNameOrIp>
openclaw nodes canvas eval --node <idOrNameOrIp> --js "document.title"
```
Notes:
- `canvas present` accepts URLs or local file paths (`--target`), plus optional `--x/--y/--width/--height` for positioning.
- `canvas eval` accepts inline JS (`--js`) or a positional arg.
### A2UI (Canvas)
```bash
openclaw nodes canvas a2ui push --node <idOrNameOrIp> --text "Hello"
openclaw nodes canvas a2ui push --node <idOrNameOrIp> --jsonl ./payload.jsonl
openclaw nodes canvas a2ui reset --node <idOrNameOrIp>
```
Notes:
- Only A2UI v0.8 JSONL is supported (v0.9/createSurface is rejected).
## Photos + videos (node camera)
Photos (`jpg`):
```bash
openclaw nodes camera list --node <idOrNameOrIp>
openclaw nodes camera snap --node <idOrNameOrIp> # default: both facings (2 MEDIA lines)
openclaw nodes camera snap --node <idOrNameOrIp> --facing front
```
Video clips (`mp4`):
```bash
openclaw nodes camera clip --node <idOrNameOrIp> --duration 10s
openclaw nodes camera clip --node <idOrNameOrIp> --duration 3000 --no-audio
```
Notes:
- The node must be **foregrounded** for `canvas.*` and `camera.*` (background calls return `NODE_BACKGROUND_UNAVAILABLE`).
- Clip duration is clamped (currently `<= 60s`) to avoid oversized base64 payloads.
- Android will prompt for `CAMERA`/`RECORD_AUDIO` permissions when possible; denied permissions fail with `*_PERMISSION_REQUIRED`.
## Screen recordings (nodes)
Nodes expose `screen.record` (mp4). Example:
```bash
openclaw nodes screen record --node <idOrNameOrIp> --duration 10s --fps 10
openclaw nodes screen record --node <idOrNameOrIp> --duration 10s --fps 10 --no-audio
```
Notes:
- `screen.record` requires the node app to be foregrounded.
- Android will show the system screen-capture prompt before recording.
- Screen recordings are clamped to `<= 60s`.
- `--no-audio` disables microphone capture (supported on iOS/Android; macOS uses system capture audio).
- Use `--screen <index>` to select a display when multiple screens are available.
## Location (nodes)
Nodes expose `location.get` when Location is enabled in settings.
CLI helper:
```bash
openclaw nodes location get --node <idOrNameOrIp>
openclaw nodes location get --node <idOrNameOrIp> --accuracy precise --max-age 15000 --location-timeout 10000
```
Notes:
- Location is **off by default**.
- “Always” requires system permission; background fetch is best-effort.
- The response includes lat/lon, accuracy (meters), and timestamp.
## SMS (Android nodes)
Android nodes can expose `sms.send` when the user grants **SMS** permission and the device supports telephony.
Low-level invoke:
```bash
openclaw nodes invoke --node <idOrNameOrIp> --command sms.send --params '{"to":"+15555550123","message":"Hello from OpenClaw"}'
```
Notes:
- The permission prompt must be accepted on the Android device before the capability is advertised.
- Wi-Fi-only devices without telephony will not advertise `sms.send`.
## System commands (node host / mac node)
The macOS node exposes `system.run`, `system.notify`, and `system.execApprovals.get/set`.
The headless node host exposes `system.run`, `system.which`, and `system.execApprovals.get/set`.
Examples:
```bash
openclaw nodes run --node <idOrNameOrIp> -- echo "Hello from mac node"
openclaw nodes notify --node <idOrNameOrIp> --title "Ping" --body "Gateway ready"
```
Notes:
- `system.run` returns stdout/stderr/exit code in the payload.
- `system.notify` respects notification permission state on the macOS app.
- `system.run` supports `--cwd`, `--env KEY=VAL`, `--command-timeout`, and `--needs-screen-recording`.
- `system.notify` supports `--priority <passive|active|timeSensitive>` and `--delivery <system|overlay|auto>`.
- macOS nodes drop `PATH` overrides; headless node hosts only accept `PATH` when it prepends the node host PATH.
- On macOS node mode, `system.run` is gated by exec approvals in the macOS app (Settings → Exec approvals).
Ask/allowlist/full behave the same as the headless node host; denied prompts return `SYSTEM_RUN_DENIED`.
- On headless node host, `system.run` is gated by exec approvals (`~/.openclaw/exec-approvals.json`).
## Exec node binding
When multiple nodes are available, you can bind exec to a specific node.
This sets the default node for `exec host=node` (and can be overridden per agent).
Global default:
```bash
openclaw config set tools.exec.node "node-id-or-name"
```
Per-agent override:
```bash
openclaw config get agents.list
openclaw config set agents.list[0].tools.exec.node "node-id-or-name"
```
Unset to allow any node:
```bash
openclaw config unset tools.exec.node
openclaw config unset agents.list[0].tools.exec.node
```
## Permissions map
Nodes may include a `permissions` map in `node.list` / `node.describe`, keyed by permission name (e.g. `screenRecording`, `accessibility`) with boolean values (`true` = granted).
## Headless node host (cross-platform)
OpenClaw can run a **headless node host** (no UI) that connects to the Gateway
WebSocket and exposes `system.run` / `system.which`. This is useful on Linux/Windows
or for running a minimal node alongside a server.
Start it:
```bash
openclaw node run --host <gateway-host> --port 18789
```
Notes:
- Pairing is still required (the Gateway will show a node approval prompt).
- The node host stores its node id, token, display name, and gateway connection info in `~/.openclaw/node.json`.
- Exec approvals are enforced locally via `~/.openclaw/exec-approvals.json`
(see [Exec approvals](/tools/exec-approvals)).
- On macOS, the headless node host prefers the companion app exec host when reachable and falls
back to local execution if the app is unavailable. Set `OPENCLAW_NODE_EXEC_HOST=app` to require
the app, or `OPENCLAW_NODE_EXEC_FALLBACK=0` to disable fallback.
- Add `--tls` / `--tls-fingerprint` when the Gateway WS uses TLS.
## Mac node mode
- The macOS menubar app connects to the Gateway WS server as a node (so `openclaw nodes …` works against this Mac).
- In remote mode, the app opens an SSH tunnel for the Gateway port and connects to `localhost`.

View File

@@ -1,113 +0,0 @@
---
summary: "Location command for nodes (location.get), permission modes, and background behavior"
read_when:
- Adding location node support or permissions UI
- Designing background location + push flows
title: "Location Command"
---
# Location command (nodes)
## TL;DR
- `location.get` is a node command (via `node.invoke`).
- Off by default.
- Settings use a selector: Off / While Using / Always.
- Separate toggle: Precise Location.
## Why a selector (not just a switch)
OS permissions are multi-level. We can expose a selector in-app, but the OS still decides the actual grant.
- iOS/macOS: user can choose **While Using** or **Always** in system prompts/Settings. App can request upgrade, but OS may require Settings.
- Android: background location is a separate permission; on Android 10+ it often requires a Settings flow.
- Precise location is a separate grant (iOS 14+ “Precise”, Android “fine” vs “coarse”).
Selector in UI drives our requested mode; actual grant lives in OS settings.
## Settings model
Per node device:
- `location.enabledMode`: `off | whileUsing | always`
- `location.preciseEnabled`: bool
UI behavior:
- Selecting `whileUsing` requests foreground permission.
- Selecting `always` first ensures `whileUsing`, then requests background (or sends user to Settings if required).
- If OS denies requested level, revert to the highest granted level and show status.
## Permissions mapping (node.permissions)
Optional. macOS node reports `location` via the permissions map; iOS/Android may omit it.
## Command: `location.get`
Called via `node.invoke`.
Params (suggested):
```json
{
"timeoutMs": 10000,
"maxAgeMs": 15000,
"desiredAccuracy": "coarse|balanced|precise"
}
```
Response payload:
```json
{
"lat": 48.20849,
"lon": 16.37208,
"accuracyMeters": 12.5,
"altitudeMeters": 182.0,
"speedMps": 0.0,
"headingDeg": 270.0,
"timestamp": "2026-01-03T12:34:56.000Z",
"isPrecise": true,
"source": "gps|wifi|cell|unknown"
}
```
Errors (stable codes):
- `LOCATION_DISABLED`: selector is off.
- `LOCATION_PERMISSION_REQUIRED`: permission missing for requested mode.
- `LOCATION_BACKGROUND_UNAVAILABLE`: app is backgrounded but only While Using allowed.
- `LOCATION_TIMEOUT`: no fix in time.
- `LOCATION_UNAVAILABLE`: system failure / no providers.
## Background behavior (future)
Goal: model can request location even when node is backgrounded, but only when:
- User selected **Always**.
- OS grants background location.
- App is allowed to run in background for location (iOS background mode / Android foreground service or special allowance).
Push-triggered flow (future):
1. Gateway sends a push to the node (silent push or FCM data).
2. Node wakes briefly and requests location from the device.
3. Node forwards payload to Gateway.
Notes:
- iOS: Always permission + background location mode required. Silent push may be throttled; expect intermittent failures.
- Android: background location may require a foreground service; otherwise, expect denial.
## Model/tooling integration
- Tool surface: `nodes` tool adds `location_get` action (node required).
- CLI: `openclaw nodes location get --node <id>`.
- Agent guidelines: only call when user enabled location and understands the scope.
## UX copy (suggested)
- Off: “Location sharing is disabled.”
- While Using: “Only when OpenClaw is open.”
- Always: “Allow background location. Requires system permission.”
- Precise: “Use precise GPS location. Toggle off to share approximate location.”

View File

@@ -1,379 +0,0 @@
---
summary: "Inbound image/audio/video understanding (optional) with provider + CLI fallbacks"
read_when:
- Designing or refactoring media understanding
- Tuning inbound audio/video/image preprocessing
title: "Media Understanding"
---
# Media Understanding (Inbound) — 2026-01-17
OpenClaw can **summarize inbound media** (image/audio/video) before the reply pipeline runs. It autodetects when local tools or provider keys are available, and can be disabled or customized. If understanding is off, models still receive the original files/URLs as usual.
## Goals
- Optional: predigest inbound media into short text for faster routing + better command parsing.
- Preserve original media delivery to the model (always).
- Support **provider APIs** and **CLI fallbacks**.
- Allow multiple models with ordered fallback (error/size/timeout).
## Highlevel behavior
1. Collect inbound attachments (`MediaPaths`, `MediaUrls`, `MediaTypes`).
2. For each enabled capability (image/audio/video), select attachments per policy (default: **first**).
3. Choose the first eligible model entry (size + capability + auth).
4. If a model fails or the media is too large, **fall back to the next entry**.
5. On success:
- `Body` becomes `[Image]`, `[Audio]`, or `[Video]` block.
- Audio sets `{{Transcript}}`; command parsing uses caption text when present,
otherwise the transcript.
- Captions are preserved as `User text:` inside the block.
If understanding fails or is disabled, **the reply flow continues** with the original body + attachments.
## Config overview
`tools.media` supports **shared models** plus percapability overrides:
- `tools.media.models`: shared model list (use `capabilities` to gate).
- `tools.media.image` / `tools.media.audio` / `tools.media.video`:
- defaults (`prompt`, `maxChars`, `maxBytes`, `timeoutSeconds`, `language`)
- provider overrides (`baseUrl`, `headers`, `providerOptions`)
- Deepgram audio options via `tools.media.audio.providerOptions.deepgram`
- optional **percapability `models` list** (preferred before shared models)
- `attachments` policy (`mode`, `maxAttachments`, `prefer`)
- `scope` (optional gating by channel/chatType/session key)
- `tools.media.concurrency`: max concurrent capability runs (default **2**).
```json5
{
tools: {
media: {
models: [
/* shared list */
],
image: {
/* optional overrides */
},
audio: {
/* optional overrides */
},
video: {
/* optional overrides */
},
},
},
}
```
### Model entries
Each `models[]` entry can be **provider** or **CLI**:
```json5
{
type: "provider", // default if omitted
provider: "openai",
model: "gpt-5.2",
prompt: "Describe the image in <= 500 chars.",
maxChars: 500,
maxBytes: 10485760,
timeoutSeconds: 60,
capabilities: ["image"], // optional, used for multimodal entries
profile: "vision-profile",
preferredProfile: "vision-fallback",
}
```
```json5
{
type: "cli",
command: "gemini",
args: [
"-m",
"gemini-3-flash",
"--allowed-tools",
"read_file",
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
],
maxChars: 500,
maxBytes: 52428800,
timeoutSeconds: 120,
capabilities: ["video", "image"],
}
```
CLI templates can also use:
- `{{MediaDir}}` (directory containing the media file)
- `{{OutputDir}}` (scratch dir created for this run)
- `{{OutputBase}}` (scratch file base path, no extension)
## Defaults and limits
Recommended defaults:
- `maxChars`: **500** for image/video (short, commandfriendly)
- `maxChars`: **unset** for audio (full transcript unless you set a limit)
- `maxBytes`:
- image: **10MB**
- audio: **20MB**
- video: **50MB**
Rules:
- If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
- If the model returns more than `maxChars`, output is trimmed.
- `prompt` defaults to simple “Describe the {media}.” plus the `maxChars` guidance (image/video only).
- If `<capability>.enabled: true` but no models are configured, OpenClaw tries the
**active reply model** when its provider supports the capability.
### Auto-detect media understanding (default)
If `tools.media.<capability>.enabled` is **not** set to `false` and you havent
configured models, OpenClaw auto-detects in this order and **stops at the first
working option**:
1. **Local CLIs** (audio only; if installed)
- `sherpa-onnx-offline` (requires `SHERPA_ONNX_MODEL_DIR` with encoder/decoder/joiner/tokens)
- `whisper-cli` (`whisper-cpp`; uses `WHISPER_CPP_MODEL` or the bundled tiny model)
- `whisper` (Python CLI; downloads models automatically)
2. **Gemini CLI** (`gemini`) using `read_many_files`
3. **Provider keys**
- Audio: OpenAI → Groq → Deepgram → Google
- Image: OpenAI → Anthropic → Google → MiniMax
- Video: Google
To disable auto-detection, set:
```json5
{
tools: {
media: {
audio: {
enabled: false,
},
},
},
}
```
Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI is on `PATH` (we expand `~`), or set an explicit CLI model with a full command path.
## Capabilities (optional)
If you set `capabilities`, the entry only runs for those media types. For shared
lists, OpenClaw can infer defaults:
- `openai`, `anthropic`, `minimax`: **image**
- `google` (Gemini API): **image + audio + video**
- `groq`: **audio**
- `deepgram`: **audio**
For CLI entries, **set `capabilities` explicitly** to avoid surprising matches.
If you omit `capabilities`, the entry is eligible for the list it appears in.
## Provider support matrix (OpenClaw integrations)
| Capability | Provider integration | Notes |
| ---------- | ------------------------------------------------ | ------------------------------------------------- |
| Image | OpenAI / Anthropic / Google / others via `pi-ai` | Any image-capable model in the registry works. |
| Audio | OpenAI, Groq, Deepgram, Google | Provider transcription (Whisper/Deepgram/Gemini). |
| Video | Google (Gemini API) | Provider video understanding. |
## Recommended providers
**Image**
- Prefer your active model if it supports images.
- Good defaults: `openai/gpt-5.2`, `anthropic/claude-opus-4-5`, `google/gemini-3-pro-preview`.
**Audio**
- `openai/gpt-4o-mini-transcribe`, `groq/whisper-large-v3-turbo`, or `deepgram/nova-3`.
- CLI fallback: `whisper-cli` (whisper-cpp) or `whisper`.
- Deepgram setup: [Deepgram (audio transcription)](/providers/deepgram).
**Video**
- `google/gemini-3-flash-preview` (fast), `google/gemini-3-pro-preview` (richer).
- CLI fallback: `gemini` CLI (supports `read_file` on video/audio).
## Attachment policy
Percapability `attachments` controls which attachments are processed:
- `mode`: `first` (default) or `all`
- `maxAttachments`: cap the number processed (default **1**)
- `prefer`: `first`, `last`, `path`, `url`
When `mode: "all"`, outputs are labeled `[Image 1/2]`, `[Audio 2/2]`, etc.
## Config examples
### 1) Shared models list + overrides
```json5
{
tools: {
media: {
models: [
{ provider: "openai", model: "gpt-5.2", capabilities: ["image"] },
{
provider: "google",
model: "gemini-3-flash-preview",
capabilities: ["image", "audio", "video"],
},
{
type: "cli",
command: "gemini",
args: [
"-m",
"gemini-3-flash",
"--allowed-tools",
"read_file",
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
],
capabilities: ["image", "video"],
},
],
audio: {
attachments: { mode: "all", maxAttachments: 2 },
},
video: {
maxChars: 500,
},
},
},
}
```
### 2) Audio + Video only (image off)
```json5
{
tools: {
media: {
audio: {
enabled: true,
models: [
{ provider: "openai", model: "gpt-4o-mini-transcribe" },
{
type: "cli",
command: "whisper",
args: ["--model", "base", "{{MediaPath}}"],
},
],
},
video: {
enabled: true,
maxChars: 500,
models: [
{ provider: "google", model: "gemini-3-flash-preview" },
{
type: "cli",
command: "gemini",
args: [
"-m",
"gemini-3-flash",
"--allowed-tools",
"read_file",
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
],
},
],
},
},
},
}
```
### 3) Optional image understanding
```json5
{
tools: {
media: {
image: {
enabled: true,
maxBytes: 10485760,
maxChars: 500,
models: [
{ provider: "openai", model: "gpt-5.2" },
{ provider: "anthropic", model: "claude-opus-4-5" },
{
type: "cli",
command: "gemini",
args: [
"-m",
"gemini-3-flash",
"--allowed-tools",
"read_file",
"Read the media at {{MediaPath}} and describe it in <= {{MaxChars}} characters.",
],
},
],
},
},
},
}
```
### 4) Multimodal single entry (explicit capabilities)
```json5
{
tools: {
media: {
image: {
models: [
{
provider: "google",
model: "gemini-3-pro-preview",
capabilities: ["image", "video", "audio"],
},
],
},
audio: {
models: [
{
provider: "google",
model: "gemini-3-pro-preview",
capabilities: ["image", "video", "audio"],
},
],
},
video: {
models: [
{
provider: "google",
model: "gemini-3-pro-preview",
capabilities: ["image", "video", "audio"],
},
],
},
},
},
}
```
## Status output
When media understanding runs, `/status` includes a short summary line:
```
📎 Media: image ok (openai/gpt-5.2) · audio skipped (maxBytes)
```
This shows percapability outcomes and the chosen provider/model when applicable.
## Notes
- Understanding is **besteffort**. Errors do not block replies.
- Attachments are still passed to models even when understanding is disabled.
- Use `scope` to limit where understanding runs (e.g. only DMs).
## Related docs
- [Configuration](/gateway/configuration)
- [Image & Media Support](/nodes/images)

View File

@@ -1,90 +0,0 @@
---
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
read_when:
- Implementing Talk mode on macOS/iOS/Android
- Changing voice/TTS/interrupt behavior
title: "Talk Mode"
---
# Talk Mode
Talk mode is a continuous voice conversation loop:
1. Listen for speech
2. Send transcript to the model (main session, chat.send)
3. Wait for the response
4. Speak it via ElevenLabs (streaming playback)
## Behavior (macOS)
- **Always-on overlay** while Talk mode is enabled.
- **Listening → Thinking → Speaking** phase transitions.
- On a **short pause** (silence window), the current transcript is sent.
- Replies are **written to WebChat** (same as typing).
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
## Voice directives in replies
The assistant may prefix its reply with a **single JSON line** to control voice:
```json
{ "voice": "<voice-id>", "once": true }
```
Rules:
- First non-empty line only.
- Unknown keys are ignored.
- `once: true` applies to the current reply only.
- Without `once`, the voice becomes the new default for Talk mode.
- The JSON line is stripped before TTS playback.
Supported keys:
- `voice` / `voice_id` / `voiceId`
- `model` / `model_id` / `modelId`
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
- `once`
## Config (`~/.openclaw/openclaw.json`)
```json5
{
talk: {
voiceId: "elevenlabs_voice_id",
modelId: "eleven_v3",
outputFormat: "mp3_44100_128",
apiKey: "elevenlabs_api_key",
interruptOnSpeech: true,
},
}
```
Defaults:
- `interruptOnSpeech`: true
- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
- `modelId`: defaults to `eleven_v3` when unset
- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
## macOS UI
- Menu bar toggle: **Talk**
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
- Overlay:
- **Listening**: cloud pulses with mic level
- **Thinking**: sinking animation
- **Speaking**: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode
## Notes
- Requires Speech + Microphone permissions.
- Uses `chat.send` against session key `main`.
- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
- `latency_tier` is validated to `0..4` when set.
- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.

View File

@@ -1,65 +0,0 @@
---
summary: "Global voice wake words (Gateway-owned) and how they sync across nodes"
read_when:
- Changing voice wake words behavior or defaults
- Adding new node platforms that need wake word sync
title: "Voice Wake"
---
# Voice Wake (Global Wake Words)
OpenClaw treats **wake words as a single global list** owned by the **Gateway**.
- There are **no per-node custom wake words**.
- **Any node/app UI may edit** the list; changes are persisted by the Gateway and broadcast to everyone.
- Each device still keeps its own **Voice Wake enabled/disabled** toggle (local UX + permissions differ).
## Storage (Gateway host)
Wake words are stored on the gateway machine at:
- `~/.openclaw/settings/voicewake.json`
Shape:
```json
{ "triggers": ["openclaw", "claude", "computer"], "updatedAtMs": 1730000000000 }
```
## Protocol
### Methods
- `voicewake.get``{ triggers: string[] }`
- `voicewake.set` with params `{ triggers: string[] }``{ triggers: string[] }`
Notes:
- Triggers are normalized (trimmed, empties dropped). Empty lists fall back to defaults.
- Limits are enforced for safety (count/length caps).
### Events
- `voicewake.changed` payload `{ triggers: string[] }`
Who receives it:
- All WebSocket clients (macOS app, WebChat, etc.)
- All connected nodes (iOS/Android), and also on node connect as an initial “current state” push.
## Client behavior
### macOS app
- Uses the global list to gate `VoiceWakeRuntime` triggers.
- Editing “Trigger words” in Voice Wake settings calls `voicewake.set` and then relies on the broadcast to keep other clients in sync.
### iOS node
- Uses the global list for `VoiceWakeManager` trigger detection.
- Editing Wake Words in Settings calls `voicewake.set` (over the Gateway WS) and also keeps local wake-word detection responsive.
### Android node
- Exposes a Wake Words editor in Settings.
- Calls `voicewake.set` over the Gateway WS so edits sync everywhere.