From 743b69d307ef8d1f267df3851a60f855c9cc7269 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Thu, 23 Apr 2026 19:34:42 -0700 Subject: [PATCH] docs(tools): split browser docs by extracting control API and CLI reference --- docs/docs.json | 1 + docs/tools/browser-control.md | 343 ++++++++++++++++++++++++++++++++++ docs/tools/browser.md | 325 +------------------------------- 3 files changed, 348 insertions(+), 321 deletions(-) create mode 100644 docs/tools/browser-control.md diff --git a/docs/docs.json b/docs/docs.json index 65403aeda9b..590647ff348 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -1209,6 +1209,7 @@ "group": "Web Browser", "pages": [ "tools/browser", + "tools/browser-control", "tools/browser-login", "tools/browser-linux-troubleshooting", "tools/browser-wsl2-windows-remote-cdp-troubleshooting" diff --git a/docs/tools/browser-control.md b/docs/tools/browser-control.md new file mode 100644 index 00000000000..10fd96d642d --- /dev/null +++ b/docs/tools/browser-control.md @@ -0,0 +1,343 @@ +--- +summary: "OpenClaw browser control API, CLI reference, and scripting actions" +read_when: + - Scripting or debugging the agent browser via the local control API + - Looking for the `openclaw browser` CLI reference + - Adding custom browser automation with snapshots and refs +title: "Browser control API" +--- + +For setup, configuration, and troubleshooting, see [Browser](/tools/browser). +This page is the reference for the local control HTTP API, the `openclaw browser` +CLI, and scripting patterns (snapshots, refs, waits, debug flows). + +## Control API (optional) + +For local integrations only, the Gateway exposes a small loopback HTTP API: + +- Status/start/stop: `GET /`, `POST /start`, `POST /stop` +- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId` +- Snapshot/screenshot: `GET /snapshot`, `POST /screenshot` +- Actions: `POST /navigate`, `POST /act` +- Hooks: `POST /hooks/file-chooser`, `POST /hooks/dialog` +- Downloads: `POST /download`, `POST /wait/download` +- Debugging: `GET /console`, `POST /pdf` +- Debugging: `GET /errors`, `GET /requests`, `POST /trace/start`, `POST /trace/stop`, `POST /highlight` +- Network: `POST /response/body` +- State: `GET /cookies`, `POST /cookies/set`, `POST /cookies/clear` +- State: `GET /storage/:kind`, `POST /storage/:kind/set`, `POST /storage/:kind/clear` +- Settings: `POST /set/offline`, `POST /set/headers`, `POST /set/credentials`, `POST /set/geolocation`, `POST /set/media`, `POST /set/timezone`, `POST /set/locale`, `POST /set/device` + +All endpoints accept `?profile=`. + +If shared-secret gateway auth is configured, browser HTTP routes require auth too: + +- `Authorization: Bearer ` +- `x-openclaw-password: ` or HTTP Basic auth with that password + +Notes: + +- This standalone loopback browser API does **not** consume trusted-proxy or + Tailscale Serve identity headers. +- If `gateway.auth.mode` is `none` or `trusted-proxy`, these loopback browser + routes do not inherit those identity-bearing modes; keep them loopback-only. + +### `/act` error contract + +`POST /act` uses a structured error response for route-level validation and +policy failures: + +```json +{ "error": "", "code": "ACT_*" } +``` + +Current `code` values: + +- `ACT_KIND_REQUIRED` (HTTP 400): `kind` is missing or unrecognized. +- `ACT_INVALID_REQUEST` (HTTP 400): action payload failed normalization or validation. +- `ACT_SELECTOR_UNSUPPORTED` (HTTP 400): `selector` was used with an unsupported action kind. +- `ACT_EVALUATE_DISABLED` (HTTP 403): `evaluate` (or `wait --fn`) is disabled by config. +- `ACT_TARGET_ID_MISMATCH` (HTTP 403): top-level or batched `targetId` conflicts with request target. +- `ACT_EXISTING_SESSION_UNSUPPORTED` (HTTP 501): action is not supported for existing-session profiles. + +Other runtime failures may still return `{ "error": "" }` without a +`code` field. + +### Playwright requirement + +Some features (navigate/act/AI snapshot/role snapshot, element screenshots, +PDF) require Playwright. If Playwright isn’t installed, those endpoints return +a clear 501 error. + +What still works without Playwright: + +- ARIA snapshots +- Page screenshots for the managed `openclaw` browser when a per-tab CDP + WebSocket is available +- Page screenshots for `existing-session` / Chrome MCP profiles +- `existing-session` ref-based screenshots (`--ref`) from snapshot output + +What still needs Playwright: + +- `navigate` +- `act` +- AI snapshots / role snapshots +- CSS-selector element screenshots (`--element`) +- full browser PDF export + +Element screenshots also reject `--full-page`; the route returns `fullPage is +not supported for element screenshots`. + +If you see `Playwright is not available in this gateway build`, repair the +bundled browser plugin runtime dependencies so `playwright-core` is installed, +then restart the gateway. For packaged installs, run `openclaw doctor --fix`. +For Docker, also install the Chromium browser binaries as shown below. + +#### Docker Playwright install + +If your Gateway runs in Docker, avoid `npx playwright` (npm override conflicts). +Use the bundled CLI instead: + +```bash +docker compose run --rm openclaw-cli \ + node /app/node_modules/playwright-core/cli.js install chromium +``` + +To persist browser downloads, set `PLAYWRIGHT_BROWSERS_PATH` (for example, +`/home/node/.cache/ms-playwright`) and make sure `/home/node` is persisted via +`OPENCLAW_HOME_VOLUME` or a bind mount. See [Docker](/install/docker). + +## How it works (internal) + +A small loopback control server accepts HTTP requests and connects to Chromium-based browsers via CDP. Advanced actions (click/type/snapshot/PDF) go through Playwright on top of CDP; when Playwright is missing, only non-Playwright operations are available. The agent sees one stable interface while local/remote browsers and profiles swap freely underneath. + +## CLI quick reference + +All commands accept `--browser-profile ` to target a specific profile, and `--json` for machine-readable output. + + + + + +```bash +openclaw browser status +openclaw browser start +openclaw browser stop # also clears emulation on attach-only/remote CDP +openclaw browser tabs +openclaw browser tab # shortcut for current tab +openclaw browser tab new +openclaw browser tab select 2 +openclaw browser tab close 2 +openclaw browser open https://example.com +openclaw browser focus abcd1234 +openclaw browser close abcd1234 +``` + + + + + +```bash +openclaw browser screenshot +openclaw browser screenshot --full-page +openclaw browser screenshot --ref 12 # or --ref e12 +openclaw browser snapshot +openclaw browser snapshot --format aria --limit 200 +openclaw browser snapshot --interactive --compact --depth 6 +openclaw browser snapshot --efficient +openclaw browser snapshot --labels +openclaw browser snapshot --selector "#main" --interactive +openclaw browser snapshot --frame "iframe#main" --interactive +openclaw browser console --level error +openclaw browser errors --clear +openclaw browser requests --filter api --clear +openclaw browser pdf +openclaw browser responsebody "**/api" --max-chars 5000 +``` + + + + + +```bash +openclaw browser navigate https://example.com +openclaw browser resize 1280 720 +openclaw browser click 12 --double # or e12 for role refs +openclaw browser type 23 "hello" --submit +openclaw browser press Enter +openclaw browser hover 44 +openclaw browser scrollintoview e12 +openclaw browser drag 10 11 +openclaw browser select 9 OptionA OptionB +openclaw browser download e12 report.pdf +openclaw browser waitfordownload report.pdf +openclaw browser upload /tmp/openclaw/uploads/file.pdf +openclaw browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]' +openclaw browser dialog --accept +openclaw browser wait --text "Done" +openclaw browser wait "#main" --url "**/dash" --load networkidle --fn "window.ready===true" +openclaw browser evaluate --fn '(el) => el.textContent' --ref 7 +openclaw browser highlight e12 +openclaw browser trace start +openclaw browser trace stop +``` + + + + + +```bash +openclaw browser cookies +openclaw browser cookies set session abc123 --url "https://example.com" +openclaw browser cookies clear +openclaw browser storage local get +openclaw browser storage local set theme dark +openclaw browser storage session clear +openclaw browser set offline on +openclaw browser set headers --headers-json '{"X-Debug":"1"}' +openclaw browser set credentials user pass # --clear to remove +openclaw browser set geo 37.7749 -122.4194 --origin "https://example.com" +openclaw browser set media dark +openclaw browser set timezone America/New_York +openclaw browser set locale en-US +openclaw browser set device "iPhone 14" +``` + + + + + +Notes: + +- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog. +- `click`/`type`/etc require a `ref` from `snapshot` (numeric `12` or role ref `e12`). CSS selectors are intentionally not supported for actions. +- Download, trace, and upload paths are constrained to OpenClaw temp roots: `/tmp/openclaw{,/downloads,/uploads}` (fallback: `${os.tmpdir()}/openclaw/...`). +- `upload` can also set file inputs directly via `--input-ref` or `--element`. + +Snapshot flags at a glance: + +- `--format ai` (default with Playwright): AI snapshot with numeric refs (`aria-ref=""`). +- `--format aria`: accessibility tree, no refs; inspection only. +- `--efficient` (or `--mode efficient`): compact role snapshot preset. Set `browser.snapshotDefaults.mode: "efficient"` to make this the default (see [Gateway configuration](/gateway/configuration-reference#browser)). +- `--interactive`, `--compact`, `--depth`, `--selector` force a role snapshot with `ref=e12` refs. `--frame "