docs(browser): explain automation skill and tab handles

This commit is contained in:
Peter Steinberger
2026-04-25 00:23:36 +01:00
parent 45e2a15e29
commit dea05aae6b
5 changed files with 44 additions and 8 deletions

View File

@@ -103,6 +103,8 @@ Docs: https://docs.openclaw.ai
- WhatsApp/groups+direct: setting `systemPrompt: ""` on a specific `groups.<id>` or `direct.<peerId>` entry now suppresses the wildcard system prompt instead of falling through to it, so users can silence the global prompt for a specific group or peer. (#70381) Thanks @Bluetegu.
- Browser/tool: tell agents not to pass per-call `timeoutMs` on existing-session type, evaluate, and other Chrome MCP actions that reject timeout overrides. Thanks @steipete.
- Browser/tool: use Playwright's current AI aria snapshot API for `refs="aria"` and fall back to role refs when a node browser cannot provide aria refs, so agents can still inspect and click controls such as Google Meet admission buttons. Thanks @steipete.
- Browser/tool: expose stable `tabId` handles such as `t1` plus optional tab labels, and accept those handles anywhere a browser tab target is needed. Thanks @steipete.
- Browser/tool: bundle a `browser-automation` skill with the multi-step snapshot, stable-tab, stale-ref, and manual-blocker loop for agent-controlled pages. Thanks @steipete.
- Plugins/Google Meet: use browser automation to classify and clear Meet's microphone-choice interstitial during browser meeting creation, and reuse in-progress create tabs on retry instead of opening duplicates. Thanks @steipete.
- Codex/GPT-5.4: harden fallback, auth-profile, tool-schema, and replay edge cases across native and embedded runtime paths. (#70743) Thanks @100yenadmin.
- Models/fallback: resolve bare fallback model provider ids before model switching, so configured fallback chains keep working when a fallback is named without an explicit provider prefix. Thanks @steipete.

View File

@@ -111,14 +111,20 @@ openclaw browser --browser-profile work tabs
```bash
openclaw browser tabs
openclaw browser tab new
openclaw browser tab new --label docs
openclaw browser tab label t1 docs
openclaw browser tab select 2
openclaw browser tab close 2
openclaw browser open https://docs.openclaw.ai
openclaw browser focus <targetId>
openclaw browser close <targetId>
openclaw browser open https://docs.openclaw.ai --label docs
openclaw browser focus docs
openclaw browser close t1
```
`tabs` returns the raw `targetId` plus a stable `tabId` such as `t1`. You can
also assign a label with `open --label`, `tab new --label`, or `tab label`.
`focus`, `close`, snapshots, and actions accept the raw `targetId`, `tabId`,
label, or a unique target-id prefix.
## Snapshot / screenshot / actions
Snapshot:

View File

@@ -171,6 +171,10 @@ Eligibility includes skill metadata gates, runtime environment/config checks,
and the effective agent skill allowlist when `agents.defaults.skills` or
`agents.list[].skills` is configured.
Plugin-bundled skills are eligible only when their owning plugin is enabled.
This lets tool plugins expose deeper operating guides without embedding all of
that guidance directly in every tool description.
```
<available_skills>
<skill>

View File

@@ -23,6 +23,9 @@ Beginner view:
- A separate browser profile named **openclaw** (orange accent by default).
- Deterministic tab control (list/open/focus/close).
- Agent actions (click/type/drag/select), snapshots, screenshots, PDFs.
- A bundled `browser-automation` skill that teaches agents the snapshot,
stable-tab, stale-ref, and manual-blocker recovery loop when the browser
plugin is enabled.
- Optional multi-profile support (`openclaw`, `work`, `remote`, ...).
This browser is **not** your daily driver. It is a safe, isolated surface for
@@ -63,6 +66,22 @@ Defaults need both `plugins.entries.browser.enabled` **and** `browser.enabled=tr
Browser config changes require a Gateway restart so the plugin can re-register its service.
## Agent guidance
The browser plugin ships two levels of agent guidance:
- The `browser` tool description carries the compact always-on contract: pick
the right profile, keep refs on the same tab, use `tabId`/labels for tab
targeting, and load the browser skill for multi-step work.
- The bundled `browser-automation` skill carries the longer operating loop:
check status/tabs first, label task tabs, snapshot before acting, resnapshot
after UI changes, recover stale refs once, and report login/2FA/captcha or
camera/microphone blockers as manual action instead of guessing.
Plugin-bundled skills are listed in the agent's available skills when the
plugin is enabled. The full skill instructions are loaded on demand, so routine
turns do not pay the full token cost.
## Missing browser command or tool
If `openclaw browser` is unknown after an upgrade, `browser.request` is missing, or the agent reports the browser tool as unavailable, the usual cause is a `plugins.allow` list that omits `browser`. Add it:
@@ -494,7 +513,8 @@ Compared to the managed `openclaw` profile, existing-session drivers are more co
- **Dedicated user data dir**: never touches your personal browser profile.
- **Dedicated ports**: avoids `9222` to prevent collisions with dev workflows.
- **Deterministic tab control**: target tabs by `targetId`, not “last tab”.
- **Deterministic tab control**: target tabs by raw `targetId`, stable `tabId`
handles such as `t1`, or labels you assign with `open --label` / `tab label`.
## Browser selection

View File

@@ -81,9 +81,13 @@ slash-command discovery, sandbox sync, and skill snapshots.
Plugins can ship their own skills by listing `skills` directories in
`openclaw.plugin.json` (paths relative to the plugin root). Plugin skills load
when the plugin is enabled. Today those directories are merged into the same
low-precedence path as `skills.load.extraDirs`, so a same-named bundled,
managed, agent, or workspace skill overrides them.
when the plugin is enabled. This is the right place for tool-specific operating
guides that are too long for the tool description but should be available
whenever the plugin is installed; for example, the browser plugin ships a
`browser-automation` skill for multi-step browser control. Today those
directories are merged into the same low-precedence path as
`skills.load.extraDirs`, so a same-named bundled, managed, agent, or workspace
skill overrides them.
You can gate them via `metadata.openclaw.requires.config` on the plugins config
entry. See [Plugins](/tools/plugin) for discovery/config and [Tools](/tools) for the
tool surface those skills teach.