Commit Graph

17 Commits

Author SHA1 Message Date
FMLS
3a88142ddd fix(browser): document stable tab references (#88393)
Summary:
- The branch documents friendly browser tab references across docs, the browser skill, CLI help, and tool schema descriptions, and adds tests for target reference resolution and tab alias behavior.
- PR surface: Source +24, Tests +328, Docs +9. Total +361 across 21 files.
- Reproducibility: yes. for the documentation mismatch by source inspection: current main supports friendly ta ... schema/help surfaces still emphasize raw CDP target ids. Runtime behavior itself is not a new failing path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: refactor(browser): share tab reference CLI help

Validation:
- ClawSweeper review passed for head 118af80b0b.
- Required merge gates passed before the squash merge.

Prepared head SHA: 118af80b0b
Review: https://github.com/openclaw/openclaw/pull/88393#issuecomment-4583558133

Co-authored-by: FMLS <kfliuyang@gmail.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: hxy91819
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
2026-05-31 12:09:50 +00:00
scotthuang
7920af0c9e refactor: route browser screenshot vision through shared media understanding
* feat(browser): add optional vision understanding to screenshot tool

* fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles

* fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support

* feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage

* style(browser): add curly braces to satisfy eslint curly rule

* fix(browser): correct tools.browser.enabled help text to match actual behavior

* fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision

* refactor(browser): move vision config from tools.browser to browser.models

The browser plugin's vision configuration now lives on the top-level
`browser` config namespace (browser.models, browser.visionEnabled,
browser.visionPrompt, etc.) instead of `tools.browser`. This aligns
with the plugin's existing config location and avoids confusion between
tool-level and plugin-level settings.

- Remove tools.browser from ToolsSchema and ToolsConfig
- Add models/vision* fields to BrowserConfig and its zod schema
- Update getBrowserVisionConfig to read from cfg.browser
- Update schema help, labels, and quality test
- Update vision.test.ts to use new config shape

* docs(browser): add screenshot vision configuration section

Document the new browser.models config for automatic screenshot
description via vision models, enabling text-only main models to
reason about web page content.

* fix(browser): remove deliverable media markers from vision result, drop unused import

P1: Vision-success path no longer exposes the raw screenshot as
deliverable media (removes MEDIA: line and details.media.mediaUrl).
This prevents channel delivery from auto-sending sensitive page content
when the intended output is a text description.

P2: Remove unused ToolsMediaUnderstandingSchema import that would fail
noUnusedLocals typecheck.

* fix(browser): add command/args fields to browser models schema

The browser vision model schema uses .strict(), so CLI-type entries
with command/args were rejected by TypeScript. Add these fields to
align with MediaUnderstandingModelSchema.

* chore(browser): remove debug console.log statements

* fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback

ClawSweeper #84247 review round 2:

P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions
before wrapping with wrapExternalContent. The agent media extractor scans every
browser tool-result text block via splitMediaFromOutput which treats line-start
MEDIA: as a trusted local-media delivery directive, and browser is on the
trusted-media allowlist. Without neutralization, page or vision-provider output
containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media
artifact from untrusted content. wrapExternalContent itself does not strip
line-start directives. Introduce neutralizeMediaDirectives in vision.ts that
prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA:
(case-insensitive), defanging the parser anchor while keeping the original
text human-readable.

P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile
in the vision-failure catch fallback. The non-vision screenshot path already
forwards this option (d5cc0d53b7) so configured agents.defaults.imageMaxDimensionPx
takes effect. Without this fix, any provider timeout/error silently bypasses the
sanitization guard and returns a raw full-resolution screenshot.

Regression coverage:
- vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path,
  mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged,
  case-insensitive, multiple directives per blob).
- browser-tool.test.ts: 2 integration cases that drive the full screenshot
  tool execute path:
    - 'neutralizes MEDIA: directives in vision text and does not attach media'
      asserts no line matches /^\s*MEDIA:/i in returned text, secret path text
      is preserved verbatim, details.media is absent, and imageResultFromFile
      is not called on the success path.
    - 'preserves screenshot image sanitization on vision failure fallback'
      mocks describeImageFileWithModel to reject and asserts the fallback
      imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600}
      plus the 'browser screenshot vision failed' extraText.

* fix(browser): apply clawsweeper fallback media fix from PR #84247

* refactor: reuse media image understanding for browser screenshots

* refactor: use structured media delivery

* test: update music completion media instruction expectation

* fix: trim buffered reply directive padding

* test: refresh codex prompt snapshots for message media aliases

---------

Co-authored-by: scotthuang <scotthuang@tencent.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-31 00:00:19 +01:00
Zee Zheng
8be581cbf8 fix(browser): allow inbound media uploads
Allow the browser upload tool to resolve OpenClaw-managed inbound media refs such as `media://inbound/<id>` and sandbox-relative `media/inbound/<id>` while preserving the existing upload-root path contract.

Keep upload-root files ahead of sandbox-relative inbound fallback, reject nested absolute inbound media files, and validate raw `media://` paths before URL normalization so traversal-shaped refs cannot resolve to direct media ids.

Verification:
- `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=verbose`
- `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=dot`
- `OPENCLAW_HEAVY_CHECK_LOCK_SCOPE=worktree node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo`
- `pnpm lint --threads=8`
- `.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main`
- `git diff --check`
- GitHub PR checks on be08e6c8a8: dependency-guard, check-lint, check-test-types, check-additional-extension-bundled, checks-fast-contracts-plugins-a, checks-fast-contracts-plugins-b all passed.

Fixes #83544.

Co-authored-by: Zee Zheng <zheng.zuo0@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 23:49:07 +01:00
clawsweeper[bot]
fa814eb9ed feat(browser): add evaluate timeout CLI option (#83696)
Summary:
- The branch adds `openclaw browser evaluate --timeout-ms`, forwards it to the evaluate body and request timeo ... ents and tests it, adds a changelog entry, and includes a config.patch no-op shortcut from the repair pass.
- Reproducibility: not applicable. this is a feature PR rather than a bug report. Source inspection shows current main lacks the CLI flag while the branch wires it into an already-supported evaluate `timeoutMs` payload.

Automerge notes:
- PR branch already contained follow-up commit before automerge: feat(browser): add evaluate timeout CLI option

Validation:
- ClawSweeper review passed for head 0d81d3d93e.
- Required merge gates passed before the squash merge.

Prepared head SHA: 0d81d3d93e
Review: https://github.com/openclaw/openclaw/pull/83696#issuecomment-4479900502

Co-authored-by: fred <fengruifree@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
2026-05-18 17:30:33 +00:00
Peter Steinberger
7afac6015f feat(browser): surface observed dialogs (#83099) 2026-05-18 00:05:29 +01:00
Ayaan Zaidi
4a009612c9 fix(docker): prune with source workspace policy 2026-05-11 10:50:30 +05:30
Ayaan Zaidi
d40e062800 docs(browser): note Docker Chromium autodetect 2026-05-10 11:37:37 +05:30
Vincent Koc
ae9f779e5f docs: typography hygiene + 1 in-body H1 removal across 6 pages
Replaced 84 typography characters (curly quotes, apostrophes, em/en
dashes, non-breaking hyphens) with ASCII equivalents per
docs/CLAUDE.md heading and content hygiene rules.

- docs/gateway/tools-invoke-http-api.md: 14 chars; removed the
  duplicate '# Tools Invoke (HTTP)' H1 (Mintlify renders title from
  frontmatter; the in-body H1 with parens produced a brittle anchor).
- docs/tools/browser-control.md: 14 chars
- docs/security/formal-verification.md: 14 chars
- docs/gateway/configuration-reference.md: 14 chars
- docs/concepts/agent.md: 14 chars
- docs/channels/qa-channel.md: 14 chars
2026-05-05 20:26:16 -07:00
Peter Steinberger
ed8f50f240 refactor: simplify plugin dependency handling
Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths.

Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.
2026-05-01 21:32:22 +01:00
Peter Steinberger
1f7b7c249a fix(google-meet): grant browser media permissions 2026-04-27 14:54:07 +01:00
Peter Steinberger
ed1ac2fc44 feat(browser): add CDP role snapshot fallback 2026-04-26 04:40:26 +01:00
Peter Steinberger
e6ee4d6e68 fix(browser): preserve tabs across target swaps 2026-04-26 01:21:59 +01:00
Peter Steinberger
8cbb62d93c docs(browser): document headless start override 2026-04-25 11:42:04 +01:00
Peter Steinberger
19017bad96 docs(browser): explain actionable aria snapshot refs 2026-04-25 09:51:34 +01:00
Peter Steinberger
209d50b52c feat(browser): add coordinate click action
Co-authored-by: Daniel Lutts <daniellutts@10-19-94-204.dynapool.wireless.nyu.edu>
2026-04-25 07:31:33 +01:00
Peter Steinberger
60e7b692cc docs(browser): document inspection diagnostics 2026-04-25 00:56:35 +01:00
Vincent Koc
743b69d307 docs(tools): split browser docs by extracting control API and CLI reference 2026-04-23 19:34:50 -07:00