Commit Graph

661 Commits

Author SHA1 Message Date
scotthuang
7920af0c9e refactor: route browser screenshot vision through shared media understanding
* feat(browser): add optional vision understanding to screenshot tool

* fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles

* fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support

* feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage

* style(browser): add curly braces to satisfy eslint curly rule

* fix(browser): correct tools.browser.enabled help text to match actual behavior

* fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision

* refactor(browser): move vision config from tools.browser to browser.models

The browser plugin's vision configuration now lives on the top-level
`browser` config namespace (browser.models, browser.visionEnabled,
browser.visionPrompt, etc.) instead of `tools.browser`. This aligns
with the plugin's existing config location and avoids confusion between
tool-level and plugin-level settings.

- Remove tools.browser from ToolsSchema and ToolsConfig
- Add models/vision* fields to BrowserConfig and its zod schema
- Update getBrowserVisionConfig to read from cfg.browser
- Update schema help, labels, and quality test
- Update vision.test.ts to use new config shape

* docs(browser): add screenshot vision configuration section

Document the new browser.models config for automatic screenshot
description via vision models, enabling text-only main models to
reason about web page content.

* fix(browser): remove deliverable media markers from vision result, drop unused import

P1: Vision-success path no longer exposes the raw screenshot as
deliverable media (removes MEDIA: line and details.media.mediaUrl).
This prevents channel delivery from auto-sending sensitive page content
when the intended output is a text description.

P2: Remove unused ToolsMediaUnderstandingSchema import that would fail
noUnusedLocals typecheck.

* fix(browser): add command/args fields to browser models schema

The browser vision model schema uses .strict(), so CLI-type entries
with command/args were rejected by TypeScript. Add these fields to
align with MediaUnderstandingModelSchema.

* chore(browser): remove debug console.log statements

* fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback

ClawSweeper #84247 review round 2:

P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions
before wrapping with wrapExternalContent. The agent media extractor scans every
browser tool-result text block via splitMediaFromOutput which treats line-start
MEDIA: as a trusted local-media delivery directive, and browser is on the
trusted-media allowlist. Without neutralization, page or vision-provider output
containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media
artifact from untrusted content. wrapExternalContent itself does not strip
line-start directives. Introduce neutralizeMediaDirectives in vision.ts that
prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA:
(case-insensitive), defanging the parser anchor while keeping the original
text human-readable.

P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile
in the vision-failure catch fallback. The non-vision screenshot path already
forwards this option (d5cc0d53b7) so configured agents.defaults.imageMaxDimensionPx
takes effect. Without this fix, any provider timeout/error silently bypasses the
sanitization guard and returns a raw full-resolution screenshot.

Regression coverage:
- vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path,
  mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged,
  case-insensitive, multiple directives per blob).
- browser-tool.test.ts: 2 integration cases that drive the full screenshot
  tool execute path:
    - 'neutralizes MEDIA: directives in vision text and does not attach media'
      asserts no line matches /^\s*MEDIA:/i in returned text, secret path text
      is preserved verbatim, details.media is absent, and imageResultFromFile
      is not called on the success path.
    - 'preserves screenshot image sanitization on vision failure fallback'
      mocks describeImageFileWithModel to reject and asserts the fallback
      imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600}
      plus the 'browser screenshot vision failed' extraText.

* fix(browser): apply clawsweeper fallback media fix from PR #84247

* refactor: reuse media image understanding for browser screenshots

* refactor: use structured media delivery

* test: update music completion media instruction expectation

* fix: trim buffered reply directive padding

* test: refresh codex prompt snapshots for message media aliases

---------

Co-authored-by: scotthuang <scotthuang@tencent.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-31 00:00:19 +01:00
Zee Zheng
8be581cbf8 fix(browser): allow inbound media uploads
Allow the browser upload tool to resolve OpenClaw-managed inbound media refs such as `media://inbound/<id>` and sandbox-relative `media/inbound/<id>` while preserving the existing upload-root path contract.

Keep upload-root files ahead of sandbox-relative inbound fallback, reject nested absolute inbound media files, and validate raw `media://` paths before URL normalization so traversal-shaped refs cannot resolve to direct media ids.

Verification:
- `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=verbose`
- `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=dot`
- `OPENCLAW_HEAVY_CHECK_LOCK_SCOPE=worktree node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo`
- `pnpm lint --threads=8`
- `.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main`
- `git diff --check`
- GitHub PR checks on be08e6c8a8: dependency-guard, check-lint, check-test-types, check-additional-extension-bundled, checks-fast-contracts-plugins-a, checks-fast-contracts-plugins-b all passed.

Fixes #83544.

Co-authored-by: Zee Zheng <zheng.zuo0@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 23:49:07 +01:00
Peter Steinberger
4df27b9626 fix(browser): bound armed dialog expiry 2026-05-30 13:08:52 -04:00
Vincent Koc
3ef2935ac9 perf(browser): reuse chrome mcp import 2026-05-30 13:00:31 +02:00
Vincent Koc
b2c85bc0a2 perf(browser): cache registration runtime import 2026-05-30 13:00:30 +02:00
Vincent Koc
8348af99e8 fix(ci): clear stale changed-check failures 2026-05-30 09:55:59 +01:00
Peter Steinberger
069ea7942d fix(browser): cap proxy request timeouts 2026-05-30 04:39:55 -04:00
Peter Steinberger
5d75f64369 fix(browser): cap cdp reachability timeouts 2026-05-30 04:36:23 -04:00
Peter Steinberger
915f88a0a3 fix(browser): centralize route timeout clamping 2026-05-30 03:59:45 -04:00
Peter Steinberger
cec50aa047 fix(browser): cap act action timeouts 2026-05-30 03:52:29 -04:00
Peter Steinberger
aae0d54752 fix(browser): cap Chrome MCP navigation timeout grace 2026-05-30 03:41:53 -04:00
Peter Steinberger
cd07d013ba chore(release): bump version to 2026.5.30 2026-05-30 06:49:13 +01:00
Peter Steinberger
040f14b641 fix(browser): cap node runtime timeouts 2026-05-29 17:07:33 -04:00
Peter Steinberger
5230a23202 fix(browser): cap control fetch timeouts 2026-05-29 17:04:43 -04:00
Peter Steinberger
61031d1b1c feat(workboard): add agent coordination tools
Summary:
- Add Workboard agent coordination tools for list/read/claim/heartbeat/release/comment/proof/unblock flows.
- Store artifacts, claims, diagnostics, and notifications in the Workboard SQLite-backed plugin state; surface the new metadata through Gateway, Control UI, docs, and plugin manifest contracts.
- Add scoped claim authorization, token redaction, stale diagnostic cleanup, atomic proof artifact writes, and generated i18n metadata.

Verification:
- pnpm test ui/src/i18n/test/translate.test.ts extensions/browser/src/cli/browser-cli-actions-input/register.element.test.ts extensions/workboard/src/store.test.ts extensions/workboard/src/gateway.test.ts extensions/workboard/src/tools.test.ts ui/src/ui/controllers/workboard.test.ts ui/src/ui/views/workboard.test.ts
- pnpm ui:i18n:check
- env -u OPENCLAW_TESTBOX pnpm check:changed
- autoreview --mode local: clean
- PR CI passed; Windows checkout failure rerun passed on attempt 2
2026-05-29 20:23:21 +02:00
Peter Steinberger
615199a6a4 fix(browser): centralize cli index parsing 2026-05-29 07:29:52 -04:00
Peter Steinberger
182d60535a test: fix main test type checks 2026-05-29 11:21:42 +01:00
Peter Steinberger
9f28e8c5f4 fix(browser): centralize cli integer option parsing 2026-05-29 06:05:01 -04:00
Vincent Koc
966c274f20 refactor: share browser snapshot helpers 2026-05-29 11:11:46 +02:00
Vincent Koc
bee163bf37 refactor: share chrome cdp websocket diagnostics 2026-05-29 10:57:12 +02:00
Vincent Koc
850f7c24d4 refactor: share browser basic route helpers 2026-05-29 10:45:55 +02:00
Vincent Koc
628104662b refactor: share browser client request helpers 2026-05-29 10:26:44 +02:00
Vincent Koc
a4bb9b1438 refactor: share browser debug route responses 2026-05-29 10:14:18 +02:00
Vincent Koc
3e050d05e8 refactor: share session tab registry helpers 2026-05-29 10:02:18 +02:00
Peter Steinberger
24614ac100 refactor(browser): centralize route numeric readers 2026-05-29 03:59:19 -04:00
Vincent Koc
6c309b9883 refactor: share browser route navigation policy 2026-05-29 09:52:12 +02:00
Peter Steinberger
2ea8d88d63 fix(browser): validate cookie expiry values 2026-05-29 03:50:19 -04:00
Peter Steinberger
ac52499aca fix(browser): validate screenshot timeout 2026-05-29 03:46:53 -04:00
Peter Steinberger
c48a4a3188 fix(browser): validate geolocation options 2026-05-29 03:43:06 -04:00
Peter Steinberger
c7f50738c0 fix(browser): validate permission grant timeout 2026-05-29 03:34:06 -04:00
Peter Steinberger
dca86d47e0 fix(browser): validate hook download timeouts 2026-05-29 03:30:46 -04:00
Peter Steinberger
854cb9292d fix(browser): validate response body numeric options 2026-05-29 03:27:34 -04:00
Peter Steinberger
0b24f47465 fix(browser): tighten act numeric parsing 2026-05-29 03:23:42 -04:00
Peter Steinberger
4fae13e29e fix(browser): centralize snapshot numeric parsing 2026-05-29 03:15:56 -04:00
Peter Steinberger
0bc591a7d7 fix(browser): reject invalid tab indexes 2026-05-29 03:07:15 -04:00
Peter Steinberger
286883cc54 fix(browser): cap route timer delays 2026-05-29 03:03:07 -04:00
Peter Steinberger
b0730944eb fix(browser): cap cli request timeouts 2026-05-29 02:50:51 -04:00
Vincent Koc
611adb2ee0 test(browser): align loopback auth mock types 2026-05-29 08:01:21 +02:00
Vincent Koc
2e042fbca8 fix(browser): reject excessive viewport resizes 2026-05-29 07:51:27 +02:00
Peter Steinberger
621db8f0b1 fix(browser): reject explicit zero cdp ports 2026-05-29 01:43:05 -04:00
Peter Steinberger
59cec74d89 fix(browser): clamp non-finite viewport dimensions 2026-05-29 00:46:07 -04:00
Peter Steinberger
1e48ca4e32 fix(browser): default non-finite chrome mcp click delays 2026-05-29 00:42:37 -04:00
Peter Steinberger
4638f58615 fix(browser): default non-finite keypress delays 2026-05-29 00:38:45 -04:00
Peter Steinberger
c7144a8689 fix(browser): default non-finite DOM text budgets 2026-05-29 00:35:43 -04:00
Peter Steinberger
4dd3ba149c fix(browser): default non-finite snapshot limits 2026-05-29 00:32:35 -04:00
Peter Steinberger
adabff1bf0 fix(browser): centralize non-finite tool timeouts 2026-05-29 00:04:04 -04:00
Peter Steinberger
dac13d9a69 fix(browser): default non-finite navigation timeouts 2026-05-29 00:00:44 -04:00
Peter Steinberger
3c8ad8cbaa fix(browser): default non-finite fetch timeouts 2026-05-28 23:52:40 -04:00
Peter Steinberger
b2bdad5bee fix(browser): default non-finite snapshot timeouts 2026-05-28 23:48:33 -04:00
Peter Steinberger
18ef59bb33 fix(browser): default non-finite dialog arm timeouts 2026-05-28 23:44:42 -04:00