Commit Graph

25 Commits

Author SHA1 Message Date
scotthuang
7920af0c9e refactor: route browser screenshot vision through shared media understanding
* feat(browser): add optional vision understanding to screenshot tool

* fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles

* fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support

* feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage

* style(browser): add curly braces to satisfy eslint curly rule

* fix(browser): correct tools.browser.enabled help text to match actual behavior

* fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision

* refactor(browser): move vision config from tools.browser to browser.models

The browser plugin's vision configuration now lives on the top-level
`browser` config namespace (browser.models, browser.visionEnabled,
browser.visionPrompt, etc.) instead of `tools.browser`. This aligns
with the plugin's existing config location and avoids confusion between
tool-level and plugin-level settings.

- Remove tools.browser from ToolsSchema and ToolsConfig
- Add models/vision* fields to BrowserConfig and its zod schema
- Update getBrowserVisionConfig to read from cfg.browser
- Update schema help, labels, and quality test
- Update vision.test.ts to use new config shape

* docs(browser): add screenshot vision configuration section

Document the new browser.models config for automatic screenshot
description via vision models, enabling text-only main models to
reason about web page content.

* fix(browser): remove deliverable media markers from vision result, drop unused import

P1: Vision-success path no longer exposes the raw screenshot as
deliverable media (removes MEDIA: line and details.media.mediaUrl).
This prevents channel delivery from auto-sending sensitive page content
when the intended output is a text description.

P2: Remove unused ToolsMediaUnderstandingSchema import that would fail
noUnusedLocals typecheck.

* fix(browser): add command/args fields to browser models schema

The browser vision model schema uses .strict(), so CLI-type entries
with command/args were rejected by TypeScript. Add these fields to
align with MediaUnderstandingModelSchema.

* chore(browser): remove debug console.log statements

* fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback

ClawSweeper #84247 review round 2:

P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions
before wrapping with wrapExternalContent. The agent media extractor scans every
browser tool-result text block via splitMediaFromOutput which treats line-start
MEDIA: as a trusted local-media delivery directive, and browser is on the
trusted-media allowlist. Without neutralization, page or vision-provider output
containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media
artifact from untrusted content. wrapExternalContent itself does not strip
line-start directives. Introduce neutralizeMediaDirectives in vision.ts that
prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA:
(case-insensitive), defanging the parser anchor while keeping the original
text human-readable.

P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile
in the vision-failure catch fallback. The non-vision screenshot path already
forwards this option (d5cc0d53b7) so configured agents.defaults.imageMaxDimensionPx
takes effect. Without this fix, any provider timeout/error silently bypasses the
sanitization guard and returns a raw full-resolution screenshot.

Regression coverage:
- vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path,
  mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged,
  case-insensitive, multiple directives per blob).
- browser-tool.test.ts: 2 integration cases that drive the full screenshot
  tool execute path:
    - 'neutralizes MEDIA: directives in vision text and does not attach media'
      asserts no line matches /^\s*MEDIA:/i in returned text, secret path text
      is preserved verbatim, details.media is absent, and imageResultFromFile
      is not called on the success path.
    - 'preserves screenshot image sanitization on vision failure fallback'
      mocks describeImageFileWithModel to reject and asserts the fallback
      imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600}
      plus the 'browser screenshot vision failed' extraText.

* fix(browser): apply clawsweeper fallback media fix from PR #84247

* refactor: reuse media image understanding for browser screenshots

* refactor: use structured media delivery

* test: update music completion media instruction expectation

* fix: trim buffered reply directive padding

* test: refresh codex prompt snapshots for message media aliases

---------

Co-authored-by: scotthuang <scotthuang@tencent.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-31 00:00:19 +01:00
Peter Steinberger
5a10326612 test: dedupe node media log mock read 2026-05-13 06:17:56 +01:00
Peter Steinberger
14042cff90 test: guard nodes media mock calls 2026-05-12 07:56:48 +01:00
Shakker
b62750ac7e test: check node media payloads 2026-05-12 05:44:51 +01:00
Peter Steinberger
bf17d01a1d test: clear nodes media broad matchers 2026-05-10 18:20:51 +01:00
Peter Steinberger
5377e952ca test: simplify nodes media path collection 2026-05-09 00:52:06 +01:00
Peter Steinberger
378cfe2da2 test: clarify cli media error assertion 2026-05-08 11:08:58 +01:00
Peter Steinberger
9ef37d1907 test: tighten assertions and harness coverage 2026-05-08 05:28:12 +01:00
Peter Steinberger
330ba1fa31 refactor: move canvas to plugin surfaces 2026-05-07 09:07:18 +01:00
Peter Steinberger
9470b616c9 refactor: remove redundant camera CLI conversions 2026-04-10 21:53:47 +01:00
Peter Steinberger
3bf19d6f40 fix(security): fail-close node camera URL downloads 2026-03-02 23:23:39 +00:00
Peter Steinberger
b1c30f0ba9 refactor: dedupe cli config cron and install flows 2026-03-02 19:57:33 +00:00
Peter Steinberger
a13586619b test: move integration-heavy suites to e2e lane 2026-03-02 05:33:07 +00:00
Peter Steinberger
7fdf54f078 test: move cli local suites out of e2e 2026-02-22 11:30:29 +00:00
Peter Steinberger
d6ad647f56 test(cli): share nodes ios fixture helpers 2026-02-22 07:44:56 +00:00
Peter Steinberger
a1cb700a05 test: dedupe and optimize test suites 2026-02-19 15:19:38 +00:00
cpojer
048e29ea35 chore: Fix types in tests 45/N. 2026-02-17 15:50:07 +09:00
Peter Steinberger
f717a13039 refactor(agent): dedupe harness and command workflows 2026-02-16 14:59:30 +00:00
Peter Steinberger
6c0dca30b8 fix: accept auth code in chutes oauth manual flow 2026-02-15 02:53:39 +00:00
Peter Steinberger
4de879a6c5 fix(test): avoid base-to-string in nodes-media e2e logs 2026-02-15 00:26:46 +00:00
Peter Steinberger
07fbf46091 fix(test): avoid vitest mock type inference issues 2026-02-15 01:06:02 +01:00
Peter Steinberger
86e4cc56b9 refactor(test): reuse base CLI program mocks 2026-02-14 23:51:42 +00:00
Peter Steinberger
55a25f9875 refactor(test): reuse nodes media gateway mock 2026-02-14 22:43:59 +00:00
Ahmad Bitar
c179f71f42 feat: Android companion app improvements & gateway URL camera payloads (#13541)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 9c179c9c31
Co-authored-by: smartprogrammer93 <33181301+smartprogrammer93@users.noreply.github.com>
Co-authored-by: steipete <58493+steipete@users.noreply.github.com>
Reviewed-by: @steipete
2026-02-13 16:49:28 +01:00
Peter Steinberger
9131b22a28 test: migrate suites to e2e coverage layout 2026-02-13 14:28:22 +00:00