openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-06-03 20:04:05 +00:00

Author	SHA1	Message	Date
scotthuang	7920af0c9e	refactor: route browser screenshot vision through shared media understanding * feat(browser): add optional vision understanding to screenshot tool * fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles * fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support * feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage * style(browser): add curly braces to satisfy eslint curly rule * fix(browser): correct tools.browser.enabled help text to match actual behavior * fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision * refactor(browser): move vision config from tools.browser to browser.models The browser plugin's vision configuration now lives on the top-level `browser` config namespace (browser.models, browser.visionEnabled, browser.visionPrompt, etc.) instead of `tools.browser`. This aligns with the plugin's existing config location and avoids confusion between tool-level and plugin-level settings. - Remove tools.browser from ToolsSchema and ToolsConfig - Add models/vision* fields to BrowserConfig and its zod schema - Update getBrowserVisionConfig to read from cfg.browser - Update schema help, labels, and quality test - Update vision.test.ts to use new config shape * docs(browser): add screenshot vision configuration section Document the new browser.models config for automatic screenshot description via vision models, enabling text-only main models to reason about web page content. * fix(browser): remove deliverable media markers from vision result, drop unused import P1: Vision-success path no longer exposes the raw screenshot as deliverable media (removes MEDIA: line and details.media.mediaUrl). This prevents channel delivery from auto-sending sensitive page content when the intended output is a text description. P2: Remove unused ToolsMediaUnderstandingSchema import that would fail noUnusedLocals typecheck. * fix(browser): add command/args fields to browser models schema The browser vision model schema uses .strict(), so CLI-type entries with command/args were rejected by TypeScript. Add these fields to align with MediaUnderstandingModelSchema. * chore(browser): remove debug console.log statements * fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback ClawSweeper #84247 review round 2: P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions before wrapping with wrapExternalContent. The agent media extractor scans every browser tool-result text block via splitMediaFromOutput which treats line-start MEDIA: as a trusted local-media delivery directive, and browser is on the trusted-media allowlist. Without neutralization, page or vision-provider output containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media artifact from untrusted content. wrapExternalContent itself does not strip line-start directives. Introduce neutralizeMediaDirectives in vision.ts that prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA: (case-insensitive), defanging the parser anchor while keeping the original text human-readable. P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile in the vision-failure catch fallback. The non-vision screenshot path already forwards this option (`d5cc0d53b7`) so configured agents.defaults.imageMaxDimensionPx takes effect. Without this fix, any provider timeout/error silently bypasses the sanitization guard and returns a raw full-resolution screenshot. Regression coverage: - vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path, mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged, case-insensitive, multiple directives per blob). - browser-tool.test.ts: 2 integration cases that drive the full screenshot tool execute path: - 'neutralizes MEDIA: directives in vision text and does not attach media' asserts no line matches /^\sMEDIA:/i in returned text, secret path text is preserved verbatim, details.media is absent, and imageResultFromFile is not called on the success path. - 'preserves screenshot image sanitization on vision failure fallback' mocks describeImageFileWithModel to reject and asserts the fallback imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600} plus the 'browser screenshot vision failed' extraText. fix(browser): apply clawsweeper fallback media fix from PR #84247 * refactor: reuse media image understanding for browser screenshots * refactor: use structured media delivery * test: update music completion media instruction expectation * fix: trim buffered reply directive padding * test: refresh codex prompt snapshots for message media aliases --------- Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>	2026-05-31 00:00:19 +01:00
Zee Zheng	8be581cbf8	fix(browser): allow inbound media uploads Allow the browser upload tool to resolve OpenClaw-managed inbound media refs such as `media://inbound/<id>` and sandbox-relative `media/inbound/<id>` while preserving the existing upload-root path contract. Keep upload-root files ahead of sandbox-relative inbound fallback, reject nested absolute inbound media files, and validate raw `media://` paths before URL normalization so traversal-shaped refs cannot resolve to direct media ids. Verification: - `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=verbose` - `OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/browser/src/browser/paths.test.ts --reporter=dot` - `OPENCLAW_HEAVY_CHECK_LOCK_SCOPE=worktree node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo` - `pnpm lint --threads=8` - `.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main` - `git diff --check` - GitHub PR checks on `be08e6c8a8`: dependency-guard, check-lint, check-test-types, check-additional-extension-bundled, checks-fast-contracts-plugins-a, checks-fast-contracts-plugins-b all passed. Fixes #83544. Co-authored-by: Zee Zheng <zheng.zuo0@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 23:49:07 +01:00
clawsweeper[bot]	fa814eb9ed	feat(browser): add evaluate timeout CLI option (#83696 ) Summary: - The branch adds `openclaw browser evaluate --timeout-ms`, forwards it to the evaluate body and request timeo ... ents and tests it, adds a changelog entry, and includes a config.patch no-op shortcut from the repair pass. - Reproducibility: not applicable. this is a feature PR rather than a bug report. Source inspection shows current main lacks the CLI flag while the branch wires it into an already-supported evaluate `timeoutMs` payload. Automerge notes: - PR branch already contained follow-up commit before automerge: feat(browser): add evaluate timeout CLI option Validation: - ClawSweeper review passed for head `0d81d3d93e`. - Required merge gates passed before the squash merge. Prepared head SHA: `0d81d3d93e` Review: https://github.com/openclaw/openclaw/pull/83696#issuecomment-4479900502 Co-authored-by: fred <fengruifree@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>	2026-05-18 17:30:33 +00:00
Peter Steinberger	7afac6015f	feat(browser): surface observed dialogs (#83099 )	2026-05-18 00:05:29 +01:00
Ayaan Zaidi	4a009612c9	fix(docker): prune with source workspace policy	2026-05-11 10:50:30 +05:30
Ayaan Zaidi	d40e062800	docs(browser): note Docker Chromium autodetect	2026-05-10 11:37:37 +05:30
Vincent Koc	ae9f779e5f	docs: typography hygiene + 1 in-body H1 removal across 6 pages Replaced 84 typography characters (curly quotes, apostrophes, em/en dashes, non-breaking hyphens) with ASCII equivalents per docs/CLAUDE.md heading and content hygiene rules. - docs/gateway/tools-invoke-http-api.md: 14 chars; removed the duplicate '# Tools Invoke (HTTP)' H1 (Mintlify renders title from frontmatter; the in-body H1 with parens produced a brittle anchor). - docs/tools/browser-control.md: 14 chars - docs/security/formal-verification.md: 14 chars - docs/gateway/configuration-reference.md: 14 chars - docs/concepts/agent.md: 14 chars - docs/channels/qa-channel.md: 14 chars	2026-05-05 20:26:16 -07:00
Peter Steinberger	ed8f50f240	refactor: simplify plugin dependency handling Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths. Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.	2026-05-01 21:32:22 +01:00
Peter Steinberger	1f7b7c249a	fix(google-meet): grant browser media permissions	2026-04-27 14:54:07 +01:00
Peter Steinberger	ed1ac2fc44	feat(browser): add CDP role snapshot fallback	2026-04-26 04:40:26 +01:00
Peter Steinberger	e6ee4d6e68	fix(browser): preserve tabs across target swaps	2026-04-26 01:21:59 +01:00
Peter Steinberger	8cbb62d93c	docs(browser): document headless start override	2026-04-25 11:42:04 +01:00
Peter Steinberger	19017bad96	docs(browser): explain actionable aria snapshot refs	2026-04-25 09:51:34 +01:00
Peter Steinberger	209d50b52c	feat(browser): add coordinate click action Co-authored-by: Daniel Lutts <daniellutts@10-19-94-204.dynapool.wireless.nyu.edu>	2026-04-25 07:31:33 +01:00
Peter Steinberger	60e7b692cc	docs(browser): document inspection diagnostics	2026-04-25 00:56:35 +01:00
Vincent Koc	743b69d307	docs(tools): split browser docs by extracting control API and CLI reference	2026-04-23 19:34:50 -07:00

16 Commits