openclaw

mirror of https://github.com/openclaw/openclaw.git synced 2026-06-03 19:44:06 +00:00

Author	SHA1	Message	Date
scotthuang	7920af0c9e	refactor: route browser screenshot vision through shared media understanding * feat(browser): add optional vision understanding to screenshot tool * fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles * fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support * feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage * style(browser): add curly braces to satisfy eslint curly rule * fix(browser): correct tools.browser.enabled help text to match actual behavior * fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision * refactor(browser): move vision config from tools.browser to browser.models The browser plugin's vision configuration now lives on the top-level `browser` config namespace (browser.models, browser.visionEnabled, browser.visionPrompt, etc.) instead of `tools.browser`. This aligns with the plugin's existing config location and avoids confusion between tool-level and plugin-level settings. - Remove tools.browser from ToolsSchema and ToolsConfig - Add models/vision* fields to BrowserConfig and its zod schema - Update getBrowserVisionConfig to read from cfg.browser - Update schema help, labels, and quality test - Update vision.test.ts to use new config shape * docs(browser): add screenshot vision configuration section Document the new browser.models config for automatic screenshot description via vision models, enabling text-only main models to reason about web page content. * fix(browser): remove deliverable media markers from vision result, drop unused import P1: Vision-success path no longer exposes the raw screenshot as deliverable media (removes MEDIA: line and details.media.mediaUrl). This prevents channel delivery from auto-sending sensitive page content when the intended output is a text description. P2: Remove unused ToolsMediaUnderstandingSchema import that would fail noUnusedLocals typecheck. * fix(browser): add command/args fields to browser models schema The browser vision model schema uses .strict(), so CLI-type entries with command/args were rejected by TypeScript. Add these fields to align with MediaUnderstandingModelSchema. * chore(browser): remove debug console.log statements * fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback ClawSweeper #84247 review round 2: P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions before wrapping with wrapExternalContent. The agent media extractor scans every browser tool-result text block via splitMediaFromOutput which treats line-start MEDIA: as a trusted local-media delivery directive, and browser is on the trusted-media allowlist. Without neutralization, page or vision-provider output containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media artifact from untrusted content. wrapExternalContent itself does not strip line-start directives. Introduce neutralizeMediaDirectives in vision.ts that prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA: (case-insensitive), defanging the parser anchor while keeping the original text human-readable. P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile in the vision-failure catch fallback. The non-vision screenshot path already forwards this option (`d5cc0d53b7`) so configured agents.defaults.imageMaxDimensionPx takes effect. Without this fix, any provider timeout/error silently bypasses the sanitization guard and returns a raw full-resolution screenshot. Regression coverage: - vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path, mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged, case-insensitive, multiple directives per blob). - browser-tool.test.ts: 2 integration cases that drive the full screenshot tool execute path: - 'neutralizes MEDIA: directives in vision text and does not attach media' asserts no line matches /^\sMEDIA:/i in returned text, secret path text is preserved verbatim, details.media is absent, and imageResultFromFile is not called on the success path. - 'preserves screenshot image sanitization on vision failure fallback' mocks describeImageFileWithModel to reject and asserts the fallback imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600} plus the 'browser screenshot vision failed' extraText. fix(browser): apply clawsweeper fallback media fix from PR #84247 * refactor: reuse media image understanding for browser screenshots * refactor: use structured media delivery * test: update music completion media instruction expectation * fix: trim buffered reply directive padding * test: refresh codex prompt snapshots for message media aliases --------- Co-authored-by: scotthuang <scotthuang@tencent.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>	2026-05-31 00:00:19 +01:00
Aman113114-IITD	c876fecbe7	docs: clarify media directive formatting Summary: - Document that MEDIA directives must be plain-text line-start metadata. Verification: - Source check: src/media/parse.ts only recognizes lines whose trimmed start begins with MEDIA: and skips fenced code blocks. - PR CI: check-docs succeeded.	2026-05-22 17:59:01 +01:00
Peter Steinberger	3312ce5acb	fix: support home-relative media paths	2026-05-02 22:23:45 +01:00
Bartok	f0b327cf68	fix(media): gate markdown image extraction by channel (#72718 ) Closes #72642 Co-authored-by: Peter Steinberger <steipete@gmail.com>	2026-04-27 11:27:35 +01:00
pashpashpash	5404bbbb71	Avoid duplicate generated media attachments Generated media can be produced in intermediate tool results before the assistant chooses which assets to share in its final reply. This change keeps those intermediate files from being appended a second time when the final reply already names the assets to deliver, and tightens the media directive parsing around unsafe or ambiguous URLs.	2026-04-25 17:56:29 -07:00
Peter Steinberger	8e7d382c37	refactor(tts): clarify text media directives	2026-04-25 18:18:34 +01:00
Peter Steinberger	60f9358348	fix(tts): preserve legacy tool voice hints	2026-04-25 17:56:37 +01:00
Vincent Koc	c948c63bbd	docs: unify casing and replace path-as-text links across recent doc surfaces Sweep recent (last ~5h) doc edits for two readability/uniformity issues: - Replace 42 path-as-text links of the form '[/foo/bar](/foo/bar)' with descriptive labels derived from each target page's frontmatter title (e.g. '[Anthropic]', '[Token use and costs]', '[OpenAI-compatible endpoints]'). Affected files include gateway/troubleshooting, concepts/oauth, reference/session-management-compaction, and reference/transcript-hygiene. - Sentence-case Title-Cased headings and link text in Related sections across codex-harness, model-providers, tools/plugin, sdk-runtime, sdk-setup, prompt-caching, ci, cli/config, google-meet, browser, rich-output-protocol, subagents, web/control-ui, while preserving brand and proper-noun capitalization (OpenAI, Codex, Chrome, Parallels, Z.AI, etc.).	2026-04-24 22:18:22 -07:00
Peter Steinberger	759fe0bf95	docs: cover reply media and voice-call fixes	2026-04-25 05:48:29 +01:00
Peter Steinberger	dbdf2863d6	docs: fix broken internal links	2026-04-24 04:13:20 +01:00
Vincent Koc	f0b6c65e3b	docs(install,reference): add Related sections to pages missing them	2026-04-23 20:07:25 -07:00
Vincent Koc	2777b089b5	docs: normalize frontmatter titles to sentence case	2026-04-23 13:15:17 -07:00
Vincent Koc	4a2cd533ac	docs: remove duplicate H1 where frontmatter title already sets it	2026-04-23 13:11:14 -07:00
Peter Steinberger	834fdc4832	docs: align documentation with current surfaces	2026-04-23 07:25:06 +01:00
Tak Hoffman	cc5c691f00	feat(ui): render assistant directives and add embed tag (#64104 ) * Add embed rendering for Control UI assistant output * Add changelog entry for embed rendering * Harden canvas path resolution and stage isolation * Secure assistant media route and preserve UI avatar override * Fix chat media and history regressions * Harden embed iframe URL handling * Fix embed follow-up review regressions * Restore offloaded chat attachment persistence * Harden hook and media routing * Fix embed review follow-ups * feat(ui): add configurable embed sandbox mode * fix(gateway): harden assistant media and auth rotation * fix(gateway): restore websocket pairing handshake flows * fix(gateway): restore ws hello policy details * Restore dropped control UI shell wiring * Fix control UI reconnect cleanup regressions * fix(gateway): restore media root and auth getter compatibility * feat(ui): rename public canvas tag to embed * fix(ui): address remaining media and gateway review issues * fix(ui): address remaining embed and attachment review findings * fix(ui): restore stop control and tool card inputs * fix(ui): address history and attachment review findings * fix(ui): restore prompt contribution wiring * fix(ui): address latest history and directive reviews * fix(ui): forward password auth for assistant media * fix(ui): suppress silent transcript tokens with media * feat(ui): add granular embed sandbox modes * fix(ui): preserve relative media directives in history * docs(ui): document embed sandbox modes * fix(gateway): restrict canvas history hoisting to tool entries * fix(gateway): tighten embed follow-up review fixes * fix(ci): repair merged branch type drift * fix(prompt): restore stable runtime prompt rendering * fix(ui): harden local attachment preview checks * fix(prompt): restore channel-aware approval guidance * fix(gateway): enforce auth rotation and media cleanup * feat(ui): gate external embed urls behind config * fix(ci): repair rebased branch drift * fix(ci): resolve remaining branch check failures	2026-04-11 07:32:53 -05:00

15 Commits