Allow media understanding providers to opt into synthetic non-secret auth for local or self-hosted no-auth audio/video execution.
This preserves configured env/profile/literal provider credentials first, keeps explicit profile failures hard-fail, and leaves unmarked remote providers fail-closed.
Fixes#74644.
Extract shared normalization/coercion helpers into private @openclaw/normalization-core workspace package while preserving existing plugin SDK helper subpaths.\n\nAlso keeps direct normalization-core imports internal, wires UI/build/loader resolution, and replaces the slow PR network CodeQL lane with a fast added-line boundary scan while retaining full CodeQL for scheduled/manual runs.\n\nVerification: local moved tests, plugin SDK boundary tests, extension loader tests, agents-support shard, UI build/test, build artifacts, lint, workflow guards, autoreview, and GitHub CI passed on PR head 963d893715.
* feat(browser): add optional vision understanding to screenshot tool
* fix(browser): wrap vision output as external content, enforce maxBytes, forward auth profiles
* fix(browser): remove no-op scope/attachments config, drop profile pass-through lacking runtime support
* feat(media-understanding): add profile/preferredProfile to DescribeImageFileWithModelParams and forward to describeImage
* style(browser): add curly braces to satisfy eslint curly rule
* fix(browser): correct tools.browser.enabled help text to match actual behavior
* fix(browser): thread agentDir/workspaceDir from plugin tool context into browser vision
* refactor(browser): move vision config from tools.browser to browser.models
The browser plugin's vision configuration now lives on the top-level
`browser` config namespace (browser.models, browser.visionEnabled,
browser.visionPrompt, etc.) instead of `tools.browser`. This aligns
with the plugin's existing config location and avoids confusion between
tool-level and plugin-level settings.
- Remove tools.browser from ToolsSchema and ToolsConfig
- Add models/vision* fields to BrowserConfig and its zod schema
- Update getBrowserVisionConfig to read from cfg.browser
- Update schema help, labels, and quality test
- Update vision.test.ts to use new config shape
* docs(browser): add screenshot vision configuration section
Document the new browser.models config for automatic screenshot
description via vision models, enabling text-only main models to
reason about web page content.
* fix(browser): remove deliverable media markers from vision result, drop unused import
P1: Vision-success path no longer exposes the raw screenshot as
deliverable media (removes MEDIA: line and details.media.mediaUrl).
This prevents channel delivery from auto-sending sensitive page content
when the intended output is a text description.
P2: Remove unused ToolsMediaUnderstandingSchema import that would fail
noUnusedLocals typecheck.
* fix(browser): add command/args fields to browser models schema
The browser vision model schema uses .strict(), so CLI-type entries
with command/args were rejected by TypeScript. Add these fields to
align with MediaUnderstandingModelSchema.
* chore(browser): remove debug console.log statements
* fix(browser): harden screenshot vision result against MEDIA: directive injection and restore image sanitization on failure fallback
ClawSweeper #84247 review round 2:
P1 (security, high): neutralize line-start MEDIA: directives in vision descriptions
before wrapping with wrapExternalContent. The agent media extractor scans every
browser tool-result text block via splitMediaFromOutput which treats line-start
MEDIA: as a trusted local-media delivery directive, and browser is on the
trusted-media allowlist. Without neutralization, page or vision-provider output
containing 'MEDIA:/tmp/secret.png' could synthesize a channel-deliverable media
artifact from untrusted content. wrapExternalContent itself does not strip
line-start directives. Introduce neutralizeMediaDirectives in vision.ts that
prepends '[neutralized] ' to any line whose trimStart() begins with MEDIA:
(case-insensitive), defanging the parser anchor while keeping the original
text human-readable.
P2 (compatibility): pass resolveRuntimeImageSanitization() to imageResultFromFile
in the vision-failure catch fallback. The non-vision screenshot path already
forwards this option (d5cc0d53b7) so configured agents.defaults.imageMaxDimensionPx
takes effect. Without this fix, any provider timeout/error silently bypasses the
sanitization guard and returns a raw full-resolution screenshot.
Regression coverage:
- vision.test.ts: 6 unit cases for neutralizeMediaDirectives (no-op fast path,
mid-line MEDIA: untouched, line-start defanged, leading-whitespace defanged,
case-insensitive, multiple directives per blob).
- browser-tool.test.ts: 2 integration cases that drive the full screenshot
tool execute path:
- 'neutralizes MEDIA: directives in vision text and does not attach media'
asserts no line matches /^\s*MEDIA:/i in returned text, secret path text
is preserved verbatim, details.media is absent, and imageResultFromFile
is not called on the success path.
- 'preserves screenshot image sanitization on vision failure fallback'
mocks describeImageFileWithModel to reject and asserts the fallback
imageResultFromFile call receives imageSanitization: {maxDimensionPx:1600}
plus the 'browser screenshot vision failed' extraText.
* fix(browser): apply clawsweeper fallback media fix from PR #84247
* refactor: reuse media image understanding for browser screenshots
* refactor: use structured media delivery
* test: update music completion media instruction expectation
* fix: trim buffered reply directive padding
* test: refresh codex prompt snapshots for message media aliases
---------
Co-authored-by: scotthuang <scotthuang@tencent.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Route internal model catalog imports to the extracted @openclaw/model-catalog-core package and delete obsolete internal facades.
Keep public SDK declarations self-contained by wrapping core helpers at public boundaries instead of leaking private package imports.
Verification:
- pnpm test src/plugins/contracts/model-catalog-core-imports.test.ts src/plugins/sdk-alias.test.ts packages/model-catalog-core/src/configured-model-refs.test.ts packages/model-catalog-core/src/provider-model-id-normalize.test.ts packages/model-catalog-core/src/provider-model-id-normalization.test.ts src/config/config.model-ref-validation.test.ts src/agents/model-selection.test.ts src/plugin-sdk/provider-model-shared.test.ts -- --reporter=verbose
- pnpm check:test-types
- pnpm test:extensions:package-boundary:compile
- pnpm build
- rg "@openclaw/model-catalog-core" dist/plugin-sdk packages/plugin-sdk/dist -n --glob '*.d.ts' || true
- git diff --check
- autoreview clean after fix
CI note: merged with admin override because checks-node-agentic-commands-doctor and checks-node-core-runtime-infra-state failed twice with exit 143/no-output watchdog termination after prior passing test output, while relevant local proof and the rest of CI were green.
Normalize Google Gemini 3.1 Flash Lite routing to the GA model id and keep the retired preview spelling as a compatibility alias. Align default alias docs, FAQ guidance, and deprecated-model manifest recommendations with the GA id.
Fixes#86151.
Co-authored-by: Sebastien Tardif <sebtardif@ncf.ca>
Summary:
- The PR adds HEIC/HEIF-to-JPEG normalization before media-understanding image description providers run, with regression tests and a changelog entry.
- PR surface: Source +58, Tests +82, Docs +1. Total +141 across 6 files.
- Reproducibility: yes. at source level: current main forwards HEIC buffers to `describeImage` without normali ... ody includes a red HEIC regression test before the patch. I did not execute tests in this read-only review.
Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(media-understanding): normalize HEIC before image descriptions
Validation:
- ClawSweeper review passed for head ed34620bd7.
- Required merge gates passed before the squash merge.
Prepared head SHA: ed34620bd7
Review: https://github.com/openclaw/openclaw/pull/86037#issuecomment-4528578874
Co-authored-by: luoyanglang <hanwanlonga@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Restore the describeImageWithModel default token budget to the helper-level 4096-token default instead of forcing 512 before resolution.
Add regression coverage for the default and for smaller model caps, and record the user-facing fix in the changelog.
Co-authored-by: scotthuang <scotthuang@tencent.com>
Restores WebChat image uploads to the media-understanding flow without one-turn model overrides.
- removes image-model override plumbing from the reply run
- stages WebChat images as MediaPaths for enrichment
- avoids replaying already-understood images to text-only reply models while preserving undescribed images
Co-authored-by: NianJiuZst <3235467914@qq.com>
Summary:
- The PR changes sherpa-onnx CLI audio parsing so structured JSON with an empty `text` field becomes no transcript, while preserving non-empty JSON extraction and adding direct plus auto-detect regression coverage.
- Reproducibility: yes. Source inspection on current main shows empty sherpa structured JSON misses extraction ... scord voice can skip empty transcripts; I did not run a live Discord reproduction in this read-only review.
Automerge notes:
- PR branch already contained follow-up commit before automerge: Fix stale CI guardrails for sherpa transcript PR
- PR branch already contained follow-up commit before automerge: Skip empty sherpa structured transcripts
Validation:
- ClawSweeper review passed for head ac03171cfc.
- Required merge gates passed before the squash merge.
Prepared head SHA: ac03171cfc
Review: https://github.com/openclaw/openclaw/pull/84667#issuecomment-4501484167
Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Fixes Copilot image understanding by exchanging OAuth tokens for Copilot API tokens, routing Copilot Gemini image requests through Chat Completions, and sending the prompt in user content with Copilot vision headers.
Real behavior proof:
- Old Responses route with real Copilot key reproduced `400 model gemini-3.1-pro-preview does not support Responses API`.
- Fixed route with the same real Copilot key returned `Cat`.
- Final CLI live smoke returned `ok: true` and `text: Cat` for `github-copilot/gemini-3.1-pro-preview`.
Verification:
- pnpm test src/media-understanding/image.test.ts extensions/github-copilot/models.test.ts extensions/github-copilot/stream.test.ts src/agents/pi-hooks/compaction-safeguard.test.ts -- --reporter=verbose
- pnpm check:changed via Blacksmith Testbox tbx_01krgt56pqmft8txekt017wke6, Actions run https://github.com/openclaw/openclaw/actions/runs/25803926150, exit 0.
Refs #80393, #80442.
Co-authored-by: Yang Haoyu <150496764+afunnyhy@users.noreply.github.com>