openclaw/docs/nodes/images.md at fb1dfd486bb9aca05055d88c51efe4fbc279a9fc

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-29 19:39:27 +00:00

Files

Peter Steinberger bb46b79d3c refactor: internalize OpenClaw agent runtime (#85341 )

* refactor: extract agent core package

Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts.

* refactor: extract shared llm runtime

Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout.

* refactor: remove pi runtime internals

Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code.

* refactor: tighten agent session runtime

Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts.

* refactor: remove static model and pi auth paths

Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities.

* refactor: remove legacy provider compat paths

* docs: remove agent parity notes

* fix: skip provider wildcard metadata parsing

* refactor: share session extension sdk loading

* refactor: inline acpx proxy error formatter

* refactor: fold edit recovery into edit tool

* fix: accept extension batch separator

* test: align startup provider plugin expectations

* fix: restore provider-scoped release discovery

* test: align static asset packaging expectations

* fix: run static provider catalogs during scoped discovery

* fix: add provider entry catalogs for scoped live discovery

* fix: load lightweight provider catalog entries

* fix: refresh provider-scoped plugin metadata

* fix: keep provider catalog entries on release live path

* fix: keep static manifest models in release live checks

* fix: harden release model discovery

* fix: reduce OpenAI live cache probe reasoning

* fix: disable OpenAI cache probe reasoning

* ci: extend OpenAI gateway live timeout

* fix: extend live gateway model budget

* fix: stabilize release validation regressions

* fix: honor provider aliases in model rows

* fix: stabilize release validation lanes

* fix: stabilize release memory qa

* ci: stabilize release validation lanes

* ci: prefer ipv4 for live docker node calls

* fix: restore shared tool-call stream wrapper

* ci: remove legacy pi test shard alias

* fix: clean up embedded agent test drift

* fix: stabilize runtime alias status

* fix: clean up embedded agent ci drift

* fix: restore release ci invariants

* fix: clean up post-rebase runtime drift

* fix: restore release ci checks

* fix: restore release ci after rebase

* fix: remove stale pi runtime path

* test: align compaction runtime expectations

* test: update plugin prerelease expectations

* fix: handle claude live tool approvals

* fix: stabilize release validation gates

* fix: finish agent runtime import

* test: finish post-rebase agent runtime mocks

* fix: keep codex compaction native

* fix: stabilize codex app-server hook tests

* test: isolate codex diagnostic active run

* test: remove codex diagnostic completion race

# Conflicts:
#	extensions/codex/src/app-server/run-attempt.test.ts

* ci: fix full release manifest performance run id

* refactor: narrow llm plugin sdk boundary

* chore: drop generated google boundary stamps

* fix: repair rebase fallout

* fix: clean up rebased runtime references

* fix: decode codex jwt payloads as base64url

* fix: preserve shipped pi runtime alias

* fix: add scoped sdk virtual modules

* fix: decode llm codex oauth jwt as base64url

* fix: avoid stale vertex adc negative cache

* fix: harden tool arg decoding and codeql path

* fix: keep vertex adc negative checks live

* refactor: consolidate codex jwt and edit helpers

* fix: await codex oauth node runtime imports

* fix: preserve sdk tool and notice contracts

* fix: preserve shipped compat config boundaries

* fix: align codex oauth callback host

* fix: terminate agent-core loop streams on failure

* fix: keep codex oauth callback alive during fallback

* ci: include session tools in critical codeql scans

* fix: keep Cloudflare Anthropic provider auth header

* docs: redirect legacy pi runtime pages

* fix: honor bundled web provider compat discovery

* fix: protect session output spill files

* fix: keep legacy agent dir env blocked

* fix: contain auto-discovered skill symlinks

* fix: harden agent core sdk proxy surfaces

* fix: restore approval reaction sdk compat

* fix: keep live docker runs bounded

* fix: keep codex oauth redirect host aligned

* fix: resolve post-rebase agent runtime drift

* fix: redact anthropic oauth parse failures

* fix: preserve responses strict tool shaping

* fix: repair agent runtime rebase cleanup

* docs: redirect retired parity pages

* fix: bound auto-discovered resources to roots

* fix: repair post-rebase agent test drift

* fix: preserve bundled provider allowlist migration

* fix: preserve manifest-owned provider aliases

* fix: declare photon image dependency

* fix: keep provider headers out of proxy body

* fix: preserve shipped env aliases

* fix: refresh control ui i18n generated state

* fix: quote read fallback paths

* fix: preview edits through configured backend

* test: satisfy core test typecheck

* fix: preserve ZAI usage auth fallback

* test: repair codex diagnostic test

* fix: repair agent runtime rebase drift

* test: finish embedded runner import rename

* fix: repair agent runtime rebase integrations

* test: align compaction oauth fallback expectations

* fix: allow sdk-auth session models

* fix: update doctor tool schema import

* fix: preserve bedrock plugin region

* fix: stream harmony-like prose immediately

* ci: include session runtime in codeql shards

* fix: repair latest rebase integrations

* fix: honor explicit codex websocket transport

* fix: keep openai-compatible credentials provider-scoped

* fix: refresh sdk api baseline after rebase

* fix: route cli runtime aliases through openclaw harness

* test: rename stale harness mock expectation

* test: rename embedded agent overflow calls

* test: clean embedded auth test wording

* test: use openclaw stream types in deepinfra cache test

* fix: refresh sdk api baseline on latest main

* fix: honor bundled discovery compat allowlists

* fix: refresh sdk api baseline after latest rebase

* fix: remove stale rebase imports

* test: rename stale model catalog mock

* test: mock renamed doctor runtime modules

* fix: map canonical kimi env auth

* fix: use internal model registry in bench script

* fix: migrate deepinfra provider catalog entry

* fix: enforce builtin tool suppression

* fix: route compaction auth and proxy payloads safely

* refactor: prune unused llm registry leftovers

* test: update codex hooks session import

* test: fix model picker ci coverage

* test: align model picker auth mock types

2026-05-27 19:24:04 +01:00

3.9 KiB

Raw Blame History

summary, read_when, title

summary

read_when

title

Image and media handling rules for send, gateway, and agent replies

Modifying media pipeline or attachments

Image and media support

The WhatsApp channel runs via Baileys Web. This document captures the current media handling rules for send, gateway, and agent replies.

Goals

Send media with optional captions via openclaw message send --media.
Allow auto-replies from the web inbox to include media alongside text.
Keep per-type limits sane and predictable.

CLI Surface

openclaw message send --media <path-or-url> [--message <caption>]
- --media optional; caption can be empty for media-only sends.
- --dry-run prints the resolved payload; --json emits { channel, to, messageId, mediaUrl, caption }.

WhatsApp Web channel behavior

Input: local file path or HTTP(S) URL.
Flow: load into a Buffer, detect media kind, and build the correct payload:
- Images: resize & recompress to JPEG (max side 2048px) targeting channels.whatsapp.mediaMaxMb (default: 50 MB).
- Audio/Voice/Video: pass-through up to 16 MB; audio is sent as a voice note (ptt: true).
- Documents: anything else, up to 100 MB, with filename preserved when available.
WhatsApp GIF-style playback: send an MP4 with gifPlayback: true (CLI: --gif-playback) so mobile clients loop inline.
MIME detection prefers magic bytes, then headers, then file extension.
Caption comes from --message or reply.text; empty caption is allowed.
Logging: non-verbose shows ↩️/✅; verbose includes size and source path/URL.

Auto-Reply Pipeline

getReplyFromConfig returns { text?, mediaUrl?, mediaUrls? }.
When media is present, the web sender resolves local paths or URLs using the same pipeline as openclaw message send.
Multiple media entries are sent sequentially if provided.

Inbound Media To Commands

When inbound web messages include media, OpenClaw downloads to a temp file and exposes templating variables:
- {{MediaUrl}} pseudo-URL for the inbound media.
- {{MediaPath}} local temp path written before running the command.
When a per-session Docker sandbox is enabled, inbound media is copied into the sandbox workspace and MediaPath/MediaUrl are rewritten to a relative path like media/inbound/<filename>.
Media understanding (if configured via tools.media.* or shared tools.media.models) runs before templating and can insert [Image], [Audio], and [Video] blocks into Body.
- Audio sets {{Transcript}} and uses the transcript for command parsing so slash commands still work.
- Video and image descriptions preserve any caption text for command parsing.
- If the active primary image model already supports vision natively, OpenClaw skips the [Image] summary block and passes the original image to the model instead.
By default only the first matching image/audio/video attachment is processed; set tools.media.<cap>.attachments to process multiple attachments.

Limits and errors

Outbound send caps (WhatsApp web send)

Images: up to channels.whatsapp.mediaMaxMb (default: 50 MB) after recompression.
Audio/voice/video: 16 MB cap; documents: 100 MB cap.
Oversize or unreadable media → clear error in logs and the reply is skipped.

Media understanding caps (transcription/description)

Image default: 10 MB (tools.media.image.maxBytes).
Audio default: 20 MB (tools.media.audio.maxBytes).
Video default: 50 MB (tools.media.video.maxBytes).
Oversize media skips understanding, but replies still go through with the original body.

Notes for Tests

Cover send + reply flows for image/audio/document cases.
Validate recompression for images (size bound) and voice-note flag for audio.
Ensure multi-media replies fan out as sequential sends.

3.9 KiB Raw Blame History Unescape Escape