diff --git a/CHANGELOG.md b/CHANGELOG.md index 73f0b04a807..b84c94e2870 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,7 @@ Docs: https://docs.openclaw.ai ### Changes - Outbound adapters/plugins: add shared `sendPayload` support across direct-text-media, Discord, Slack, WhatsApp, Zalo, and Zalouser with multi-media iteration and chunk-aware text fallback. (#30144) Thanks @nohat. +- Media understanding/audio echo: add optional `tools.media.audio.echoTranscript` + `echoFormat` to send a pre-agent transcript confirmation message to the originating chat, with echo disabled by default. (#32150) Thanks @AytuncYildizli. - Plugin runtime/STT: add `api.runtime.stt.transcribeAudioFile(...)` so extensions can transcribe local audio files through OpenClaw's configured media-understanding audio providers. (#22402) Thanks @benthecarman. - Sessions/Attachments: add inline file attachment support for `sessions_spawn` (subagent runtime only) with base64/utf8 encoding, transcript content redaction, lifecycle cleanup, and configurable limits via `tools.sessions_spawn.attachments`. (#16761) Thanks @napetrov. - Tools/PDF analysis: add a first-class `pdf` tool with native Anthropic and Google PDF provider support, extraction fallback for non-native models, configurable defaults (`agents.defaults.pdfModel`, `pdfMaxBytesMb`, `pdfMaxPages`), and docs/tests covering routing, validation, and registration. (#31319) Thanks @tyler6204. diff --git a/docs/nodes/media-understanding.md b/docs/nodes/media-understanding.md index a40921582b0..c04037a7147 100644 --- a/docs/nodes/media-understanding.md +++ b/docs/nodes/media-understanding.md @@ -40,6 +40,7 @@ If understanding fails or is disabled, **the reply flow continues** with the ori - defaults (`prompt`, `maxChars`, `maxBytes`, `timeoutSeconds`, `language`) - provider overrides (`baseUrl`, `headers`, `providerOptions`) - Deepgram audio options via `tools.media.audio.providerOptions.deepgram` + - audio transcript echo controls (`echoTranscript`, default `false`; `echoFormat`) - optional **per‑capability `models` list** (preferred before shared models) - `attachments` policy (`mode`, `maxAttachments`, `prefer`) - `scope` (optional gating by channel/chatType/session key) @@ -57,6 +58,8 @@ If understanding fails or is disabled, **the reply flow continues** with the ori }, audio: { /* optional overrides */ + echoTranscript: true, + echoFormat: '📝 "{transcript}"', }, video: { /* optional overrides */ diff --git a/src/media-understanding/apply.echo-transcript.test.ts b/src/media-understanding/apply.echo-transcript.test.ts index afda260e2f3..a088525ae46 100644 --- a/src/media-understanding/apply.echo-transcript.test.ts +++ b/src/media-understanding/apply.echo-transcript.test.ts @@ -68,7 +68,7 @@ let suiteTempMediaRootDir = ""; async function createTempAudioFile(): Promise { const dir = await fs.mkdtemp(path.join(suiteTempMediaRootDir, "case-")); const filePath = path.join(dir, "note.ogg"); - await fs.writeFile(filePath, Buffer.from([0, 255, 0, 1, 2, 3, 4, 5, 6, 7, 8])); + await fs.writeFile(filePath, Buffer.alloc(2048, 0xab)); return filePath; }