When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.
Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
(string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
resolves channel/to from ctx, guards on isDeliverableMessageChannel,
and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
covering disabled/enabled/custom-format/no-audio/failed-transcription/
non-deliverable-channel/missing-from/OriginatingTo/delivery-failure
Default echoFormat: '📝 "{transcript}"'
Closes#32102
Expose audio transcription through the PluginRuntime so external
plugins (e.g. marmot) can use openclaw's media-understanding provider
framework without importing unexported internal modules.
The new transcribeAudioFile() wraps runCapability({capability: "audio"})
and reads provider/model/apiKey from tools.media.audio in the config,
matching the pattern used by the Discord VC implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase test audio file sizes to meet MIN_AUDIO_FILE_BYTES (1024) threshold
introduced by the skip-empty-audio feature. Fix localPathRoots in skip-tiny-audio
tests so temp files pass path validation. Remove undefined loadApply() call
in apply.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a minimum file size guard (MIN_AUDIO_FILE_BYTES = 1024) before
sending audio to transcription APIs. Files below this threshold are
almost certainly empty or corrupt and would cause unhelpful errors
from Whisper/Deepgram/Groq providers.
Changes:
- Add 'tooSmall' skip reason to MediaUnderstandingSkipError
- Add MIN_AUDIO_FILE_BYTES constant (1024 bytes) to defaults
- Guard both provider and CLI audio paths in runner.ts
- Add comprehensive tests for tiny, empty, and valid audio files
- Update existing test fixtures to use audio files above threshold
runProviderEntry now calls resolveProxyFetchFromEnv() and passes the
result as fetchFn to transcribeAudio/describeVideo, so media provider
API calls respect HTTPS_PROXY/HTTP_PROXY behind corporate proxies.
Move makeProxyFetch to src/infra/net/proxy-fetch.ts and add
resolveProxyFetchFromEnv which reads standard proxy env vars
(HTTPS_PROXY, HTTP_PROXY, and lowercase variants) and returns a
proxy-aware fetch via undici's EnvHttpProxyAgent. Telegram re-exports
from the shared location to avoid duplication.
The openai provider implements transcribeAudio via
transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities
array only declared ["image"]. This caused the media-understanding
runner to skip the openai provider when processing inbound audio
messages, resulting in raw audio files being passed to agents
instead of transcribed text.
Fix: Add "audio" to the capabilities array so the runner correctly
selects the openai provider for audio transcription.
Co-authored-by: Cursor <cursoragent@cursor.com>
Thread history and thread starter were being fetched and included on
every message in a Slack thread, causing unnecessary token bloat. The
session transcript already contains the full conversation history, so
re-fetching and re-injecting thread history on each turn is redundant.
Now thread history is only fetched for new thread sessions
(!threadSessionPreviousTimestamp). Existing sessions rely on their
transcript for context.
Fixes#32121
When `allowSyntheticToolResults` is false (OpenAI, OpenRouter, and most
third-party providers), the guard never cleared its pending tool call map
when a user message arrived during in-flight tool execution. This left
orphaned tool_use blocks in the transcript with no matching tool_result,
causing the provider API to reject all subsequent requests with 400 errors
and permanently breaking the session.
The fix removes the `allowSyntheticToolResults` gate around the flush
calls. `flushPendingToolResults()` already handles both cases correctly:
it only inserts synthetic results when allowed, and always clears the
pending map. The gate was preventing the map from being cleared at all
for providers that disable synthetic results.
Fixes#32098
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The health monitor was created once at startup and never touched by
applyHotReload(), so changing channelHealthCheckMinutes only took
effect after a full gateway restart.
Wire up a "restart-health-monitor" reload action so hot-reload can
stop the old monitor and (re)create one with the updated interval —
or disable it entirely when set to 0.
Closes#32105
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `fromMe` flag from Baileys' WAMessage.key was only used for
access-control filtering and then discarded. This meant agents
could not distinguish owner-sent messages from contact messages
in DM conversations (everything appeared as from the contact).
Add `fromMe` to `WebInboundMessage`, store it during message
construction, and thread it through `buildInboundLine` →
`formatInboundEnvelope` so DM transcripts prefix owner messages
with `(self):`.
Closes#32061
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>