Expose audio transcription through the PluginRuntime so external
plugins (e.g. marmot) can use openclaw's media-understanding provider
framework without importing unexported internal modules.
The new transcribeAudioFile() wraps runCapability({capability: "audio"})
and reads provider/model/apiKey from tools.media.audio in the config,
matching the pattern used by the Discord VC implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase test audio file sizes to meet MIN_AUDIO_FILE_BYTES (1024) threshold
introduced by the skip-empty-audio feature. Fix localPathRoots in skip-tiny-audio
tests so temp files pass path validation. Remove undefined loadApply() call
in apply.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a minimum file size guard (MIN_AUDIO_FILE_BYTES = 1024) before
sending audio to transcription APIs. Files below this threshold are
almost certainly empty or corrupt and would cause unhelpful errors
from Whisper/Deepgram/Groq providers.
Changes:
- Add 'tooSmall' skip reason to MediaUnderstandingSkipError
- Add MIN_AUDIO_FILE_BYTES constant (1024 bytes) to defaults
- Guard both provider and CLI audio paths in runner.ts
- Add comprehensive tests for tiny, empty, and valid audio files
- Update existing test fixtures to use audio files above threshold
runProviderEntry now calls resolveProxyFetchFromEnv() and passes the
result as fetchFn to transcribeAudio/describeVideo, so media provider
API calls respect HTTPS_PROXY/HTTP_PROXY behind corporate proxies.
Move makeProxyFetch to src/infra/net/proxy-fetch.ts and add
resolveProxyFetchFromEnv which reads standard proxy env vars
(HTTPS_PROXY, HTTP_PROXY, and lowercase variants) and returns a
proxy-aware fetch via undici's EnvHttpProxyAgent. Telegram re-exports
from the shared location to avoid duplication.
The openai provider implements transcribeAudio via
transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities
array only declared ["image"]. This caused the media-understanding
runner to skip the openai provider when processing inbound audio
messages, resulting in raw audio files being passed to agents
instead of transcribed text.
Fix: Add "audio" to the capabilities array so the runner correctly
selects the openai provider for audio transcription.
Co-authored-by: Cursor <cursoragent@cursor.com>
Thread history and thread starter were being fetched and included on
every message in a Slack thread, causing unnecessary token bloat. The
session transcript already contains the full conversation history, so
re-fetching and re-injecting thread history on each turn is redundant.
Now thread history is only fetched for new thread sessions
(!threadSessionPreviousTimestamp). Existing sessions rely on their
transcript for context.
Fixes#32121
When `allowSyntheticToolResults` is false (OpenAI, OpenRouter, and most
third-party providers), the guard never cleared its pending tool call map
when a user message arrived during in-flight tool execution. This left
orphaned tool_use blocks in the transcript with no matching tool_result,
causing the provider API to reject all subsequent requests with 400 errors
and permanently breaking the session.
The fix removes the `allowSyntheticToolResults` gate around the flush
calls. `flushPendingToolResults()` already handles both cases correctly:
it only inserts synthetic results when allowed, and always clears the
pending map. The gate was preventing the map from being cleared at all
for providers that disable synthetic results.
Fixes#32098
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The health monitor was created once at startup and never touched by
applyHotReload(), so changing channelHealthCheckMinutes only took
effect after a full gateway restart.
Wire up a "restart-health-monitor" reload action so hot-reload can
stop the old monitor and (re)create one with the updated interval —
or disable it entirely when set to 0.
Closes#32105
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `fromMe` flag from Baileys' WAMessage.key was only used for
access-control filtering and then discarded. This meant agents
could not distinguish owner-sent messages from contact messages
in DM conversations (everything appeared as from the contact).
Add `fromMe` to `WebInboundMessage`, store it during message
construction, and thread it through `buildInboundLine` →
`formatInboundEnvelope` so DM transcripts prefix owner messages
with `(self):`.
Closes#32061
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Greptile review: show "not a valid OpenClaw plugin" when the
npm package was found but lacks openclaw.extensions, instead of the
misleading "npm package unavailable" message.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When `openclaw plugins install diffs` downloads the unrelated npm
package `diffs@0.1.1` (which lacks `openclaw.extensions`), the install
fails without trying the bundled `@openclaw/diffs` plugin.
Two fixes:
1. Broaden the bundled-fallback trigger to also fire on
"missing openclaw.extensions" errors (not just npm 404s)
2. Match bundled plugins by pluginId in addition to npmSpec so
unscoped names like "diffs" resolve to `@openclaw/diffs`
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses #31699 — config .bak files persist with sensitive data.
Changes:
- Explicitly chmod 0o600 on all .bak files after creation, instead of
relying on copyFile to preserve source permissions (not guaranteed on
all platforms, e.g. Windows, NFS mounts).
- Clean up orphan .bak files that fall outside the managed 5-deep
rotation ring (e.g. PID-stamped leftovers from interrupted writes,
manual backups like .bak.before-marketing).
- Add tests for permission hardening and orphan cleanup.
The backup ring itself is preserved — it's a valuable recovery mechanism.
This PR hardens the security surface by ensuring backup files are
always owner-only and stale copies don't accumulate indefinitely.
xAI rejects minLength, maxLength, minItems, maxItems, minContains, and
maxContains in tool schemas with a 502 error instead of ignoring them.
This causes all requests to fail when any tool definition includes these
validation-constraint keywords (e.g. sessions_spawn uses maxLength and
maxItems on its attachment fields).
Add stripXaiUnsupportedKeywords() in schema/clean-for-xai.ts, mirroring
the existing cleanSchemaForGemini() pattern. Apply it in normalizeToolParameters()
when the provider is xai directly, or openrouter with an x-ai/* model id.
Fixes tool calls for x-ai/grok-* models both direct and via OpenRouter.
The downloadAndSaveTelegramFile inner function only used the server-side
file path (e.g. "documents/file_42.pdf") or the Content-Disposition
header (which Telegram doesn't send) to derive the saved filename.
The original filename provided by Telegram via msg.document.file_name,
msg.audio.file_name, msg.video.file_name, and msg.animation.file_name
was never passed through, causing all inbound files to lose their
user-provided names.
Now downloadAndSaveTelegramFile accepts an optional telegramFileName
parameter that takes priority over the fetched/server-side name.
The resolveMedia call site extracts the original name from the message
and passes it through.
Closes#31768
Made-with: Cursor
Replace the single per-account messageQueue Promise chain in
DiscordMessageListener with per-channel queues. This restores parallel
processing for channel-bound agents that regressed in 2026.3.1.
Messages within the same channel remain serialized to preserve ordering,
while messages to different channels now proceed independently. Completed
queue entries are cleaned up to prevent memory accumulation.
Closes#31530
The native streaming path (chatStream) and preview final edit path
(chat.update) send raw Markdown text without converting to Slack
mrkdwn format. This causes **bold** to appear as literal asterisks
instead of rendered bold text.
Apply markdownToSlackMrkdwn() in streaming.ts (start/append/stop) and
in dispatch.ts (preview final edit via chat.update) to match the
non-streaming delivery path behavior.
Closes#31892
When enforceFinalTag is active (Google providers), stripBlockTags
correctly returns empty for text without <final> tags. However, the
handleMessageEnd fallback recovered raw text, bypassing this protection
and leaking internal reasoning (e.g. "**Applying single-bot mention
rule**NO_REPLY") to Discord.
Guard the fallback with enforceFinalTag check: if the provider is
supposed to use <final> tags and none were seen, the text is treated
as leaked reasoning and suppressed.
Also harden stripSilentToken regex to allow bold markdown (**) as
separator before NO_REPLY, matching the pattern Gemini Flash Lite
produces.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>