fix: apply missed media/runtime follow-ups from merged PRs

This commit is contained in:
Peter Steinberger
2026-03-02 21:45:29 +00:00
parent f2b37f0aa9
commit a183656f8f
9 changed files with 205 additions and 9 deletions

View File

@@ -123,6 +123,7 @@ Recommended defaults:
Rules:
- If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription.
- If the model returns more than `maxChars`, output is trimmed.
- `prompt` defaults to simple “Describe the {media}.” plus the `maxChars` guidance (image/video only).
- If `<capability>.enabled: true` but no models are configured, OpenClaw tries the
@@ -160,6 +161,20 @@ To disable auto-detection, set:
Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI is on `PATH` (we expand `~`), or set an explicit CLI model with a full command path.
### Proxy environment support (provider models)
When provider-based **audio** and **video** media understanding is enabled, OpenClaw
honors standard outbound proxy environment variables for provider HTTP calls:
- `HTTPS_PROXY`
- `HTTP_PROXY`
- `https_proxy`
- `http_proxy`
If no proxy env vars are set, media understanding uses direct egress.
If the proxy value is malformed, OpenClaw logs a warning and falls back to direct
fetch.
## Capabilities (optional)
If you set `capabilities`, the entry only runs for those media types. For shared

View File

@@ -90,6 +90,22 @@ Notes:
- Returns PCM audio buffer + sample rate. Plugins must resample/encode for providers.
- Edge TTS is not supported for telephony.
For STT/transcription, plugins can call:
```ts
const { text } = await api.runtime.stt.transcribeAudioFile({
filePath: "/tmp/inbound-audio.ogg",
cfg: api.config,
// Optional when MIME cannot be inferred reliably:
mime: "audio/ogg",
});
```
Notes:
- Uses core media-understanding audio configuration (`tools.media.audio`) and provider fallback order.
- Returns `{ text: undefined }` when no transcription output is produced (for example skipped/unsupported input).
## Discovery & precedence
OpenClaw scans, in order: