mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 09:20:43 +00:00
fix(google): emit opus voice-note tts
This commit is contained in:
@@ -1,2 +1,2 @@
|
||||
f813474b1623f06e1465daacd56db970e8e92ab1be122faee0fa2a1dc2d4fc43 plugin-sdk-api-baseline.json
|
||||
b3ea88c0c9b4cf6d9a46f0d34149063303853e78ef9708224608e4da79b23190 plugin-sdk-api-baseline.jsonl
|
||||
c911117176b41eebf26470618274a7e093910e9b36855bc045bc8a92f6856745 plugin-sdk-api-baseline.json
|
||||
ff360635f95beb217b9dd207a87eaf331319a7671aea03acfe05911756741b21 plugin-sdk-api-baseline.jsonl
|
||||
|
||||
@@ -252,8 +252,8 @@ The bundled `google` speech provider uses the Gemini API TTS path with
|
||||
|
||||
- Default voice: `Kore`
|
||||
- Auth: `messages.tts.providers.google.apiKey`, `models.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY`
|
||||
- Output: WAV for regular TTS attachments, PCM for Talk/telephony
|
||||
- Native voice-note output: not supported on this Gemini API path because the API returns PCM rather than Opus
|
||||
- Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
|
||||
- Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with `ffmpeg`
|
||||
|
||||
To use Google as the default TTS provider:
|
||||
|
||||
|
||||
@@ -584,7 +584,7 @@ These override `messages.tts.*` for that host.
|
||||
- **Local CLI**: uses the configured `outputFormat`. Voice-note targets are
|
||||
converted to Ogg/Opus and telephony output is converted to raw 16 kHz mono PCM
|
||||
with `ffmpeg`.
|
||||
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
|
||||
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments, transcodes it to 48kHz Opus for voice-note targets, and returns PCM directly for Talk/telephony.
|
||||
- **Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
|
||||
- **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
|
||||
- **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
|
||||
|
||||
Reference in New Issue
Block a user