fix(google): emit opus voice-note tts

2026-05-06 17:50:45 +00:00 · 2026-04-25 21:33:15 +01:00
parent d5b6667823
commit e2fd3dcee9
14 changed files with 300 additions and 123 deletions
--- a/docs/tools/tts.md
+++ b/docs/tools/tts.md
@@ -584,7 +584,7 @@ These override `messages.tts.*` for that host.
 - **Local CLI**: uses the configured `outputFormat`. Voice-note targets are
  converted to Ogg/Opus and telephony output is converted to raw 16 kHz mono PCM
  with `ffmpeg`.
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
+- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments, transcodes it to 48kHz Opus for voice-note targets, and returns PCM directly for Talk/telephony.
 - **Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
 - **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
 - **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).