mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 14:30:45 +00:00
fix(minimax): transcode voice-note tts to opus
This commit is contained in:
@@ -244,6 +244,18 @@ exposed separately through the plugin-owned `MiniMax-VL-01` media provider.
|
||||
See [Image Generation](/tools/image-generation) for shared tool parameters, provider selection, and failover behavior.
|
||||
</Note>
|
||||
|
||||
### Text-to-speech
|
||||
|
||||
The bundled `minimax` plugin registers MiniMax T2A v2 as a speech provider for
|
||||
`messages.tts`.
|
||||
|
||||
- Default TTS model: `speech-2.8-hd`
|
||||
- Default voice: `English_expressive_narrator`
|
||||
- Normal audio attachments stay MP3.
|
||||
- Voice-note targets such as Feishu and Telegram are transcoded from MiniMax
|
||||
MP3 to 48kHz Opus with `ffmpeg`, because the Feishu/Lark file API only
|
||||
accepts `file_type: "opus"` for native audio messages.
|
||||
|
||||
### Music generation
|
||||
|
||||
The bundled `minimax` plugin also registers music generation through the shared
|
||||
|
||||
@@ -488,7 +488,7 @@ These override `messages.tts.*` for that host.
|
||||
- 48kHz / 64kbps is a good voice message tradeoff.
|
||||
- **Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI).
|
||||
- 44.1kHz / 128kbps is the default balance for speech clarity.
|
||||
- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate). Voice-note format not natively supported; use OpenAI or ElevenLabs for guaranteed Opus voice messages.
|
||||
- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate) for normal audio attachments. For voice-note targets such as Feishu and Telegram, OpenClaw transcodes the MiniMax MP3 to 48kHz Opus with `ffmpeg` before delivery.
|
||||
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
|
||||
- **Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
|
||||
- **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
|
||||
|
||||
Reference in New Issue
Block a user