feat(tts): add xiaomi mimo speech provider

This commit is contained in:
Peter Steinberger
2026-04-25 09:47:52 +01:00
parent e10f20032a
commit ec8dbc4595
10 changed files with 789 additions and 10 deletions

View File

@@ -53,6 +53,46 @@ OpenAI-compatible endpoint with API-key authentication.
The default model ref is `xiaomi/mimo-v2-flash`. The provider is injected automatically when `XIAOMI_API_KEY` is set or an auth profile exists.
</Tip>
## Text-to-speech
The bundled `xiaomi` plugin also registers Xiaomi MiMo as a speech provider for
`messages.tts`. It calls Xiaomi's chat-completions TTS contract with the text as
an `assistant` message and optional style guidance as a `user` message.
| Property | Value |
| -------- | ---------------------------------------- |
| TTS id | `xiaomi` (`mimo` alias) |
| Auth | `XIAOMI_API_KEY` |
| API | `POST /v1/chat/completions` with `audio` |
| Default | `mimo-v2.5-tts`, voice `mimo_default` |
| Output | MP3 by default; WAV when configured |
```json5
{
messages: {
tts: {
auto: "always",
provider: "xiaomi",
providers: {
xiaomi: {
apiKey: "xiaomi_api_key",
model: "mimo-v2.5-tts",
voice: "mimo_default",
format: "mp3",
style: "Bright, natural, conversational tone.",
},
},
},
},
}
```
Supported built-in voices include `mimo_default`, `default_zh`, `default_en`,
`Mia`, `Chloe`, `Milo`, and `Dean`. `mimo-v2-tts` is supported for older MiMo
TTS accounts; the default uses the current MiMo-V2.5 TTS model. For voice-note
targets such as Feishu and Telegram, OpenClaw transcodes Xiaomi output to 48kHz
Opus with `ffmpeg` before delivery.
## Config example
```json5