diff --git a/docs/tools/tts.md b/docs/tools/tts.md
index 12bd04d7069..cb17494312d 100644
--- a/docs/tools/tts.md
+++ b/docs/tools/tts.md
@@ -1,120 +1,122 @@
 ---
-summary: "Text-to-speech (TTS) for outbound replies"
+summary: "Text-to-speech for outbound replies — providers, personas, slash commands, and per-channel output"
 read_when:
   - Enabling text-to-speech for replies
-  - Configuring TTS providers or limits
-  - Using /tts commands
+  - Configuring a TTS provider, fallback chain, or persona
+  - Using /tts commands or directives
 title: "Text-to-speech"
 ---
 
-OpenClaw can convert outbound replies into audio using Azure Speech, ElevenLabs, Google Gemini, Gradium, Inworld, Local CLI, Microsoft, MiniMax, OpenAI, Volcengine, Vydra, xAI, or Xiaomi MiMo.
-It works anywhere OpenClaw can send audio.
+OpenClaw can convert outbound replies into audio across **13 speech providers**
+and deliver native voice messages on Feishu, Matrix, Telegram, and WhatsApp,
+audio attachments everywhere else, and PCM/Ulaw streams for telephony and Talk.
 
-## Supported services
+## Quick start
 
-- **Azure Speech** (primary or fallback provider; uses the Azure AI Speech REST API)
-- **ElevenLabs** (primary or fallback provider)
-- **Google Gemini** (primary or fallback provider; uses Gemini API TTS)
-- **Gradium** (primary or fallback provider; supports voice-note and telephony output)
-- **Inworld** (primary or fallback provider; uses the Inworld streaming TTS API)
-- **Local CLI** (primary or fallback provider; runs a configured local TTS command)
-- **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`)
-- **MiniMax** (primary or fallback provider; uses the T2A v2 API)
-- **OpenAI** (primary or fallback provider; also used for summaries)
-- **Volcengine** (primary or fallback provider; uses the BytePlus Seed Speech HTTP API)
-- **Vydra** (primary or fallback provider; shared image, video, and speech provider)
-- **xAI** (primary or fallback provider; uses the xAI TTS API)
-- **Xiaomi MiMo** (primary or fallback provider; uses MiMo TTS through Xiaomi chat completions)
+<Steps>
+  <Step title="Pick a provider">
+    OpenAI and ElevenLabs are the most reliable hosted options. Microsoft and
+    Local CLI work without an API key. See the [provider matrix](#supported-providers)
+    for the full list.
+  </Step>
+  <Step title="Set the API key">
+    Export the env var for your provider (for example `OPENAI_API_KEY`,
+    `ELEVENLABS_API_KEY`). Microsoft and Local CLI need no key.
+  </Step>
+  <Step title="Enable in config">
+    Set `messages.tts.auto: "always"` and `messages.tts.provider`:
 
-### Microsoft speech notes
+    ```json5
+    {
+      messages: {
+        tts: {
+          auto: "always",
+          provider: "elevenlabs",
+        },
+      },
+    }
+    ```
 
-The bundled Microsoft speech provider currently uses Microsoft Edge's online
-neural TTS service via the `node-edge-tts` library. It's a hosted service (not
-local), uses Microsoft endpoints, and does not require an API key.
-`node-edge-tts` exposes speech configuration options and output formats, but
-not all options are supported by the service. Legacy config and directive input
-using `edge` still works and is normalized to `microsoft`.
+  </Step>
+  <Step title="Try it in chat">
+    `/tts status` shows the current state. `/tts audio Hello from OpenClaw`
+    sends a one-off audio reply.
+  </Step>
+</Steps>
 
-Because this path is a public web service without a published SLA or quota,
-treat it as best-effort. If you need guaranteed limits and support, use OpenAI
-or ElevenLabs.
+<Note>
+Auto-TTS is **off** by default. When `messages.tts.provider` is unset,
+OpenClaw picks the first configured provider in registry auto-select order.
+</Note>
 
-## Optional keys
+## Supported providers
 
-If you want Azure Speech, ElevenLabs, Google Gemini, Gradium, Inworld, MiniMax, OpenAI, Volcengine, Vydra, xAI, or Xiaomi MiMo:
+| Provider          | Auth                                                                                                             | Notes                                                                   |
+| ----------------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
+| **OpenAI**        | `OPENAI_API_KEY`                                                                                                 | Also used for auto-summary; supports persona `instructions`.            |
+| **ElevenLabs**    | `ELEVENLABS_API_KEY` or `XI_API_KEY`                                                                             | Voice cloning, multilingual, deterministic via `seed`.                  |
+| **Google Gemini** | `GEMINI_API_KEY` or `GOOGLE_API_KEY`                                                                             | Gemini API TTS; persona-aware via `promptTemplate: "audio-profile-v1"`. |
+| **Azure Speech**  | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`)          | Native Ogg/Opus voice-note output and telephony.                        |
+| **Microsoft**     | none                                                                                                             | Public Edge neural TTS via `node-edge-tts`. Best-effort, no SLA.        |
+| **MiniMax**       | `MINIMAX_API_KEY` (or Token Plan: `MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`, `MINIMAX_CODING_API_KEY`)      | T2A v2 API. Defaults to `speech-2.8-hd`.                                |
+| **Inworld**       | `INWORLD_API_KEY`                                                                                                | Streaming TTS API. Native Opus voice-note and PCM telephony.            |
+| **xAI**           | `XAI_API_KEY`                                                                                                    | xAI batch TTS. Native Opus voice-note is **not** supported.             |
+| **Volcengine**    | `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY` (legacy AppID/token: `VOLCENGINE_TTS_APPID`/`_TOKEN`) | BytePlus Seed Speech HTTP API.                                          |
+| **Xiaomi MiMo**   | `XIAOMI_API_KEY`                                                                                                 | MiMo TTS through Xiaomi chat completions.                               |
+| **OpenRouter**    | `OPENROUTER_API_KEY` (can reuse `models.providers.openrouter.apiKey`)                                            | Default model `hexgrad/kokoro-82m`.                                     |
+| **Gradium**       | `GRADIUM_API_KEY`                                                                                                | Voice-note and telephony output.                                        |
+| **Vydra**         | `VYDRA_API_KEY`                                                                                                  | Shared image, video, and speech provider.                               |
+| **Local CLI**     | none                                                                                                             | Runs a configured local TTS command.                                    |
 
-- `AZURE_SPEECH_KEY` plus `AZURE_SPEECH_REGION` (also accepts
-  `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, and `SPEECH_REGION`)
-- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
-- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`)
-- `GRADIUM_API_KEY`
-- `INWORLD_API_KEY`
-- `MINIMAX_API_KEY`; MiniMax TTS also accepts Token Plan auth via
-  `MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`, or
-  `MINIMAX_CODING_API_KEY`
-- `OPENAI_API_KEY`
-- `VOLCENGINE_TTS_API_KEY` (or `BYTEPLUS_SEED_SPEECH_API_KEY`);
-  legacy AppID/token auth also accepts `VOLCENGINE_TTS_APPID` and
-  `VOLCENGINE_TTS_TOKEN`
-- `VYDRA_API_KEY`
-- `XAI_API_KEY`
-- `XIAOMI_API_KEY`
+If multiple providers are configured, the selected one is used first and the
+others are fallback options. Auto-summary uses `summaryModel` (or
+`agents.defaults.model.primary`), so that provider must also be authenticated
+if you keep summaries enabled.
 
-Local CLI and Microsoft speech do **not** require an API key.
+<Warning>
+The bundled **Microsoft** provider uses Microsoft Edge's online neural TTS
+service via `node-edge-tts`. It is a public web service without a published
+SLA or quota — treat it as best-effort. The legacy provider id `edge` is
+normalized to `microsoft` and `openclaw doctor --fix` rewrites persisted
+config; new configs should always use `microsoft`.
+</Warning>
 
-If multiple providers are configured, the selected provider is used first and the others are fallback options.
-Auto-summary uses the configured `summaryModel` (or `agents.defaults.model.primary`),
-so that provider must also be authenticated if you enable summaries.
+## Configuration
 
-## Service links
-
-- [OpenAI Text-to-Speech guide](https://platform.openai.com/docs/guides/text-to-speech)
-- [OpenAI Audio API reference](https://platform.openai.com/docs/api-reference/audio)
-- [Azure Speech REST text-to-speech](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech)
-- [Azure Speech provider](/providers/azure-speech)
-- [ElevenLabs Text to Speech](https://elevenlabs.io/docs/api-reference/text-to-speech)
-- [ElevenLabs Authentication](https://elevenlabs.io/docs/api-reference/authentication)
-- [Gradium](/providers/gradium)
-- [Inworld TTS API](https://docs.inworld.ai/tts/tts)
-- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2)
-- [Volcengine TTS HTTP API](/providers/volcengine#text-to-speech)
-- [Xiaomi MiMo speech synthesis](/providers/xiaomi#text-to-speech)
-- [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts)
-- [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs)
-- [xAI Text to Speech](https://docs.x.ai/developers/rest-api-reference/inference/voice#text-to-speech-rest)
-
-## Is it enabled by default?
-
-No. Auto‑TTS is **off** by default. Enable it in config with
-`messages.tts.auto` or locally with `/tts on`.
-
-When `messages.tts.provider` is unset, OpenClaw picks the first configured
-speech provider in registry auto-select order.
-
-## Config
-
-TTS config lives under `messages.tts` in `openclaw.json`.
-Full schema is in [Gateway configuration](/gateway/configuration).
-
-### Minimal config (enable + provider)
+TTS config lives under `messages.tts` in `~/.openclaw/openclaw.json`. Pick a
+preset and adapt the provider block:
 
+<Tabs>
+  <Tab title="OpenAI + ElevenLabs">
 ```json5
 {
   messages: {
     tts: {
       auto: "always",
-      provider: "elevenlabs",
+      provider: "openai",
+      summaryModel: "openai/gpt-4.1-mini",
+      modelOverrides: { enabled: true },
+      providers: {
+        openai: {
+          apiKey: "${OPENAI_API_KEY}",
+          model: "gpt-4o-mini-tts",
+          voice: "alloy",
+        },
+        elevenlabs: {
+          apiKey: "${ELEVENLABS_API_KEY}",
+          model: "eleven_multilingual_v2",
+          voiceId: "EXAVITQu4vr4xnSDxMaL",
+          voiceSettings: { stability: 0.5, similarityBoost: 0.75, style: 0.0, useSpeakerBoost: true, speed: 1.0 },
+          applyTextNormalization: "auto",
+          languageCode: "en",
+        },
+      },
     },
   },
 }
 ```
-
-### Per-agent voice overrides
-
-Use `agents.list[].tts` when one agent should speak with a different provider,
-voice, model, style, or auto-TTS mode. The agent block deep-merges over
-`messages.tts`, so provider credentials can stay in the global provider config.
-
+  </Tab>
+  <Tab title="ElevenLabs only">
 ```json5
 {
   messages: {
@@ -125,78 +127,37 @@ voice, model, style, or auto-TTS mode. The agent block deep-merges over
         elevenlabs: {
           apiKey: "${ELEVENLABS_API_KEY}",
           model: "eleven_multilingual_v2",
+          voiceId: "EXAVITQu4vr4xnSDxMaL",
         },
       },
     },
   },
-  agents: {
-    list: [
-      {
-        id: "reader",
-        tts: {
-          providers: {
-            elevenlabs: {
-              voiceId: "EXAVITQu4vr4xnSDxMaL",
-            },
-          },
-        },
-      },
-    ],
-  },
 }
 ```
-
-Precedence for automatic replies, `/tts audio`, `/tts status`, and the `tts`
-agent tool is:
-
-1. `messages.tts`
-2. active `agents.list[].tts`
-3. local `/tts` preferences for this host
-4. inline `[[tts:...]]` directives when model overrides are enabled
-
-### OpenAI primary with ElevenLabs fallback
-
+  </Tab>
+  <Tab title="Google Gemini">
 ```json5
 {
   messages: {
     tts: {
       auto: "always",
-      provider: "openai",
-      summaryModel: "openai/gpt-4.1-mini",
-      modelOverrides: {
-        enabled: true,
-      },
+      provider: "google",
       providers: {
-        openai: {
-          apiKey: "openai_api_key",
-          baseUrl: "https://api.openai.com/v1",
-          model: "gpt-4o-mini-tts",
-          voice: "alloy",
-        },
-        elevenlabs: {
-          apiKey: "elevenlabs_api_key",
-          baseUrl: "https://api.elevenlabs.io",
-          voiceId: "voice_id",
-          modelId: "eleven_multilingual_v2",
-          seed: 42,
-          applyTextNormalization: "auto",
-          languageCode: "en",
-          voiceSettings: {
-            stability: 0.5,
-            similarityBoost: 0.75,
-            style: 0.0,
-            useSpeakerBoost: true,
-            speed: 1.0,
-          },
+        google: {
+          apiKey: "${GEMINI_API_KEY}",
+          model: "gemini-3.1-flash-tts-preview",
+          voiceName: "Kore",
+          // Optional natural-language style prompts:
+          // audioProfile: "Speak in a calm, podcast-host tone.",
+          // speakerName: "Alex",
         },
       },
     },
   },
 }
 ```
-
-### Azure Speech primary
-
+  </Tab>
+  <Tab title="Azure Speech">
 ```json5
 {
   messages: {
@@ -205,8 +166,8 @@ agent tool is:
       provider: "azure-speech",
       providers: {
         "azure-speech": {
-          // apiKey falls back to AZURE_SPEECH_KEY.
-          // region falls back to AZURE_SPEECH_REGION.
+          apiKey: "${AZURE_SPEECH_KEY}",
+          region: "eastus",
           voice: "en-US-JennyNeural",
           lang: "en-US",
           outputFormat: "audio-24khz-48kbitrate-mono-mp3",
@@ -217,16 +178,8 @@ agent tool is:
   },
 }
 ```
-
-Azure Speech uses a Speech resource key, not an Azure OpenAI key. Resolution
-order is `messages.tts.providers.azure-speech.apiKey` ->
-`AZURE_SPEECH_KEY` -> `AZURE_SPEECH_API_KEY` -> `SPEECH_KEY`, plus
-`messages.tts.providers.azure-speech.region` -> `AZURE_SPEECH_REGION` ->
-`SPEECH_REGION` for the region. New config should use `azure-speech`; `azure`
-is accepted as a provider alias.
-
-### Microsoft primary (no API key)
-
+  </Tab>
+  <Tab title="Microsoft (no key)">
 ```json5
 {
   messages: {
@@ -239,17 +192,16 @@ is accepted as a provider alias.
           voice: "en-US-MichelleNeural",
           lang: "en-US",
           outputFormat: "audio-24khz-48kbitrate-mono-mp3",
-          rate: "+10%",
-          pitch: "-5%",
+          rate: "+0%",
+          pitch: "+0%",
         },
       },
     },
   },
 }
 ```
-
-### MiniMax primary
-
+  </Tab>
+  <Tab title="MiniMax">
 ```json5
 {
   messages: {
@@ -258,8 +210,7 @@ is accepted as a provider alias.
       provider: "minimax",
       providers: {
         minimax: {
-          apiKey: "minimax_api_key",
-          baseUrl: "https://api.minimax.io",
+          apiKey: "${MINIMAX_API_KEY}",
           model: "speech-2.8-hd",
           voiceId: "English_expressive_narrator",
           speed: 1.0,
@@ -271,42 +222,8 @@ is accepted as a provider alias.
   },
 }
 ```
-
-MiniMax TTS auth resolution is `messages.tts.providers.minimax.apiKey`, then
-stored `minimax-portal` OAuth/token profiles, then Token Plan environment keys
-(`MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`,
-`MINIMAX_CODING_API_KEY`), then `MINIMAX_API_KEY`. When no explicit TTS
-`baseUrl` is set, OpenClaw can reuse the configured `minimax-portal` OAuth
-host for Token Plan speech.
-
-### Google Gemini primary
-
-```json5
-{
-  messages: {
-    tts: {
-      auto: "always",
-      provider: "google",
-      providers: {
-        google: {
-          apiKey: "gemini_api_key",
-          model: "gemini-3.1-flash-tts-preview",
-          voiceName: "Kore",
-        },
-      },
-    },
-  },
-}
-```
-
-Google Gemini TTS uses the Gemini API key path. A Google Cloud Console API key
-restricted to the Gemini API is valid here, and it is the same style of key used
-by the bundled Google image-generation provider. Resolution order is
-`messages.tts.providers.google.apiKey` -> `models.providers.google.apiKey` ->
-`GEMINI_API_KEY` -> `GOOGLE_API_KEY`.
-
-### Inworld primary
-
+  </Tab>
+  <Tab title="Inworld">
 ```json5
 {
   messages: {
@@ -315,56 +232,18 @@ by the bundled Google image-generation provider. Resolution order is
       provider: "inworld",
       providers: {
         inworld: {
-          apiKey: "inworld_api_key",
-          baseUrl: "https://api.inworld.ai",
-          voiceId: "Sarah",
+          apiKey: "${INWORLD_API_KEY}",
           modelId: "inworld-tts-1.5-max",
-          temperature: 0.8,
+          voiceId: "Sarah",
+          temperature: 0.7,
         },
       },
     },
   },
 }
 ```
-
-The `apiKey` value must be the Base64-encoded credential string copied
-verbatim from the Inworld dashboard (Workspace > API Keys). The provider
-sends it as `Authorization: Basic <apiKey>` without any additional
-encoding, so do not pass a raw bearer token and do not Base64-encode it
-yourself. The key falls back to the `INWORLD_API_KEY` env var. See
-[Inworld provider](/providers/inworld) for full setup.
-
-### Volcengine primary
-
-```json5
-{
-  messages: {
-    tts: {
-      auto: "always",
-      provider: "volcengine",
-      providers: {
-        volcengine: {
-          apiKey: "byteplus_seed_speech_api_key",
-          resourceId: "seed-tts-1.0",
-          voice: "en_female_anna_mars_bigtts",
-          speedRatio: 1.0,
-        },
-      },
-    },
-  },
-}
-```
-
-Volcengine TTS uses the BytePlus Seed Speech API key from the Speech Console,
-not the OpenAI-compatible `VOLCANO_ENGINE_API_KEY` used for Doubao model
-providers. Resolution order is `messages.tts.providers.volcengine.apiKey` ->
-`VOLCENGINE_TTS_API_KEY` -> `BYTEPLUS_SEED_SPEECH_API_KEY`. Legacy AppID/token
-auth still works through `messages.tts.providers.volcengine.appId` / `token` or
-`VOLCENGINE_TTS_APPID` / `VOLCENGINE_TTS_TOKEN`. Voice-note targets request
-provider-native `ogg_opus`; normal audio-file targets request `mp3`.
-
-### xAI primary
-
+  </Tab>
+  <Tab title="xAI">
 ```json5
 {
   messages: {
@@ -373,25 +252,37 @@ provider-native `ogg_opus`; normal audio-file targets request `mp3`.
       provider: "xai",
       providers: {
         xai: {
-          apiKey: "xai_api_key",
+          apiKey: "${XAI_API_KEY}",
           voiceId: "eve",
           language: "en",
           responseFormat: "mp3",
-          speed: 1.0,
         },
       },
     },
   },
 }
 ```
-
-xAI TTS uses the same `XAI_API_KEY` path as the bundled Grok model provider.
-Resolution order is `messages.tts.providers.xai.apiKey` -> `XAI_API_KEY`.
-Current live voices are `ara`, `eve`, `leo`, `rex`, `sal`, and `una`; `eve` is
-the default. `language` accepts a BCP-47 tag or `auto`.
-
-### Xiaomi MiMo primary
-
+  </Tab>
+  <Tab title="Volcengine">
+```json5
+{
+  messages: {
+    tts: {
+      auto: "always",
+      provider: "volcengine",
+      providers: {
+        volcengine: {
+          apiKey: "${VOLCENGINE_TTS_API_KEY}",
+          resourceId: "seed-tts-1.0",
+          voice: "en_female_anna_mars_bigtts",
+        },
+      },
+    },
+  },
+}
+```
+  </Tab>
+  <Tab title="Xiaomi MiMo">
 ```json5
 {
   messages: {
@@ -400,26 +291,18 @@ the default. `language` accepts a BCP-47 tag or `auto`.
       provider: "xiaomi",
       providers: {
         xiaomi: {
-          apiKey: "xiaomi_api_key",
-          baseUrl: "https://api.xiaomimimo.com/v1",
+          apiKey: "${XIAOMI_API_KEY}",
           model: "mimo-v2.5-tts",
           voice: "mimo_default",
           format: "mp3",
-          style: "Bright, natural, conversational tone.",
         },
       },
     },
   },
 }
 ```
-
-Xiaomi MiMo TTS uses the same `XIAOMI_API_KEY` path as the bundled Xiaomi model
-provider. The speech provider id is `xiaomi`; `mimo` is accepted as an alias.
-The target text is sent as the assistant message, matching Xiaomi's TTS
-contract. Optional `style` is sent as a user instruction and is not spoken.
-
-### OpenRouter primary
-
+  </Tab>
+  <Tab title="OpenRouter">
 ```json5
 {
   messages: {
@@ -428,7 +311,7 @@ contract. Optional `style` is sent as a user instruction and is not spoken.
       provider: "openrouter",
       providers: {
         openrouter: {
-          apiKey: "openrouter_api_key",
+          apiKey: "${OPENROUTER_API_KEY}",
           model: "hexgrad/kokoro-82m",
           voice: "af_alloy",
           responseFormat: "mp3",
@@ -438,14 +321,26 @@ contract. Optional `style` is sent as a user instruction and is not spoken.
   },
 }
 ```
-
-OpenRouter TTS uses the same `OPENROUTER_API_KEY` path as the bundled
-OpenRouter model provider. Resolution order is
-`messages.tts.providers.openrouter.apiKey` ->
-`models.providers.openrouter.apiKey` -> `OPENROUTER_API_KEY`.
-
-### Local CLI primary
-
+  </Tab>
+  <Tab title="Gradium">
+```json5
+{
+  messages: {
+    tts: {
+      auto: "always",
+      provider: "gradium",
+      providers: {
+        gradium: {
+          apiKey: "${GRADIUM_API_KEY}",
+          voiceId: "YTpq7expH9539ERJ",
+        },
+      },
+    },
+  },
+}
+```
+  </Tab>
+  <Tab title="Local CLI">
 ```json5
 {
   messages: {
@@ -464,28 +359,74 @@ OpenRouter model provider. Resolution order is
   },
 }
 ```
+  </Tab>
+</Tabs>
 
-Local CLI TTS runs the configured command on the gateway host. `{{Text}}`,
-`{{OutputPath}}`, `{{OutputDir}}`, and `{{OutputBase}}` placeholders are
-expanded in `args`; if no `{{Text}}` placeholder is present, OpenClaw writes the
-spoken text to stdin. `outputFormat` accepts `mp3`, `opus`, or `wav`.
-Voice-note targets are transcoded to Ogg/Opus and telephony output is
-transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
-`cli` still works, but new config should use `tts-local-cli`.
+### Per-agent voice overrides
 
-### Gradium primary
+Use `agents.list[].tts` when one agent should speak with a different provider,
+voice, model, persona, or auto-TTS mode. The agent block deep-merges over
+`messages.tts`, so provider credentials can stay in the global provider config:
 
 ```json5
 {
   messages: {
     tts: {
       auto: "always",
-      provider: "gradium",
+      provider: "elevenlabs",
       providers: {
-        gradium: {
-          apiKey: "gradium_api_key",
-          baseUrl: "https://api.gradium.ai",
-          voiceId: "YTpq7expH9539ERJ",
+        elevenlabs: { apiKey: "${ELEVENLABS_API_KEY}", model: "eleven_multilingual_v2" },
+      },
+    },
+  },
+  agents: {
+    list: [
+      {
+        id: "reader",
+        tts: {
+          providers: {
+            elevenlabs: { voiceId: "EXAVITQu4vr4xnSDxMaL" },
+          },
+        },
+      },
+    ],
+  },
+}
+```
+
+To pin a per-agent persona, set `agents.list[].tts.persona` alongside provider
+config — it overrides the global `messages.tts.persona` for that agent only.
+
+Precedence order for automatic replies, `/tts audio`, `/tts status`, and the
+`tts` agent tool:
+
+1. `messages.tts`
+2. active `agents.list[].tts`
+3. local `/tts` preferences for this host
+4. inline `[[tts:...]]` directives when [model overrides](#model-driven-directives) are enabled
+
+## Personas
+
+A **persona** is a stable spoken identity that can be applied deterministically
+across providers. It can prefer one provider, define provider-neutral prompt
+intent, and carry provider-specific bindings for voices, models, prompt
+templates, seeds, and voice settings.
+
+### Minimal persona
+
+```json5
+{
+  messages: {
+    tts: {
+      auto: "always",
+      persona: "narrator",
+      personas: {
+        narrator: {
+          label: "Narrator",
+          provider: "elevenlabs",
+          providers: {
+            elevenlabs: { voiceId: "EXAVITQu4vr4xnSDxMaL", modelId: "eleven_multilingual_v2" },
+          },
         },
       },
     },
@@ -493,12 +434,7 @@ transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
 }
 ```
 
-### TTS personas
-
-Use `messages.tts.personas` when you want a stable spoken identity that can be
-applied deterministically across providers. A persona can prefer one provider,
-define provider-neutral prompt intent, and carry provider-specific bindings for
-voices, models, prompt templates, seeds, and voice settings.
+### Full persona (provider-neutral prompt)
 
 ```json5
 {
@@ -527,10 +463,7 @@ voices, models, prompt templates, seeds, and voice settings.
               voiceName: "Algieba",
               promptTemplate: "audio-profile-v1",
             },
-            openai: {
-              model: "gpt-4o-mini-tts",
-              voice: "cedar",
-            },
+            openai: { model: "gpt-4o-mini-tts", voice: "cedar" },
             elevenlabs: {
               voiceId: "voice_id",
               modelId: "eleven_multilingual_v2",
@@ -551,376 +484,184 @@ voices, models, prompt templates, seeds, and voice settings.
 }
 ```
 
-Resolution is deterministic:
+### Persona resolution
+
+The active persona is selected deterministically:
 
 1. `/tts persona <id>` local preference, if set.
 2. `messages.tts.persona`, if set.
 3. No persona.
 
-Provider selection is explicit-first:
+Provider selection runs explicit-first:
 
-1. Direct provider overrides from CLI, gateway, Talk, or allowed TTS directives.
+1. Direct overrides (CLI, gateway, Talk, allowed TTS directives).
 2. `/tts provider <id>` local preference.
-3. Active persona `provider`.
+3. Active persona's `provider`.
 4. `messages.tts.provider`.
 5. Registry auto-select.
 
-For each provider attempt, OpenClaw merges:
+For each provider attempt, OpenClaw merges configs in this order:
 
 1. `messages.tts.providers.<id>`
 2. `messages.tts.personas.<persona>.providers.<id>`
-3. trusted request overrides
-4. allowed model-emitted TTS directive overrides
+3. Trusted request overrides
+4. Allowed model-emitted TTS directive overrides
 
-`fallbackPolicy` controls what happens when an active persona has no binding for
-an attempted provider:
+### How providers use persona prompts
 
-- `preserve-persona` keeps provider-neutral persona prompt fields available to
-  providers. This is the default.
-- `provider-defaults` omits the persona from provider prompt preparation for
-  that attempt, so the provider uses its neutral defaults while still allowing
-  fallback to continue.
-- `fail` skips that provider attempt with `reasonCode: "not_configured"` and
-  `personaBinding: "missing"`. Fallback providers are still tried; the whole TTS
-  request fails only if every attempted provider is skipped or fails.
+Persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`,
+`pacing`, `constraints`) are **provider-neutral**. Each provider decides how
+to use them:
 
-Persona prompt fields are provider-neutral. Providers decide how to use them.
-Google wraps them only when the effective Google provider config sets
-`promptTemplate: "audio-profile-v1"` or `personaPrompt`; its older
-`audioProfile` and `speakerName` fields are still prepended as Google-specific
-prompt text. OpenAI maps prompt fields to `instructions` when no explicit
-OpenAI `instructions` value is configured. Providers without prompt-like
-controls use the provider-specific persona bindings only.
+<AccordionGroup>
+  <Accordion title="Google Gemini">
+    Wraps persona prompt fields in a Gemini TTS prompt structure **only when**
+    the effective Google provider config sets `promptTemplate: "audio-profile-v1"`
+    or `personaPrompt`. The older `audioProfile` and `speakerName` fields are
+    still prepended as Google-specific prompt text. Inline audio tags such as
+    `[whispers]` or `[laughs]` inside a `[[tts:text]]` block are preserved
+    inside the Gemini transcript; OpenClaw does not generate these tags.
+  </Accordion>
+  <Accordion title="OpenAI">
+    Maps persona prompt fields to the request `instructions` field **only when**
+    no explicit OpenAI `instructions` is configured. Explicit `instructions`
+    always wins.
+  </Accordion>
+  <Accordion title="Other providers">
+    Use only the provider-specific persona bindings under
+    `personas.<id>.providers.<provider>`. Persona prompt fields are ignored
+    unless the provider implements its own persona-prompt mapping.
+  </Accordion>
+</AccordionGroup>
 
-Gemini inline audio tags are transcript content, not persona config. If the
-assistant or an explicit `[[tts:text]]` block includes tags such as `[whispers]`
-or `[laughs]`, OpenClaw preserves them inside the Gemini transcript. OpenClaw
-does not generate configured start tags.
+### Fallback policy
 
-### Disable Microsoft speech
+`fallbackPolicy` controls behavior when a persona has **no binding** for the
+attempted provider:
 
-```json5
-{
-  messages: {
-    tts: {
-      providers: {
-        microsoft: {
-          enabled: false,
-        },
-      },
-    },
-  },
-}
-```
+| Policy              | Behavior                                                                                                                                         |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `preserve-persona`  | **Default.** Provider-neutral prompt fields stay available; the provider may use them or ignore them.                                            |
+| `provider-defaults` | Persona is omitted from prompt preparation for that attempt; the provider uses its neutral defaults while fallback to other providers continues. |
+| `fail`              | Skip that provider attempt with `reasonCode: "not_configured"` and `personaBinding: "missing"`. Fallback providers are still tried.              |
 
-### Custom limits + prefs path
+The whole TTS request only fails when **every** attempted provider is skipped
+or fails.
 
-```json5
-{
-  messages: {
-    tts: {
-      auto: "always",
-      maxTextLength: 4000,
-      timeoutMs: 30000,
-      prefsPath: "~/.openclaw/settings/tts.json",
-    },
-  },
-}
-```
+## Model-driven directives
 
-### Only reply with audio after an inbound voice message
+By default, the assistant **can** emit `[[tts:...]]` directives to override
+voice, model, or speed for a single reply, plus an optional
+`[[tts:text]]...[[/tts:text]]` block for expressive cues that should appear in
+audio only:
 
-```json5
-{
-  messages: {
-    tts: {
-      auto: "inbound",
-    },
-  },
-}
-```
-
-### Disable auto-summary for long replies
-
-```json5
-{
-  messages: {
-    tts: {
-      auto: "always",
-    },
-  },
-}
-```
-
-Then run:
-
-```
-/tts summary off
-```
-
-### Notes on fields
-
-- `auto`: auto‑TTS mode (`off`, `always`, `inbound`, `tagged`).
-  - `inbound` only sends audio after an inbound voice message.
-  - `tagged` only sends audio when the reply includes `[[tts:key=value]]` directives or a `[[tts:text]]...[[/tts:text]]` block.
-- `enabled`: legacy toggle (doctor migrates this to `auto`).
-- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
-- `provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"inworld"`, `"microsoft"`, `"minimax"`, `"openai"`, `"volcengine"`, `"vydra"`, `"xai"`, or `"xiaomi"` (fallback is automatic).
-- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
-- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
-  rewritten to `provider: "microsoft"`.
-- `persona`: default TTS persona id from `personas`.
-- `personas.<id>`: stable spoken identity. The id is normalized to lowercase.
-- `personas.<id>.provider`: preferred speech provider for the persona. Explicit provider overrides and local provider prefs still win.
-- `personas.<id>.fallbackPolicy`: `preserve-persona` (default), `provider-defaults`, or `fail`; see [TTS personas](#tts-personas).
-- `personas.<id>.prompt`: provider-neutral persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`, `pacing`, `constraints`).
-- `personas.<id>.providers.<provider>`: provider-specific persona binding merged over `providers.<provider>`.
-- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
-  - Accepts `provider/model` or a configured model alias.
-- `modelOverrides`: allow the model to emit TTS directives (on by default).
-  - `allowProvider` defaults to `false` (provider switching is opt-in).
-- `providers.<id>`: provider-owned settings keyed by speech provider id.
-- Legacy direct provider blocks (`messages.tts.openai`, `messages.tts.elevenlabs`, `messages.tts.microsoft`, `messages.tts.edge`) are repaired by `openclaw doctor --fix`; committed config should use `messages.tts.providers.<id>`.
-- Legacy `messages.tts.providers.edge` is also repaired by `openclaw doctor --fix`; committed config should use `messages.tts.providers.microsoft`.
-- `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
-- `timeoutMs`: request timeout (ms).
-- `prefsPath`: override the local prefs JSON path (provider/limit/summary).
-- `apiKey` values fall back to env vars (`AZURE_SPEECH_KEY`/`AZURE_SPEECH_API_KEY`/`SPEECH_KEY`, `ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `INWORLD_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`, `XIAOMI_API_KEY`). Volcengine uses `appId`/`token` instead.
-- `providers.azure-speech.apiKey`: Azure Speech resource key (env:
-  `AZURE_SPEECH_KEY`, `AZURE_SPEECH_API_KEY`, or `SPEECH_KEY`).
-- `providers.azure-speech.region`: Azure Speech region such as `eastus` (env:
-  `AZURE_SPEECH_REGION` or `SPEECH_REGION`).
-- `providers.azure-speech.endpoint` / `providers.azure-speech.baseUrl`: optional
-  Azure Speech endpoint/base URL override.
-- `providers.azure-speech.voice`: Azure voice ShortName (default
-  `en-US-JennyNeural`).
-- `providers.azure-speech.lang`: SSML language code (default `en-US`).
-- `providers.azure-speech.outputFormat`: Azure `X-Microsoft-OutputFormat` for
-  standard audio output (default `audio-24khz-48kbitrate-mono-mp3`).
-- `providers.azure-speech.voiceNoteOutputFormat`: Azure
-  `X-Microsoft-OutputFormat` for voice-note output (default
-  `ogg-24khz-16bit-mono-opus`).
-- `providers.elevenlabs.baseUrl`: override ElevenLabs API base URL.
-- `providers.openai.baseUrl`: override the OpenAI TTS endpoint.
-  - Resolution order: `messages.tts.providers.openai.baseUrl` -> `OPENAI_TTS_BASE_URL` -> `https://api.openai.com/v1`
-  - Non-default values are treated as OpenAI-compatible TTS endpoints, so custom model and voice names are accepted.
-- `providers.elevenlabs.voiceSettings`:
-  - `stability`, `similarityBoost`, `style`: `0..1`
-  - `useSpeakerBoost`: `true|false`
-  - `speed`: `0.5..2.0` (1.0 = normal)
-- `providers.elevenlabs.applyTextNormalization`: `auto|on|off`
-- `providers.elevenlabs.languageCode`: 2-letter ISO 639-1 (e.g. `en`, `de`)
-- `providers.elevenlabs.seed`: integer `0..4294967295` (best-effort determinism)
-- `providers.minimax.baseUrl`: override MiniMax API base URL (default `https://api.minimax.io`, env: `MINIMAX_API_HOST`).
-- `providers.minimax.model`: TTS model (default `speech-2.8-hd`, env: `MINIMAX_TTS_MODEL`).
-- `providers.minimax.voiceId`: voice identifier (default `English_expressive_narrator`, env: `MINIMAX_TTS_VOICE_ID`).
-- `providers.minimax.speed`: playback speed `0.5..2.0` (default 1.0).
-- `providers.minimax.vol`: volume `(0, 10]` (default 1.0; must be greater than 0).
-- `providers.minimax.pitch`: integer pitch shift `-12..12` (default 0). Fractional values are truncated before calling MiniMax T2A because the API rejects non-integer pitch values.
-- `providers.tts-local-cli.command`: local executable or command string for CLI TTS.
-- `providers.tts-local-cli.args`: command arguments; supports `{{Text}}`, `{{OutputPath}}`, `{{OutputDir}}`, and `{{OutputBase}}` placeholders.
-- `providers.tts-local-cli.outputFormat`: expected CLI output format (`mp3`, `opus`, or `wav`; default `mp3` for audio attachments).
-- `providers.tts-local-cli.timeoutMs`: command timeout in milliseconds (default `120000`).
-- `providers.tts-local-cli.cwd`: optional command working directory.
-- `providers.tts-local-cli.env`: optional string environment overrides for the command.
-- `providers.inworld.baseUrl`: override Inworld API base URL (default `https://api.inworld.ai`).
-- `providers.inworld.voiceId`: Inworld voice identifier (default `Sarah`).
-- `providers.inworld.modelId`: Inworld TTS model (default `inworld-tts-1.5-max`; also supports `inworld-tts-1.5-mini`, `inworld-tts-1-max`, `inworld-tts-1`).
-- `providers.inworld.temperature`: sampling temperature `0..2` (optional).
-- `providers.google.model`: Gemini TTS model (default `gemini-3.1-flash-tts-preview`).
-- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
-- `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
-- `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
-- `providers.google.promptTemplate`: set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.
-- `providers.google.personaPrompt`: Google-specific extra persona prompt text appended to the template's Director's Notes.
-- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
-  - If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
-- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
-- `providers.gradium.voiceId`: Gradium voice identifier (default Emma, `YTpq7expH9539ERJ`).
-- `providers.volcengine.apiKey`: BytePlus Seed Speech API key (env:
-  `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY`).
-- `providers.volcengine.resourceId`: BytePlus Seed Speech resource id (default
-  `seed-tts-1.0`, env: `VOLCENGINE_TTS_RESOURCE_ID`; use `seed-tts-2.0` when
-  your BytePlus project has TTS 2.0 entitlement).
-- `providers.volcengine.appKey`: BytePlus Seed Speech app key header (default
-  `aGjiRDfUWi`, env: `VOLCENGINE_TTS_APP_KEY`).
-- `providers.volcengine.baseUrl`: override the Seed Speech TTS HTTP endpoint
-  (env: `VOLCENGINE_TTS_BASE_URL`).
-- `providers.volcengine.appId`: legacy Volcengine Speech Console application id (env: `VOLCENGINE_TTS_APPID`).
-- `providers.volcengine.token`: legacy Volcengine Speech Console access token (env: `VOLCENGINE_TTS_TOKEN`).
-- `providers.volcengine.cluster`: legacy Volcengine TTS cluster (default `volcano_tts`, env: `VOLCENGINE_TTS_CLUSTER`).
-- `providers.volcengine.voice`: voice type (default `en_female_anna_mars_bigtts`, env: `VOLCENGINE_TTS_VOICE`).
-- `providers.volcengine.speedRatio`: provider-native speed ratio.
-- `providers.volcengine.emotion`: provider-native emotion tag.
-- `providers.xai.apiKey`: xAI TTS API key (env: `XAI_API_KEY`).
-- `providers.xai.baseUrl`: override the xAI TTS base URL (default `https://api.x.ai/v1`, env: `XAI_BASE_URL`).
-- `providers.xai.voiceId`: xAI voice id (default `eve`; current live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`).
-- `providers.xai.language`: BCP-47 language code or `auto` (default `en`).
-- `providers.xai.responseFormat`: `mp3`, `wav`, `pcm`, `mulaw`, or `alaw` (default `mp3`).
-- `providers.xai.speed`: provider-native speed override.
-- `providers.xiaomi.apiKey`: Xiaomi MiMo API key (env: `XIAOMI_API_KEY`).
-- `providers.xiaomi.baseUrl`: override the Xiaomi MiMo API base URL (default `https://api.xiaomimimo.com/v1`, env: `XIAOMI_BASE_URL`).
-- `providers.xiaomi.model`: TTS model (default `mimo-v2.5-tts`, env: `XIAOMI_TTS_MODEL`; `mimo-v2-tts` is also supported).
-- `providers.xiaomi.voice`: MiMo voice id (default `mimo_default`, env: `XIAOMI_TTS_VOICE`).
-- `providers.xiaomi.format`: `mp3` or `wav` (default `mp3`, env: `XIAOMI_TTS_FORMAT`).
-- `providers.xiaomi.style`: optional natural-language style instruction sent as the user message; it is not spoken.
-- `providers.openrouter.apiKey`: OpenRouter API key (env: `OPENROUTER_API_KEY`; can reuse `models.providers.openrouter.apiKey`).
-- `providers.openrouter.baseUrl`: override the OpenRouter TTS base URL (default `https://openrouter.ai/api/v1`; legacy `https://openrouter.ai/v1` is normalized).
-- `providers.openrouter.model`: OpenRouter TTS model id (default `hexgrad/kokoro-82m`; `modelId` is also accepted).
-- `providers.openrouter.voice`: provider-specific voice id (default `af_alloy`; `voiceId` is also accepted).
-- `providers.openrouter.responseFormat`: `mp3` or `pcm` (default `mp3`).
-- `providers.openrouter.speed`: provider-native speed override.
-- `providers.microsoft.enabled`: allow Microsoft speech usage (default `true`; no API key).
-- `providers.microsoft.voice`: Microsoft neural voice name (e.g. `en-US-MichelleNeural`).
-- `providers.microsoft.lang`: language code (e.g. `en-US`).
-- `providers.microsoft.outputFormat`: Microsoft output format (e.g. `audio-24khz-48kbitrate-mono-mp3`).
-  - See Microsoft Speech output formats for valid values; not all formats are supported by the bundled Edge-backed transport.
-- `providers.microsoft.rate` / `providers.microsoft.pitch` / `providers.microsoft.volume`: percent strings (e.g. `+10%`, `-5%`).
-- `providers.microsoft.saveSubtitles`: write JSON subtitles alongside the audio file.
-- `providers.microsoft.proxy`: proxy URL for Microsoft speech requests.
-- `providers.microsoft.timeoutMs`: request timeout override (ms).
-- `edge.*`: legacy alias for the same Microsoft settings. Run
-  `openclaw doctor --fix` to rewrite persisted config to `providers.microsoft`.
-
-## Model-driven overrides (default on)
-
-By default, the model **can** emit TTS directives for a single reply.
-When `messages.tts.auto` is `tagged`, these directives are required to trigger audio.
-
-When enabled, the model can emit `[[tts:...]]` directives to override the voice
-for a single reply, plus an optional `[[tts:text]]...[[/tts:text]]` block to
-provide expressive tags (laughter, singing cues, etc) that should only appear in
-the audio.
-
-Streaming block delivery strips these directives from visible text before the
-channel sees them, even when a directive is split across adjacent blocks. Final
-mode still parses the accumulated raw reply for TTS synthesis.
-
-`provider=...` directives are ignored unless `modelOverrides.allowProvider: true`.
-When a reply declares `provider=...`, the other keys in that directive are
-parsed only by that provider. Unsupported keys are stripped from visible text
-and reported as TTS directive warnings instead of being routed to another
-provider.
-
-Example reply payload:
-
-```
+```text
 Here you go.
 
 [[tts:voiceId=pMsXgVXv3BLzUgSXRplE model=eleven_v3 speed=1.1]]
 [[tts:text]](laughs) Read the song once more.[[/tts:text]]
 ```
 
-Available directive keys (when enabled):
+When `messages.tts.auto` is `"tagged"`, **directives are required** to trigger
+audio. Streaming block delivery strips directives from visible text before the
+channel sees them, even when split across adjacent blocks.
 
-- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `gradium`, `minimax`, `microsoft`, `volcengine`, `vydra`, `xai`, or `xiaomi`; requires `allowProvider: true`)
-- `voice` (OpenAI, Gradium, Volcengine, or Xiaomi voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / Gradium / MiniMax / xAI)
-- `model` (OpenAI TTS model, ElevenLabs model id, MiniMax model, or Xiaomi MiMo TTS model) or `google_model` (Google TTS model)
+`provider=...` is ignored unless `modelOverrides.allowProvider: true`. When a
+reply declares `provider=...`, the other keys in that directive are parsed
+only by that provider; unsupported keys are stripped and reported as TTS
+directive warnings.
+
+**Available directive keys:**
+
+- `provider` (registered provider id; requires `allowProvider: true`)
+- `voice` / `voiceName` / `voice_name` / `google_voice` / `voiceId`
+- `model` / `google_model`
 - `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
-- `vol` / `volume` (MiniMax volume, 0-10)
-- `pitch` (MiniMax integer pitch, -12 to 12; fractional values are truncated before the MiniMax request)
+- `vol` / `volume` (MiniMax volume, 0–10)
+- `pitch` (MiniMax integer pitch, −12 to 12; fractional values are truncated)
 - `emotion` (Volcengine emotion tag)
 - `applyTextNormalization` (`auto|on|off`)
 - `languageCode` (ISO 639-1)
 - `seed`
 
-Disable all model overrides:
+**Disable model overrides entirely:**
 
 ```json5
-{
-  messages: {
-    tts: {
-      modelOverrides: {
-        enabled: false,
-      },
-    },
-  },
-}
+{ messages: { tts: { modelOverrides: { enabled: false } } } }
 ```
 
-Optional allowlist (enable provider switching while keeping other knobs configurable):
+**Allow provider switching while keeping other knobs configurable:**
 
 ```json5
-{
-  messages: {
-    tts: {
-      modelOverrides: {
-        enabled: true,
-        allowProvider: true,
-        allowSeed: false,
-      },
-    },
-  },
-}
+{ messages: { tts: { modelOverrides: { enabled: true, allowProvider: true, allowSeed: false } } } }
 ```
 
+## Slash commands
+
+Single command `/tts`. On Discord, OpenClaw also registers `/voice` because
+`/tts` is a built-in Discord command — text `/tts ...` still works.
+
+```text
+/tts off | on | status
+/tts chat on | off | default
+/tts latest
+/tts provider <id>
+/tts persona <id> | off
+/tts limit <chars>
+/tts summary off
+/tts audio <text>
+```
+
+<Note>
+Commands require an authorized sender (allowlist/owner rules apply) and either
+`commands.text` or native command registration must be enabled.
+</Note>
+
+Behavior notes:
+
+- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
+- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
+- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
+- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
+- `/tts audio` generates a one-off audio reply (does **not** toggle TTS on).
+- `limit` and `summary` are stored in **local prefs**, not the main config.
+- `/tts status` includes fallback diagnostics for the latest attempt — `Fallback: <primary> -> <used>`, `Attempts: ...`, and per-attempt detail (`provider:outcome(reasonCode) latency`).
+- `/status` shows the active TTS mode plus configured provider, model, voice, and sanitized custom endpoint metadata when TTS is enabled.
+
 ## Per-user preferences
 
-Slash commands write local overrides to `prefsPath` (default:
-`~/.openclaw/settings/tts.json`, override with `OPENCLAW_TTS_PREFS` or
-`messages.tts.prefsPath`).
+Slash commands write local overrides to `prefsPath`. The default is
+`~/.openclaw/settings/tts.json`; override with the `OPENCLAW_TTS_PREFS` env var
+or `messages.tts.prefsPath`.
 
-Stored fields:
-
-- `auto`
-- `provider`
-- `persona`
-- `maxLength` (summary threshold; default 1500 chars)
-- `summarize` (default `true`)
+| Stored field | Effect                                       |
+| ------------ | -------------------------------------------- |
+| `auto`       | Local auto-TTS override (`always`, `off`, …) |
+| `provider`   | Local primary provider override              |
+| `persona`    | Local persona override                       |
+| `maxLength`  | Summary threshold (default `1500` chars)     |
+| `summarize`  | Summary toggle (default `true`)              |
 
 These override the effective config from `messages.tts` plus the active
 `agents.list[].tts` block for that host.
 
-## Output formats (fixed)
-
-- **Feishu / Matrix / Telegram / WhatsApp**: voice-note replies prefer Opus (`opus_48000_64` from ElevenLabs, `opus` from OpenAI).
-  - 48kHz / 64kbps is a good voice message tradeoff.
-- **Feishu / WhatsApp**: when a voice-note reply is produced as MP3/WebM/WAV/M4A
-  or another likely audio file, the channel plugin transcodes it to 48kHz
-  Ogg/Opus with `ffmpeg` before sending the native voice message. WhatsApp sends
-  the result through the Baileys `audio` payload with `ptt: true` and
-  `audio/ogg; codecs=opus`. If conversion fails, Feishu receives the original
-  file as an attachment; WhatsApp send fails rather than posting an incompatible
-  PTT payload.
-- **Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI).
-  - 44.1kHz / 128kbps is the default balance for speech clarity.
-- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate) for normal audio attachments. For voice-note targets such as Feishu, Telegram, and WhatsApp, OpenClaw transcodes the MiniMax MP3 to 48kHz Opus with `ffmpeg` before delivery.
-- **Xiaomi MiMo**: MP3 by default, or WAV when configured. For voice-note targets such as Feishu, Telegram, and WhatsApp, OpenClaw transcodes Xiaomi output to 48kHz Opus with `ffmpeg` before delivery.
-- **Local CLI**: uses the configured `outputFormat`. Voice-note targets are
-  converted to Ogg/Opus and telephony output is converted to raw 16 kHz mono PCM
-  with `ffmpeg`.
-- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments, transcodes it to 48kHz Opus for voice-note targets, and returns PCM directly for Talk/telephony.
-- **Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
-- **Inworld**: MP3 for normal audio attachments, native `OGG_OPUS` for voice-note targets, and raw `PCM` at 22050 Hz for Talk/telephony.
-- **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
-- **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
-  - The bundled transport accepts an `outputFormat`, but not all formats are available from the service.
-  - Output format values follow Microsoft Speech output formats (including Ogg/WebM Opus).
-  - Telegram `sendVoice` accepts OGG/MP3/M4A; use OpenAI/ElevenLabs if you need
-    guaranteed Opus voice messages.
-  - If the configured Microsoft output format fails, OpenClaw retries with MP3.
-
-OpenAI/ElevenLabs output formats are fixed per channel (see above).
-
 ## Auto-TTS behavior
 
-When enabled, OpenClaw:
+When `messages.tts.auto` is enabled, OpenClaw:
 
-- skips TTS if the reply already contains media or a `MEDIA:` directive.
-- skips very short replies (< 10 chars).
-- summarizes long replies when enabled using `agents.defaults.model.primary` (or `summaryModel`).
-- attaches the generated audio to the reply.
-- in `mode: "final"`, still sends audio-only TTS for streamed final replies
+- Skips TTS if the reply already contains media or a `MEDIA:` directive.
+- Skips very short replies (under 10 chars).
+- Summarizes long replies when summaries are enabled, using
+  `summaryModel` (or `agents.defaults.model.primary`).
+- Attaches the generated audio to the reply.
+- In `mode: "final"`, still sends audio-only TTS for streamed final replies
   after the text stream completes; the generated media goes through the same
   channel media normalization as normal reply attachments.
 
 If the reply exceeds `maxLength` and summary is off (or no API key for the
-summary model), audio
-is skipped and the normal text reply is sent.
+summary model), audio is skipped and the normal text reply is sent.
 
-## Flow diagram
-
-```
+```text
 Reply -> TTS enabled?
   no  -> send text
   yes -> has media / MEDIA: / short?
@@ -929,80 +670,247 @@ Reply -> TTS enabled?
                    no  -> TTS -> attach audio
                    yes -> summary enabled?
                             no  -> send text
-                            yes -> summarize (summaryModel or agents.defaults.model.primary)
-                                      -> TTS -> attach audio
+                            yes -> summarize -> TTS -> attach audio
 ```
 
-## Slash command usage
+## Output formats by channel
 
-There is a single command: `/tts`.
-See [Slash commands](/tools/slash-commands) for enablement details.
+| Target                                | Format                                                                                                                                |
+| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| Feishu / Matrix / Telegram / WhatsApp | Voice-note replies prefer **Opus** (`opus_48000_64` from ElevenLabs, `opus` from OpenAI). 48 kHz / 64 kbps balances clarity and size. |
+| Other channels                        | **MP3** (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI). 44.1 kHz / 128 kbps default for speech.                                 |
+| Talk / telephony                      | Provider-native **PCM** (Inworld 22050 Hz, Google 24 kHz), or `ulaw_8000` from Gradium for telephony.                                 |
 
-Discord note: `/tts` is a built-in Discord command, so OpenClaw registers
-`/voice` as the native command there. Text `/tts ...` still works.
+Per-provider notes:
 
-```
-/tts off
-/tts on
-/tts status
-/tts chat on
-/tts chat off
-/tts chat default
-/tts latest
-/tts provider openai
-/tts persona alfred
-/tts limit 2000
-/tts summary off
-/tts audio Hello from OpenClaw
-```
+- **Feishu / WhatsApp transcoding:** When a voice-note reply lands as MP3/WebM/WAV/M4A, the channel plugin transcodes to 48 kHz Ogg/Opus with `ffmpeg`. WhatsApp sends through Baileys with `ptt: true` and `audio/ogg; codecs=opus`. If conversion fails: Feishu falls back to attaching the original file; WhatsApp send fails rather than posting an incompatible PTT payload.
+- **MiniMax / Xiaomi MiMo:** Default MP3 (32 kHz for MiniMax `speech-2.8-hd`); transcoded to 48 kHz Opus for voice-note targets via `ffmpeg`.
+- **Local CLI:** Uses configured `outputFormat`. Voice-note targets are converted to Ogg/Opus and telephony output to raw 16 kHz mono PCM.
+- **Google Gemini:** Returns raw 24 kHz PCM. OpenClaw wraps as WAV for attachments, transcodes to 48 kHz Opus for voice-note targets, returns PCM directly for Talk/telephony.
+- **Inworld:** MP3 attachments, native `OGG_OPUS` voice-note, raw `PCM` 22050 Hz for Talk/telephony.
+- **xAI:** MP3 by default; `responseFormat` may be `mp3|wav|pcm|mulaw|alaw`. Uses xAI's batch REST endpoint — streaming WebSocket TTS is **not** used. Native Opus voice-note format is **not** supported.
+- **Microsoft:** Uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`). Telegram `sendVoice` accepts OGG/MP3/M4A; use OpenAI/ElevenLabs if you need guaranteed Opus voice messages. If the configured Microsoft format fails, OpenClaw retries with MP3.
 
-Notes:
+OpenAI and ElevenLabs output formats are fixed per channel as listed above.
 
-- Commands require an authorized sender (allowlist/owner rules still apply).
-- `commands.text` or native command registration must be enabled.
-- Config `messages.tts.auto` accepts `off|always|inbound|tagged`.
-- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
-- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
-- Use config when you want `inbound` or `tagged` defaults.
-- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
-- `limit` and `summary` are stored in local prefs, not the main config.
-- `/tts audio` generates a one-off audio reply (does not toggle TTS on).
-- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
-- `/tts status` includes fallback visibility for the latest attempt:
-  - success fallback: `Fallback: <primary> -> <used>` plus `Attempts: ...`
-  - failure: `Error: ...` plus `Attempts: ...`
-  - detailed diagnostics: `Attempt details: provider:outcome(reasonCode) latency`
-- `/status` shows the active TTS mode plus configured provider, model, voice,
-  and sanitized custom endpoint metadata when TTS is enabled.
-- OpenAI and ElevenLabs API failures now include parsed provider error detail and request id (when returned by the provider), which is surfaced in TTS errors/logs.
+## Field reference
+
+<AccordionGroup>
+  <Accordion title="Top-level messages.tts.*">
+    <ParamField path="auto" type='"off" | "always" | "inbound" | "tagged"'>
+      Auto-TTS mode. `inbound` only sends audio after an inbound voice message; `tagged` only sends audio when the reply includes `[[tts:...]]` directives or a `[[tts:text]]` block.
+    </ParamField>
+    <ParamField path="enabled" type="boolean" deprecated>
+      Legacy toggle. `openclaw doctor --fix` migrates this to `auto`.
+    </ParamField>
+    <ParamField path="mode" type='"final" | "all"' default="final">
+      `"all"` includes tool/block replies in addition to final replies.
+    </ParamField>
+    <ParamField path="provider" type="string">
+      Speech provider id. When unset, OpenClaw uses the first configured provider in registry auto-select order. Legacy `provider: "edge"` is rewritten to `"microsoft"` by `openclaw doctor --fix`.
+    </ParamField>
+    <ParamField path="persona" type="string">
+      Active persona id from `personas`. Normalized to lowercase.
+    </ParamField>
+    <ParamField path="personas.<id>" type="object">
+      Stable spoken identity. Fields: `label`, `description`, `provider`, `fallbackPolicy`, `prompt`, `providers.<provider>`. See [Personas](#personas).
+    </ParamField>
+    <ParamField path="summaryModel" type="string">
+      Cheap model for auto-summary; defaults to `agents.defaults.model.primary`. Accepts `provider/model` or a configured model alias.
+    </ParamField>
+    <ParamField path="modelOverrides" type="object">
+      Allow the model to emit TTS directives. `enabled` defaults to `true`; `allowProvider` defaults to `false`.
+    </ParamField>
+    <ParamField path="providers.<id>" type="object">
+      Provider-owned settings keyed by speech provider id. Legacy direct blocks (`messages.tts.openai`, `.elevenlabs`, `.microsoft`, `.edge`) are rewritten by `openclaw doctor --fix`; commit only `messages.tts.providers.<id>`.
+    </ParamField>
+    <ParamField path="maxTextLength" type="number">
+      Hard cap for TTS input characters. `/tts audio` fails if exceeded.
+    </ParamField>
+    <ParamField path="timeoutMs" type="number">
+      Request timeout in milliseconds.
+    </ParamField>
+    <ParamField path="prefsPath" type="string">
+      Override the local prefs JSON path (provider/limit/summary). Default `~/.openclaw/settings/tts.json`.
+    </ParamField>
+  </Accordion>
+
+  <Accordion title="OpenAI">
+    <ParamField path="apiKey" type="string">Falls back to `OPENAI_API_KEY`.</ParamField>
+    <ParamField path="model" type="string">OpenAI TTS model id (e.g. `gpt-4o-mini-tts`).</ParamField>
+    <ParamField path="voice" type="string">Voice name (e.g. `alloy`, `cedar`).</ParamField>
+    <ParamField path="instructions" type="string">Explicit OpenAI `instructions` field. When set, persona prompt fields are **not** auto-mapped.</ParamField>
+    <ParamField path="baseUrl" type="string">
+      Override the OpenAI TTS endpoint. Resolution order: config → `OPENAI_TTS_BASE_URL` → `https://api.openai.com/v1`. Non-default values are treated as OpenAI-compatible TTS endpoints, so custom model and voice names are accepted.
+    </ParamField>
+  </Accordion>
+
+  <Accordion title="ElevenLabs">
+    <ParamField path="apiKey" type="string">Falls back to `ELEVENLABS_API_KEY` or `XI_API_KEY`.</ParamField>
+    <ParamField path="model" type="string">Model id (e.g. `eleven_multilingual_v2`, `eleven_v3`).</ParamField>
+    <ParamField path="voiceId" type="string">ElevenLabs voice id.</ParamField>
+    <ParamField path="voiceSettings" type="object">
+      `stability`, `similarityBoost`, `style` (each `0..1`), `useSpeakerBoost` (`true|false`), `speed` (`0.5..2.0`, `1.0` = normal).
+    </ParamField>
+    <ParamField path="applyTextNormalization" type='"auto" | "on" | "off"'>Text normalization mode.</ParamField>
+    <ParamField path="languageCode" type="string">2-letter ISO 639-1 (e.g. `en`, `de`).</ParamField>
+    <ParamField path="seed" type="number">Integer `0..4294967295` for best-effort determinism.</ParamField>
+    <ParamField path="baseUrl" type="string">Override ElevenLabs API base URL.</ParamField>
+  </Accordion>
+
+  <Accordion title="Google Gemini">
+    <ParamField path="apiKey" type="string">Falls back to `GEMINI_API_KEY` / `GOOGLE_API_KEY`. If omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.</ParamField>
+    <ParamField path="model" type="string">Gemini TTS model. Default `gemini-3.1-flash-tts-preview`.</ParamField>
+    <ParamField path="voiceName" type="string">Gemini prebuilt voice name. Default `Kore`. Alias: `voice`.</ParamField>
+    <ParamField path="audioProfile" type="string">Natural-language style prompt prepended before spoken text.</ParamField>
+    <ParamField path="speakerName" type="string">Optional speaker label prepended before spoken text when your prompt uses a named speaker.</ParamField>
+    <ParamField path="promptTemplate" type='"audio-profile-v1"'>Set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.</ParamField>
+    <ParamField path="personaPrompt" type="string">Google-specific extra persona prompt text appended to the template's Director's Notes.</ParamField>
+    <ParamField path="baseUrl" type="string">Only `https://generativelanguage.googleapis.com` is accepted.</ParamField>
+  </Accordion>
+
+  <Accordion title="Azure Speech">
+    <ParamField path="apiKey" type="string">Env: `AZURE_SPEECH_KEY`, `AZURE_SPEECH_API_KEY`, or `SPEECH_KEY`.</ParamField>
+    <ParamField path="region" type="string">Azure Speech region (e.g. `eastus`). Env: `AZURE_SPEECH_REGION` or `SPEECH_REGION`.</ParamField>
+    <ParamField path="endpoint" type="string">Optional Azure Speech endpoint override (alias `baseUrl`).</ParamField>
+    <ParamField path="voice" type="string">Azure voice ShortName. Default `en-US-JennyNeural`.</ParamField>
+    <ParamField path="lang" type="string">SSML language code. Default `en-US`.</ParamField>
+    <ParamField path="outputFormat" type="string">Azure `X-Microsoft-OutputFormat` for standard audio. Default `audio-24khz-48kbitrate-mono-mp3`.</ParamField>
+    <ParamField path="voiceNoteOutputFormat" type="string">Azure `X-Microsoft-OutputFormat` for voice-note output. Default `ogg-24khz-16bit-mono-opus`.</ParamField>
+  </Accordion>
+
+  <Accordion title="Microsoft (no API key)">
+    <ParamField path="enabled" type="boolean" default="true">Allow Microsoft speech usage.</ParamField>
+    <ParamField path="voice" type="string">Microsoft neural voice name (e.g. `en-US-MichelleNeural`).</ParamField>
+    <ParamField path="lang" type="string">Language code (e.g. `en-US`).</ParamField>
+    <ParamField path="outputFormat" type="string">Microsoft output format. Default `audio-24khz-48kbitrate-mono-mp3`. Not all formats are supported by the bundled Edge-backed transport.</ParamField>
+    <ParamField path="rate / pitch / volume" type="string">Percent strings (e.g. `+10%`, `-5%`).</ParamField>
+    <ParamField path="saveSubtitles" type="boolean">Write JSON subtitles alongside the audio file.</ParamField>
+    <ParamField path="proxy" type="string">Proxy URL for Microsoft speech requests.</ParamField>
+    <ParamField path="timeoutMs" type="number">Request timeout override (ms).</ParamField>
+    <ParamField path="edge.*" type="object" deprecated>Legacy alias. Run `openclaw doctor --fix` to rewrite persisted config to `providers.microsoft`.</ParamField>
+  </Accordion>
+
+  <Accordion title="MiniMax">
+    <ParamField path="apiKey" type="string">Falls back to `MINIMAX_API_KEY`. Token Plan auth via `MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`, or `MINIMAX_CODING_API_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://api.minimax.io`. Env: `MINIMAX_API_HOST`.</ParamField>
+    <ParamField path="model" type="string">Default `speech-2.8-hd`. Env: `MINIMAX_TTS_MODEL`.</ParamField>
+    <ParamField path="voiceId" type="string">Default `English_expressive_narrator`. Env: `MINIMAX_TTS_VOICE_ID`.</ParamField>
+    <ParamField path="speed" type="number">`0.5..2.0`. Default `1.0`.</ParamField>
+    <ParamField path="vol" type="number">`(0, 10]`. Default `1.0`.</ParamField>
+    <ParamField path="pitch" type="number">Integer `-12..12`. Default `0`. Fractional values are truncated before the request.</ParamField>
+  </Accordion>
+
+  <Accordion title="Inworld">
+    <ParamField path="apiKey" type="string">Env: `INWORLD_API_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://api.inworld.ai`.</ParamField>
+    <ParamField path="modelId" type="string">Default `inworld-tts-1.5-max`. Also: `inworld-tts-1.5-mini`, `inworld-tts-1-max`, `inworld-tts-1`.</ParamField>
+    <ParamField path="voiceId" type="string">Default `Sarah`.</ParamField>
+    <ParamField path="temperature" type="number">Sampling temperature `0..2`.</ParamField>
+  </Accordion>
+
+  <Accordion title="xAI">
+    <ParamField path="apiKey" type="string">Env: `XAI_API_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://api.x.ai/v1`. Env: `XAI_BASE_URL`.</ParamField>
+    <ParamField path="voiceId" type="string">Default `eve`. Live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`.</ParamField>
+    <ParamField path="language" type="string">BCP-47 language code or `auto`. Default `en`.</ParamField>
+    <ParamField path="responseFormat" type='"mp3" | "wav" | "pcm" | "mulaw" | "alaw"'>Default `mp3`.</ParamField>
+    <ParamField path="speed" type="number">Provider-native speed override.</ParamField>
+  </Accordion>
+
+  <Accordion title="Volcengine (BytePlus Seed Speech)">
+    <ParamField path="apiKey" type="string">Env: `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY`.</ParamField>
+    <ParamField path="resourceId" type="string">Default `seed-tts-1.0`. Env: `VOLCENGINE_TTS_RESOURCE_ID`. Use `seed-tts-2.0` when your project has TTS 2.0 entitlement.</ParamField>
+    <ParamField path="appKey" type="string">App key header. Default `aGjiRDfUWi`. Env: `VOLCENGINE_TTS_APP_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Override the Seed Speech TTS HTTP endpoint. Env: `VOLCENGINE_TTS_BASE_URL`.</ParamField>
+    <ParamField path="voice" type="string">Voice type. Default `en_female_anna_mars_bigtts`. Env: `VOLCENGINE_TTS_VOICE`.</ParamField>
+    <ParamField path="speedRatio" type="number">Provider-native speed ratio.</ParamField>
+    <ParamField path="emotion" type="string">Provider-native emotion tag.</ParamField>
+    <ParamField path="appId / token / cluster" type="string" deprecated>Legacy Volcengine Speech Console fields. Env: `VOLCENGINE_TTS_APPID`, `VOLCENGINE_TTS_TOKEN`, `VOLCENGINE_TTS_CLUSTER` (default `volcano_tts`).</ParamField>
+  </Accordion>
+
+  <Accordion title="Xiaomi MiMo">
+    <ParamField path="apiKey" type="string">Env: `XIAOMI_API_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://api.xiaomimimo.com/v1`. Env: `XIAOMI_BASE_URL`.</ParamField>
+    <ParamField path="model" type="string">Default `mimo-v2.5-tts`. Env: `XIAOMI_TTS_MODEL`. Also supports `mimo-v2-tts`.</ParamField>
+    <ParamField path="voice" type="string">Default `mimo_default`. Env: `XIAOMI_TTS_VOICE`.</ParamField>
+    <ParamField path="format" type='"mp3" | "wav"'>Default `mp3`. Env: `XIAOMI_TTS_FORMAT`.</ParamField>
+    <ParamField path="style" type="string">Optional natural-language style instruction sent as the user message; not spoken.</ParamField>
+  </Accordion>
+
+  <Accordion title="OpenRouter">
+    <ParamField path="apiKey" type="string">Env: `OPENROUTER_API_KEY`. Can reuse `models.providers.openrouter.apiKey`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://openrouter.ai/api/v1`. Legacy `https://openrouter.ai/v1` is normalized.</ParamField>
+    <ParamField path="model" type="string">Default `hexgrad/kokoro-82m`. Alias: `modelId`.</ParamField>
+    <ParamField path="voice" type="string">Default `af_alloy`. Alias: `voiceId`.</ParamField>
+    <ParamField path="responseFormat" type='"mp3" | "pcm"'>Default `mp3`.</ParamField>
+    <ParamField path="speed" type="number">Provider-native speed override.</ParamField>
+  </Accordion>
+
+  <Accordion title="Gradium">
+    <ParamField path="apiKey" type="string">Env: `GRADIUM_API_KEY`.</ParamField>
+    <ParamField path="baseUrl" type="string">Default `https://api.gradium.ai`.</ParamField>
+    <ParamField path="voiceId" type="string">Default Emma (`YTpq7expH9539ERJ`).</ParamField>
+  </Accordion>
+
+  <Accordion title="Local CLI (tts-local-cli)">
+    <ParamField path="command" type="string">Local executable or command string for CLI TTS.</ParamField>
+    <ParamField path="args" type="string[]">Command arguments. Supports `{{Text}}`, `{{OutputPath}}`, `{{OutputDir}}`, `{{OutputBase}}` placeholders.</ParamField>
+    <ParamField path="outputFormat" type='"mp3" | "opus" | "wav"'>Expected CLI output format. Default `mp3` for audio attachments.</ParamField>
+    <ParamField path="timeoutMs" type="number">Command timeout in milliseconds. Default `120000`.</ParamField>
+    <ParamField path="cwd" type="string">Optional command working directory.</ParamField>
+    <ParamField path="env" type="Record<string, string>">Optional environment overrides for the command.</ParamField>
+  </Accordion>
+</AccordionGroup>
 
 ## Agent tool
 
 The `tts` tool converts text to speech and returns an audio attachment for
-reply delivery. When the channel is Feishu, Matrix, Telegram, or WhatsApp,
-the audio is delivered as a voice message rather than a file attachment.
-Feishu and WhatsApp can transcode non-Opus TTS output on this path when
-`ffmpeg` is available.
+reply delivery. On Feishu, Matrix, Telegram, and WhatsApp, the audio is
+delivered as a voice message rather than a file attachment. Feishu and
+WhatsApp can transcode non-Opus TTS output on this path when `ffmpeg` is
+available.
+
 WhatsApp sends audio through Baileys as a PTT voice note (`audio` with
-`ptt: true`), and sends visible text separately from PTT audio because clients
-do not consistently render captions on voice notes.
-It accepts optional `channel` and `timeoutMs` fields; `timeoutMs` is a
+`ptt: true`) and sends visible text **separately** from PTT audio because
+clients do not consistently render captions on voice notes.
+
+The tool accepts optional `channel` and `timeoutMs` fields; `timeoutMs` is a
 per-call provider request timeout in milliseconds.
 
 ## Gateway RPC
 
-Gateway methods:
+| Method            | Purpose                                  |
+| ----------------- | ---------------------------------------- |
+| `tts.status`      | Read current TTS state and last attempt. |
+| `tts.enable`      | Set local auto preference to `always`.   |
+| `tts.disable`     | Set local auto preference to `off`.      |
+| `tts.convert`     | One-off text → audio.                    |
+| `tts.setProvider` | Set local provider preference.           |
+| `tts.setPersona`  | Set local persona preference.            |
+| `tts.providers`   | List configured providers and status.    |
 
-- `tts.status`
-- `tts.enable`
-- `tts.disable`
-- `tts.convert`
-- `tts.setProvider`
-- `tts.setPersona`
-- `tts.providers`
+## Service links
+
+- [OpenAI text-to-speech guide](https://platform.openai.com/docs/guides/text-to-speech)
+- [OpenAI Audio API reference](https://platform.openai.com/docs/api-reference/audio)
+- [Azure Speech REST text-to-speech](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech)
+- [Azure Speech provider](/providers/azure-speech)
+- [ElevenLabs Text to Speech](https://elevenlabs.io/docs/api-reference/text-to-speech)
+- [ElevenLabs Authentication](https://elevenlabs.io/docs/api-reference/authentication)
+- [Gradium](/providers/gradium)
+- [Inworld TTS API](https://docs.inworld.ai/tts/tts)
+- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2)
+- [Volcengine TTS HTTP API](/providers/volcengine#text-to-speech)
+- [Xiaomi MiMo speech synthesis](/providers/xiaomi#text-to-speech)
+- [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts)
+- [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs)
+- [xAI text to speech](https://docs.x.ai/developers/rest-api-reference/inference/voice#text-to-speech-rest)
 
 ## Related
 
 - [Media overview](/tools/media-overview)
 - [Music generation](/tools/music-generation)
 - [Video generation](/tools/video-generation)
+- [Slash commands](/tools/slash-commands)
+- [Voice call plugin](/plugins/voice-call)