Files
openclaw/docs/providers/azure-speech.md
2026-04-26 01:42:51 +01:00

6.2 KiB

summary, read_when, title
summary read_when title
Azure AI Speech text-to-speech for OpenClaw replies
You want Azure Speech synthesis for outbound replies
You need native Ogg Opus voice-note output from Azure Speech
Azure Speech

Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call.

OpenClaw uses the Azure Speech REST API directly with SSML and sends the provider-owned output format through X-Microsoft-OutputFormat.

Detail Value
Website Azure AI Speech
Docs Speech REST text-to-speech
Auth AZURE_SPEECH_KEY plus AZURE_SPEECH_REGION
Default voice en-US-JennyNeural
Default file output audio-24khz-48kbitrate-mono-mp3
Default voice-note file ogg-24khz-16bit-mono-opus

Getting started

In the Azure portal, create a Speech resource. Copy **KEY 1** from Resource Management > Keys and Endpoint, and copy the resource location such as `eastus`.
```
AZURE_SPEECH_KEY=<speech-resource-key>
AZURE_SPEECH_REGION=eastus
```
```json5 { messages: { tts: { auto: "always", provider: "azure-speech", providers: { "azure-speech": { voice: "en-US-JennyNeural", lang: "en-US", }, }, }, }, } ``` Send a reply through any connected channel. OpenClaw synthesizes the audio with Azure Speech and delivers MP3 for standard audio, or Ogg/Opus when the channel expects a voice note.

Configuration options

Option Path Description
apiKey messages.tts.providers.azure-speech.apiKey Azure Speech resource key. Falls back to AZURE_SPEECH_KEY, AZURE_SPEECH_API_KEY, or SPEECH_KEY.
region messages.tts.providers.azure-speech.region Azure Speech resource region. Falls back to AZURE_SPEECH_REGION or SPEECH_REGION.
endpoint messages.tts.providers.azure-speech.endpoint Optional Azure Speech endpoint/base URL override.
baseUrl messages.tts.providers.azure-speech.baseUrl Optional Azure Speech base URL override.
voice messages.tts.providers.azure-speech.voice Azure voice ShortName (default en-US-JennyNeural).
lang messages.tts.providers.azure-speech.lang SSML language code (default en-US).
outputFormat messages.tts.providers.azure-speech.outputFormat Audio-file output format (default audio-24khz-48kbitrate-mono-mp3).
voiceNoteOutputFormat messages.tts.providers.azure-speech.voiceNoteOutputFormat Voice-note output format (default ogg-24khz-16bit-mono-opus).

Notes

Azure Speech uses a Speech resource key, not an Azure OpenAI key. The key is sent as `Ocp-Apim-Subscription-Key`; OpenClaw derives `https://.tts.speech.microsoft.com` from `region` unless you provide `endpoint` or `baseUrl`. Use the Azure Speech voice `ShortName` value, for example `en-US-JennyNeural`. The bundled provider can list voices through the same Speech resource and filters voices marked deprecated or retired. Azure accepts output formats such as `audio-24khz-48kbitrate-mono-mp3`, `ogg-24khz-16bit-mono-opus`, and `riff-24khz-16bit-mono-pcm`. OpenClaw requests Ogg/Opus for `voice-note` targets so channels can send native voice bubbles without an extra MP3 conversion. `azure` is accepted as a provider alias for existing PRs and user config, but new config should use `azure-speech` to avoid confusion with Azure OpenAI model providers. TTS overview, providers, and `messages.tts` config. Full config reference including `messages.tts` settings. All bundled OpenClaw providers. Common issues and debugging steps.