From 5163a2fbf771dd90b1b5f51985bd4b5946fac038 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sat, 25 Apr 2026 08:42:23 +0100 Subject: [PATCH] docs: document Talk MLX config --- docs/gateway/config-agents.md | 6 ++++++ docs/nodes/talk.md | 28 ++++++++++++++++++++-------- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/docs/gateway/config-agents.md b/docs/gateway/config-agents.md index a9b98244602..1dfed57c2f5 100644 --- a/docs/gateway/config-agents.md +++ b/docs/gateway/config-agents.md @@ -1326,6 +1326,10 @@ Defaults for Talk mode (macOS/iOS/Android). outputFormat: "mp3_44100_128", apiKey: "elevenlabs_api_key", }, + mlx: { + modelId: "mlx-community/Soprano-80M-bf16", + }, + system: {}, }, silenceTimeoutMs: 1500, interruptOnSpeech: true, @@ -1339,6 +1343,8 @@ Defaults for Talk mode (macOS/iOS/Android). - `providers.*.apiKey` accepts plaintext strings or SecretRef objects. - `ELEVENLABS_API_KEY` fallback applies only when no Talk API key is configured. - `providers.*.voiceAliases` lets Talk directives use friendly names. +- `providers.mlx.modelId` selects the Hugging Face repo used by the macOS local MLX helper. If omitted, macOS uses `mlx-community/Soprano-80M-bf16`. +- macOS MLX playback runs through the bundled `openclaw-mlx-tts` helper when present, or an executable on `PATH`; `OPENCLAW_MLX_TTS_BIN` overrides the helper path for development. - `silenceTimeoutMs` controls how long Talk mode waits after user silence before it sends the transcript. Unset keeps the platform default pause window (`700 ms on macOS and Android, 900 ms on iOS`). --- diff --git a/docs/nodes/talk.md b/docs/nodes/talk.md index dd3dd6b7941..fe65b915aca 100644 --- a/docs/nodes/talk.md +++ b/docs/nodes/talk.md @@ -1,5 +1,5 @@ --- -summary: "Talk mode: continuous speech conversations with ElevenLabs TTS" +summary: "Talk mode: continuous speech conversations with configured TTS providers" read_when: - Implementing Talk mode on macOS/iOS/Android - Changing voice/TTS/interrupt behavior @@ -50,10 +50,19 @@ Supported keys: ```json5 { talk: { - voiceId: "elevenlabs_voice_id", - modelId: "eleven_v3", - outputFormat: "mp3_44100_128", - apiKey: "elevenlabs_api_key", + provider: "elevenlabs", + providers: { + elevenlabs: { + voiceId: "elevenlabs_voice_id", + modelId: "eleven_v3", + outputFormat: "mp3_44100_128", + apiKey: "elevenlabs_api_key", + }, + mlx: { + modelId: "mlx-community/Soprano-80M-bf16", + }, + system: {}, + }, silenceTimeoutMs: 1500, interruptOnSpeech: true, }, @@ -64,9 +73,11 @@ Defaults: - `interruptOnSpeech`: true - `silenceTimeoutMs`: when unset, Talk keeps the platform default pause window before sending the transcript (`700 ms on macOS and Android, 900 ms on iOS`) -- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available) -- `modelId`: defaults to `eleven_v3` when unset -- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available) +- `provider`: selects the active Talk provider. Use `elevenlabs`, `mlx`, or `system` for the macOS-local playback paths. +- `providers..voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` for ElevenLabs (or first ElevenLabs voice when API key is available). +- `providers.elevenlabs.modelId`: defaults to `eleven_v3` when unset. +- `providers.mlx.modelId`: defaults to `mlx-community/Soprano-80M-bf16` when unset. +- `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available). - `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming) ## macOS UI @@ -85,6 +96,7 @@ Defaults: - Requires Speech + Microphone permissions. - Uses `chat.send` against session key `main`. - The gateway resolves Talk playback through `talk.speak` using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable. +- macOS local MLX playback uses the bundled `openclaw-mlx-tts` helper when present, or an executable on `PATH`. Set `OPENCLAW_MLX_TTS_BIN` to point at a custom helper binary during development. - `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`. - `latency_tier` is validated to `0..4` when set. - Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.