TTS: add provider personas

2026-07-13 14:46:06 +00:00 · 2026-04-23 07:26:32 -07:00
parent 80219ed1b3
commit 0594fa3c4d
39 changed files with 2021 additions and 136 deletions
--- a/docs/tools/tts.md
+++ b/docs/tools/tts.md
@@ -493,6 +493,110 @@ transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
 }
 ```

+### TTS personas
+
+Use `messages.tts.personas` when you want a stable spoken identity that can be
+applied deterministically across providers. A persona can prefer one provider,
+define provider-neutral prompt intent, and carry provider-specific bindings for
+voices, models, prompt templates, seeds, and voice settings.
+
+```json5
+{
+  messages: {
+    tts: {
+      auto: "always",
+      persona: "alfred",
+      personas: {
+        alfred: {
+          label: "Alfred",
+          description: "Dry, warm British butler narrator.",
+          provider: "google",
+          fallbackPolicy: "preserve-persona",
+          prompt: {
+            profile: "A brilliant British butler. Dry, witty, warm, charming, emotionally expressive, never generic.",
+            scene: "A quiet late-night study. Close-mic narration for a trusted operator.",
+            sampleContext: "The speaker is answering a private technical request with concise confidence and dry warmth.",
+            style: "Refined, understated, lightly amused.",
+            accent: "British English.",
+            pacing: "Measured, with short dramatic pauses.",
+            constraints: ["Do not read configuration values aloud.", "Do not explain the persona."],
+          },
+          providers: {
+            google: {
+              model: "gemini-3.1-flash-tts-preview",
+              voiceName: "Algieba",
+              promptTemplate: "audio-profile-v1",
+            },
+            openai: {
+              model: "gpt-4o-mini-tts",
+              voice: "cedar",
+            },
+            elevenlabs: {
+              voiceId: "voice_id",
+              modelId: "eleven_multilingual_v2",
+              seed: 42,
+              voiceSettings: {
+                stability: 0.65,
+                similarityBoost: 0.8,
+                style: 0.25,
+                useSpeakerBoost: true,
+                speed: 0.95,
+              },
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+Resolution is deterministic:
+
+1. `/tts persona <id>` local preference, if set.
+2. `messages.tts.persona`, if set.
+3. No persona.
+
+Provider selection is explicit-first:
+
+1. Direct provider overrides from CLI, gateway, Talk, or allowed TTS directives.
+2. `/tts provider <id>` local preference.
+3. Active persona `provider`.
+4. `messages.tts.provider`.
+5. Registry auto-select.
+
+For each provider attempt, OpenClaw merges:
+
+1. `messages.tts.providers.<id>`
+2. `messages.tts.personas.<persona>.providers.<id>`
+3. trusted request overrides
+4. allowed model-emitted TTS directive overrides
+
+`fallbackPolicy` controls what happens when an active persona has no binding for
+an attempted provider:
+
+- `preserve-persona` keeps provider-neutral persona prompt fields available to
+  providers. This is the default.
+- `provider-defaults` omits the persona from provider prompt preparation for
+  that attempt, so the provider uses its neutral defaults while still allowing
+  fallback to continue.
+- `fail` skips that provider attempt with `reasonCode: "not_configured"` and
+  `personaBinding: "missing"`. Fallback providers are still tried; the whole TTS
+  request fails only if every attempted provider is skipped or fails.
+
+Persona prompt fields are provider-neutral. Providers decide how to use them.
+Google wraps them only when the effective Google provider config sets
+`promptTemplate: "audio-profile-v1"` or `personaPrompt`; its older
+`audioProfile` and `speakerName` fields are still prepended as Google-specific
+prompt text. OpenAI maps prompt fields to `instructions` when no explicit
+OpenAI `instructions` value is configured. Providers without prompt-like
+controls use the provider-specific persona bindings only.
+
+Gemini inline audio tags are transcript content, not persona config. If the
+assistant or an explicit `[[tts:text]]` block includes tags such as `[whispers]`
+or `[laughs]`, OpenClaw preserves them inside the Gemini transcript. OpenClaw
+does not generate configured start tags.
+
 ### Disable Microsoft speech

 ```json5
@@ -565,6 +669,12 @@ Then run:
 - If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
 - Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
  rewritten to `provider: "microsoft"`.
+- `persona`: default TTS persona id from `personas`.
+- `personas.<id>`: stable spoken identity. The id is normalized to lowercase.
+- `personas.<id>.provider`: preferred speech provider for the persona. Explicit provider overrides and local provider prefs still win.
+- `personas.<id>.fallbackPolicy`: `preserve-persona` (default), `provider-defaults`, or `fail`; see [TTS personas](#tts-personas).
+- `personas.<id>.prompt`: provider-neutral persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`, `pacing`, `constraints`).
+- `personas.<id>.providers.<provider>`: provider-specific persona binding merged over `providers.<provider>`.
 - `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
  - Accepts `provider/model` or a configured model alias.
 - `modelOverrides`: allow the model to emit TTS directives (on by default).
@@ -621,6 +731,8 @@ Then run:
 - `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
 - `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
 - `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
+- `providers.google.promptTemplate`: set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.
+- `providers.google.personaPrompt`: Google-specific extra persona prompt text appended to the template's Director's Notes.
 - `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
  - If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
 - `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
@@ -750,8 +862,9 @@ Slash commands write local overrides to `prefsPath` (default:

 Stored fields:

- `enabled`
+- `auto`
 - `provider`
+- `persona`
 - `maxLength` (summary threshold; default 1500 chars)
 - `summarize` (default `true`)

@@ -837,6 +950,7 @@ Discord note: `/tts` is a built-in Discord command, so OpenClaw registers
 /tts chat default
 /tts latest
 /tts provider openai
+/tts persona alfred
 /tts limit 2000
 /tts summary off
 /tts audio Hello from OpenClaw
@@ -850,6 +964,7 @@ Notes:
 - `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
 - `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
 - Use config when you want `inbound` or `tagged` defaults.
+- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
 - `limit` and `summary` are stored in local prefs, not the main config.
 - `/tts audio` generates a one-off audio reply (does not toggle TTS on).
 - `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
@@ -883,6 +998,7 @@ Gateway methods:
 - `tts.disable`
 - `tts.convert`
 - `tts.setProvider`
+- `tts.setPersona`
 - `tts.providers`

 ## Related