mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 11:50:43 +00:00
TTS: add provider personas
This commit is contained in:
@@ -493,6 +493,110 @@ transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
|
||||
}
|
||||
```
|
||||
|
||||
### TTS personas
|
||||
|
||||
Use `messages.tts.personas` when you want a stable spoken identity that can be
|
||||
applied deterministically across providers. A persona can prefer one provider,
|
||||
define provider-neutral prompt intent, and carry provider-specific bindings for
|
||||
voices, models, prompt templates, seeds, and voice settings.
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
persona: "alfred",
|
||||
personas: {
|
||||
alfred: {
|
||||
label: "Alfred",
|
||||
description: "Dry, warm British butler narrator.",
|
||||
provider: "google",
|
||||
fallbackPolicy: "preserve-persona",
|
||||
prompt: {
|
||||
profile: "A brilliant British butler. Dry, witty, warm, charming, emotionally expressive, never generic.",
|
||||
scene: "A quiet late-night study. Close-mic narration for a trusted operator.",
|
||||
sampleContext: "The speaker is answering a private technical request with concise confidence and dry warmth.",
|
||||
style: "Refined, understated, lightly amused.",
|
||||
accent: "British English.",
|
||||
pacing: "Measured, with short dramatic pauses.",
|
||||
constraints: ["Do not read configuration values aloud.", "Do not explain the persona."],
|
||||
},
|
||||
providers: {
|
||||
google: {
|
||||
model: "gemini-3.1-flash-tts-preview",
|
||||
voiceName: "Algieba",
|
||||
promptTemplate: "audio-profile-v1",
|
||||
},
|
||||
openai: {
|
||||
model: "gpt-4o-mini-tts",
|
||||
voice: "cedar",
|
||||
},
|
||||
elevenlabs: {
|
||||
voiceId: "voice_id",
|
||||
modelId: "eleven_multilingual_v2",
|
||||
seed: 42,
|
||||
voiceSettings: {
|
||||
stability: 0.65,
|
||||
similarityBoost: 0.8,
|
||||
style: 0.25,
|
||||
useSpeakerBoost: true,
|
||||
speed: 0.95,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Resolution is deterministic:
|
||||
|
||||
1. `/tts persona <id>` local preference, if set.
|
||||
2. `messages.tts.persona`, if set.
|
||||
3. No persona.
|
||||
|
||||
Provider selection is explicit-first:
|
||||
|
||||
1. Direct provider overrides from CLI, gateway, Talk, or allowed TTS directives.
|
||||
2. `/tts provider <id>` local preference.
|
||||
3. Active persona `provider`.
|
||||
4. `messages.tts.provider`.
|
||||
5. Registry auto-select.
|
||||
|
||||
For each provider attempt, OpenClaw merges:
|
||||
|
||||
1. `messages.tts.providers.<id>`
|
||||
2. `messages.tts.personas.<persona>.providers.<id>`
|
||||
3. trusted request overrides
|
||||
4. allowed model-emitted TTS directive overrides
|
||||
|
||||
`fallbackPolicy` controls what happens when an active persona has no binding for
|
||||
an attempted provider:
|
||||
|
||||
- `preserve-persona` keeps provider-neutral persona prompt fields available to
|
||||
providers. This is the default.
|
||||
- `provider-defaults` omits the persona from provider prompt preparation for
|
||||
that attempt, so the provider uses its neutral defaults while still allowing
|
||||
fallback to continue.
|
||||
- `fail` skips that provider attempt with `reasonCode: "not_configured"` and
|
||||
`personaBinding: "missing"`. Fallback providers are still tried; the whole TTS
|
||||
request fails only if every attempted provider is skipped or fails.
|
||||
|
||||
Persona prompt fields are provider-neutral. Providers decide how to use them.
|
||||
Google wraps them only when the effective Google provider config sets
|
||||
`promptTemplate: "audio-profile-v1"` or `personaPrompt`; its older
|
||||
`audioProfile` and `speakerName` fields are still prepended as Google-specific
|
||||
prompt text. OpenAI maps prompt fields to `instructions` when no explicit
|
||||
OpenAI `instructions` value is configured. Providers without prompt-like
|
||||
controls use the provider-specific persona bindings only.
|
||||
|
||||
Gemini inline audio tags are transcript content, not persona config. If the
|
||||
assistant or an explicit `[[tts:text]]` block includes tags such as `[whispers]`
|
||||
or `[laughs]`, OpenClaw preserves them inside the Gemini transcript. OpenClaw
|
||||
does not generate configured start tags.
|
||||
|
||||
### Disable Microsoft speech
|
||||
|
||||
```json5
|
||||
@@ -565,6 +669,12 @@ Then run:
|
||||
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
|
||||
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
|
||||
rewritten to `provider: "microsoft"`.
|
||||
- `persona`: default TTS persona id from `personas`.
|
||||
- `personas.<id>`: stable spoken identity. The id is normalized to lowercase.
|
||||
- `personas.<id>.provider`: preferred speech provider for the persona. Explicit provider overrides and local provider prefs still win.
|
||||
- `personas.<id>.fallbackPolicy`: `preserve-persona` (default), `provider-defaults`, or `fail`; see [TTS personas](#tts-personas).
|
||||
- `personas.<id>.prompt`: provider-neutral persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`, `pacing`, `constraints`).
|
||||
- `personas.<id>.providers.<provider>`: provider-specific persona binding merged over `providers.<provider>`.
|
||||
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
||||
- Accepts `provider/model` or a configured model alias.
|
||||
- `modelOverrides`: allow the model to emit TTS directives (on by default).
|
||||
@@ -621,6 +731,8 @@ Then run:
|
||||
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
|
||||
- `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
|
||||
- `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
|
||||
- `providers.google.promptTemplate`: set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.
|
||||
- `providers.google.personaPrompt`: Google-specific extra persona prompt text appended to the template's Director's Notes.
|
||||
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
|
||||
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
||||
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
|
||||
@@ -750,8 +862,9 @@ Slash commands write local overrides to `prefsPath` (default:
|
||||
|
||||
Stored fields:
|
||||
|
||||
- `enabled`
|
||||
- `auto`
|
||||
- `provider`
|
||||
- `persona`
|
||||
- `maxLength` (summary threshold; default 1500 chars)
|
||||
- `summarize` (default `true`)
|
||||
|
||||
@@ -837,6 +950,7 @@ Discord note: `/tts` is a built-in Discord command, so OpenClaw registers
|
||||
/tts chat default
|
||||
/tts latest
|
||||
/tts provider openai
|
||||
/tts persona alfred
|
||||
/tts limit 2000
|
||||
/tts summary off
|
||||
/tts audio Hello from OpenClaw
|
||||
@@ -850,6 +964,7 @@ Notes:
|
||||
- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
|
||||
- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
|
||||
- Use config when you want `inbound` or `tagged` defaults.
|
||||
- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
|
||||
- `limit` and `summary` are stored in local prefs, not the main config.
|
||||
- `/tts audio` generates a one-off audio reply (does not toggle TTS on).
|
||||
- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
|
||||
@@ -883,6 +998,7 @@ Gateway methods:
|
||||
- `tts.disable`
|
||||
- `tts.convert`
|
||||
- `tts.setProvider`
|
||||
- `tts.setPersona`
|
||||
- `tts.providers`
|
||||
|
||||
## Related
|
||||
|
||||
Reference in New Issue
Block a user