TTS: add provider personas

This commit is contained in:
Barron Roth
2026-04-23 07:26:32 -07:00
committed by Ayaan Zaidi
parent 80219ed1b3
commit 0594fa3c4d
39 changed files with 2021 additions and 136 deletions

View File

@@ -1048,6 +1048,7 @@ Docs: https://docs.openclaw.ai
- Anthropic/models: add Claude Opus 4.7 `xhigh` reasoning effort support and keep it separate from adaptive thinking. - Anthropic/models: add Claude Opus 4.7 `xhigh` reasoning effort support and keep it separate from adaptive thinking.
- Control UI/settings: overhaul the settings and slash-command experience with faster presets, quick-create flows, and refreshed command discovery. (#67819) Thanks @BunsDev. - Control UI/settings: overhaul the settings and slash-command experience with faster presets, quick-create flows, and refreshed command discovery. (#67819) Thanks @BunsDev.
- macOS/gateway: add `screen.snapshot` support for macOS app nodes, including runtime plumbing, default macOS allowlisting, and docs for monitor preview flows. (#67954) Thanks @BunsDev. - macOS/gateway: add `screen.snapshot` support for macOS app nodes, including runtime plumbing, default macOS allowlisting, and docs for monitor preview flows. (#67954) Thanks @BunsDev.
- TTS/personas: add provider-aware TTS personas with deterministic provider binding merges, `/tts persona` controls, gateway/CLI persona state, Google Gemini `audio-profile-v1` prompt wrapping, and OpenAI instruction mapping. (#68323)
### Fixes ### Fixes

View File

@@ -493,6 +493,110 @@ transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
} }
``` ```
### TTS personas
Use `messages.tts.personas` when you want a stable spoken identity that can be
applied deterministically across providers. A persona can prefer one provider,
define provider-neutral prompt intent, and carry provider-specific bindings for
voices, models, prompt templates, seeds, and voice settings.
```json5
{
messages: {
tts: {
auto: "always",
persona: "alfred",
personas: {
alfred: {
label: "Alfred",
description: "Dry, warm British butler narrator.",
provider: "google",
fallbackPolicy: "preserve-persona",
prompt: {
profile: "A brilliant British butler. Dry, witty, warm, charming, emotionally expressive, never generic.",
scene: "A quiet late-night study. Close-mic narration for a trusted operator.",
sampleContext: "The speaker is answering a private technical request with concise confidence and dry warmth.",
style: "Refined, understated, lightly amused.",
accent: "British English.",
pacing: "Measured, with short dramatic pauses.",
constraints: ["Do not read configuration values aloud.", "Do not explain the persona."],
},
providers: {
google: {
model: "gemini-3.1-flash-tts-preview",
voiceName: "Algieba",
promptTemplate: "audio-profile-v1",
},
openai: {
model: "gpt-4o-mini-tts",
voice: "cedar",
},
elevenlabs: {
voiceId: "voice_id",
modelId: "eleven_multilingual_v2",
seed: 42,
voiceSettings: {
stability: 0.65,
similarityBoost: 0.8,
style: 0.25,
useSpeakerBoost: true,
speed: 0.95,
},
},
},
},
},
},
},
}
```
Resolution is deterministic:
1. `/tts persona <id>` local preference, if set.
2. `messages.tts.persona`, if set.
3. No persona.
Provider selection is explicit-first:
1. Direct provider overrides from CLI, gateway, Talk, or allowed TTS directives.
2. `/tts provider <id>` local preference.
3. Active persona `provider`.
4. `messages.tts.provider`.
5. Registry auto-select.
For each provider attempt, OpenClaw merges:
1. `messages.tts.providers.<id>`
2. `messages.tts.personas.<persona>.providers.<id>`
3. trusted request overrides
4. allowed model-emitted TTS directive overrides
`fallbackPolicy` controls what happens when an active persona has no binding for
an attempted provider:
- `preserve-persona` keeps provider-neutral persona prompt fields available to
providers. This is the default.
- `provider-defaults` omits the persona from provider prompt preparation for
that attempt, so the provider uses its neutral defaults while still allowing
fallback to continue.
- `fail` skips that provider attempt with `reasonCode: "not_configured"` and
`personaBinding: "missing"`. Fallback providers are still tried; the whole TTS
request fails only if every attempted provider is skipped or fails.
Persona prompt fields are provider-neutral. Providers decide how to use them.
Google wraps them only when the effective Google provider config sets
`promptTemplate: "audio-profile-v1"` or `personaPrompt`; its older
`audioProfile` and `speakerName` fields are still prepended as Google-specific
prompt text. OpenAI maps prompt fields to `instructions` when no explicit
OpenAI `instructions` value is configured. Providers without prompt-like
controls use the provider-specific persona bindings only.
Gemini inline audio tags are transcript content, not persona config. If the
assistant or an explicit `[[tts:text]]` block includes tags such as `[whispers]`
or `[laughs]`, OpenClaw preserves them inside the Gemini transcript. OpenClaw
does not generate configured start tags.
### Disable Microsoft speech ### Disable Microsoft speech
```json5 ```json5
@@ -565,6 +669,12 @@ Then run:
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order. - If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and - Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
rewritten to `provider: "microsoft"`. rewritten to `provider: "microsoft"`.
- `persona`: default TTS persona id from `personas`.
- `personas.<id>`: stable spoken identity. The id is normalized to lowercase.
- `personas.<id>.provider`: preferred speech provider for the persona. Explicit provider overrides and local provider prefs still win.
- `personas.<id>.fallbackPolicy`: `preserve-persona` (default), `provider-defaults`, or `fail`; see [TTS personas](#tts-personas).
- `personas.<id>.prompt`: provider-neutral persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`, `pacing`, `constraints`).
- `personas.<id>.providers.<provider>`: provider-specific persona binding merged over `providers.<provider>`.
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`. - `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
- Accepts `provider/model` or a configured model alias. - Accepts `provider/model` or a configured model alias.
- `modelOverrides`: allow the model to emit TTS directives (on by default). - `modelOverrides`: allow the model to emit TTS directives (on by default).
@@ -621,6 +731,8 @@ Then run:
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted). - `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
- `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text. - `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
- `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker. - `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
- `providers.google.promptTemplate`: set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.
- `providers.google.personaPrompt`: Google-specific extra persona prompt text appended to the template's Director's Notes.
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted. - `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback. - If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`). - `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
@@ -750,8 +862,9 @@ Slash commands write local overrides to `prefsPath` (default:
Stored fields: Stored fields:
- `enabled` - `auto`
- `provider` - `provider`
- `persona`
- `maxLength` (summary threshold; default 1500 chars) - `maxLength` (summary threshold; default 1500 chars)
- `summarize` (default `true`) - `summarize` (default `true`)
@@ -837,6 +950,7 @@ Discord note: `/tts` is a built-in Discord command, so OpenClaw registers
/tts chat default /tts chat default
/tts latest /tts latest
/tts provider openai /tts provider openai
/tts persona alfred
/tts limit 2000 /tts limit 2000
/tts summary off /tts summary off
/tts audio Hello from OpenClaw /tts audio Hello from OpenClaw
@@ -850,6 +964,7 @@ Notes:
- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`. - `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat. - `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
- Use config when you want `inbound` or `tagged` defaults. - Use config when you want `inbound` or `tagged` defaults.
- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
- `limit` and `summary` are stored in local prefs, not the main config. - `limit` and `summary` are stored in local prefs, not the main config.
- `/tts audio` generates a one-off audio reply (does not toggle TTS on). - `/tts audio` generates a one-off audio reply (does not toggle TTS on).
- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends. - `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
@@ -883,6 +998,7 @@ Gateway methods:
- `tts.disable` - `tts.disable`
- `tts.convert` - `tts.convert`
- `tts.setProvider` - `tts.setProvider`
- `tts.setPersona`
- `tts.providers` - `tts.providers`
## Related ## Related

View File

@@ -1,5 +1,8 @@
import * as providerHttp from "openclaw/plugin-sdk/provider-http"; import { afterEach, beforeAll, describe, expect, it, vi } from "vitest";
import { afterEach, describe, expect, it, vi } from "vitest"; import {
getProviderHttpMocks,
installProviderHttpMockCleanup,
} from "../../test/helpers/media-generation/provider-http-mocks.js";
const transcodeAudioBufferToOpusMock = vi.hoisted(() => vi.fn()); const transcodeAudioBufferToOpusMock = vi.hoisted(() => vi.fn());
@@ -7,10 +10,23 @@ vi.mock("openclaw/plugin-sdk/media-runtime", () => ({
transcodeAudioBufferToOpus: transcodeAudioBufferToOpusMock, transcodeAudioBufferToOpus: transcodeAudioBufferToOpusMock,
})); }));
import { buildGoogleSpeechProvider, __testing } from "./speech-provider.js"; const {
assertOkOrThrowProviderErrorMock,
postJsonRequestMock,
resolveProviderHttpRequestConfigMock,
} = getProviderHttpMocks();
function installGoogleTtsFetchMock(pcm = Buffer.from([1, 0, 2, 0])) { let buildGoogleSpeechProvider: typeof import("./speech-provider.js").buildGoogleSpeechProvider;
const fetchMock = vi.fn().mockResolvedValue({ let __testing: typeof import("./speech-provider.js").__testing;
beforeAll(async () => {
({ buildGoogleSpeechProvider, __testing } = await import("./speech-provider.js"));
});
installProviderHttpMockCleanup();
function googleTtsResponse(pcm = Buffer.from([1, 0, 2, 0])) {
return {
ok: true, ok: true,
json: async () => ({ json: async () => ({
candidates: [ candidates: [
@@ -28,21 +44,26 @@ function installGoogleTtsFetchMock(pcm = Buffer.from([1, 0, 2, 0])) {
}, },
], ],
}), }),
};
}
function installGoogleTtsRequestMock(pcm = Buffer.from([1, 0, 2, 0])) {
postJsonRequestMock.mockResolvedValue({
response: googleTtsResponse(pcm),
release: vi.fn(async () => {}),
}); });
vi.stubGlobal("fetch", fetchMock); return postJsonRequestMock;
return fetchMock;
} }
describe("Google speech provider", () => { describe("Google speech provider", () => {
afterEach(() => { afterEach(() => {
vi.restoreAllMocks();
vi.unstubAllGlobals(); vi.unstubAllGlobals();
vi.unstubAllEnvs(); vi.unstubAllEnvs();
transcodeAudioBufferToOpusMock.mockReset(); transcodeAudioBufferToOpusMock.mockReset();
}); });
it("synthesizes Gemini PCM as WAV and preserves audio tags in the request text", async () => { it("synthesizes Gemini PCM as WAV and preserves audio tags in the request text", async () => {
const fetchMock = installGoogleTtsFetchMock(); const requestMock = installGoogleTtsRequestMock();
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
const result = await provider.synthesize({ const result = await provider.synthesize({
@@ -57,11 +78,10 @@ describe("Google speech provider", () => {
timeoutMs: 12_345, timeoutMs: 12_345,
}); });
expect(fetchMock).toHaveBeenCalledWith( expect(requestMock).toHaveBeenCalledWith(
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
expect.objectContaining({ expect.objectContaining({
method: "POST", url: "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
body: JSON.stringify({ body: {
contents: [ contents: [
{ {
role: "user", role: "user",
@@ -78,11 +98,14 @@ describe("Google speech provider", () => {
}, },
}, },
}, },
}), },
fetchFn: fetch,
pinDns: false,
timeoutMs: 12_345,
}), }),
); );
const [, init] = fetchMock.mock.calls[0]; const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("google-test-key"); expect(new Headers(request.headers).get("x-goog-api-key")).toBe("google-test-key");
expect(result.outputFormat).toBe("wav"); expect(result.outputFormat).toBe("wav");
expect(result.fileExtension).toBe(".wav"); expect(result.fileExtension).toBe(".wav");
expect(result.voiceCompatible).toBe(false); expect(result.voiceCompatible).toBe(false);
@@ -94,7 +117,7 @@ describe("Google speech provider", () => {
}); });
it("transcodes Gemini PCM to Opus for voice-note targets", async () => { it("transcodes Gemini PCM to Opus for voice-note targets", async () => {
installGoogleTtsFetchMock(Buffer.from([5, 0, 6, 0])); installGoogleTtsRequestMock(Buffer.from([5, 0, 6, 0]));
transcodeAudioBufferToOpusMock.mockResolvedValueOnce(Buffer.from("google-opus")); transcodeAudioBufferToOpusMock.mockResolvedValueOnce(Buffer.from("google-opus"));
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
@@ -125,9 +148,138 @@ describe("Google speech provider", () => {
expect(audioBuffer.subarray(8, 12).toString("ascii")).toBe("WAVE"); expect(audioBuffer.subarray(8, 12).toString("ascii")).toBe("WAVE");
}); });
it("advertises all documented Gemini TTS-capable models", () => {
const provider = buildGoogleSpeechProvider();
expect(provider.models).toEqual(__testing.GOOGLE_TTS_MODELS);
});
it("renders deterministic audio-profile-v1 prompts without generating tags", async () => {
const provider = buildGoogleSpeechProvider();
const prepared = await provider.prepareSynthesis?.({
text: "[whispers] The door is open.",
cfg: {},
providerConfig: {
promptTemplate: "audio-profile-v1",
personaPrompt: "Keep a close-mic feel.",
},
persona: {
id: "alfred",
label: "Alfred",
prompt: {
profile: "A brilliant British butler.",
scene: "A quiet late-night study.",
sampleContext: "The speaker is answering a trusted operator.",
style: "Refined and lightly amused.",
accent: "British English.",
pacing: "Measured.",
constraints: ["Do not read configuration values aloud."],
},
},
target: "audio-file",
timeoutMs: 1_000,
});
expect(prepared?.text).toBe(
[
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
"as performance direction. Do not read section titles, notes, labels, or",
"configuration aloud.",
"",
"# AUDIO PROFILE: Alfred",
"A brilliant British butler.",
"",
"## THE SCENE",
"A quiet late-night study.",
"",
"### DIRECTOR'S NOTES",
"Style: Refined and lightly amused.",
"Accent: British English.",
"Pacing: Measured.",
"Constraints:",
"- Do not read configuration values aloud.",
"Provider notes:",
"Keep a close-mic feel.",
"",
"### SAMPLE CONTEXT",
"The speaker is answering a trusted operator.",
"",
"### TRANSCRIPT",
"[whispers] The door is open.",
].join("\n"),
);
});
it("does not wrap an OpenClaw audio-profile-v1 prompt twice", async () => {
const provider = buildGoogleSpeechProvider();
const text = [
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
"as performance direction. Do not read section titles, notes, labels, or",
"configuration aloud.",
"",
"# AUDIO PROFILE: Alfred",
"A brilliant British butler.",
"",
"### TRANSCRIPT",
"Hello.",
].join("\n");
const prepared = await provider.prepareSynthesis?.({
text,
cfg: {},
providerConfig: {
promptTemplate: "audio-profile-v1",
},
persona: {
id: "alfred",
label: "Alfred",
prompt: {
profile: "A brilliant British butler.",
},
},
target: "audio-file",
timeoutMs: 1_000,
});
expect(prepared).toBeUndefined();
});
it("retries once when Gemini returns no audio payload", async () => {
const pcm = Buffer.from([5, 0, 6, 0]);
const requestSequence = vi
.fn()
.mockResolvedValueOnce({
response: {
ok: true,
json: async () => ({ candidates: [{ content: { parts: [{ text: "not audio" }] } }] }),
},
release: vi.fn(async () => {}),
})
.mockResolvedValueOnce({
response: googleTtsResponse(pcm),
release: vi.fn(async () => {}),
});
postJsonRequestMock.mockImplementation(requestSequence);
const provider = buildGoogleSpeechProvider();
const result = await provider.synthesize({
text: "Retry this.",
cfg: {},
providerConfig: {
apiKey: "google-test-key",
},
target: "audio-file",
timeoutMs: 5_000,
});
expect(requestSequence).toHaveBeenCalledTimes(2);
expect(result.audioBuffer.subarray(44)).toEqual(pcm);
});
it("falls back to GEMINI_API_KEY and configured Google API base URL", async () => { it("falls back to GEMINI_API_KEY and configured Google API base URL", async () => {
vi.stubEnv("GEMINI_API_KEY", "env-google-key"); vi.stubEnv("GEMINI_API_KEY", "env-google-key");
const fetchMock = installGoogleTtsFetchMock(); const requestMock = installGoogleTtsRequestMock();
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 1 })).toBe(true); expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 1 })).toBe(true);
@@ -149,16 +301,17 @@ describe("Google speech provider", () => {
timeoutMs: 10_000, timeoutMs: 10_000,
}); });
expect(fetchMock).toHaveBeenCalledWith( expect(requestMock).toHaveBeenCalledWith(
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent", expect.objectContaining({
expect.any(Object), url: "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
}),
); );
const [, init] = fetchMock.mock.calls[0]; const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("env-google-key"); expect(new Headers(request.headers).get("x-goog-api-key")).toBe("env-google-key");
}); });
it("can reuse a configured Google model-provider API key without auth profiles", async () => { it("can reuse a configured Google model-provider API key without auth profiles", async () => {
const fetchMock = installGoogleTtsFetchMock(); const requestMock = installGoogleTtsRequestMock();
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
const cfg = { const cfg = {
models: { models: {
@@ -182,13 +335,13 @@ describe("Google speech provider", () => {
timeoutMs: 10_000, timeoutMs: 10_000,
}); });
const [, init] = fetchMock.mock.calls[0]; const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("model-provider-google-key"); expect(new Headers(request.headers).get("x-goog-api-key")).toBe("model-provider-google-key");
}); });
it("returns Gemini PCM directly for telephony synthesis", async () => { it("returns Gemini PCM directly for telephony synthesis", async () => {
const pcm = Buffer.from([3, 0, 4, 0]); const pcm = Buffer.from([3, 0, 4, 0]);
installGoogleTtsFetchMock(pcm); installGoogleTtsRequestMock(pcm);
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
const result = await provider.synthesizeTelephony?.({ const result = await provider.synthesizeTelephony?.({
@@ -209,7 +362,7 @@ describe("Google speech provider", () => {
}); });
it("prepends configured Gemini TTS profile text", async () => { it("prepends configured Gemini TTS profile text", async () => {
const fetchMock = installGoogleTtsFetchMock(); const requestMock = installGoogleTtsRequestMock();
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
await provider.synthesize({ await provider.synthesize({
@@ -224,8 +377,7 @@ describe("Google speech provider", () => {
timeoutMs: 10_000, timeoutMs: 10_000,
}); });
const [, init] = fetchMock.mock.calls[0]; expect(requestMock.mock.calls[0]?.[0].body).toMatchObject({
expect(JSON.parse(String(init.body))).toMatchObject({
contents: [ contents: [
{ {
parts: [ parts: [
@@ -326,23 +478,26 @@ describe("Google speech provider", () => {
}); });
it("formats Google TTS HTTP errors with provider details", async () => { it("formats Google TTS HTTP errors with provider details", async () => {
vi.stubGlobal( assertOkOrThrowProviderErrorMock.mockRejectedValue(
"fetch", new Error(
vi.fn().mockResolvedValue( "Google TTS failed (429): Quota exceeded [code=RESOURCE_EXHAUSTED] [request_id=google_req_123]",
new Response(
JSON.stringify({
error: {
message: "Quota exceeded",
status: "RESOURCE_EXHAUSTED",
},
}),
{
status: 429,
headers: { "x-request-id": "google_req_123" },
},
),
), ),
); );
postJsonRequestMock.mockResolvedValue({
response: new Response(
JSON.stringify({
error: {
message: "Quota exceeded",
status: "RESOURCE_EXHAUSTED",
},
}),
{
status: 429,
headers: { "x-request-id": "google_req_123" },
},
),
release: vi.fn(async () => {}),
});
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
await expect( await expect(
@@ -359,8 +514,7 @@ describe("Google speech provider", () => {
}); });
it("honors configured private-network opt-in for Google TTS", async () => { it("honors configured private-network opt-in for Google TTS", async () => {
installGoogleTtsFetchMock(); installGoogleTtsRequestMock();
const postJsonRequestSpy = vi.spyOn(providerHttp, "postJsonRequest");
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
await provider.synthesize({ await provider.synthesize({
@@ -381,14 +535,16 @@ describe("Google speech provider", () => {
timeoutMs: 12_345, timeoutMs: 12_345,
}); });
expect(postJsonRequestSpy).toHaveBeenCalledWith( expect(resolveProviderHttpRequestConfigMock).toHaveBeenCalledWith(
expect.objectContaining({ allowPrivateNetwork: true }), expect.objectContaining({
allowPrivateNetwork: true,
request: expect.objectContaining({ allowPrivateNetwork: true }),
}),
); );
}); });
it("honors configured private-network opt-in for Google telephony TTS", async () => { it("honors configured private-network opt-in for Google telephony TTS", async () => {
installGoogleTtsFetchMock(); installGoogleTtsRequestMock();
const postJsonRequestSpy = vi.spyOn(providerHttp, "postJsonRequest");
const provider = buildGoogleSpeechProvider(); const provider = buildGoogleSpeechProvider();
await provider.synthesizeTelephony?.({ await provider.synthesizeTelephony?.({
@@ -408,8 +564,11 @@ describe("Google speech provider", () => {
timeoutMs: 12_345, timeoutMs: 12_345,
}); });
expect(postJsonRequestSpy).toHaveBeenCalledWith( expect(resolveProviderHttpRequestConfigMock).toHaveBeenCalledWith(
expect.objectContaining({ allowPrivateNetwork: true }), expect.objectContaining({
allowPrivateNetwork: true,
request: expect.objectContaining({ allowPrivateNetwork: true }),
}),
); );
}); });
}); });

View File

@@ -21,6 +21,13 @@ const DEFAULT_GOOGLE_TTS_VOICE = "Kore";
const GOOGLE_TTS_SAMPLE_RATE = 24_000; const GOOGLE_TTS_SAMPLE_RATE = 24_000;
const GOOGLE_TTS_CHANNELS = 1; const GOOGLE_TTS_CHANNELS = 1;
const GOOGLE_TTS_BITS_PER_SAMPLE = 16; const GOOGLE_TTS_BITS_PER_SAMPLE = 16;
const GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE = "audio-profile-v1";
const GOOGLE_TTS_MODELS = [
"gemini-3.1-flash-tts-preview",
"gemini-2.5-flash-preview-tts",
"gemini-2.5-pro-preview-tts",
] as const;
const GOOGLE_TTS_VOICES = [ const GOOGLE_TTS_VOICES = [
"Zephyr", "Zephyr",
@@ -62,6 +69,8 @@ type GoogleTtsProviderConfig = {
voiceName: string; voiceName: string;
audioProfile?: string; audioProfile?: string;
speakerName?: string; speakerName?: string;
promptTemplate?: typeof GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE;
personaPrompt?: string;
}; };
type GoogleTtsProviderOverrides = { type GoogleTtsProviderOverrides = {
@@ -91,6 +100,13 @@ type GoogleGenerateSpeechResponse = {
}>; }>;
}; };
class GoogleTtsRetryableError extends Error {
constructor(message: string) {
super(message);
this.name = "GoogleTtsRetryableError";
}
}
function normalizeGoogleTtsModel(model: unknown): string { function normalizeGoogleTtsModel(model: unknown): string {
const trimmed = normalizeOptionalString(model); const trimmed = normalizeOptionalString(model);
if (!trimmed) { if (!trimmed) {
@@ -104,6 +120,19 @@ function normalizeGoogleTtsVoiceName(voiceName: unknown): string {
return normalizeOptionalString(voiceName) ?? DEFAULT_GOOGLE_TTS_VOICE; return normalizeOptionalString(voiceName) ?? DEFAULT_GOOGLE_TTS_VOICE;
} }
function normalizeGooglePromptTemplate(
value: unknown,
): typeof GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE | undefined {
const trimmed = normalizeOptionalString(value);
if (!trimmed) {
return undefined;
}
if (trimmed === GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE) {
return trimmed;
}
throw new Error(`Invalid Google TTS promptTemplate: ${trimmed}`);
}
function resolveGoogleTtsEnvApiKey(): string | undefined { function resolveGoogleTtsEnvApiKey(): string | undefined {
return ( return (
normalizeOptionalString(process.env.GEMINI_API_KEY) ?? normalizeOptionalString(process.env.GEMINI_API_KEY) ??
@@ -149,6 +178,8 @@ function normalizeGoogleTtsProviderConfig(
rawConfig: Record<string, unknown>, rawConfig: Record<string, unknown>,
): GoogleTtsProviderConfig { ): GoogleTtsProviderConfig {
const raw = resolveGoogleTtsConfigRecord(rawConfig); const raw = resolveGoogleTtsConfigRecord(rawConfig);
const promptTemplate = normalizeGooglePromptTemplate(raw?.promptTemplate);
const personaPrompt = trimToUndefined(raw?.personaPrompt);
return { return {
apiKey: normalizeResolvedSecretInputString({ apiKey: normalizeResolvedSecretInputString({
value: raw?.apiKey, value: raw?.apiKey,
@@ -159,11 +190,16 @@ function normalizeGoogleTtsProviderConfig(
voiceName: normalizeGoogleTtsVoiceName(raw?.voiceName ?? raw?.voice), voiceName: normalizeGoogleTtsVoiceName(raw?.voiceName ?? raw?.voice),
audioProfile: trimToUndefined(raw?.audioProfile), audioProfile: trimToUndefined(raw?.audioProfile),
speakerName: trimToUndefined(raw?.speakerName), speakerName: trimToUndefined(raw?.speakerName),
...(promptTemplate ? { promptTemplate } : {}),
...(personaPrompt ? { personaPrompt } : {}),
}; };
} }
function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsProviderConfig { function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsProviderConfig {
const normalized = normalizeGoogleTtsProviderConfig({}); const normalized = normalizeGoogleTtsProviderConfig({});
const promptTemplate =
normalizeGooglePromptTemplate(config.promptTemplate) ?? normalized.promptTemplate;
const personaPrompt = trimToUndefined(config.personaPrompt) ?? normalized.personaPrompt;
return { return {
apiKey: trimToUndefined(config.apiKey) ?? normalized.apiKey, apiKey: trimToUndefined(config.apiKey) ?? normalized.apiKey,
baseUrl: trimToUndefined(config.baseUrl) ?? normalized.baseUrl, baseUrl: trimToUndefined(config.baseUrl) ?? normalized.baseUrl,
@@ -173,6 +209,8 @@ function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsPro
), ),
audioProfile: trimToUndefined(config.audioProfile) ?? normalized.audioProfile, audioProfile: trimToUndefined(config.audioProfile) ?? normalized.audioProfile,
speakerName: trimToUndefined(config.speakerName) ?? normalized.speakerName, speakerName: trimToUndefined(config.speakerName) ?? normalized.speakerName,
...(promptTemplate ? { promptTemplate } : {}),
...(personaPrompt ? { personaPrompt } : {}),
}; };
} }
@@ -243,6 +281,116 @@ function extractGoogleSpeechPcm(payload: GoogleGenerateSpeechResponse): Buffer {
throw new Error("Google TTS response missing audio data"); throw new Error("Google TTS response missing audio data");
} }
function normalizePromptSectionText(value: string | undefined): string | undefined {
const trimmed = trimToUndefined(value?.replace(/\r\n?/g, "\n"));
if (!trimmed) {
return undefined;
}
let sanitized = "";
for (const char of trimmed) {
const code = char.charCodeAt(0);
if (
(code >= 0 && code <= 8) ||
code === 11 ||
code === 12 ||
(code >= 14 && code <= 31) ||
code === 127
) {
continue;
}
sanitized += char;
}
return sanitized;
}
function normalizePromptList(values: readonly string[] | undefined): string[] {
return (values ?? [])
.map((value) => normalizePromptSectionText(value))
.filter((value): value is string => Boolean(value));
}
function isOpenClawGoogleAudioProfilePrompt(text: string): boolean {
return (
text.includes("# AUDIO PROFILE:") &&
text.includes("### TRANSCRIPT") &&
text.startsWith("Synthesize speech from the TRANSCRIPT section only.")
);
}
function renderGoogleAudioProfilePrompt(params: {
text: string;
persona?: {
id: string;
label?: string;
prompt?: {
profile?: string;
scene?: string;
sampleContext?: string;
style?: string;
accent?: string;
pacing?: string;
constraints?: string[];
};
};
personaPrompt?: string;
}): string {
const transcript = params.text.replace(/\r\n?/g, "\n").trim();
const prompt = params.persona?.prompt;
const profile = normalizePromptSectionText(prompt?.profile);
const scene = normalizePromptSectionText(prompt?.scene);
const sampleContext = normalizePromptSectionText(prompt?.sampleContext);
const style = normalizePromptSectionText(prompt?.style);
const accent = normalizePromptSectionText(prompt?.accent);
const pacing = normalizePromptSectionText(prompt?.pacing);
const constraints = normalizePromptList(prompt?.constraints);
const personaPrompt = normalizePromptSectionText(params.personaPrompt);
const label =
normalizePromptSectionText(params.persona?.label) ??
normalizePromptSectionText(params.persona?.id);
const sections = [
[
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
"as performance direction. Do not read section titles, notes, labels, or",
"configuration aloud.",
].join("\n"),
];
if (label || profile) {
sections.push([`# AUDIO PROFILE: ${label ?? "voice"}`, profile].filter(Boolean).join("\n"));
}
if (scene) {
sections.push(["## THE SCENE", scene].join("\n"));
}
const directorNotes: string[] = [];
if (style) {
directorNotes.push(`Style: ${style}`);
}
if (accent) {
directorNotes.push(`Accent: ${accent}`);
}
if (pacing) {
directorNotes.push(`Pacing: ${pacing}`);
}
if (constraints.length > 0) {
directorNotes.push(["Constraints:", ...constraints.map((item) => `- ${item}`)].join("\n"));
}
if (personaPrompt) {
directorNotes.push(["Provider notes:", personaPrompt].join("\n"));
}
if (directorNotes.length > 0) {
sections.push(["### DIRECTOR'S NOTES", ...directorNotes].join("\n"));
}
if (sampleContext) {
sections.push(["### SAMPLE CONTEXT", sampleContext].join("\n"));
}
sections.push(["### TRANSCRIPT", transcript].join("\n"));
return sections.join("\n\n");
}
function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): Buffer { function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): Buffer {
const byteRate = sampleRate * GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8); const byteRate = sampleRate * GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
const blockAlign = GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8); const blockAlign = GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
@@ -265,7 +413,7 @@ function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): B
return Buffer.concat([header, pcm]); return Buffer.concat([header, pcm]);
} }
async function synthesizeGoogleTtsPcm(params: { async function synthesizeGoogleTtsPcmOnce(params: {
text: string; text: string;
apiKey: string; apiKey: string;
baseUrl?: string; baseUrl?: string;
@@ -322,19 +470,59 @@ async function synthesizeGoogleTtsPcm(params: {
}); });
try { try {
await assertOkOrThrowProviderError(res, "Google TTS failed"); if (!res.ok) {
return extractGoogleSpeechPcm((await res.json()) as GoogleGenerateSpeechResponse); try {
await assertOkOrThrowProviderError(res, "Google TTS failed");
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
if (res.status >= 500 && res.status < 600) {
throw new GoogleTtsRetryableError(message);
}
throw err;
}
}
try {
return extractGoogleSpeechPcm((await res.json()) as GoogleGenerateSpeechResponse);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
throw new GoogleTtsRetryableError(message);
}
} finally { } finally {
await release(); await release();
} }
} }
async function synthesizeGoogleTtsPcm(params: {
text: string;
apiKey: string;
baseUrl?: string;
request?: ReturnType<typeof sanitizeConfiguredModelProviderRequest>;
model: string;
voiceName: string;
audioProfile?: string;
speakerName?: string;
timeoutMs: number;
}): Promise<Buffer> {
let lastError: unknown;
for (let attempt = 0; attempt < 2; attempt += 1) {
try {
return await synthesizeGoogleTtsPcmOnce(params);
} catch (err) {
lastError = err;
if (!(err instanceof GoogleTtsRetryableError) || attempt > 0) {
throw err;
}
}
}
throw lastError instanceof Error ? lastError : new Error(String(lastError));
}
export function buildGoogleSpeechProvider(): SpeechProviderPlugin { export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
return { return {
id: "google", id: "google",
label: "Google", label: "Google",
autoSelectOrder: 50, autoSelectOrder: 50,
models: [DEFAULT_GOOGLE_TTS_MODEL], models: GOOGLE_TTS_MODELS,
voices: GOOGLE_TTS_VOICES, voices: GOOGLE_TTS_VOICES,
resolveConfig: ({ rawConfig }) => normalizeGoogleTtsProviderConfig(rawConfig), resolveConfig: ({ rawConfig }) => normalizeGoogleTtsProviderConfig(rawConfig),
parseDirectiveToken, parseDirectiveToken,
@@ -372,6 +560,22 @@ export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
listVoices: async () => GOOGLE_TTS_VOICES.map((voice) => ({ id: voice, name: voice })), listVoices: async () => GOOGLE_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
isConfigured: ({ cfg, providerConfig }) => isConfigured: ({ cfg, providerConfig }) =>
Boolean(resolveGoogleTtsApiKey({ cfg, providerConfig })), Boolean(resolveGoogleTtsApiKey({ cfg, providerConfig })),
prepareSynthesis: (ctx) => {
const config = readGoogleTtsProviderConfig(ctx.providerConfig);
const shouldWrap =
config.promptTemplate === GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE ||
Boolean(config.personaPrompt);
if (!shouldWrap || isOpenClawGoogleAudioProfilePrompt(ctx.text)) {
return undefined;
}
return {
text: renderGoogleAudioProfilePrompt({
text: ctx.text,
persona: ctx.persona,
personaPrompt: config.personaPrompt,
}),
};
},
synthesize: async (req) => { synthesize: async (req) => {
const config = readGoogleTtsProviderConfig(req.providerConfig); const config = readGoogleTtsProviderConfig(req.providerConfig);
const overrides = readGoogleTtsOverrides(req.providerOverrides); const overrides = readGoogleTtsOverrides(req.providerOverrides);
@@ -449,7 +653,10 @@ export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
export const __testing = { export const __testing = {
DEFAULT_GOOGLE_TTS_MODEL, DEFAULT_GOOGLE_TTS_MODEL,
DEFAULT_GOOGLE_TTS_VOICE, DEFAULT_GOOGLE_TTS_VOICE,
GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE,
GOOGLE_TTS_MODELS,
GOOGLE_TTS_SAMPLE_RATE, GOOGLE_TTS_SAMPLE_RATE,
normalizeGoogleTtsModel, normalizeGoogleTtsModel,
renderGoogleAudioProfilePrompt,
wrapPcm16MonoToWav, wrapPcm16MonoToWav,
}; };

View File

@@ -134,6 +134,7 @@ function createLiveTtsConfig(): ResolvedTtsConfig {
voice: "alloy", voice: "alloy",
}, },
}, },
personas: {},
maxTextLength: 4_000, maxTextLength: 4_000,
timeoutMs: 30_000, timeoutMs: 30_000,
}; };

View File

@@ -162,6 +162,40 @@ describe("buildOpenAISpeechProvider", () => {
}); });
}); });
it("maps persona prompt fields to instructions when instructions are unset", async () => {
const provider = buildOpenAISpeechProvider();
const prepared = await provider.prepareSynthesis?.({
text: "hello",
cfg: {} as never,
providerConfig: {
apiKey: "sk-test",
model: "gpt-4o-mini-tts",
voice: "cedar",
},
persona: {
id: "alfred",
label: "Alfred",
prompt: {
profile: "A brilliant British butler.",
scene: "A quiet late-night study.",
sampleContext: "The speaker is answering a trusted operator.",
style: "Refined and lightly amused.",
accent: "British English.",
pacing: "Measured.",
constraints: ["Do not read configuration values aloud."],
},
},
target: "audio-file",
timeoutMs: 1_000,
});
expect(prepared?.providerConfig?.instructions).toContain("Persona: Alfred");
expect(prepared?.providerConfig?.instructions).toContain(
"Constraint: Do not read configuration values aloud.",
);
});
it("uses wav for Groq-compatible OpenAI TTS endpoints", async () => { it("uses wav for Groq-compatible OpenAI TTS endpoints", async () => {
const provider = buildOpenAISpeechProvider(); const provider = buildOpenAISpeechProvider();
mockSpeechFetchExpectingFormat("wav"); mockSpeechFetchExpectingFormat("wav");

View File

@@ -71,7 +71,7 @@ function isGroqSpeechBaseUrl(baseUrl: string): boolean {
function resolveSpeechResponseFormat( function resolveSpeechResponseFormat(
baseUrl: string, baseUrl: string,
target: "audio-file" | "voice-note", target: "audio-file" | "voice-note" | "telephony",
configuredFormat?: OpenAiSpeechResponseFormat, configuredFormat?: OpenAiSpeechResponseFormat,
): OpenAiSpeechResponseFormat { ): OpenAiSpeechResponseFormat {
if (configuredFormat) { if (configuredFormat) {
@@ -145,6 +145,37 @@ function readOpenAIOverrides(
}; };
} }
function renderOpenAITtsPersonaInstructions(req: {
label?: string;
prompt?: {
profile?: string;
scene?: string;
sampleContext?: string;
style?: string;
accent?: string;
pacing?: string;
constraints?: string[];
};
}): string | undefined {
const prompt = req.prompt;
if (!prompt) {
return undefined;
}
const lines = [
req.label ? `Persona: ${req.label}` : undefined,
prompt.profile ? `Profile: ${prompt.profile}` : undefined,
prompt.scene ? `Scene: ${prompt.scene}` : undefined,
prompt.style ? `Style: ${prompt.style}` : undefined,
prompt.accent ? `Accent: ${prompt.accent}` : undefined,
prompt.pacing ? `Pacing: ${prompt.pacing}` : undefined,
prompt.sampleContext ? `Sample context: ${prompt.sampleContext}` : undefined,
...(prompt.constraints ?? []).map((constraint) => `Constraint: ${constraint}`),
]
.map((line) => trimToUndefined(line))
.filter((line): line is string => Boolean(line));
return lines.length > 0 ? lines.join("\n") : undefined;
}
function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): { function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): {
handled: boolean; handled: boolean;
overrides?: SpeechProviderOverrides; overrides?: SpeechProviderOverrides;
@@ -229,6 +260,23 @@ export function buildOpenAISpeechProvider(): SpeechProviderPlugin {
listVoices: async () => OPENAI_TTS_VOICES.map((voice) => ({ id: voice, name: voice })), listVoices: async () => OPENAI_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
isConfigured: ({ providerConfig }) => isConfigured: ({ providerConfig }) =>
Boolean(readOpenAIProviderConfig(providerConfig).apiKey || process.env.OPENAI_API_KEY), Boolean(readOpenAIProviderConfig(providerConfig).apiKey || process.env.OPENAI_API_KEY),
prepareSynthesis: (ctx) => {
const config = readOpenAIProviderConfig(ctx.providerConfig);
if (config.instructions) {
return undefined;
}
const instructions = renderOpenAITtsPersonaInstructions({
label: ctx.persona?.label ?? ctx.persona?.id,
prompt: ctx.persona?.prompt,
});
return instructions
? {
providerConfig: {
instructions,
},
}
: undefined;
},
synthesize: async (req) => { synthesize: async (req) => {
const config = readOpenAIProviderConfig(req.providerConfig); const config = readOpenAIProviderConfig(req.providerConfig);
const overrides = readOpenAIOverrides(req.providerOverrides); const overrides = readOpenAIOverrides(req.providerOverrides);

View File

@@ -3,11 +3,13 @@ export {
getLastTtsAttempt, getLastTtsAttempt,
getResolvedSpeechProviderConfig, getResolvedSpeechProviderConfig,
getTtsMaxLength, getTtsMaxLength,
getTtsPersona,
getTtsProvider, getTtsProvider,
isSummarizationEnabled, isSummarizationEnabled,
isTtsEnabled, isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
listSpeechVoices, listSpeechVoices,
listTtsPersonas,
maybeApplyTtsToPayload, maybeApplyTtsToPayload,
resolveExplicitTtsOverrides, resolveExplicitTtsOverrides,
resolveTtsAutoMode, resolveTtsAutoMode,
@@ -19,6 +21,7 @@ export {
setTtsAutoMode, setTtsAutoMode,
setTtsEnabled, setTtsEnabled,
setTtsMaxLength, setTtsMaxLength,
setTtsPersona,
setTtsProvider, setTtsProvider,
synthesizeSpeech, synthesizeSpeech,
textToSpeech, textToSpeech,

View File

@@ -1,7 +1,12 @@
import { rmSync } from "node:fs"; import { rmSync } from "node:fs";
import path from "node:path"; import path from "node:path";
import type { OpenClawConfig } from "openclaw/plugin-sdk/config-runtime"; import type { OpenClawConfig } from "openclaw/plugin-sdk/config-runtime";
import type { SpeechProviderPlugin, SpeechSynthesisRequest } from "openclaw/plugin-sdk/speech-core"; import type { ReplyPayload } from "openclaw/plugin-sdk/reply-payload";
import type {
SpeechProviderPlugin,
SpeechProviderPrepareSynthesisContext,
SpeechSynthesisRequest,
} from "openclaw/plugin-sdk/speech-core";
import { afterEach, describe, expect, it, vi } from "vitest"; import { afterEach, describe, expect, it, vi } from "vitest";
type MockSpeechSynthesisResult = Awaited<ReturnType<SpeechProviderPlugin["synthesize"]>>; type MockSpeechSynthesisResult = Awaited<ReturnType<SpeechProviderPlugin["synthesize"]>>;
@@ -16,6 +21,9 @@ const synthesizeMock = vi.hoisted(() =>
}), }),
), ),
); );
const prepareSynthesisMock = vi.hoisted(() =>
vi.fn(async (_ctx: SpeechProviderPrepareSynthesisContext) => undefined),
);
const listSpeechProvidersMock = vi.hoisted(() => vi.fn()); const listSpeechProvidersMock = vi.hoisted(() => vi.fn());
const getSpeechProviderMock = vi.hoisted(() => vi.fn()); const getSpeechProviderMock = vi.hoisted(() => vi.fn());
@@ -31,6 +39,7 @@ vi.mock("../api.js", async () => {
label: "Mock", label: "Mock",
autoSelectOrder: 1, autoSelectOrder: 1,
isConfigured: () => true, isConfigured: () => true,
prepareSynthesis: prepareSynthesisMock,
synthesize: synthesizeMock, synthesize: synthesizeMock,
}; };
listSpeechProvidersMock.mockImplementation(() => [mockProvider]); listSpeechProvidersMock.mockImplementation(() => [mockProvider]);
@@ -49,10 +58,40 @@ vi.mock("../api.js", async () => {
}; };
}); });
const { _test, maybeApplyTtsToPayload, resolveTtsConfig } = await import("./tts.js"); const {
_test,
getTtsPersona,
getTtsProvider,
maybeApplyTtsToPayload,
resolveTtsConfig,
synthesizeSpeech,
textToSpeechTelephony,
} = await import("./tts.js");
const nativeVoiceNoteChannels = ["discord", "feishu", "matrix", "telegram", "whatsapp"] as const; const nativeVoiceNoteChannels = ["discord", "feishu", "matrix", "telegram", "whatsapp"] as const;
function createMockSpeechProvider(
id = "mock",
options: Partial<SpeechProviderPlugin> = {},
): SpeechProviderPlugin {
return {
id,
label: id,
autoSelectOrder: id === "mock" ? 1 : 2,
isConfigured: () => true,
prepareSynthesis: prepareSynthesisMock,
synthesize: synthesizeMock,
...options,
};
}
function installSpeechProviders(providers: SpeechProviderPlugin[]): void {
listSpeechProvidersMock.mockImplementation(() => providers);
getSpeechProviderMock.mockImplementation(
(providerId: string) => providers.find((provider) => provider.id === providerId) ?? null,
);
}
function createTtsConfig(prefsName: string): OpenClawConfig { function createTtsConfig(prefsName: string): OpenClawConfig {
return { return {
messages: { messages: {
@@ -102,6 +141,8 @@ async function expectTtsPayloadResult(params: {
describe("speech-core native voice-note routing", () => { describe("speech-core native voice-note routing", () => {
afterEach(() => { afterEach(() => {
synthesizeMock.mockClear(); synthesizeMock.mockClear();
prepareSynthesisMock.mockClear();
installSpeechProviders([createMockSpeechProvider()]);
}); });
it("keeps native voice-note channel support centralized", () => { it("keeps native voice-note channel support centralized", () => {
@@ -153,6 +194,268 @@ describe("speech-core native voice-note routing", () => {
audioAsVoice: undefined, audioAsVoice: undefined,
}); });
}); });
it("selects persona preferred provider before config fallback", () => {
const cfg: OpenClawConfig = {
messages: {
tts: {
enabled: true,
provider: "other",
persona: "alfred",
personas: {
alfred: {
label: "Alfred",
provider: "mock",
providers: {
mock: {
voice: "Algieba",
},
},
},
},
},
},
};
const config = resolveTtsConfig(cfg);
const prefsPath = "/tmp/openclaw-speech-core-persona-provider.json";
expect(getTtsPersona(config, prefsPath)?.id).toBe("alfred");
expect(getTtsProvider(config, prefsPath)).toBe("mock");
});
it("merges active persona provider binding into synthesis config", async () => {
const cfg: OpenClawConfig = {
messages: {
tts: {
enabled: true,
provider: "mock",
prefsPath: "/tmp/openclaw-speech-core-persona-merge.json",
providers: {
mock: {
model: "base-model",
voice: "base-voice",
},
},
persona: "alfred",
personas: {
alfred: {
provider: "mock",
providers: {
mock: {
voice: "persona-voice",
style: "dry",
},
},
},
},
},
},
};
const payload: ReplyPayload = {
text: "This reply should use persona-specific provider configuration.",
};
let mediaDir: string | undefined;
try {
const result = await maybeApplyTtsToPayload({
payload,
cfg,
channel: "slack",
kind: "final",
});
expect(synthesizeMock).toHaveBeenCalledWith(
expect.objectContaining({
providerConfig: expect.objectContaining({
model: "base-model",
voice: "persona-voice",
style: "dry",
}),
}),
);
expect(result.mediaUrl).toMatch(/voice-\d+\.ogg$/);
mediaDir = result.mediaUrl ? path.dirname(result.mediaUrl) : undefined;
} finally {
if (mediaDir) {
rmSync(mediaDir, { recursive: true, force: true });
}
}
});
it("does not mark skipped unregistered providers as missing persona bindings", async () => {
const result = await synthesizeSpeech({
text: "Use fallback provider.",
cfg: {
messages: {
tts: {
enabled: true,
provider: "missing",
persona: "alfred",
personas: {
alfred: {
providers: {
missing: {
voice: "configured-but-unregistered",
},
},
},
},
},
},
},
});
expect(result.success).toBe(true);
expect(result.attempts?.[0]).toMatchObject({
provider: "missing",
outcome: "skipped",
reasonCode: "no_provider_registered",
persona: "alfred",
});
expect(result.attempts?.[0]).not.toHaveProperty("personaBinding");
});
it("does not mark skipped telephony providers as missing persona bindings", async () => {
const result = await textToSpeechTelephony({
text: "Use telephony provider.",
cfg: {
messages: {
tts: {
enabled: true,
provider: "mock",
persona: "alfred",
personas: {
alfred: {
providers: {
mock: {
voice: "persona-voice",
},
},
},
},
},
},
},
});
expect(result.success).toBe(false);
expect(result.attempts?.[0]).toMatchObject({
provider: "mock",
outcome: "skipped",
reasonCode: "unsupported_for_telephony",
persona: "alfred",
});
expect(result.attempts?.[0]).not.toHaveProperty("personaBinding");
});
it("uses provider defaults when fallback policy allows missing persona bindings", async () => {
await synthesizeSpeech({
text: "Use neutral provider defaults.",
cfg: {
messages: {
tts: {
enabled: true,
provider: "mock",
persona: "alfred",
personas: {
alfred: {
fallbackPolicy: "provider-defaults",
prompt: {
profile: "A precise butler.",
},
},
},
},
},
},
});
expect(prepareSynthesisMock).toHaveBeenCalledWith(
expect.objectContaining({
persona: undefined,
personaProviderConfig: undefined,
}),
);
});
it("preserves persona prompts by default when provider bindings are missing", async () => {
await synthesizeSpeech({
text: "Use persona prompt.",
cfg: {
messages: {
tts: {
enabled: true,
provider: "mock",
persona: "alfred",
personas: {
alfred: {
prompt: {
profile: "A precise butler.",
},
},
},
},
},
},
});
expect(prepareSynthesisMock).toHaveBeenCalledWith(
expect.objectContaining({
persona: expect.objectContaining({ id: "alfred" }),
personaProviderConfig: undefined,
}),
);
});
it("skips unbound providers under fail policy while allowing bound fallbacks", async () => {
installSpeechProviders([
createMockSpeechProvider("mock", { autoSelectOrder: 1 }),
createMockSpeechProvider("fallback", { autoSelectOrder: 2 }),
]);
const result = await synthesizeSpeech({
text: "Use the first persona-bound provider.",
cfg: {
messages: {
tts: {
enabled: true,
provider: "mock",
persona: "alfred",
personas: {
alfred: {
fallbackPolicy: "fail",
providers: {
fallback: {
voice: "fallback-voice",
},
},
},
},
},
},
},
});
expect(result.success).toBe(true);
expect(result.provider).toBe("fallback");
expect(result.fallbackFrom).toBe("mock");
expect(result.attempts?.[0]).toMatchObject({
provider: "mock",
outcome: "skipped",
reasonCode: "not_configured",
persona: "alfred",
personaBinding: "missing",
error: "mock: persona alfred has no provider binding",
});
expect(result.attempts?.[1]).toMatchObject({
provider: "fallback",
outcome: "success",
persona: "alfred",
personaBinding: "applied",
});
});
}); });
describe("speech-core per-agent TTS config", () => { describe("speech-core per-agent TTS config", () => {

View File

@@ -12,6 +12,7 @@ import path from "node:path";
import { normalizeChannelId, type ChannelId } from "openclaw/plugin-sdk/channel-targets"; import { normalizeChannelId, type ChannelId } from "openclaw/plugin-sdk/channel-targets";
import type { import type {
OpenClawConfig, OpenClawConfig,
ResolvedTtsPersona,
TtsAutoMode, TtsAutoMode,
TtsConfig, TtsConfig,
TtsModelOverrideConfig, TtsModelOverrideConfig,
@@ -40,6 +41,7 @@ import {
normalizeSpeechProviderId, normalizeSpeechProviderId,
normalizeTtsAutoMode, normalizeTtsAutoMode,
parseTtsDirectives, parseTtsDirectives,
resolveEffectiveTtsConfig,
type ResolvedTtsConfig, type ResolvedTtsConfig,
type ResolvedTtsModelOverrides, type ResolvedTtsModelOverrides,
scheduleCleanup, scheduleCleanup,
@@ -62,13 +64,13 @@ const DEFAULT_TIMEOUT_MS = 30_000;
const DEFAULT_TTS_MAX_LENGTH = 1500; const DEFAULT_TTS_MAX_LENGTH = 1500;
const DEFAULT_TTS_SUMMARIZE = true; const DEFAULT_TTS_SUMMARIZE = true;
const DEFAULT_MAX_TEXT_LENGTH = 4096; const DEFAULT_MAX_TEXT_LENGTH = 4096;
const BLOCKED_MERGE_KEYS = new Set(["__proto__", "prototype", "constructor"]);
type TtsUserPrefs = { type TtsUserPrefs = {
tts?: { tts?: {
auto?: TtsAutoMode; auto?: TtsAutoMode;
enabled?: boolean; enabled?: boolean;
provider?: TtsProvider; provider?: TtsProvider;
persona?: string | null;
maxLength?: number; maxLength?: number;
summarize?: boolean; summarize?: boolean;
}; };
@@ -86,6 +88,8 @@ export type TtsProviderAttempt = {
provider: string; provider: string;
outcome: "success" | "skipped" | "failed"; outcome: "success" | "skipped" | "failed";
reasonCode: TtsAttemptReasonCode; reasonCode: TtsAttemptReasonCode;
persona?: string;
personaBinding?: "applied" | "missing" | "none";
latencyMs?: number; latencyMs?: number;
error?: string; error?: string;
}; };
@@ -96,6 +100,7 @@ export type TtsResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -111,6 +116,7 @@ export type TtsSynthesisResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -126,6 +132,7 @@ export type TtsTelephonyResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -139,6 +146,7 @@ type TtsStatusEntry = {
textLength: number; textLength: number;
summarized: boolean; summarized: boolean;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -162,6 +170,10 @@ function normalizeConfiguredSpeechProviderId(
return normalized === "edge" ? "microsoft" : normalized; return normalized === "edge" ? "microsoft" : normalized;
} }
function normalizeTtsPersonaId(personaId: string | null | undefined): string | undefined {
return normalizeOptionalLowercaseString(personaId ?? undefined);
}
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string { function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
if (prefsPath?.trim()) { if (prefsPath?.trim()) {
return resolveUserPath(prefsPath.trim()); return resolveUserPath(prefsPath.trim());
@@ -229,6 +241,87 @@ function asProviderConfigMap(value: unknown): Record<string, unknown> {
: {}; : {};
} }
function hasOwnProperty(value: object, key: string): boolean {
return Object.prototype.hasOwnProperty.call(value, key);
}
function normalizeProviderConfigMap(
value: unknown,
): Record<string, SpeechProviderConfig> | undefined {
const rawMap = asProviderConfigMap(value);
if (Object.keys(rawMap).length === 0) {
return undefined;
}
const next: Record<string, SpeechProviderConfig> = {};
for (const [providerId, providerConfig] of Object.entries(rawMap)) {
const normalized = normalizeConfiguredSpeechProviderId(providerId) ?? providerId;
next[normalized] = asProviderConfig(providerConfig);
}
return next;
}
function collectTtsPersonas(raw: TtsConfig): Record<string, ResolvedTtsPersona> {
const rawPersonas = asProviderConfigMap(raw.personas);
const personas: Record<string, ResolvedTtsPersona> = {};
for (const [id, value] of Object.entries(rawPersonas)) {
const normalizedId = normalizeTtsPersonaId(id);
if (!normalizedId || typeof value !== "object" || value === null || Array.isArray(value)) {
continue;
}
const persona = value as Omit<ResolvedTtsPersona, "id">;
personas[normalizedId] = {
...persona,
id: normalizedId,
provider: normalizeConfiguredSpeechProviderId(persona.provider) ?? persona.provider,
providers: normalizeProviderConfigMap(persona.providers),
};
}
return personas;
}
function resolvePersonaProviderConfig(
persona: ResolvedTtsPersona | undefined,
providerId: string,
): SpeechProviderConfig | undefined {
if (!persona?.providers) {
return undefined;
}
const normalized = normalizeConfiguredSpeechProviderId(providerId) ?? providerId;
if (hasOwnProperty(persona.providers, normalized)) {
return persona.providers[normalized];
}
if (hasOwnProperty(persona.providers, providerId)) {
return persona.providers[providerId];
}
return undefined;
}
function mergeProviderConfigWithPersona(params: {
providerConfig: SpeechProviderConfig;
persona?: ResolvedTtsPersona;
providerId: string;
}): {
providerConfig: SpeechProviderConfig;
personaProviderConfig?: SpeechProviderConfig;
personaBinding: "applied" | "missing" | "none";
} {
if (!params.persona) {
return { providerConfig: params.providerConfig, personaBinding: "none" };
}
const personaProviderConfig = resolvePersonaProviderConfig(params.persona, params.providerId);
if (!personaProviderConfig) {
return { providerConfig: params.providerConfig, personaBinding: "missing" };
}
return {
providerConfig: {
...params.providerConfig,
...personaProviderConfig,
},
personaProviderConfig,
personaBinding: "applied",
};
}
function resolveRawProviderConfig( function resolveRawProviderConfig(
raw: TtsConfig | undefined, raw: TtsConfig | undefined,
providerId: string, providerId: string,
@@ -241,48 +334,6 @@ function resolveRawProviderConfig(
return asProviderConfig(direct); return asProviderConfig(direct);
} }
function isPlainObject(value: unknown): value is Record<string, unknown> {
return Boolean(value) && typeof value === "object" && !Array.isArray(value);
}
function deepMergeDefined(base: unknown, override: unknown): unknown {
if (!isPlainObject(base) || !isPlainObject(override)) {
return override === undefined ? base : override;
}
const result: Record<string, unknown> = { ...base };
for (const [key, value] of Object.entries(override)) {
if (BLOCKED_MERGE_KEYS.has(key) || value === undefined) {
continue;
}
const existing = result[key];
result[key] = key in result ? deepMergeDefined(existing, value) : value;
}
return result;
}
function normalizeAgentConfigId(value: string | undefined | null): string {
return normalizeLowercaseStringOrEmpty(value);
}
function resolveAgentTtsOverride(
cfg: OpenClawConfig,
agentId: string | undefined,
): TtsConfig | undefined {
if (!agentId || !Array.isArray(cfg.agents?.list)) {
return undefined;
}
const normalized = normalizeAgentConfigId(agentId);
const agent = cfg.agents.list.find((entry) => normalizeAgentConfigId(entry.id) === normalized);
return agent?.tts;
}
function resolveEffectiveTtsRawConfig(cfg: OpenClawConfig, agentId?: string): TtsConfig {
const base = cfg.messages?.tts ?? {};
const override = resolveAgentTtsOverride(cfg, agentId);
return deepMergeDefined(base, override ?? {}) as TtsConfig;
}
function resolveLazyProviderConfig( function resolveLazyProviderConfig(
config: ResolvedTtsConfig, config: ResolvedTtsConfig,
providerId: string, providerId: string,
@@ -325,6 +376,8 @@ function collectDirectProviderConfigEntries(raw: TtsConfig): Record<string, Spee
"maxTextLength", "maxTextLength",
"mode", "mode",
"modelOverrides", "modelOverrides",
"persona",
"personas",
"prefsPath", "prefsPath",
"provider", "provider",
"providers", "providers",
@@ -357,10 +410,11 @@ export function getResolvedSpeechProviderConfig(
} }
export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): ResolvedTtsConfig { export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): ResolvedTtsConfig {
const raw: TtsConfig = resolveEffectiveTtsRawConfig(cfg, agentId); const raw: TtsConfig = resolveEffectiveTtsConfig(cfg, agentId);
const providerSource = raw.provider ? "config" : "default"; const providerSource = raw.provider ? "config" : "default";
const timeoutMs = raw.timeoutMs ?? DEFAULT_TIMEOUT_MS; const timeoutMs = raw.timeoutMs ?? DEFAULT_TIMEOUT_MS;
const auto = resolveConfiguredTtsAutoMode(raw); const auto = resolveConfiguredTtsAutoMode(raw);
const persona = normalizeTtsPersonaId(raw.persona);
return { return {
auto, auto,
mode: raw.mode ?? "final", mode: raw.mode ?? "final",
@@ -368,6 +422,8 @@ export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): Resolve
normalizeConfiguredSpeechProviderId(raw.provider) ?? normalizeConfiguredSpeechProviderId(raw.provider) ??
(providerSource === "config" ? (normalizeOptionalLowercaseString(raw.provider) ?? "") : ""), (providerSource === "config" ? (normalizeOptionalLowercaseString(raw.provider) ?? "") : ""),
providerSource, providerSource,
persona,
personas: collectTtsPersonas(raw),
summaryModel: normalizeOptionalString(raw.summaryModel), summaryModel: normalizeOptionalString(raw.summaryModel),
modelOverrides: resolveModelOverridePolicy(raw.modelOverrides), modelOverrides: resolveModelOverridePolicy(raw.modelOverrides),
providerConfigs: collectDirectProviderConfigEntries(raw), providerConfigs: collectDirectProviderConfigEntries(raw),
@@ -418,7 +474,7 @@ function resolveEffectiveTtsAutoState(params: {
autoMode: TtsAutoMode; autoMode: TtsAutoMode;
prefsPath: string; prefsPath: string;
} { } {
const raw: TtsConfig = resolveEffectiveTtsRawConfig(params.cfg, params.agentId); const raw: TtsConfig = resolveEffectiveTtsConfig(params.cfg, params.agentId);
const prefsPath = resolveTtsPrefsPathValue(raw.prefsPath); const prefsPath = resolveTtsPrefsPathValue(raw.prefsPath);
const sessionAuto = normalizeTtsAutoMode(params.sessionAuto); const sessionAuto = normalizeTtsAutoMode(params.sessionAuto);
if (sessionAuto) { if (sessionAuto) {
@@ -443,6 +499,7 @@ export function buildTtsSystemPromptHint(
return undefined; return undefined;
} }
const _config = resolveTtsConfig(cfg, agentId); const _config = resolveTtsConfig(cfg, agentId);
const persona = getTtsPersona(_config, prefsPath);
const maxLength = getTtsMaxLength(prefsPath); const maxLength = getTtsMaxLength(prefsPath);
const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off"; const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off";
const autoHint = const autoHint =
@@ -454,6 +511,9 @@ export function buildTtsSystemPromptHint(
return [ return [
"Voice (TTS) is enabled.", "Voice (TTS) is enabled.",
autoHint, autoHint,
persona
? `Active TTS persona: ${persona.label ?? persona.id}${persona.description ? ` - ${persona.description}` : ""}.`
: undefined,
`Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`, `Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`,
"Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.", "Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.",
] ]
@@ -523,6 +583,13 @@ export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): Tt
if (prefsProvider) { if (prefsProvider) {
return prefsProvider; return prefsProvider;
} }
const activePersona = resolveTtsPersonaFromPrefs(config, prefs);
const personaProvider =
canonicalizeSpeechProviderId(activePersona?.provider, config.sourceConfig) ??
normalizeConfiguredSpeechProviderId(activePersona?.provider);
if (personaProvider && getSpeechProvider(personaProvider, config.sourceConfig)) {
return personaProvider;
}
if (config.providerSource === "config") { if (config.providerSource === "config") {
return normalizeConfiguredSpeechProviderId(config.provider) ?? config.provider; return normalizeConfiguredSpeechProviderId(config.provider) ?? config.provider;
} }
@@ -542,6 +609,38 @@ export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): Tt
return config.provider; return config.provider;
} }
function resolveTtsPersonaFromPrefs(
config: ResolvedTtsConfig,
prefs: TtsUserPrefs,
): ResolvedTtsPersona | undefined {
if (prefs.tts && hasOwnProperty(prefs.tts, "persona")) {
const prefsPersona = normalizeTtsPersonaId(prefs.tts.persona);
return prefsPersona ? config.personas[prefsPersona] : undefined;
}
const configPersona = normalizeTtsPersonaId(config.persona);
return configPersona ? config.personas[configPersona] : undefined;
}
export function getTtsPersona(
config: ResolvedTtsConfig,
prefsPath: string,
): ResolvedTtsPersona | undefined {
return resolveTtsPersonaFromPrefs(config, readPrefs(prefsPath));
}
export function listTtsPersonas(config: ResolvedTtsConfig): ResolvedTtsPersona[] {
return Object.values(config.personas).toSorted((left, right) => left.id.localeCompare(right.id));
}
export function setTtsPersona(prefsPath: string, persona: string | null | undefined): void {
updatePrefs(prefsPath, (prefs) => {
const next = { ...prefs.tts };
const normalized = normalizeTtsPersonaId(persona);
next.persona = normalized ?? null;
prefs.tts = next;
});
}
export function setTtsProvider(prefsPath: string, provider: TtsProvider): void { export function setTtsProvider(prefsPath: string, provider: TtsProvider): void {
updatePrefs(prefsPath, (prefs) => { updatePrefs(prefsPath, (prefs) => {
prefs.tts = { ...prefs.tts, provider: canonicalizeSpeechProviderId(provider) ?? provider }; prefs.tts = { ...prefs.tts, provider: canonicalizeSpeechProviderId(provider) ?? provider };
@@ -714,17 +813,20 @@ function buildTtsFailureResult(
errors: string[], errors: string[],
attemptedProviders?: string[], attemptedProviders?: string[],
attempts?: TtsProviderAttempt[], attempts?: TtsProviderAttempt[],
persona?: string,
): { ): {
success: false; success: false;
error: string; error: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
persona?: string;
} { } {
return { return {
success: false, success: false,
error: `TTS conversion failed: ${errors.join("; ") || "no providers available"}`, error: `TTS conversion failed: ${errors.join("; ") || "no providers available"}`,
attemptedProviders, attemptedProviders,
attempts, attempts,
persona,
}; };
} }
@@ -733,17 +835,22 @@ type TtsProviderReadyResolution =
kind: "ready"; kind: "ready";
provider: NonNullable<ReturnType<typeof getSpeechProvider>>; provider: NonNullable<ReturnType<typeof getSpeechProvider>>;
providerConfig: SpeechProviderConfig; providerConfig: SpeechProviderConfig;
personaProviderConfig?: SpeechProviderConfig;
synthesisPersona?: ResolvedTtsPersona;
personaBinding: "applied" | "missing" | "none";
} }
| { | {
kind: "skip"; kind: "skip";
reasonCode: "no_provider_registered" | "not_configured" | "unsupported_for_telephony"; reasonCode: "no_provider_registered" | "not_configured" | "unsupported_for_telephony";
message: string; message: string;
personaBinding?: "missing";
}; };
function resolveReadySpeechProvider(params: { function resolveReadySpeechProvider(params: {
provider: TtsProvider; provider: TtsProvider;
cfg: OpenClawConfig; cfg: OpenClawConfig;
config: ResolvedTtsConfig; config: ResolvedTtsConfig;
persona?: ResolvedTtsPersona;
requireTelephony?: boolean; requireTelephony?: boolean;
}): TtsProviderReadyResolution { }): TtsProviderReadyResolution {
const resolvedProvider = getSpeechProvider(params.provider, params.cfg); const resolvedProvider = getSpeechProvider(params.provider, params.cfg);
@@ -759,10 +866,23 @@ function resolveReadySpeechProvider(params: {
resolvedProvider.id, resolvedProvider.id,
params.cfg, params.cfg,
); );
const merged = mergeProviderConfigWithPersona({
providerConfig,
persona: params.persona,
providerId: resolvedProvider.id,
});
if (params.persona?.fallbackPolicy === "fail" && merged.personaBinding === "missing") {
return {
kind: "skip",
reasonCode: "not_configured",
message: `${params.provider}: persona ${params.persona.id} has no provider binding`,
personaBinding: "missing",
};
}
if ( if (
!resolvedProvider.isConfigured({ !resolvedProvider.isConfigured({
cfg: params.cfg, cfg: params.cfg,
providerConfig, providerConfig: merged.providerConfig,
timeoutMs: params.config.timeoutMs, timeoutMs: params.config.timeoutMs,
}) })
) { ) {
@@ -782,7 +902,56 @@ function resolveReadySpeechProvider(params: {
return { return {
kind: "ready", kind: "ready",
provider: resolvedProvider, provider: resolvedProvider,
providerConfig, providerConfig: merged.providerConfig,
personaProviderConfig: merged.personaProviderConfig,
synthesisPersona:
params.persona?.fallbackPolicy === "provider-defaults" && merged.personaBinding === "missing"
? undefined
: params.persona,
personaBinding: merged.personaBinding,
};
}
async function prepareSpeechSynthesis(params: {
provider: NonNullable<ReturnType<typeof getSpeechProvider>>;
text: string;
cfg: OpenClawConfig;
providerConfig: SpeechProviderConfig;
providerOverrides?: SpeechProviderOverrides;
persona?: ResolvedTtsPersona;
personaProviderConfig?: SpeechProviderConfig;
target: "audio-file" | "voice-note" | "telephony";
timeoutMs: number;
}): Promise<{
text: string;
providerConfig: SpeechProviderConfig;
providerOverrides?: SpeechProviderOverrides;
}> {
if (!params.provider.prepareSynthesis) {
return {
text: params.text,
providerConfig: params.providerConfig,
providerOverrides: params.providerOverrides,
};
}
const prepared = await params.provider.prepareSynthesis({
text: params.text,
cfg: params.cfg,
providerConfig: params.providerConfig,
providerOverrides: params.providerOverrides,
persona: params.persona,
personaProviderConfig: params.personaProviderConfig,
target: params.target,
timeoutMs: params.timeoutMs,
});
return {
text: prepared?.text ?? params.text,
providerConfig: prepared?.providerConfig
? { ...params.providerConfig, ...prepared.providerConfig }
: params.providerConfig,
providerOverrides: prepared?.providerOverrides
? { ...params.providerOverrides, ...prepared.providerOverrides }
: params.providerOverrides,
}; };
} }
@@ -796,6 +965,7 @@ function resolveTtsRequestSetup(params: {
}): }):
| { | {
config: ResolvedTtsConfig; config: ResolvedTtsConfig;
persona?: ResolvedTtsPersona;
providers: TtsProvider[]; providers: TtsProvider[];
} }
| { | {
@@ -814,6 +984,7 @@ function resolveTtsRequestSetup(params: {
canonicalizeSpeechProviderId(params.providerOverride, params.cfg) ?? userProvider; canonicalizeSpeechProviderId(params.providerOverride, params.cfg) ?? userProvider;
return { return {
config, config,
persona: getTtsPersona(config, prefsPath),
providers: params.disableFallback ? [provider] : resolveTtsProviderOrder(provider, params.cfg), providers: params.disableFallback ? [provider] : resolveTtsProviderOrder(provider, params.cfg),
}; };
} }
@@ -833,6 +1004,7 @@ export async function textToSpeech(params: {
return { return {
success: false, success: false,
error: synthesis.error ?? "TTS conversion failed", error: synthesis.error ?? "TTS conversion failed",
persona: synthesis.persona,
attemptedProviders: synthesis.attemptedProviders, attemptedProviders: synthesis.attemptedProviders,
attempts: synthesis.attempts, attempts: synthesis.attempts,
}; };
@@ -850,6 +1022,7 @@ export async function textToSpeech(params: {
audioPath, audioPath,
latencyMs: synthesis.latencyMs, latencyMs: synthesis.latencyMs,
provider: synthesis.provider, provider: synthesis.provider,
persona: synthesis.persona,
fallbackFrom: synthesis.fallbackFrom, fallbackFrom: synthesis.fallbackFrom,
attemptedProviders: synthesis.attemptedProviders, attemptedProviders: synthesis.attemptedProviders,
attempts: synthesis.attempts, attempts: synthesis.attempts,
@@ -886,7 +1059,7 @@ export async function synthesizeSpeech(params: {
return { success: false, error: setup.error }; return { success: false, error: setup.error };
} }
const { config, providers } = setup; const { config, persona, providers } = setup;
const timeoutMs = params.timeoutMs ?? config.timeoutMs; const timeoutMs = params.timeoutMs ?? config.timeoutMs;
const target = supportsNativeVoiceNoteTts(params.channel) ? "voice-note" : "audio-file"; const target = supportsNativeVoiceNoteTts(params.channel) ? "voice-note" : "audio-file";
@@ -906,6 +1079,7 @@ export async function synthesizeSpeech(params: {
provider, provider,
cfg: params.cfg, cfg: params.cfg,
config, config,
persona,
}); });
if (resolvedProvider.kind === "skip") { if (resolvedProvider.kind === "skip") {
errors.push(resolvedProvider.message); errors.push(resolvedProvider.message);
@@ -913,17 +1087,32 @@ export async function synthesizeSpeech(params: {
provider, provider,
outcome: "skipped", outcome: "skipped",
reasonCode: resolvedProvider.reasonCode, reasonCode: resolvedProvider.reasonCode,
persona: persona?.id,
...(resolvedProvider.personaBinding
? { personaBinding: resolvedProvider.personaBinding }
: {}),
error: resolvedProvider.message, error: resolvedProvider.message,
}); });
logVerbose(`TTS: provider ${provider} skipped (${resolvedProvider.message})`); logVerbose(`TTS: provider ${provider} skipped (${resolvedProvider.message})`);
continue; continue;
} }
const synthesis = await resolvedProvider.provider.synthesize({ const prepared = await prepareSpeechSynthesis({
provider: resolvedProvider.provider,
text: params.text, text: params.text,
cfg: params.cfg, cfg: params.cfg,
providerConfig: resolvedProvider.providerConfig, providerConfig: resolvedProvider.providerConfig,
target,
providerOverrides: params.overrides?.providerOverrides?.[resolvedProvider.provider.id], providerOverrides: params.overrides?.providerOverrides?.[resolvedProvider.provider.id],
persona: resolvedProvider.synthesisPersona,
personaProviderConfig: resolvedProvider.personaProviderConfig,
target,
timeoutMs,
});
const synthesis = await resolvedProvider.provider.synthesize({
text: prepared.text,
cfg: params.cfg,
providerConfig: prepared.providerConfig,
target,
providerOverrides: prepared.providerOverrides,
timeoutMs, timeoutMs,
}); });
const latencyMs = Date.now() - providerStart; const latencyMs = Date.now() - providerStart;
@@ -931,6 +1120,8 @@ export async function synthesizeSpeech(params: {
provider, provider,
outcome: "success", outcome: "success",
reasonCode: "success", reasonCode: "success",
persona: persona?.id,
personaBinding: resolvedProvider.personaBinding,
latencyMs, latencyMs,
}); });
return { return {
@@ -938,6 +1129,7 @@ export async function synthesizeSpeech(params: {
audioBuffer: synthesis.audioBuffer, audioBuffer: synthesis.audioBuffer,
latencyMs, latencyMs,
provider, provider,
persona: persona?.id,
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined, fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
attemptedProviders, attemptedProviders,
attempts, attempts,
@@ -956,6 +1148,13 @@ export async function synthesizeSpeech(params: {
reasonCode: reasonCode:
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error", err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
latencyMs, latencyMs,
persona: persona?.id,
personaBinding:
resolvePersonaProviderConfig(persona, provider) != null
? "applied"
: persona
? "missing"
: "none",
error: errorMsg, error: errorMsg,
}); });
const rawError = sanitizeTtsErrorForLog(err); const rawError = sanitizeTtsErrorForLog(err);
@@ -970,7 +1169,7 @@ export async function synthesizeSpeech(params: {
} }
} }
return buildTtsFailureResult(errors, attemptedProviders, attempts); return buildTtsFailureResult(errors, attemptedProviders, attempts, persona?.id);
} }
export async function textToSpeechTelephony(params: { export async function textToSpeechTelephony(params: {
@@ -987,7 +1186,7 @@ export async function textToSpeechTelephony(params: {
return { success: false, error: setup.error }; return { success: false, error: setup.error };
} }
const { config, providers } = setup; const { config, persona, providers } = setup;
const errors: string[] = []; const errors: string[] = [];
const attemptedProviders: string[] = []; const attemptedProviders: string[] = [];
const attempts: TtsProviderAttempt[] = []; const attempts: TtsProviderAttempt[] = [];
@@ -1004,6 +1203,7 @@ export async function textToSpeechTelephony(params: {
provider, provider,
cfg: params.cfg, cfg: params.cfg,
config, config,
persona,
requireTelephony: true, requireTelephony: true,
}); });
if (resolvedProvider.kind === "skip") { if (resolvedProvider.kind === "skip") {
@@ -1012,28 +1212,32 @@ export async function textToSpeechTelephony(params: {
provider, provider,
outcome: "skipped", outcome: "skipped",
reasonCode: resolvedProvider.reasonCode, reasonCode: resolvedProvider.reasonCode,
persona: persona?.id,
...(resolvedProvider.personaBinding
? { personaBinding: resolvedProvider.personaBinding }
: {}),
error: resolvedProvider.message, error: resolvedProvider.message,
}); });
logVerbose(`TTS telephony: provider ${provider} skipped (${resolvedProvider.message})`); logVerbose(`TTS telephony: provider ${provider} skipped (${resolvedProvider.message})`);
continue; continue;
} }
const synthesizeTelephony = resolvedProvider.provider.synthesizeTelephony; const synthesizeTelephony = resolvedProvider.provider.synthesizeTelephony as NonNullable<
if (!synthesizeTelephony) { typeof resolvedProvider.provider.synthesizeTelephony
const message = `${provider}: unsupported for telephony`; >;
errors.push(message); const prepared = await prepareSpeechSynthesis({
attempts.push({ provider: resolvedProvider.provider,
provider,
outcome: "skipped",
reasonCode: "unsupported_for_telephony",
error: message,
});
logVerbose(`TTS telephony: provider ${provider} skipped (${message})`);
continue;
}
const synthesis = await synthesizeTelephony({
text: params.text, text: params.text,
cfg: params.cfg, cfg: params.cfg,
providerConfig: resolvedProvider.providerConfig, providerConfig: resolvedProvider.providerConfig,
persona: resolvedProvider.synthesisPersona,
personaProviderConfig: resolvedProvider.personaProviderConfig,
target: "telephony",
timeoutMs: config.timeoutMs,
});
const synthesis = await synthesizeTelephony({
text: prepared.text,
cfg: params.cfg,
providerConfig: prepared.providerConfig,
timeoutMs: config.timeoutMs, timeoutMs: config.timeoutMs,
}); });
const latencyMs = Date.now() - providerStart; const latencyMs = Date.now() - providerStart;
@@ -1041,6 +1245,8 @@ export async function textToSpeechTelephony(params: {
provider, provider,
outcome: "success", outcome: "success",
reasonCode: "success", reasonCode: "success",
persona: persona?.id,
personaBinding: resolvedProvider.personaBinding,
latencyMs, latencyMs,
}); });
@@ -1049,6 +1255,7 @@ export async function textToSpeechTelephony(params: {
audioBuffer: synthesis.audioBuffer, audioBuffer: synthesis.audioBuffer,
latencyMs, latencyMs,
provider, provider,
persona: persona?.id,
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined, fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
attemptedProviders, attemptedProviders,
attempts, attempts,
@@ -1065,6 +1272,13 @@ export async function textToSpeechTelephony(params: {
reasonCode: reasonCode:
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error", err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
latencyMs, latencyMs,
persona: persona?.id,
personaBinding:
resolvePersonaProviderConfig(persona, provider) != null
? "applied"
: persona
? "missing"
: "none",
error: errorMsg, error: errorMsg,
}); });
const rawError = sanitizeTtsErrorForLog(err); const rawError = sanitizeTtsErrorForLog(err);
@@ -1079,7 +1293,7 @@ export async function textToSpeechTelephony(params: {
} }
} }
return buildTtsFailureResult(errors, attemptedProviders, attempts); return buildTtsFailureResult(errors, attemptedProviders, attempts, persona?.id);
} }
export async function listSpeechVoices(params: { export async function listSpeechVoices(params: {
@@ -1250,6 +1464,7 @@ export async function maybeApplyTtsToPayload(params: {
textLength: text.length, textLength: text.length,
summarized: wasSummarized, summarized: wasSummarized,
provider: result.provider, provider: result.provider,
persona: result.persona,
fallbackFrom: result.fallbackFrom, fallbackFrom: result.fallbackFrom,
attemptedProviders: result.attemptedProviders, attemptedProviders: result.attemptedProviders,
attempts: result.attempts, attempts: result.attempts,
@@ -1268,6 +1483,7 @@ export async function maybeApplyTtsToPayload(params: {
success: false, success: false,
textLength: text.length, textLength: text.length,
summarized: wasSummarized, summarized: wasSummarized,
persona: result.persona,
attemptedProviders: result.attemptedProviders, attemptedProviders: result.attemptedProviders,
attempts: result.attempts, attempts: result.attempts,
error: result.error, error: result.error,

View File

@@ -6,6 +6,7 @@ import {
type SpeechProviderConfig, type SpeechProviderConfig,
type SpeechProviderOverrides, type SpeechProviderOverrides,
type SpeechProviderPlugin, type SpeechProviderPlugin,
type SpeechSynthesisTarget,
} from "openclaw/plugin-sdk/speech"; } from "openclaw/plugin-sdk/speech";
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime"; import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
import { import {
@@ -48,7 +49,7 @@ function normalizeXaiSpeechResponseFormat(value: unknown): XaiSpeechResponseForm
} }
function resolveSpeechResponseFormat( function resolveSpeechResponseFormat(
target: "audio-file" | "voice-note", target: SpeechSynthesisTarget,
configuredFormat?: XaiSpeechResponseFormat, configuredFormat?: XaiSpeechResponseFormat,
): XaiSpeechResponseFormat { ): XaiSpeechResponseFormat {
if (configuredFormat) { if (configuredFormat) {

View File

@@ -9,16 +9,19 @@ const ttsMocks = vi.hoisted(() => ({
getResolvedSpeechProviderConfig: vi.fn(), getResolvedSpeechProviderConfig: vi.fn(),
getLastTtsAttempt: vi.fn(), getLastTtsAttempt: vi.fn(),
getTtsMaxLength: vi.fn(), getTtsMaxLength: vi.fn(),
getTtsPersona: vi.fn(),
getTtsProvider: vi.fn(), getTtsProvider: vi.fn(),
isSummarizationEnabled: vi.fn(), isSummarizationEnabled: vi.fn(),
isTtsEnabled: vi.fn(), isTtsEnabled: vi.fn(),
isTtsProviderConfigured: vi.fn(), isTtsProviderConfigured: vi.fn(),
listTtsPersonas: vi.fn(),
resolveTtsConfig: vi.fn(), resolveTtsConfig: vi.fn(),
resolveTtsPrefsPath: vi.fn(), resolveTtsPrefsPath: vi.fn(),
setLastTtsAttempt: vi.fn(), setLastTtsAttempt: vi.fn(),
setSummarizationEnabled: vi.fn(), setSummarizationEnabled: vi.fn(),
setTtsEnabled: vi.fn(), setTtsEnabled: vi.fn(),
setTtsMaxLength: vi.fn(), setTtsMaxLength: vi.fn(),
setTtsPersona: vi.fn(),
setTtsProvider: vi.fn(), setTtsProvider: vi.fn(),
textToSpeech: vi.fn(), textToSpeech: vi.fn(),
})); }));
@@ -66,10 +69,12 @@ describe("handleTtsCommands status fallback reporting", () => {
ttsMocks.resolveTtsPrefsPath.mockReturnValue("/tmp/tts-prefs.json"); ttsMocks.resolveTtsPrefsPath.mockReturnValue("/tmp/tts-prefs.json");
ttsMocks.isTtsEnabled.mockReturnValue(true); ttsMocks.isTtsEnabled.mockReturnValue(true);
ttsMocks.getTtsProvider.mockReturnValue(PRIMARY_TTS_PROVIDER); ttsMocks.getTtsProvider.mockReturnValue(PRIMARY_TTS_PROVIDER);
ttsMocks.getTtsPersona.mockReturnValue(undefined);
ttsMocks.isTtsProviderConfigured.mockReturnValue(true); ttsMocks.isTtsProviderConfigured.mockReturnValue(true);
ttsMocks.getTtsMaxLength.mockReturnValue(1500); ttsMocks.getTtsMaxLength.mockReturnValue(1500);
ttsMocks.isSummarizationEnabled.mockReturnValue(true); ttsMocks.isSummarizationEnabled.mockReturnValue(true);
ttsMocks.getLastTtsAttempt.mockReturnValue(undefined); ttsMocks.getLastTtsAttempt.mockReturnValue(undefined);
ttsMocks.listTtsPersonas.mockReturnValue([]);
}); });
it("shows fallback provider details for successful attempts", async () => { it("shows fallback provider details for successful attempts", async () => {
@@ -234,6 +239,24 @@ describe("handleTtsCommands status fallback reporting", () => {
); );
}); });
it("lists and sets configured TTS personas", async () => {
ttsMocks.listTtsPersonas.mockReturnValue([
{
id: "alfred",
label: "Alfred",
provider: "google",
},
]);
const listResult = await handleTtsCommands(buildTtsParams("/tts persona"), true);
expect(listResult?.shouldContinue).toBe(false);
expect(listResult?.reply?.text).toContain("alfred (Alfred) provider=google");
const setResult = await handleTtsCommands(buildTtsParams("/tts persona alfred"), true);
expect(setResult?.shouldContinue).toBe(false);
expect(ttsMocks.setTtsPersona).toHaveBeenCalledWith("/tmp/tts-prefs.json", "alfred");
});
it("reads the latest assistant transcript reply once", async () => { it("reads the latest assistant transcript reply once", async () => {
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-tts-latest-")); const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-tts-latest-"));
const sessionFile = path.join(tempDir, "session.jsonl"); const sessionFile = path.join(tempDir, "session.jsonl");

View File

@@ -14,16 +14,19 @@ import {
getResolvedSpeechProviderConfig, getResolvedSpeechProviderConfig,
getLastTtsAttempt, getLastTtsAttempt,
getTtsMaxLength, getTtsMaxLength,
getTtsPersona,
getTtsProvider, getTtsProvider,
isSummarizationEnabled, isSummarizationEnabled,
isTtsEnabled, isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
listTtsPersonas,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
setLastTtsAttempt, setLastTtsAttempt,
setSummarizationEnabled, setSummarizationEnabled,
setTtsEnabled, setTtsEnabled,
setTtsMaxLength, setTtsMaxLength,
setTtsPersona,
setTtsProvider, setTtsProvider,
textToSpeech, textToSpeech,
} from "../../tts/tts.js"; } from "../../tts/tts.js";
@@ -68,7 +71,11 @@ function formatAttemptDetails(attempts: TtsAttemptDetail[] | undefined): string
.map((attempt) => { .map((attempt) => {
const reason = attempt.reasonCode === "success" ? "ok" : attempt.reasonCode; const reason = attempt.reasonCode === "success" ? "ok" : attempt.reasonCode;
const latency = Number.isFinite(attempt.latencyMs) ? ` ${attempt.latencyMs}ms` : ""; const latency = Number.isFinite(attempt.latencyMs) ? ` ${attempt.latencyMs}ms` : "";
return `${attempt.provider}:${attempt.outcome}(${reason})${latency}`; const persona =
attempt.persona && attempt.personaBinding && attempt.personaBinding !== "none"
? ` persona=${attempt.persona}:${attempt.personaBinding}`
: "";
return `${attempt.provider}:${attempt.outcome}(${reason})${persona}${latency}`;
}) })
.join(", "); .join(", ");
} }
@@ -83,6 +90,7 @@ function ttsUsage(): ReplyPayload {
`• /tts off — Disable TTS\n` + `• /tts off — Disable TTS\n` +
`• /tts status — Show current settings\n` + `• /tts status — Show current settings\n` +
`• /tts provider [name] — View/change provider\n` + `• /tts provider [name] — View/change provider\n` +
`• /tts persona [id|off] — View/change persona\n` +
`• /tts limit [number] — View/change text limit\n` + `• /tts limit [number] — View/change text limit\n` +
`• /tts summary [on|off] — View/change auto-summary\n` + `• /tts summary [on|off] — View/change auto-summary\n` +
`• /tts audio <text> — Generate audio from text\n` + `• /tts audio <text> — Generate audio from text\n` +
@@ -96,6 +104,7 @@ function ttsUsage(): ReplyPayload {
`• Summary OFF: Truncates text, then generates audio\n\n` + `• Summary OFF: Truncates text, then generates audio\n\n` +
`**Examples:**\n` + `**Examples:**\n` +
`/tts provider <id>\n` + `/tts provider <id>\n` +
`/tts persona <id>\n` +
`/tts limit 2000\n` + `/tts limit 2000\n` +
`/tts latest\n` + `/tts latest\n` +
`/tts audio Hello, this is a test!`, `/tts audio Hello, this is a test!`,
@@ -129,6 +138,7 @@ async function buildTtsAudioReply(params: {
textLength: params.text.length, textLength: params.text.length,
summarized: false, summarized: false,
provider: result.provider, provider: result.provider,
persona: result.persona,
fallbackFrom: result.fallbackFrom, fallbackFrom: result.fallbackFrom,
attemptedProviders: result.attemptedProviders, attemptedProviders: result.attemptedProviders,
attempts: result.attempts, attempts: result.attempts,
@@ -150,6 +160,7 @@ async function buildTtsAudioReply(params: {
success: false, success: false,
textLength: params.text.length, textLength: params.text.length,
summarized: false, summarized: false,
persona: result.persona,
attemptedProviders: result.attemptedProviders, attemptedProviders: result.attemptedProviders,
attempts: result.attempts, attempts: result.attempts,
error: result.error, error: result.error,
@@ -349,6 +360,50 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
}; };
} }
if (action === "persona") {
const personas = listTtsPersonas(config);
const activePersona = getTtsPersona(config, prefsPath);
if (!args.trim()) {
const lines = [
"🎭 TTS persona",
`Active: ${activePersona?.id ?? "none"}`,
personas.length > 0
? personas
.map((persona) => {
const label = persona.label ? ` (${persona.label})` : "";
const provider = persona.provider ? ` provider=${persona.provider}` : "";
return `${persona.id}${label}${provider}`;
})
.join("\n")
: "No personas configured.",
"Usage: /tts persona <id> | off",
];
return { shouldContinue: false, reply: { text: lines.join("\n") } };
}
const requested = normalizeOptionalLowercaseString(args) ?? "";
if (requested === "off" || requested === "none" || requested === "default") {
setTtsPersona(prefsPath, null);
return { shouldContinue: false, reply: { text: "✅ TTS persona disabled." } };
}
const persona = personas.find((entry) => entry.id === requested);
if (!persona) {
return {
shouldContinue: false,
reply: {
text:
`❌ Unknown TTS persona: ${requested || args}.\n` +
`Use /tts persona to list configured personas.`,
},
};
}
setTtsPersona(prefsPath, persona.id);
return {
shouldContinue: false,
reply: { text: `✅ TTS persona set to ${persona.id}.` },
};
}
if (action === "limit") { if (action === "limit") {
if (!args.trim()) { if (!args.trim()) {
const currentLimit = getTtsMaxLength(prefsPath); const currentLimit = getTtsMaxLength(prefsPath);
@@ -410,6 +465,7 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
if (action === "status") { if (action === "status") {
const enabled = isTtsEnabled(config, prefsPath); const enabled = isTtsEnabled(config, prefsPath);
const provider = getTtsProvider(config, prefsPath); const provider = getTtsProvider(config, prefsPath);
const persona = getTtsPersona(config, prefsPath);
const hasKey = isTtsProviderConfigured(config, provider, params.cfg); const hasKey = isTtsProviderConfigured(config, provider, params.cfg);
const maxLength = getTtsMaxLength(prefsPath); const maxLength = getTtsMaxLength(prefsPath);
const summarize = isSummarizationEnabled(prefsPath); const summarize = isSummarizationEnabled(prefsPath);
@@ -419,6 +475,7 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
`State: ${enabled ? "✅ enabled" : "❌ disabled"}`, `State: ${enabled ? "✅ enabled" : "❌ disabled"}`,
`Chat override: ${params.sessionEntry?.ttsAuto ?? "default"}`, `Chat override: ${params.sessionEntry?.ttsAuto ?? "default"}`,
`Provider: ${provider} (${hasKey ? "✅ configured" : "❌ not configured"})`, `Provider: ${provider} (${hasKey ? "✅ configured" : "❌ not configured"})`,
`Persona: ${persona?.id ?? "none"}`,
`Text limit: ${maxLength} chars`, `Text limit: ${maxLength} chars`,
`Auto-summary: ${summarize ? "on" : "off"}`, `Auto-summary: ${summarize ? "on" : "off"}`,
]; ];
@@ -429,6 +486,9 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
lines.push(`Text: ${last.textLength} chars${last.summarized ? " (summarized)" : ""}`); lines.push(`Text: ${last.textLength} chars${last.summarized ? " (summarized)" : ""}`);
if (last.success) { if (last.success) {
lines.push(`Provider: ${last.provider ?? "unknown"}`); lines.push(`Provider: ${last.provider ?? "unknown"}`);
if (last.persona) {
lines.push(`Persona: ${last.persona}`);
}
if (last.fallbackFrom && last.provider && last.fallbackFrom !== last.provider) { if (last.fallbackFrom && last.provider && last.fallbackFrom !== last.provider) {
lines.push(`Fallback: ${last.fallbackFrom} -> ${last.provider}`); lines.push(`Fallback: ${last.fallbackFrom} -> ${last.provider}`);
} }

View File

@@ -73,6 +73,7 @@ const mocks = vi.hoisted(() => ({
attempts: [], attempts: [],
})), })),
setTtsProvider: vi.fn(), setTtsProvider: vi.fn(),
setTtsPersona: vi.fn(),
resolveExplicitTtsOverrides: vi.fn( resolveExplicitTtsOverrides: vi.fn(
({ ({
provider, provider,
@@ -220,11 +221,14 @@ vi.mock("../video-generation/runtime.js", () => ({
})); }));
vi.mock("../tts/tts.js", () => ({ vi.mock("../tts/tts.js", () => ({
getTtsPersona: vi.fn(() => undefined),
getTtsProvider: vi.fn(() => "openai"), getTtsProvider: vi.fn(() => "openai"),
listTtsPersonas: vi.fn(() => []),
listSpeechVoices: vi.fn(async () => []), listSpeechVoices: vi.fn(async () => []),
resolveTtsConfig: vi.fn(() => ({})), resolveTtsConfig: vi.fn(() => ({})),
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"), resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
setTtsEnabled: vi.fn(), setTtsEnabled: vi.fn(),
setTtsPersona: mocks.setTtsPersona as typeof import("../tts/tts.js").setTtsPersona,
setTtsProvider: mocks.setTtsProvider as typeof import("../tts/tts.js").setTtsProvider, setTtsProvider: mocks.setTtsProvider as typeof import("../tts/tts.js").setTtsProvider,
resolveExplicitTtsOverrides: resolveExplicitTtsOverrides:
mocks.resolveExplicitTtsOverrides as typeof import("../tts/tts.js").resolveExplicitTtsOverrides, mocks.resolveExplicitTtsOverrides as typeof import("../tts/tts.js").resolveExplicitTtsOverrides,

View File

@@ -56,11 +56,14 @@ import { theme } from "../terminal/theme.js";
import { canonicalizeSpeechProviderId, listSpeechProviders } from "../tts/provider-registry.js"; import { canonicalizeSpeechProviderId, listSpeechProviders } from "../tts/provider-registry.js";
import { import {
getTtsProvider, getTtsProvider,
getTtsPersona,
listTtsPersonas,
listSpeechVoices, listSpeechVoices,
resolveExplicitTtsOverrides, resolveExplicitTtsOverrides,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
setTtsEnabled, setTtsEnabled,
setTtsPersona,
setTtsProvider, setTtsProvider,
textToSpeech, textToSpeech,
} from "../tts/tts.js"; } from "../tts/tts.js";
@@ -256,6 +259,13 @@ const CAPABILITY_METADATA: CapabilityMetadata[] = [
flags: ["--local", "--gateway", "--json"], flags: ["--local", "--gateway", "--json"],
resultShape: "provider ids, configured state, models, voices", resultShape: "provider ids, configured state, models, voices",
}, },
{
id: "tts.personas",
description: "List TTS personas.",
transports: ["local", "gateway"],
flags: ["--local", "--gateway", "--json"],
resultShape: "persona ids, labels, providers, active persona",
},
{ {
id: "tts.status", id: "tts.status",
description: "Show gateway-managed TTS state.", description: "Show gateway-managed TTS state.",
@@ -284,6 +294,13 @@ const CAPABILITY_METADATA: CapabilityMetadata[] = [
flags: ["--provider", "--local", "--gateway", "--json"], flags: ["--provider", "--local", "--gateway", "--json"],
resultShape: "selected provider", resultShape: "selected provider",
}, },
{
id: "tts.set-persona",
description: "Set the active TTS persona.",
transports: ["local", "gateway"],
flags: ["--persona", "--off", "--local", "--gateway", "--json"],
resultShape: "selected persona",
},
{ {
id: "video.generate", id: "video.generate",
description: "Generate video files with configured video providers.", description: "Generate video files with configured video providers.",
@@ -1181,6 +1198,30 @@ async function runTtsProviders(transport: CapabilityTransport) {
}; };
} }
async function runTtsPersonas(transport: CapabilityTransport) {
if (transport === "gateway") {
return await callGateway({
method: "tts.personas",
timeoutMs: 30_000,
});
}
const cfg = loadConfig();
const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config);
const active = getTtsPersona(config, prefsPath);
return {
active: active?.id ?? null,
personas: listTtsPersonas(config).map((persona) => ({
id: persona.id,
label: persona.label,
description: persona.description,
provider: persona.provider,
fallbackPolicy: persona.fallbackPolicy,
providers: Object.keys(persona.providers ?? {}),
})),
};
}
async function runTtsVoices(providerRaw?: string) { async function runTtsVoices(providerRaw?: string) {
const cfg = loadConfig(); const cfg = loadConfig();
const config = resolveTtsConfig(cfg); const config = resolveTtsConfig(cfg);
@@ -1194,9 +1235,10 @@ async function runTtsVoices(providerRaw?: string) {
} }
async function runTtsStateMutation(params: { async function runTtsStateMutation(params: {
capability: "tts.enable" | "tts.disable" | "tts.set-provider"; capability: "tts.enable" | "tts.disable" | "tts.set-provider" | "tts.set-persona";
transport: CapabilityTransport; transport: CapabilityTransport;
provider?: string; provider?: string;
persona?: string | null;
}) { }) {
if (params.transport === "gateway") { if (params.transport === "gateway") {
const method = const method =
@@ -1204,10 +1246,17 @@ async function runTtsStateMutation(params: {
? "tts.enable" ? "tts.enable"
: params.capability === "tts.disable" : params.capability === "tts.disable"
? "tts.disable" ? "tts.disable"
: "tts.setProvider"; : params.capability === "tts.set-provider"
? "tts.setProvider"
: "tts.setPersona";
const payload = await callGateway({ const payload = await callGateway({
method, method,
params: params.provider ? { provider: params.provider } : undefined, params:
params.capability === "tts.set-provider"
? { provider: params.provider }
: params.capability === "tts.set-persona"
? { persona: params.persona ?? "off" }
: undefined,
timeoutMs: 30_000, timeoutMs: 30_000,
}); });
return payload; return payload;
@@ -1224,6 +1273,20 @@ async function runTtsStateMutation(params: {
setTtsEnabled(prefsPath, false); setTtsEnabled(prefsPath, false);
return { enabled: false }; return { enabled: false };
} }
if (params.capability === "tts.set-persona") {
if (!params.persona) {
setTtsPersona(prefsPath, null);
return { persona: null };
}
const persona = listTtsPersonas(config).find(
(entry) => entry.id === normalizeLowercaseStringOrEmpty(params.persona ?? ""),
);
if (!persona) {
throw new Error(`Unknown TTS persona: ${params.persona}`);
}
setTtsPersona(prefsPath, persona.id);
return { persona: persona.id };
}
if (!params.provider) { if (!params.provider) {
throw new Error("--provider is required"); throw new Error("--provider is required");
} }
@@ -1746,6 +1809,27 @@ export function registerCapabilityCli(program: Command) {
}); });
}); });
tts
.command("personas")
.description("List TTS personas")
.option("--local", "Force local execution", false)
.option("--gateway", "Force gateway execution", false)
.option("--json", "Output JSON", false)
.action(async (opts) => {
await runCommandWithRuntime(defaultRuntime, async () => {
const transport = resolveTransport({
local: Boolean(opts.local),
gateway: Boolean(opts.gateway),
supported: ["local", "gateway"],
defaultTransport: "local",
});
const result = await runTtsPersonas(transport);
emitJsonOrText(defaultRuntime, Boolean(opts.json), result, (value) =>
JSON.stringify(value, null, 2),
);
});
});
tts tts
.command("status") .command("status")
.description("Show TTS status") .description("Show TTS status")
@@ -1823,6 +1907,36 @@ export function registerCapabilityCli(program: Command) {
}); });
}); });
tts
.command("set-persona")
.description("Set the active TTS persona")
.option("--persona <id>", "TTS persona id")
.option("--off", "Disable the active TTS persona", false)
.option("--local", "Force local execution", false)
.option("--gateway", "Force gateway execution", false)
.option("--json", "Output JSON", false)
.action(async (opts) => {
await runCommandWithRuntime(defaultRuntime, async () => {
const transport = resolveTransport({
local: Boolean(opts.local),
gateway: Boolean(opts.gateway),
supported: ["local", "gateway"],
defaultTransport: "gateway",
});
if (!opts.off && !opts.persona) {
throw new Error("--persona is required unless --off is set");
}
const result = await runTtsStateMutation({
capability: "tts.set-persona",
persona: opts.off ? null : String(opts.persona),
transport,
});
emitJsonOrText(defaultRuntime, Boolean(opts.json), result, (value) =>
JSON.stringify(value, null, 2),
);
});
});
const video = capability.command("video").description("Video generation and description"); const video = capability.command("video").description("Video generation and description");
video video

View File

@@ -19116,6 +19116,222 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
type: "string", type: "string",
minLength: 1, minLength: 1,
}, },
persona: {
type: "string",
title: "TTS Persona",
description:
"Default TTS persona id. Local TTS persona preferences can override this per host.",
},
personas: {
type: "object",
propertyNames: {
type: "string",
},
additionalProperties: {
type: "object",
properties: {
label: {
type: "string",
},
description: {
type: "string",
},
provider: {
type: "string",
minLength: 1,
},
fallbackPolicy: {
anyOf: [
{
type: "string",
const: "preserve-persona",
},
{
type: "string",
const: "provider-defaults",
},
{
type: "string",
const: "fail",
},
],
},
prompt: {
type: "object",
properties: {
profile: {
type: "string",
},
scene: {
type: "string",
},
sampleContext: {
type: "string",
},
style: {
type: "string",
},
accent: {
type: "string",
},
pacing: {
type: "string",
},
constraints: {
type: "array",
items: {
type: "string",
},
},
},
additionalProperties: false,
title: "TTS Persona Prompt",
description:
"Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
},
rewrite: {
type: "object",
properties: {
enabled: {
type: "boolean",
},
model: {
type: "string",
},
preserveMeaning: {
type: "boolean",
},
compressForSpeech: {
type: "boolean",
},
inCharacter: {
type: "boolean",
},
maxChars: {
type: "integer",
minimum: 1,
maximum: 9007199254740991,
},
},
additionalProperties: false,
},
providers: {
type: "object",
propertyNames: {
type: "string",
},
additionalProperties: {
type: "object",
properties: {
apiKey: {
anyOf: [
{
type: "string",
},
{
oneOf: [
{
type: "object",
properties: {
source: {
type: "string",
const: "env",
},
provider: {
type: "string",
pattern: "^[a-z][a-z0-9_-]{0,63}$",
},
id: {
type: "string",
pattern: "^[A-Z][A-Z0-9_]{0,127}$",
},
},
required: ["source", "provider", "id"],
additionalProperties: false,
},
{
type: "object",
properties: {
source: {
type: "string",
const: "file",
},
provider: {
type: "string",
pattern: "^[a-z][a-z0-9_-]{0,63}$",
},
id: {
type: "string",
},
},
required: ["source", "provider", "id"],
additionalProperties: false,
},
{
type: "object",
properties: {
source: {
type: "string",
const: "exec",
},
provider: {
type: "string",
pattern: "^[a-z][a-z0-9_-]{0,63}$",
},
id: {
type: "string",
},
},
required: ["source", "provider", "id"],
additionalProperties: false,
},
],
},
],
},
},
additionalProperties: {
anyOf: [
{
type: "string",
},
{
type: "number",
},
{
type: "boolean",
},
{
type: "null",
},
{
type: "array",
items: {},
},
{
type: "object",
propertyNames: {
type: "string",
},
additionalProperties: {},
},
],
},
},
title: "TTS Persona Provider Bindings",
description:
"Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
},
},
additionalProperties: false,
title: "TTS Persona",
description:
"One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
},
title: "TTS Personas",
description:
"Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
},
summaryModel: { summaryModel: {
type: "string", type: "string",
}, },
@@ -27520,6 +27736,31 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
help: "Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.", help: "Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
tags: ["media"], tags: ["media"],
}, },
"messages.tts.persona": {
label: "TTS Persona",
help: "Default TTS persona id. Local TTS persona preferences can override this per host.",
tags: ["media"],
},
"messages.tts.personas": {
label: "TTS Personas",
help: "Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
tags: ["media"],
},
"messages.tts.personas.*": {
label: "TTS Persona",
help: "One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
tags: ["media"],
},
"messages.tts.personas.*.prompt": {
label: "TTS Persona Prompt",
help: "Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
tags: ["media"],
},
"messages.tts.personas.*.providers": {
label: "TTS Persona Provider Bindings",
help: "Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
tags: ["media"],
},
"messages.tts.providers": { "messages.tts.providers": {
label: "TTS Provider Settings", label: "TTS Provider Settings",
help: "Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.", help: "Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
@@ -28081,6 +28322,10 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
sensitive: true, sensitive: true,
tags: ["security", "media", "tools"], tags: ["security", "media", "tools"],
}, },
"messages.tts.personas.*.providers.*.apiKey": {
sensitive: true,
tags: ["security", "auth", "media"],
},
"mcp.servers.*.headers.*": { "mcp.servers.*.headers.*": {
sensitive: true, sensitive: true,
tags: ["security"], tags: ["security"],

View File

@@ -1589,6 +1589,16 @@ export const FIELD_HELP: Record<string, string> = {
"Removes the acknowledgment reaction after final reply delivery when enabled. Keep enabled for cleaner UX in channels where persistent ack reactions create clutter.", "Removes the acknowledgment reaction after final reply delivery when enabled. Keep enabled for cleaner UX in channels where persistent ack reactions create clutter.",
"messages.tts": "messages.tts":
"Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.", "Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
"messages.tts.persona":
"Default TTS persona id. Local TTS persona preferences can override this per host.",
"messages.tts.personas":
"Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
"messages.tts.personas.*":
"One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
"messages.tts.personas.*.prompt":
"Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
"messages.tts.personas.*.providers":
"Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
"messages.tts.providers": "messages.tts.providers":
"Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.", "Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
"messages.tts.providers.*": "messages.tts.providers.*":

View File

@@ -820,6 +820,11 @@ export const FIELD_LABELS: Record<string, string> = {
"messages.inbound.debounceMs": "Inbound Message Debounce (ms)", "messages.inbound.debounceMs": "Inbound Message Debounce (ms)",
"messages.inbound.byChannel": "Inbound Debounce by Channel (ms)", "messages.inbound.byChannel": "Inbound Debounce by Channel (ms)",
"messages.tts": "Message Text-to-Speech", "messages.tts": "Message Text-to-Speech",
"messages.tts.persona": "TTS Persona",
"messages.tts.personas": "TTS Personas",
"messages.tts.personas.*": "TTS Persona",
"messages.tts.personas.*.prompt": "TTS Persona Prompt",
"messages.tts.personas.*.providers": "TTS Persona Provider Bindings",
"messages.tts.providers": "TTS Provider Settings", "messages.tts.providers": "TTS Provider Settings",
"messages.tts.providers.*": "TTS Provider Config", "messages.tts.providers.*": "TTS Provider Config",
"messages.tts.providers.*.apiKey": "TTS Provider API Key", // pragma: allowlist secret "messages.tts.providers.*.apiKey": "TTS Provider API Key", // pragma: allowlist secret

View File

@@ -25,6 +25,43 @@ export type TtsModelOverrideConfig = {
export type TtsProviderConfigMap = Record<string, Record<string, unknown>>; export type TtsProviderConfigMap = Record<string, Record<string, unknown>>;
export type TtsPersonaFallbackPolicy = "preserve-persona" | "provider-defaults" | "fail";
export type TtsPersonaPromptConfig = {
profile?: string;
scene?: string;
sampleContext?: string;
style?: string;
accent?: string;
pacing?: string;
constraints?: string[];
};
export type TtsPersonaRewriteConfig = {
enabled?: boolean;
model?: string;
preserveMeaning?: boolean;
compressForSpeech?: boolean;
inCharacter?: boolean;
maxChars?: number;
};
export type TtsPersonaConfig = {
label?: string;
description?: string;
/** Preferred provider for this persona. Explicit provider prefs still win. */
provider?: TtsProvider;
fallbackPolicy?: TtsPersonaFallbackPolicy;
prompt?: TtsPersonaPromptConfig;
rewrite?: TtsPersonaRewriteConfig;
/** Provider-specific persona bindings keyed by speech provider id. */
providers?: TtsProviderConfigMap;
};
export type ResolvedTtsPersona = TtsPersonaConfig & {
id: string;
};
export type TtsConfig = { export type TtsConfig = {
/** Auto-TTS mode (preferred). */ /** Auto-TTS mode (preferred). */
auto?: TtsAutoMode; auto?: TtsAutoMode;
@@ -34,6 +71,10 @@ export type TtsConfig = {
mode?: TtsMode; mode?: TtsMode;
/** Primary TTS provider (fallbacks are automatic). */ /** Primary TTS provider (fallbacks are automatic). */
provider?: TtsProvider; provider?: TtsProvider;
/** Active TTS persona id. */
persona?: string;
/** Named TTS personas. */
personas?: Record<string, TtsPersonaConfig>;
/** Optional model override for TTS auto-summary (provider/model or alias). */ /** Optional model override for TTS auto-summary (provider/model or alias). */
summaryModel?: string; summaryModel?: string;
/** Allow the model to override TTS parameters. */ /** Allow the model to override TTS parameters. */

View File

@@ -497,12 +497,48 @@ const TtsProviderConfigSchema = z
z.record(z.string(), z.unknown()), z.record(z.string(), z.unknown()),
]), ]),
); );
const TtsPersonaPromptSchema = z
.object({
profile: z.string().optional(),
scene: z.string().optional(),
sampleContext: z.string().optional(),
style: z.string().optional(),
accent: z.string().optional(),
pacing: z.string().optional(),
constraints: z.array(z.string()).optional(),
})
.strict();
const TtsPersonaRewriteSchema = z
.object({
enabled: z.boolean().optional(),
model: z.string().optional(),
preserveMeaning: z.boolean().optional(),
compressForSpeech: z.boolean().optional(),
inCharacter: z.boolean().optional(),
maxChars: z.number().int().min(1).optional(),
})
.strict();
const TtsPersonaSchema = z
.object({
label: z.string().optional(),
description: z.string().optional(),
provider: TtsProviderSchema.optional(),
fallbackPolicy: z
.union([z.literal("preserve-persona"), z.literal("provider-defaults"), z.literal("fail")])
.optional(),
prompt: TtsPersonaPromptSchema.optional(),
rewrite: TtsPersonaRewriteSchema.optional(),
providers: z.record(z.string(), TtsProviderConfigSchema).optional(),
})
.strict();
export const TtsConfigSchema = z export const TtsConfigSchema = z
.object({ .object({
auto: TtsAutoSchema.optional(), auto: TtsAutoSchema.optional(),
enabled: z.boolean().optional(), enabled: z.boolean().optional(),
mode: TtsModeSchema.optional(), mode: TtsModeSchema.optional(),
provider: TtsProviderSchema.optional(), provider: TtsProviderSchema.optional(),
persona: z.string().optional(),
personas: z.record(z.string(), TtsPersonaSchema).optional(),
summaryModel: z.string().optional(), summaryModel: z.string().optional(),
modelOverrides: z modelOverrides: z
.object({ .object({

View File

@@ -39,4 +39,47 @@ describe("TtsConfigSchema openai speed and instructions", () => {
}), }),
).not.toThrow(); ).not.toThrow();
}); });
it("accepts provider-specific persona bindings and structured prompt fields", () => {
expect(() =>
TtsConfigSchema.parse({
persona: "alfred",
personas: {
alfred: {
label: "Alfred",
description: "Dry, warm British butler narrator.",
provider: "google",
fallbackPolicy: "preserve-persona",
prompt: {
profile: "A brilliant British butler.",
scene: "A quiet late-night study.",
sampleContext: "The speaker is answering a trusted operator.",
style: "Refined and lightly amused.",
accent: "British English.",
pacing: "Measured.",
constraints: ["Do not read configuration values aloud."],
},
rewrite: {
enabled: false,
preserveMeaning: true,
compressForSpeech: true,
maxChars: 1500,
},
providers: {
google: {
model: "gemini-3.1-flash-tts-preview",
voiceName: "Algieba",
promptTemplate: "audio-profile-v1",
},
openai: {
model: "gpt-4o-mini-tts",
voice: "cedar",
instructions: "Speak with dry warmth.",
},
},
},
},
}),
).not.toThrow();
});
}); });

View File

@@ -78,6 +78,7 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
"usage.cost", "usage.cost",
"tts.status", "tts.status",
"tts.providers", "tts.providers",
"tts.personas",
"commands.list", "commands.list",
"models.list", "models.list",
"models.authStatus", "models.authStatus",
@@ -131,6 +132,7 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
"tts.disable", "tts.disable",
"tts.convert", "tts.convert",
"tts.setProvider", "tts.setProvider",
"tts.setPersona",
"voicewake.set", "voicewake.set",
"node.invoke", "node.invoke",
"chat.send", "chat.send",

View File

@@ -20,10 +20,12 @@ const BASE_METHODS = [
"usage.cost", "usage.cost",
"tts.status", "tts.status",
"tts.providers", "tts.providers",
"tts.personas",
"tts.enable", "tts.enable",
"tts.disable", "tts.disable",
"tts.convert", "tts.convert",
"tts.setProvider", "tts.setProvider",
"tts.setPersona",
"config.get", "config.get",
"config.set", "config.set",
"config.apply", "config.apply",

View File

@@ -25,9 +25,11 @@ vi.mock("../../tts/provider-registry.js", () => ({
vi.mock("../../tts/tts.js", () => ({ vi.mock("../../tts/tts.js", () => ({
getResolvedSpeechProviderConfig: vi.fn(), getResolvedSpeechProviderConfig: vi.fn(),
getTtsPersona: vi.fn(() => undefined),
getTtsProvider: vi.fn(() => "openai"), getTtsProvider: vi.fn(() => "openai"),
isTtsEnabled: vi.fn(() => true), isTtsEnabled: vi.fn(() => true),
isTtsProviderConfigured: vi.fn(() => true), isTtsProviderConfigured: vi.fn(() => true),
listTtsPersonas: vi.fn(() => []),
resolveExplicitTtsOverrides: resolveExplicitTtsOverrides:
mocks.resolveExplicitTtsOverrides as typeof import("../../tts/tts.js").resolveExplicitTtsOverrides, mocks.resolveExplicitTtsOverrides as typeof import("../../tts/tts.js").resolveExplicitTtsOverrides,
resolveTtsAutoMode: vi.fn(() => false), resolveTtsAutoMode: vi.fn(() => false),
@@ -35,6 +37,7 @@ vi.mock("../../tts/tts.js", () => ({
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"), resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
resolveTtsProviderOrder: vi.fn(() => ["openai"]), resolveTtsProviderOrder: vi.fn(() => ["openai"]),
setTtsEnabled: vi.fn(), setTtsEnabled: vi.fn(),
setTtsPersona: vi.fn(),
setTtsProvider: vi.fn(), setTtsProvider: vi.fn(),
textToSpeech: mocks.textToSpeech as typeof import("../../tts/tts.js").textToSpeech, textToSpeech: mocks.textToSpeech as typeof import("../../tts/tts.js").textToSpeech,
})); }));

View File

@@ -7,15 +7,18 @@ import {
} from "../../tts/provider-registry.js"; } from "../../tts/provider-registry.js";
import { import {
getResolvedSpeechProviderConfig, getResolvedSpeechProviderConfig,
getTtsPersona,
getTtsProvider, getTtsProvider,
isTtsEnabled, isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
listTtsPersonas,
resolveExplicitTtsOverrides, resolveExplicitTtsOverrides,
resolveTtsAutoMode, resolveTtsAutoMode,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
resolveTtsProviderOrder, resolveTtsProviderOrder,
setTtsEnabled, setTtsEnabled,
setTtsPersona,
setTtsProvider, setTtsProvider,
textToSpeech, textToSpeech,
} from "../../tts/tts.js"; } from "../../tts/tts.js";
@@ -30,6 +33,7 @@ export const ttsHandlers: GatewayRequestHandlers = {
const config = resolveTtsConfig(cfg); const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config); const prefsPath = resolveTtsPrefsPath(config);
const provider = getTtsProvider(config, prefsPath); const provider = getTtsProvider(config, prefsPath);
const persona = getTtsPersona(config, prefsPath);
const autoMode = resolveTtsAutoMode({ config, prefsPath }); const autoMode = resolveTtsAutoMode({ config, prefsPath });
const fallbackProviders = resolveTtsProviderOrder(provider, cfg) const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
.slice(1) .slice(1)
@@ -47,6 +51,13 @@ export const ttsHandlers: GatewayRequestHandlers = {
enabled: isTtsEnabled(config, prefsPath), enabled: isTtsEnabled(config, prefsPath),
auto: autoMode, auto: autoMode,
provider, provider,
persona: persona?.id ?? null,
personas: listTtsPersonas(config).map((entry) => ({
id: entry.id,
label: entry.label,
description: entry.description,
provider: entry.provider,
})),
fallbackProvider: fallbackProviders[0] ?? null, fallbackProvider: fallbackProviders[0] ?? null,
fallbackProviders, fallbackProviders,
prefsPath, prefsPath,
@@ -157,6 +168,58 @@ export const ttsHandlers: GatewayRequestHandlers = {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err))); respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
} }
}, },
"tts.personas": async ({ respond }) => {
try {
const cfg = loadConfig();
const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config);
const active = getTtsPersona(config, prefsPath);
respond(true, {
active: active?.id ?? null,
personas: listTtsPersonas(config).map((persona) => ({
id: persona.id,
label: persona.label,
description: persona.description,
provider: persona.provider,
fallbackPolicy: persona.fallbackPolicy,
providers: Object.keys(persona.providers ?? {}),
})),
});
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
}
},
"tts.setPersona": async ({ params, respond }) => {
const cfg = loadConfig();
const rawPersona = normalizeOptionalString(params.persona);
try {
const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config);
if (!rawPersona || ["off", "none", "default"].includes(rawPersona.toLowerCase())) {
setTtsPersona(prefsPath, null);
respond(true, { persona: null });
return;
}
const persona = listTtsPersonas(config).find(
(entry) => entry.id === rawPersona.toLowerCase(),
);
if (!persona) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
"Invalid persona. Use a configured TTS persona id.",
),
);
return;
}
setTtsPersona(prefsPath, persona.id);
respond(true, { persona: persona.id });
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
}
},
"tts.providers": async ({ respond }) => { "tts.providers": async ({ respond }) => {
try { try {
const cfg = loadConfig(); const cfg = loadConfig();

View File

@@ -133,10 +133,15 @@ export type {
TelegramInlineButtonsScope, TelegramInlineButtonsScope,
TelegramNetworkConfig, TelegramNetworkConfig,
TelegramTopicConfig, TelegramTopicConfig,
ResolvedTtsPersona,
TtsAutoMode, TtsAutoMode,
TtsConfig, TtsConfig,
TtsMode, TtsMode,
TtsModelOverrideConfig, TtsModelOverrideConfig,
TtsPersonaConfig,
TtsPersonaFallbackPolicy,
TtsPersonaPromptConfig,
TtsPersonaRewriteConfig,
TtsProvider, TtsProvider,
} from "../config/types.js"; } from "../config/types.js";
export { export {

View File

@@ -9,11 +9,14 @@ export type {
SpeechModelOverridePolicy, SpeechModelOverridePolicy,
SpeechProviderConfig, SpeechProviderConfig,
SpeechProviderConfiguredContext, SpeechProviderConfiguredContext,
SpeechProviderPreparedSynthesis,
SpeechProviderPrepareSynthesisContext,
SpeechProviderResolveConfigContext, SpeechProviderResolveConfigContext,
SpeechProviderResolveTalkConfigContext, SpeechProviderResolveTalkConfigContext,
SpeechProviderResolveTalkOverridesContext, SpeechProviderResolveTalkOverridesContext,
SpeechProviderOverrides, SpeechProviderOverrides,
SpeechSynthesisRequest, SpeechSynthesisRequest,
SpeechSynthesisTarget,
SpeechTelephonySynthesisRequest, SpeechTelephonySynthesisRequest,
SpeechVoiceOption, SpeechVoiceOption,
TtsDirectiveOverrides, TtsDirectiveOverrides,
@@ -35,6 +38,7 @@ export {
listSpeechProviders, listSpeechProviders,
normalizeSpeechProviderId, normalizeSpeechProviderId,
} from "../tts/provider-registry.js"; } from "../tts/provider-registry.js";
export { resolveEffectiveTtsConfig } from "../tts/tts-config.js";
export { normalizeTtsAutoMode, TTS_AUTO_MODES } from "../tts/tts-auto-mode.js"; export { normalizeTtsAutoMode, TTS_AUTO_MODES } from "../tts/tts-auto-mode.js";
export { export {
asBoolean, asBoolean,

View File

@@ -12,11 +12,14 @@ export type {
SpeechModelOverridePolicy, SpeechModelOverridePolicy,
SpeechProviderConfig, SpeechProviderConfig,
SpeechProviderConfiguredContext, SpeechProviderConfiguredContext,
SpeechProviderPreparedSynthesis,
SpeechProviderPrepareSynthesisContext,
SpeechProviderResolveConfigContext, SpeechProviderResolveConfigContext,
SpeechProviderResolveTalkConfigContext, SpeechProviderResolveTalkConfigContext,
SpeechProviderResolveTalkOverridesContext, SpeechProviderResolveTalkOverridesContext,
SpeechProviderOverrides, SpeechProviderOverrides,
SpeechSynthesisRequest, SpeechSynthesisRequest,
SpeechSynthesisTarget,
SpeechTelephonySynthesisRequest, SpeechTelephonySynthesisRequest,
SpeechVoiceOption, SpeechVoiceOption,
TtsDirectiveOverrides, TtsDirectiveOverrides,

View File

@@ -40,6 +40,10 @@ export const getTtsMaxLength: FacadeModule["getTtsMaxLength"] = createLazyFacade
loadFacadeModule, loadFacadeModule,
"getTtsMaxLength", "getTtsMaxLength",
); );
export const getTtsPersona: FacadeModule["getTtsPersona"] = createLazyFacadeRuntimeValue(
loadFacadeModule,
"getTtsPersona",
);
export const getTtsProvider: FacadeModule["getTtsProvider"] = createLazyFacadeRuntimeValue( export const getTtsProvider: FacadeModule["getTtsProvider"] = createLazyFacadeRuntimeValue(
loadFacadeModule, loadFacadeModule,
"getTtsProvider", "getTtsProvider",
@@ -56,6 +60,10 @@ export const listSpeechVoices: FacadeModule["listSpeechVoices"] = createLazyFaca
loadFacadeModule, loadFacadeModule,
"listSpeechVoices", "listSpeechVoices",
); );
export const listTtsPersonas: FacadeModule["listTtsPersonas"] = createLazyFacadeRuntimeValue(
loadFacadeModule,
"listTtsPersonas",
);
export const maybeApplyTtsToPayload: FacadeModule["maybeApplyTtsToPayload"] = export const maybeApplyTtsToPayload: FacadeModule["maybeApplyTtsToPayload"] =
createLazyFacadeRuntimeValue(loadFacadeModule, "maybeApplyTtsToPayload"); createLazyFacadeRuntimeValue(loadFacadeModule, "maybeApplyTtsToPayload");
export const resolveExplicitTtsOverrides: FacadeModule["resolveExplicitTtsOverrides"] = export const resolveExplicitTtsOverrides: FacadeModule["resolveExplicitTtsOverrides"] =
@@ -90,6 +98,10 @@ export const setTtsMaxLength: FacadeModule["setTtsMaxLength"] = createLazyFacade
loadFacadeModule, loadFacadeModule,
"setTtsMaxLength", "setTtsMaxLength",
); );
export const setTtsPersona: FacadeModule["setTtsPersona"] = createLazyFacadeRuntimeValue(
loadFacadeModule,
"setTtsPersona",
);
export const setTtsProvider: FacadeModule["setTtsProvider"] = createLazyFacadeRuntimeValue( export const setTtsProvider: FacadeModule["setTtsProvider"] = createLazyFacadeRuntimeValue(
loadFacadeModule, loadFacadeModule,
"setTtsProvider", "setTtsProvider",

View File

@@ -1,5 +1,5 @@
import type { OpenClawConfig } from "../config/types.openclaw.js"; import type { OpenClawConfig } from "../config/types.openclaw.js";
import type { TtsAutoMode, TtsProvider } from "../config/types.tts.js"; import type { ResolvedTtsPersona, TtsAutoMode, TtsProvider } from "../config/types.tts.js";
import type { import type {
SpeechProviderConfig, SpeechProviderConfig,
SpeechVoiceOption, SpeechVoiceOption,
@@ -24,6 +24,8 @@ export type TtsProviderAttempt = {
provider: string; provider: string;
outcome: "success" | "skipped" | "failed"; outcome: "success" | "skipped" | "failed";
reasonCode: TtsAttemptReasonCode; reasonCode: TtsAttemptReasonCode;
persona?: string;
personaBinding?: "applied" | "missing" | "none";
latencyMs?: number; latencyMs?: number;
error?: string; error?: string;
}; };
@@ -34,6 +36,7 @@ export type TtsStatusEntry = {
textLength: number; textLength: number;
summarized: boolean; summarized: boolean;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -126,6 +129,7 @@ export type TtsResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -141,6 +145,7 @@ export type TtsSynthesisResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -156,6 +161,7 @@ export type TtsTelephonyResult = {
error?: string; error?: string;
latencyMs?: number; latencyMs?: number;
provider?: string; provider?: string;
persona?: string;
fallbackFrom?: string; fallbackFrom?: string;
attemptedProviders?: string[]; attemptedProviders?: string[];
attempts?: TtsProviderAttempt[]; attempts?: TtsProviderAttempt[];
@@ -179,6 +185,7 @@ export type TtsRuntimeFacade = {
cfg?: OpenClawConfig, cfg?: OpenClawConfig,
) => SpeechProviderConfig; ) => SpeechProviderConfig;
getTtsMaxLength: (prefsPath: string) => number; getTtsMaxLength: (prefsPath: string) => number;
getTtsPersona: (config: ResolvedTtsConfig, prefsPath: string) => ResolvedTtsPersona | undefined;
getTtsProvider: (config: ResolvedTtsConfig, prefsPath: string) => TtsProvider; getTtsProvider: (config: ResolvedTtsConfig, prefsPath: string) => TtsProvider;
isSummarizationEnabled: (prefsPath: string) => boolean; isSummarizationEnabled: (prefsPath: string) => boolean;
isTtsEnabled: (config: ResolvedTtsConfig, prefsPath: string, sessionAuto?: string) => boolean; isTtsEnabled: (config: ResolvedTtsConfig, prefsPath: string, sessionAuto?: string) => boolean;
@@ -188,6 +195,7 @@ export type TtsRuntimeFacade = {
cfg?: OpenClawConfig, cfg?: OpenClawConfig,
) => boolean; ) => boolean;
listSpeechVoices: ListSpeechVoices; listSpeechVoices: ListSpeechVoices;
listTtsPersonas: (config: ResolvedTtsConfig) => ResolvedTtsPersona[];
maybeApplyTtsToPayload: (params: MaybeApplyTtsToPayloadParams) => Promise<ReplyPayload>; maybeApplyTtsToPayload: (params: MaybeApplyTtsToPayloadParams) => Promise<ReplyPayload>;
resolveExplicitTtsOverrides: (params: ResolveExplicitTtsOverridesParams) => TtsDirectiveOverrides; resolveExplicitTtsOverrides: (params: ResolveExplicitTtsOverridesParams) => TtsDirectiveOverrides;
resolveTtsAutoMode: (params: ResolveTtsAutoModeParams) => TtsAutoMode; resolveTtsAutoMode: (params: ResolveTtsAutoModeParams) => TtsAutoMode;
@@ -199,6 +207,7 @@ export type TtsRuntimeFacade = {
setTtsAutoMode: (prefsPath: string, mode: TtsAutoMode) => void; setTtsAutoMode: (prefsPath: string, mode: TtsAutoMode) => void;
setTtsEnabled: (prefsPath: string, enabled: boolean) => void; setTtsEnabled: (prefsPath: string, enabled: boolean) => void;
setTtsMaxLength: (prefsPath: string, maxLength: number) => void; setTtsMaxLength: (prefsPath: string, maxLength: number) => void;
setTtsPersona: (prefsPath: string, persona: string | null | undefined) => void;
setTtsProvider: (prefsPath: string, provider: TtsProvider) => void; setTtsProvider: (prefsPath: string, provider: TtsProvider) => void;
synthesizeSpeech: (params: TtsRequestParams) => Promise<TtsSynthesisResult>; synthesizeSpeech: (params: TtsRequestParams) => Promise<TtsSynthesisResult>;
textToSpeech: TextToSpeech; textToSpeech: TextToSpeech;

View File

@@ -65,6 +65,8 @@ import type {
SpeechProviderResolveTalkConfigContext, SpeechProviderResolveTalkConfigContext,
SpeechProviderResolveTalkOverridesContext, SpeechProviderResolveTalkOverridesContext,
SpeechListVoicesRequest, SpeechListVoicesRequest,
SpeechProviderPrepareSynthesisContext,
SpeechProviderPreparedSynthesis,
SpeechProviderId, SpeechProviderId,
SpeechSynthesisRequest, SpeechSynthesisRequest,
SpeechSynthesisResult, SpeechSynthesisResult,
@@ -1724,6 +1726,12 @@ export type SpeechProviderPlugin = {
resolveTalkOverrides?: ( resolveTalkOverrides?: (
ctx: SpeechProviderResolveTalkOverridesContext, ctx: SpeechProviderResolveTalkOverridesContext,
) => SpeechProviderConfig | undefined; ) => SpeechProviderConfig | undefined;
prepareSynthesis?: (
ctx: SpeechProviderPrepareSynthesisContext,
) =>
| SpeechProviderPreparedSynthesis
| undefined
| Promise<SpeechProviderPreparedSynthesis | undefined>;
isConfigured: (ctx: SpeechProviderConfiguredContext) => boolean; isConfigured: (ctx: SpeechProviderConfiguredContext) => boolean;
synthesize: (req: SpeechSynthesisRequest) => Promise<SpeechSynthesisResult>; synthesize: (req: SpeechSynthesisRequest) => Promise<SpeechSynthesisResult>;
synthesizeTelephony?: ( synthesizeTelephony?: (

View File

@@ -465,6 +465,9 @@ const formatVoiceModeLine = (
return null; return null;
} }
const parts = [`🔊 Voice: ${snapshot.autoMode}`, `provider=${snapshot.provider}`]; const parts = [`🔊 Voice: ${snapshot.autoMode}`, `provider=${snapshot.provider}`];
if (snapshot.persona) {
parts.push(`persona=${snapshot.persona}`);
}
if (snapshot.displayName) { if (snapshot.displayName) {
parts.push(`name=${snapshot.displayName}`); parts.push(`name=${snapshot.displayName}`);
} }

View File

@@ -1,9 +1,10 @@
import type { TalkProviderConfig } from "../config/types.gateway.js"; import type { TalkProviderConfig } from "../config/types.gateway.js";
import type { OpenClawConfig } from "../config/types.js"; import type { OpenClawConfig } from "../config/types.js";
import type { ResolvedTtsPersona } from "../config/types.tts.js";
export type SpeechProviderId = string; export type SpeechProviderId = string;
export type SpeechSynthesisTarget = "audio-file" | "voice-note"; export type SpeechSynthesisTarget = "audio-file" | "voice-note" | "telephony";
export type SpeechProviderConfig = Record<string, unknown>; export type SpeechProviderConfig = Record<string, unknown>;
@@ -69,6 +70,23 @@ export type SpeechTelephonySynthesisResult = {
sampleRate: number; sampleRate: number;
}; };
export type SpeechProviderPrepareSynthesisContext = {
text: string;
cfg: OpenClawConfig;
providerConfig: SpeechProviderConfig;
providerOverrides?: SpeechProviderOverrides;
persona?: ResolvedTtsPersona;
personaProviderConfig?: SpeechProviderConfig;
target: SpeechSynthesisTarget;
timeoutMs: number;
};
export type SpeechProviderPreparedSynthesis = {
text?: string;
providerConfig?: SpeechProviderConfig;
providerOverrides?: SpeechProviderOverrides;
};
export type SpeechVoiceOption = { export type SpeechVoiceOption = {
id: string; id: string;
name?: string; name?: string;

View File

@@ -138,6 +138,44 @@ describe("resolveStatusTtsSnapshot", () => {
}); });
}); });
it("reports per-agent persona provider over global persona", async () => {
await withStatusTempHome(async () => {
expect(
resolveStatusTtsSnapshot({
cfg: {
messages: {
tts: {
auto: "always",
persona: "alfred",
personas: {
alfred: { provider: "google" },
jarvis: { provider: "edge" },
},
},
},
agents: {
list: [
{
id: "reader",
tts: {
persona: "jarvis",
},
},
],
},
} as OpenClawConfig,
agentId: "reader",
}),
).toEqual({
autoMode: "always",
provider: "microsoft",
persona: "jarvis",
maxLength: 1500,
summarize: true,
});
});
});
it("reports configured OpenAI TTS model, voice, and sanitized custom endpoint", async () => { it("reports configured OpenAI TTS model, voice, and sanitized custom endpoint", async () => {
await withStatusTempHome(async () => { await withStatusTempHome(async () => {
expect( expect(

View File

@@ -20,6 +20,7 @@ type TtsUserPrefs = {
auto?: TtsAutoMode; auto?: TtsAutoMode;
enabled?: boolean; enabled?: boolean;
provider?: TtsProvider; provider?: TtsProvider;
persona?: string | null;
maxLength?: number; maxLength?: number;
summarize?: boolean; summarize?: boolean;
}; };
@@ -31,6 +32,7 @@ type TtsStatusSnapshot = {
displayName?: string; displayName?: string;
model?: string; model?: string;
voice?: string; voice?: string;
persona?: string;
baseUrl?: string; baseUrl?: string;
customBaseUrl?: boolean; customBaseUrl?: boolean;
maxLength: number; maxLength: number;
@@ -51,6 +53,27 @@ function normalizeConfiguredSpeechProviderId(
return normalized === "edge" ? "microsoft" : normalized; return normalized === "edge" ? "microsoft" : normalized;
} }
function normalizeTtsPersonaId(personaId: string | null | undefined): string | undefined {
return normalizeOptionalLowercaseString(personaId ?? undefined);
}
function resolvePersonaPreferredProvider(
raw: TtsConfig,
personaId: string | undefined,
): TtsProvider | undefined {
if (!personaId || !raw.personas) {
return undefined;
}
for (const [id, persona] of Object.entries(raw.personas)) {
if (normalizeTtsPersonaId(id) !== personaId) {
continue;
}
const provider = normalizeConfiguredSpeechProviderId(persona.provider) ?? persona.provider;
return normalizeOptionalString(provider);
}
return undefined;
}
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string { function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
const configuredPath = normalizeOptionalString(prefsPath); const configuredPath = normalizeOptionalString(prefsPath);
if (configuredPath) { if (configuredPath) {
@@ -212,8 +235,13 @@ export function resolveStatusTtsSnapshot(params: {
return null; return null;
} }
const persona =
prefs.tts && Object.prototype.hasOwnProperty.call(prefs.tts, "persona")
? normalizeTtsPersonaId(prefs.tts.persona)
: normalizeTtsPersonaId(raw.persona);
const provider = const provider =
normalizeConfiguredSpeechProviderId(prefs.tts?.provider) ?? normalizeConfiguredSpeechProviderId(prefs.tts?.provider) ??
resolvePersonaPreferredProvider(raw, persona) ??
normalizeConfiguredSpeechProviderId(raw.provider) ?? normalizeConfiguredSpeechProviderId(raw.provider) ??
"auto"; "auto";
@@ -221,6 +249,7 @@ export function resolveStatusTtsSnapshot(params: {
autoMode, autoMode,
provider, provider,
...resolveStatusProviderDetails(raw, provider), ...resolveStatusProviderDetails(raw, provider),
...(persona ? { persona } : {}),
maxLength: prefs.tts?.maxLength ?? DEFAULT_TTS_MAX_LENGTH, maxLength: prefs.tts?.maxLength ?? DEFAULT_TTS_MAX_LENGTH,
summarize: prefs.tts?.summarize ?? DEFAULT_TTS_SUMMARIZE, summarize: prefs.tts?.summarize ?? DEFAULT_TTS_SUMMARIZE,
}; };

View File

@@ -1,5 +1,11 @@
import type { OpenClawConfig } from "../config/types.openclaw.js"; import type { OpenClawConfig } from "../config/types.openclaw.js";
import type { TtsAutoMode, TtsConfig, TtsMode, TtsProvider } from "../config/types.tts.js"; import type {
ResolvedTtsPersona,
TtsAutoMode,
TtsConfig,
TtsMode,
TtsProvider,
} from "../config/types.tts.js";
import type { SpeechModelOverridePolicy, SpeechProviderConfig } from "./provider-types.js"; import type { SpeechModelOverridePolicy, SpeechProviderConfig } from "./provider-types.js";
export type ResolvedTtsModelOverrides = SpeechModelOverridePolicy; export type ResolvedTtsModelOverrides = SpeechModelOverridePolicy;
@@ -9,6 +15,8 @@ export type ResolvedTtsConfig = {
mode: TtsMode; mode: TtsMode;
provider: TtsProvider; provider: TtsProvider;
providerSource: "config" | "default"; providerSource: "config" | "default";
persona?: string;
personas: Record<string, ResolvedTtsPersona>;
summaryModel?: string; summaryModel?: string;
modelOverrides: ResolvedTtsModelOverrides; modelOverrides: ResolvedTtsModelOverrides;
providerConfigs: Record<string, SpeechProviderConfig>; providerConfigs: Record<string, SpeechProviderConfig>;

View File

@@ -4,11 +4,13 @@ export {
getLastTtsAttempt, getLastTtsAttempt,
getResolvedSpeechProviderConfig, getResolvedSpeechProviderConfig,
getTtsMaxLength, getTtsMaxLength,
getTtsPersona,
getTtsProvider, getTtsProvider,
isSummarizationEnabled, isSummarizationEnabled,
isTtsEnabled, isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
listSpeechVoices, listSpeechVoices,
listTtsPersonas,
maybeApplyTtsToPayload, maybeApplyTtsToPayload,
resolveExplicitTtsOverrides, resolveExplicitTtsOverrides,
resolveTtsAutoMode, resolveTtsAutoMode,
@@ -20,6 +22,7 @@ export {
setTtsAutoMode, setTtsAutoMode,
setTtsEnabled, setTtsEnabled,
setTtsMaxLength, setTtsMaxLength,
setTtsPersona,
setTtsProvider, setTtsProvider,
synthesizeSpeech, synthesizeSpeech,
textToSpeech, textToSpeech,

View File

@@ -15,6 +15,7 @@ const providerHttpMocks = vi.hoisted(() => ({
fetchWithTimeoutMock: vi.fn(), fetchWithTimeoutMock: vi.fn(),
pollProviderOperationJsonMock: vi.fn(), pollProviderOperationJsonMock: vi.fn(),
assertOkOrThrowHttpErrorMock: vi.fn(async (_response: Response, _label: string) => {}), assertOkOrThrowHttpErrorMock: vi.fn(async (_response: Response, _label: string) => {}),
assertOkOrThrowProviderErrorMock: vi.fn(async (_response: Response, _label: string) => {}),
resolveProviderHttpRequestConfigMock: vi.fn((params: ResolveProviderHttpRequestConfigParams) => ({ resolveProviderHttpRequestConfigMock: vi.fn((params: ResolveProviderHttpRequestConfigParams) => ({
baseUrl: params.baseUrl ?? params.defaultBaseUrl, baseUrl: params.baseUrl ?? params.defaultBaseUrl,
allowPrivateNetwork: false, allowPrivateNetwork: false,
@@ -55,6 +56,7 @@ vi.mock("openclaw/plugin-sdk/provider-auth-runtime", () => ({
vi.mock("openclaw/plugin-sdk/provider-http", () => ({ vi.mock("openclaw/plugin-sdk/provider-http", () => ({
assertOkOrThrowHttpError: providerHttpMocks.assertOkOrThrowHttpErrorMock, assertOkOrThrowHttpError: providerHttpMocks.assertOkOrThrowHttpErrorMock,
assertOkOrThrowProviderError: providerHttpMocks.assertOkOrThrowProviderErrorMock,
createProviderOperationDeadline: ({ createProviderOperationDeadline: ({
label, label,
timeoutMs, timeoutMs,
@@ -85,6 +87,7 @@ export function installProviderHttpMockCleanup(): void {
providerHttpMocks.fetchWithTimeoutMock.mockReset(); providerHttpMocks.fetchWithTimeoutMock.mockReset();
providerHttpMocks.pollProviderOperationJsonMock.mockClear(); providerHttpMocks.pollProviderOperationJsonMock.mockClear();
providerHttpMocks.assertOkOrThrowHttpErrorMock.mockClear(); providerHttpMocks.assertOkOrThrowHttpErrorMock.mockClear();
providerHttpMocks.assertOkOrThrowProviderErrorMock.mockClear();
providerHttpMocks.resolveProviderHttpRequestConfigMock.mockClear(); providerHttpMocks.resolveProviderHttpRequestConfigMock.mockClear();
}); });
} }

View File

@@ -499,6 +499,7 @@ function createResolvedSummarizationConfig(cfg: OpenClawConfig): ResolvedTtsConf
allowSeed: true, allowSeed: true,
}, },
providerConfigs: {}, providerConfigs: {},
personas: {},
prefsPath: typeof rawConfig.prefsPath === "string" ? rawConfig.prefsPath : undefined, prefsPath: typeof rawConfig.prefsPath === "string" ? rawConfig.prefsPath : undefined,
maxTextLength: typeof rawConfig.maxTextLength === "number" ? rawConfig.maxTextLength : 4096, maxTextLength: typeof rawConfig.maxTextLength === "number" ? rawConfig.maxTextLength : 4096,
timeoutMs: typeof rawConfig.timeoutMs === "number" ? rawConfig.timeoutMs : 30_000, timeoutMs: typeof rawConfig.timeoutMs === "number" ? rawConfig.timeoutMs : 30_000,
@@ -715,6 +716,7 @@ export function describeTtsConfigContract() {
microsoft: {}, microsoft: {},
elevenlabs: {}, elevenlabs: {},
}, },
personas: {},
prefsPath: undefined, prefsPath: undefined,
maxTextLength: 4000, maxTextLength: 4000,
timeoutMs: 30_000, timeoutMs: 30_000,