mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 19:20:43 +00:00
TTS: add provider personas
This commit is contained in:
@@ -1048,6 +1048,7 @@ Docs: https://docs.openclaw.ai
|
|||||||
- Anthropic/models: add Claude Opus 4.7 `xhigh` reasoning effort support and keep it separate from adaptive thinking.
|
- Anthropic/models: add Claude Opus 4.7 `xhigh` reasoning effort support and keep it separate from adaptive thinking.
|
||||||
- Control UI/settings: overhaul the settings and slash-command experience with faster presets, quick-create flows, and refreshed command discovery. (#67819) Thanks @BunsDev.
|
- Control UI/settings: overhaul the settings and slash-command experience with faster presets, quick-create flows, and refreshed command discovery. (#67819) Thanks @BunsDev.
|
||||||
- macOS/gateway: add `screen.snapshot` support for macOS app nodes, including runtime plumbing, default macOS allowlisting, and docs for monitor preview flows. (#67954) Thanks @BunsDev.
|
- macOS/gateway: add `screen.snapshot` support for macOS app nodes, including runtime plumbing, default macOS allowlisting, and docs for monitor preview flows. (#67954) Thanks @BunsDev.
|
||||||
|
- TTS/personas: add provider-aware TTS personas with deterministic provider binding merges, `/tts persona` controls, gateway/CLI persona state, Google Gemini `audio-profile-v1` prompt wrapping, and OpenAI instruction mapping. (#68323)
|
||||||
|
|
||||||
### Fixes
|
### Fixes
|
||||||
|
|
||||||
|
|||||||
@@ -493,6 +493,110 @@ transcoded to raw 16 kHz mono PCM with `ffmpeg`. The legacy provider alias
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### TTS personas
|
||||||
|
|
||||||
|
Use `messages.tts.personas` when you want a stable spoken identity that can be
|
||||||
|
applied deterministically across providers. A persona can prefer one provider,
|
||||||
|
define provider-neutral prompt intent, and carry provider-specific bindings for
|
||||||
|
voices, models, prompt templates, seeds, and voice settings.
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
auto: "always",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
label: "Alfred",
|
||||||
|
description: "Dry, warm British butler narrator.",
|
||||||
|
provider: "google",
|
||||||
|
fallbackPolicy: "preserve-persona",
|
||||||
|
prompt: {
|
||||||
|
profile: "A brilliant British butler. Dry, witty, warm, charming, emotionally expressive, never generic.",
|
||||||
|
scene: "A quiet late-night study. Close-mic narration for a trusted operator.",
|
||||||
|
sampleContext: "The speaker is answering a private technical request with concise confidence and dry warmth.",
|
||||||
|
style: "Refined, understated, lightly amused.",
|
||||||
|
accent: "British English.",
|
||||||
|
pacing: "Measured, with short dramatic pauses.",
|
||||||
|
constraints: ["Do not read configuration values aloud.", "Do not explain the persona."],
|
||||||
|
},
|
||||||
|
providers: {
|
||||||
|
google: {
|
||||||
|
model: "gemini-3.1-flash-tts-preview",
|
||||||
|
voiceName: "Algieba",
|
||||||
|
promptTemplate: "audio-profile-v1",
|
||||||
|
},
|
||||||
|
openai: {
|
||||||
|
model: "gpt-4o-mini-tts",
|
||||||
|
voice: "cedar",
|
||||||
|
},
|
||||||
|
elevenlabs: {
|
||||||
|
voiceId: "voice_id",
|
||||||
|
modelId: "eleven_multilingual_v2",
|
||||||
|
seed: 42,
|
||||||
|
voiceSettings: {
|
||||||
|
stability: 0.65,
|
||||||
|
similarityBoost: 0.8,
|
||||||
|
style: 0.25,
|
||||||
|
useSpeakerBoost: true,
|
||||||
|
speed: 0.95,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Resolution is deterministic:
|
||||||
|
|
||||||
|
1. `/tts persona <id>` local preference, if set.
|
||||||
|
2. `messages.tts.persona`, if set.
|
||||||
|
3. No persona.
|
||||||
|
|
||||||
|
Provider selection is explicit-first:
|
||||||
|
|
||||||
|
1. Direct provider overrides from CLI, gateway, Talk, or allowed TTS directives.
|
||||||
|
2. `/tts provider <id>` local preference.
|
||||||
|
3. Active persona `provider`.
|
||||||
|
4. `messages.tts.provider`.
|
||||||
|
5. Registry auto-select.
|
||||||
|
|
||||||
|
For each provider attempt, OpenClaw merges:
|
||||||
|
|
||||||
|
1. `messages.tts.providers.<id>`
|
||||||
|
2. `messages.tts.personas.<persona>.providers.<id>`
|
||||||
|
3. trusted request overrides
|
||||||
|
4. allowed model-emitted TTS directive overrides
|
||||||
|
|
||||||
|
`fallbackPolicy` controls what happens when an active persona has no binding for
|
||||||
|
an attempted provider:
|
||||||
|
|
||||||
|
- `preserve-persona` keeps provider-neutral persona prompt fields available to
|
||||||
|
providers. This is the default.
|
||||||
|
- `provider-defaults` omits the persona from provider prompt preparation for
|
||||||
|
that attempt, so the provider uses its neutral defaults while still allowing
|
||||||
|
fallback to continue.
|
||||||
|
- `fail` skips that provider attempt with `reasonCode: "not_configured"` and
|
||||||
|
`personaBinding: "missing"`. Fallback providers are still tried; the whole TTS
|
||||||
|
request fails only if every attempted provider is skipped or fails.
|
||||||
|
|
||||||
|
Persona prompt fields are provider-neutral. Providers decide how to use them.
|
||||||
|
Google wraps them only when the effective Google provider config sets
|
||||||
|
`promptTemplate: "audio-profile-v1"` or `personaPrompt`; its older
|
||||||
|
`audioProfile` and `speakerName` fields are still prepended as Google-specific
|
||||||
|
prompt text. OpenAI maps prompt fields to `instructions` when no explicit
|
||||||
|
OpenAI `instructions` value is configured. Providers without prompt-like
|
||||||
|
controls use the provider-specific persona bindings only.
|
||||||
|
|
||||||
|
Gemini inline audio tags are transcript content, not persona config. If the
|
||||||
|
assistant or an explicit `[[tts:text]]` block includes tags such as `[whispers]`
|
||||||
|
or `[laughs]`, OpenClaw preserves them inside the Gemini transcript. OpenClaw
|
||||||
|
does not generate configured start tags.
|
||||||
|
|
||||||
### Disable Microsoft speech
|
### Disable Microsoft speech
|
||||||
|
|
||||||
```json5
|
```json5
|
||||||
@@ -565,6 +669,12 @@ Then run:
|
|||||||
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
|
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
|
||||||
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
|
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
|
||||||
rewritten to `provider: "microsoft"`.
|
rewritten to `provider: "microsoft"`.
|
||||||
|
- `persona`: default TTS persona id from `personas`.
|
||||||
|
- `personas.<id>`: stable spoken identity. The id is normalized to lowercase.
|
||||||
|
- `personas.<id>.provider`: preferred speech provider for the persona. Explicit provider overrides and local provider prefs still win.
|
||||||
|
- `personas.<id>.fallbackPolicy`: `preserve-persona` (default), `provider-defaults`, or `fail`; see [TTS personas](#tts-personas).
|
||||||
|
- `personas.<id>.prompt`: provider-neutral persona prompt fields (`profile`, `scene`, `sampleContext`, `style`, `accent`, `pacing`, `constraints`).
|
||||||
|
- `personas.<id>.providers.<provider>`: provider-specific persona binding merged over `providers.<provider>`.
|
||||||
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
||||||
- Accepts `provider/model` or a configured model alias.
|
- Accepts `provider/model` or a configured model alias.
|
||||||
- `modelOverrides`: allow the model to emit TTS directives (on by default).
|
- `modelOverrides`: allow the model to emit TTS directives (on by default).
|
||||||
@@ -621,6 +731,8 @@ Then run:
|
|||||||
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
|
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
|
||||||
- `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
|
- `providers.google.audioProfile`: natural-language style prompt prepended before the spoken text.
|
||||||
- `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
|
- `providers.google.speakerName`: optional speaker label prepended before the spoken text when your TTS prompt uses a named speaker.
|
||||||
|
- `providers.google.promptTemplate`: set to `audio-profile-v1` to wrap active persona prompt fields in a deterministic Gemini TTS prompt structure.
|
||||||
|
- `providers.google.personaPrompt`: Google-specific extra persona prompt text appended to the template's Director's Notes.
|
||||||
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
|
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
|
||||||
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
||||||
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
|
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
|
||||||
@@ -750,8 +862,9 @@ Slash commands write local overrides to `prefsPath` (default:
|
|||||||
|
|
||||||
Stored fields:
|
Stored fields:
|
||||||
|
|
||||||
- `enabled`
|
- `auto`
|
||||||
- `provider`
|
- `provider`
|
||||||
|
- `persona`
|
||||||
- `maxLength` (summary threshold; default 1500 chars)
|
- `maxLength` (summary threshold; default 1500 chars)
|
||||||
- `summarize` (default `true`)
|
- `summarize` (default `true`)
|
||||||
|
|
||||||
@@ -837,6 +950,7 @@ Discord note: `/tts` is a built-in Discord command, so OpenClaw registers
|
|||||||
/tts chat default
|
/tts chat default
|
||||||
/tts latest
|
/tts latest
|
||||||
/tts provider openai
|
/tts provider openai
|
||||||
|
/tts persona alfred
|
||||||
/tts limit 2000
|
/tts limit 2000
|
||||||
/tts summary off
|
/tts summary off
|
||||||
/tts audio Hello from OpenClaw
|
/tts audio Hello from OpenClaw
|
||||||
@@ -850,6 +964,7 @@ Notes:
|
|||||||
- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
|
- `/tts on` writes the local TTS preference to `always`; `/tts off` writes it to `off`.
|
||||||
- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
|
- `/tts chat on|off|default` writes a session-scoped auto-TTS override for the current chat.
|
||||||
- Use config when you want `inbound` or `tagged` defaults.
|
- Use config when you want `inbound` or `tagged` defaults.
|
||||||
|
- `/tts persona <id>` writes the local persona preference; `/tts persona off` clears it.
|
||||||
- `limit` and `summary` are stored in local prefs, not the main config.
|
- `limit` and `summary` are stored in local prefs, not the main config.
|
||||||
- `/tts audio` generates a one-off audio reply (does not toggle TTS on).
|
- `/tts audio` generates a one-off audio reply (does not toggle TTS on).
|
||||||
- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
|
- `/tts latest` reads the latest assistant reply from the current session transcript and sends it as audio once. It stores only a hash of that reply on the session entry to suppress duplicate voice sends.
|
||||||
@@ -883,6 +998,7 @@ Gateway methods:
|
|||||||
- `tts.disable`
|
- `tts.disable`
|
||||||
- `tts.convert`
|
- `tts.convert`
|
||||||
- `tts.setProvider`
|
- `tts.setProvider`
|
||||||
|
- `tts.setPersona`
|
||||||
- `tts.providers`
|
- `tts.providers`
|
||||||
|
|
||||||
## Related
|
## Related
|
||||||
|
|||||||
@@ -1,5 +1,8 @@
|
|||||||
import * as providerHttp from "openclaw/plugin-sdk/provider-http";
|
import { afterEach, beforeAll, describe, expect, it, vi } from "vitest";
|
||||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
import {
|
||||||
|
getProviderHttpMocks,
|
||||||
|
installProviderHttpMockCleanup,
|
||||||
|
} from "../../test/helpers/media-generation/provider-http-mocks.js";
|
||||||
|
|
||||||
const transcodeAudioBufferToOpusMock = vi.hoisted(() => vi.fn());
|
const transcodeAudioBufferToOpusMock = vi.hoisted(() => vi.fn());
|
||||||
|
|
||||||
@@ -7,10 +10,23 @@ vi.mock("openclaw/plugin-sdk/media-runtime", () => ({
|
|||||||
transcodeAudioBufferToOpus: transcodeAudioBufferToOpusMock,
|
transcodeAudioBufferToOpus: transcodeAudioBufferToOpusMock,
|
||||||
}));
|
}));
|
||||||
|
|
||||||
import { buildGoogleSpeechProvider, __testing } from "./speech-provider.js";
|
const {
|
||||||
|
assertOkOrThrowProviderErrorMock,
|
||||||
|
postJsonRequestMock,
|
||||||
|
resolveProviderHttpRequestConfigMock,
|
||||||
|
} = getProviderHttpMocks();
|
||||||
|
|
||||||
function installGoogleTtsFetchMock(pcm = Buffer.from([1, 0, 2, 0])) {
|
let buildGoogleSpeechProvider: typeof import("./speech-provider.js").buildGoogleSpeechProvider;
|
||||||
const fetchMock = vi.fn().mockResolvedValue({
|
let __testing: typeof import("./speech-provider.js").__testing;
|
||||||
|
|
||||||
|
beforeAll(async () => {
|
||||||
|
({ buildGoogleSpeechProvider, __testing } = await import("./speech-provider.js"));
|
||||||
|
});
|
||||||
|
|
||||||
|
installProviderHttpMockCleanup();
|
||||||
|
|
||||||
|
function googleTtsResponse(pcm = Buffer.from([1, 0, 2, 0])) {
|
||||||
|
return {
|
||||||
ok: true,
|
ok: true,
|
||||||
json: async () => ({
|
json: async () => ({
|
||||||
candidates: [
|
candidates: [
|
||||||
@@ -28,21 +44,26 @@ function installGoogleTtsFetchMock(pcm = Buffer.from([1, 0, 2, 0])) {
|
|||||||
},
|
},
|
||||||
],
|
],
|
||||||
}),
|
}),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function installGoogleTtsRequestMock(pcm = Buffer.from([1, 0, 2, 0])) {
|
||||||
|
postJsonRequestMock.mockResolvedValue({
|
||||||
|
response: googleTtsResponse(pcm),
|
||||||
|
release: vi.fn(async () => {}),
|
||||||
});
|
});
|
||||||
vi.stubGlobal("fetch", fetchMock);
|
return postJsonRequestMock;
|
||||||
return fetchMock;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
describe("Google speech provider", () => {
|
describe("Google speech provider", () => {
|
||||||
afterEach(() => {
|
afterEach(() => {
|
||||||
vi.restoreAllMocks();
|
|
||||||
vi.unstubAllGlobals();
|
vi.unstubAllGlobals();
|
||||||
vi.unstubAllEnvs();
|
vi.unstubAllEnvs();
|
||||||
transcodeAudioBufferToOpusMock.mockReset();
|
transcodeAudioBufferToOpusMock.mockReset();
|
||||||
});
|
});
|
||||||
|
|
||||||
it("synthesizes Gemini PCM as WAV and preserves audio tags in the request text", async () => {
|
it("synthesizes Gemini PCM as WAV and preserves audio tags in the request text", async () => {
|
||||||
const fetchMock = installGoogleTtsFetchMock();
|
const requestMock = installGoogleTtsRequestMock();
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
const result = await provider.synthesize({
|
const result = await provider.synthesize({
|
||||||
@@ -57,11 +78,10 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 12_345,
|
timeoutMs: 12_345,
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(fetchMock).toHaveBeenCalledWith(
|
expect(requestMock).toHaveBeenCalledWith(
|
||||||
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
|
|
||||||
expect.objectContaining({
|
expect.objectContaining({
|
||||||
method: "POST",
|
url: "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
|
||||||
body: JSON.stringify({
|
body: {
|
||||||
contents: [
|
contents: [
|
||||||
{
|
{
|
||||||
role: "user",
|
role: "user",
|
||||||
@@ -78,11 +98,14 @@ describe("Google speech provider", () => {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
}),
|
},
|
||||||
|
fetchFn: fetch,
|
||||||
|
pinDns: false,
|
||||||
|
timeoutMs: 12_345,
|
||||||
}),
|
}),
|
||||||
);
|
);
|
||||||
const [, init] = fetchMock.mock.calls[0];
|
const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
|
||||||
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("google-test-key");
|
expect(new Headers(request.headers).get("x-goog-api-key")).toBe("google-test-key");
|
||||||
expect(result.outputFormat).toBe("wav");
|
expect(result.outputFormat).toBe("wav");
|
||||||
expect(result.fileExtension).toBe(".wav");
|
expect(result.fileExtension).toBe(".wav");
|
||||||
expect(result.voiceCompatible).toBe(false);
|
expect(result.voiceCompatible).toBe(false);
|
||||||
@@ -94,7 +117,7 @@ describe("Google speech provider", () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it("transcodes Gemini PCM to Opus for voice-note targets", async () => {
|
it("transcodes Gemini PCM to Opus for voice-note targets", async () => {
|
||||||
installGoogleTtsFetchMock(Buffer.from([5, 0, 6, 0]));
|
installGoogleTtsRequestMock(Buffer.from([5, 0, 6, 0]));
|
||||||
transcodeAudioBufferToOpusMock.mockResolvedValueOnce(Buffer.from("google-opus"));
|
transcodeAudioBufferToOpusMock.mockResolvedValueOnce(Buffer.from("google-opus"));
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
@@ -125,9 +148,138 @@ describe("Google speech provider", () => {
|
|||||||
expect(audioBuffer.subarray(8, 12).toString("ascii")).toBe("WAVE");
|
expect(audioBuffer.subarray(8, 12).toString("ascii")).toBe("WAVE");
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("advertises all documented Gemini TTS-capable models", () => {
|
||||||
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
|
expect(provider.models).toEqual(__testing.GOOGLE_TTS_MODELS);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("renders deterministic audio-profile-v1 prompts without generating tags", async () => {
|
||||||
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
|
const prepared = await provider.prepareSynthesis?.({
|
||||||
|
text: "[whispers] The door is open.",
|
||||||
|
cfg: {},
|
||||||
|
providerConfig: {
|
||||||
|
promptTemplate: "audio-profile-v1",
|
||||||
|
personaPrompt: "Keep a close-mic feel.",
|
||||||
|
},
|
||||||
|
persona: {
|
||||||
|
id: "alfred",
|
||||||
|
label: "Alfred",
|
||||||
|
prompt: {
|
||||||
|
profile: "A brilliant British butler.",
|
||||||
|
scene: "A quiet late-night study.",
|
||||||
|
sampleContext: "The speaker is answering a trusted operator.",
|
||||||
|
style: "Refined and lightly amused.",
|
||||||
|
accent: "British English.",
|
||||||
|
pacing: "Measured.",
|
||||||
|
constraints: ["Do not read configuration values aloud."],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
target: "audio-file",
|
||||||
|
timeoutMs: 1_000,
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(prepared?.text).toBe(
|
||||||
|
[
|
||||||
|
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
|
||||||
|
"as performance direction. Do not read section titles, notes, labels, or",
|
||||||
|
"configuration aloud.",
|
||||||
|
"",
|
||||||
|
"# AUDIO PROFILE: Alfred",
|
||||||
|
"A brilliant British butler.",
|
||||||
|
"",
|
||||||
|
"## THE SCENE",
|
||||||
|
"A quiet late-night study.",
|
||||||
|
"",
|
||||||
|
"### DIRECTOR'S NOTES",
|
||||||
|
"Style: Refined and lightly amused.",
|
||||||
|
"Accent: British English.",
|
||||||
|
"Pacing: Measured.",
|
||||||
|
"Constraints:",
|
||||||
|
"- Do not read configuration values aloud.",
|
||||||
|
"Provider notes:",
|
||||||
|
"Keep a close-mic feel.",
|
||||||
|
"",
|
||||||
|
"### SAMPLE CONTEXT",
|
||||||
|
"The speaker is answering a trusted operator.",
|
||||||
|
"",
|
||||||
|
"### TRANSCRIPT",
|
||||||
|
"[whispers] The door is open.",
|
||||||
|
].join("\n"),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("does not wrap an OpenClaw audio-profile-v1 prompt twice", async () => {
|
||||||
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
const text = [
|
||||||
|
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
|
||||||
|
"as performance direction. Do not read section titles, notes, labels, or",
|
||||||
|
"configuration aloud.",
|
||||||
|
"",
|
||||||
|
"# AUDIO PROFILE: Alfred",
|
||||||
|
"A brilliant British butler.",
|
||||||
|
"",
|
||||||
|
"### TRANSCRIPT",
|
||||||
|
"Hello.",
|
||||||
|
].join("\n");
|
||||||
|
|
||||||
|
const prepared = await provider.prepareSynthesis?.({
|
||||||
|
text,
|
||||||
|
cfg: {},
|
||||||
|
providerConfig: {
|
||||||
|
promptTemplate: "audio-profile-v1",
|
||||||
|
},
|
||||||
|
persona: {
|
||||||
|
id: "alfred",
|
||||||
|
label: "Alfred",
|
||||||
|
prompt: {
|
||||||
|
profile: "A brilliant British butler.",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
target: "audio-file",
|
||||||
|
timeoutMs: 1_000,
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(prepared).toBeUndefined();
|
||||||
|
});
|
||||||
|
|
||||||
|
it("retries once when Gemini returns no audio payload", async () => {
|
||||||
|
const pcm = Buffer.from([5, 0, 6, 0]);
|
||||||
|
const requestSequence = vi
|
||||||
|
.fn()
|
||||||
|
.mockResolvedValueOnce({
|
||||||
|
response: {
|
||||||
|
ok: true,
|
||||||
|
json: async () => ({ candidates: [{ content: { parts: [{ text: "not audio" }] } }] }),
|
||||||
|
},
|
||||||
|
release: vi.fn(async () => {}),
|
||||||
|
})
|
||||||
|
.mockResolvedValueOnce({
|
||||||
|
response: googleTtsResponse(pcm),
|
||||||
|
release: vi.fn(async () => {}),
|
||||||
|
});
|
||||||
|
postJsonRequestMock.mockImplementation(requestSequence);
|
||||||
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
|
const result = await provider.synthesize({
|
||||||
|
text: "Retry this.",
|
||||||
|
cfg: {},
|
||||||
|
providerConfig: {
|
||||||
|
apiKey: "google-test-key",
|
||||||
|
},
|
||||||
|
target: "audio-file",
|
||||||
|
timeoutMs: 5_000,
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(requestSequence).toHaveBeenCalledTimes(2);
|
||||||
|
expect(result.audioBuffer.subarray(44)).toEqual(pcm);
|
||||||
|
});
|
||||||
|
|
||||||
it("falls back to GEMINI_API_KEY and configured Google API base URL", async () => {
|
it("falls back to GEMINI_API_KEY and configured Google API base URL", async () => {
|
||||||
vi.stubEnv("GEMINI_API_KEY", "env-google-key");
|
vi.stubEnv("GEMINI_API_KEY", "env-google-key");
|
||||||
const fetchMock = installGoogleTtsFetchMock();
|
const requestMock = installGoogleTtsRequestMock();
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 1 })).toBe(true);
|
expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 1 })).toBe(true);
|
||||||
@@ -149,16 +301,17 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 10_000,
|
timeoutMs: 10_000,
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(fetchMock).toHaveBeenCalledWith(
|
expect(requestMock).toHaveBeenCalledWith(
|
||||||
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
|
expect.objectContaining({
|
||||||
expect.any(Object),
|
url: "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-tts-preview:generateContent",
|
||||||
|
}),
|
||||||
);
|
);
|
||||||
const [, init] = fetchMock.mock.calls[0];
|
const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
|
||||||
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("env-google-key");
|
expect(new Headers(request.headers).get("x-goog-api-key")).toBe("env-google-key");
|
||||||
});
|
});
|
||||||
|
|
||||||
it("can reuse a configured Google model-provider API key without auth profiles", async () => {
|
it("can reuse a configured Google model-provider API key without auth profiles", async () => {
|
||||||
const fetchMock = installGoogleTtsFetchMock();
|
const requestMock = installGoogleTtsRequestMock();
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
const cfg = {
|
const cfg = {
|
||||||
models: {
|
models: {
|
||||||
@@ -182,13 +335,13 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 10_000,
|
timeoutMs: 10_000,
|
||||||
});
|
});
|
||||||
|
|
||||||
const [, init] = fetchMock.mock.calls[0];
|
const request = requestMock.mock.calls[0]?.[0] as { headers?: HeadersInit };
|
||||||
expect(new Headers(init.headers).get("x-goog-api-key")).toBe("model-provider-google-key");
|
expect(new Headers(request.headers).get("x-goog-api-key")).toBe("model-provider-google-key");
|
||||||
});
|
});
|
||||||
|
|
||||||
it("returns Gemini PCM directly for telephony synthesis", async () => {
|
it("returns Gemini PCM directly for telephony synthesis", async () => {
|
||||||
const pcm = Buffer.from([3, 0, 4, 0]);
|
const pcm = Buffer.from([3, 0, 4, 0]);
|
||||||
installGoogleTtsFetchMock(pcm);
|
installGoogleTtsRequestMock(pcm);
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
const result = await provider.synthesizeTelephony?.({
|
const result = await provider.synthesizeTelephony?.({
|
||||||
@@ -209,7 +362,7 @@ describe("Google speech provider", () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it("prepends configured Gemini TTS profile text", async () => {
|
it("prepends configured Gemini TTS profile text", async () => {
|
||||||
const fetchMock = installGoogleTtsFetchMock();
|
const requestMock = installGoogleTtsRequestMock();
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
await provider.synthesize({
|
await provider.synthesize({
|
||||||
@@ -224,8 +377,7 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 10_000,
|
timeoutMs: 10_000,
|
||||||
});
|
});
|
||||||
|
|
||||||
const [, init] = fetchMock.mock.calls[0];
|
expect(requestMock.mock.calls[0]?.[0].body).toMatchObject({
|
||||||
expect(JSON.parse(String(init.body))).toMatchObject({
|
|
||||||
contents: [
|
contents: [
|
||||||
{
|
{
|
||||||
parts: [
|
parts: [
|
||||||
@@ -326,23 +478,26 @@ describe("Google speech provider", () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it("formats Google TTS HTTP errors with provider details", async () => {
|
it("formats Google TTS HTTP errors with provider details", async () => {
|
||||||
vi.stubGlobal(
|
assertOkOrThrowProviderErrorMock.mockRejectedValue(
|
||||||
"fetch",
|
new Error(
|
||||||
vi.fn().mockResolvedValue(
|
"Google TTS failed (429): Quota exceeded [code=RESOURCE_EXHAUSTED] [request_id=google_req_123]",
|
||||||
new Response(
|
|
||||||
JSON.stringify({
|
|
||||||
error: {
|
|
||||||
message: "Quota exceeded",
|
|
||||||
status: "RESOURCE_EXHAUSTED",
|
|
||||||
},
|
|
||||||
}),
|
|
||||||
{
|
|
||||||
status: 429,
|
|
||||||
headers: { "x-request-id": "google_req_123" },
|
|
||||||
},
|
|
||||||
),
|
|
||||||
),
|
),
|
||||||
);
|
);
|
||||||
|
postJsonRequestMock.mockResolvedValue({
|
||||||
|
response: new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
error: {
|
||||||
|
message: "Quota exceeded",
|
||||||
|
status: "RESOURCE_EXHAUSTED",
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
{
|
||||||
|
status: 429,
|
||||||
|
headers: { "x-request-id": "google_req_123" },
|
||||||
|
},
|
||||||
|
),
|
||||||
|
release: vi.fn(async () => {}),
|
||||||
|
});
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
|
|
||||||
await expect(
|
await expect(
|
||||||
@@ -359,8 +514,7 @@ describe("Google speech provider", () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it("honors configured private-network opt-in for Google TTS", async () => {
|
it("honors configured private-network opt-in for Google TTS", async () => {
|
||||||
installGoogleTtsFetchMock();
|
installGoogleTtsRequestMock();
|
||||||
const postJsonRequestSpy = vi.spyOn(providerHttp, "postJsonRequest");
|
|
||||||
|
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
await provider.synthesize({
|
await provider.synthesize({
|
||||||
@@ -381,14 +535,16 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 12_345,
|
timeoutMs: 12_345,
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(postJsonRequestSpy).toHaveBeenCalledWith(
|
expect(resolveProviderHttpRequestConfigMock).toHaveBeenCalledWith(
|
||||||
expect.objectContaining({ allowPrivateNetwork: true }),
|
expect.objectContaining({
|
||||||
|
allowPrivateNetwork: true,
|
||||||
|
request: expect.objectContaining({ allowPrivateNetwork: true }),
|
||||||
|
}),
|
||||||
);
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("honors configured private-network opt-in for Google telephony TTS", async () => {
|
it("honors configured private-network opt-in for Google telephony TTS", async () => {
|
||||||
installGoogleTtsFetchMock();
|
installGoogleTtsRequestMock();
|
||||||
const postJsonRequestSpy = vi.spyOn(providerHttp, "postJsonRequest");
|
|
||||||
|
|
||||||
const provider = buildGoogleSpeechProvider();
|
const provider = buildGoogleSpeechProvider();
|
||||||
await provider.synthesizeTelephony?.({
|
await provider.synthesizeTelephony?.({
|
||||||
@@ -408,8 +564,11 @@ describe("Google speech provider", () => {
|
|||||||
timeoutMs: 12_345,
|
timeoutMs: 12_345,
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(postJsonRequestSpy).toHaveBeenCalledWith(
|
expect(resolveProviderHttpRequestConfigMock).toHaveBeenCalledWith(
|
||||||
expect.objectContaining({ allowPrivateNetwork: true }),
|
expect.objectContaining({
|
||||||
|
allowPrivateNetwork: true,
|
||||||
|
request: expect.objectContaining({ allowPrivateNetwork: true }),
|
||||||
|
}),
|
||||||
);
|
);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -21,6 +21,13 @@ const DEFAULT_GOOGLE_TTS_VOICE = "Kore";
|
|||||||
const GOOGLE_TTS_SAMPLE_RATE = 24_000;
|
const GOOGLE_TTS_SAMPLE_RATE = 24_000;
|
||||||
const GOOGLE_TTS_CHANNELS = 1;
|
const GOOGLE_TTS_CHANNELS = 1;
|
||||||
const GOOGLE_TTS_BITS_PER_SAMPLE = 16;
|
const GOOGLE_TTS_BITS_PER_SAMPLE = 16;
|
||||||
|
const GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE = "audio-profile-v1";
|
||||||
|
|
||||||
|
const GOOGLE_TTS_MODELS = [
|
||||||
|
"gemini-3.1-flash-tts-preview",
|
||||||
|
"gemini-2.5-flash-preview-tts",
|
||||||
|
"gemini-2.5-pro-preview-tts",
|
||||||
|
] as const;
|
||||||
|
|
||||||
const GOOGLE_TTS_VOICES = [
|
const GOOGLE_TTS_VOICES = [
|
||||||
"Zephyr",
|
"Zephyr",
|
||||||
@@ -62,6 +69,8 @@ type GoogleTtsProviderConfig = {
|
|||||||
voiceName: string;
|
voiceName: string;
|
||||||
audioProfile?: string;
|
audioProfile?: string;
|
||||||
speakerName?: string;
|
speakerName?: string;
|
||||||
|
promptTemplate?: typeof GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE;
|
||||||
|
personaPrompt?: string;
|
||||||
};
|
};
|
||||||
|
|
||||||
type GoogleTtsProviderOverrides = {
|
type GoogleTtsProviderOverrides = {
|
||||||
@@ -91,6 +100,13 @@ type GoogleGenerateSpeechResponse = {
|
|||||||
}>;
|
}>;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
class GoogleTtsRetryableError extends Error {
|
||||||
|
constructor(message: string) {
|
||||||
|
super(message);
|
||||||
|
this.name = "GoogleTtsRetryableError";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function normalizeGoogleTtsModel(model: unknown): string {
|
function normalizeGoogleTtsModel(model: unknown): string {
|
||||||
const trimmed = normalizeOptionalString(model);
|
const trimmed = normalizeOptionalString(model);
|
||||||
if (!trimmed) {
|
if (!trimmed) {
|
||||||
@@ -104,6 +120,19 @@ function normalizeGoogleTtsVoiceName(voiceName: unknown): string {
|
|||||||
return normalizeOptionalString(voiceName) ?? DEFAULT_GOOGLE_TTS_VOICE;
|
return normalizeOptionalString(voiceName) ?? DEFAULT_GOOGLE_TTS_VOICE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function normalizeGooglePromptTemplate(
|
||||||
|
value: unknown,
|
||||||
|
): typeof GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE | undefined {
|
||||||
|
const trimmed = normalizeOptionalString(value);
|
||||||
|
if (!trimmed) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
if (trimmed === GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE) {
|
||||||
|
return trimmed;
|
||||||
|
}
|
||||||
|
throw new Error(`Invalid Google TTS promptTemplate: ${trimmed}`);
|
||||||
|
}
|
||||||
|
|
||||||
function resolveGoogleTtsEnvApiKey(): string | undefined {
|
function resolveGoogleTtsEnvApiKey(): string | undefined {
|
||||||
return (
|
return (
|
||||||
normalizeOptionalString(process.env.GEMINI_API_KEY) ??
|
normalizeOptionalString(process.env.GEMINI_API_KEY) ??
|
||||||
@@ -149,6 +178,8 @@ function normalizeGoogleTtsProviderConfig(
|
|||||||
rawConfig: Record<string, unknown>,
|
rawConfig: Record<string, unknown>,
|
||||||
): GoogleTtsProviderConfig {
|
): GoogleTtsProviderConfig {
|
||||||
const raw = resolveGoogleTtsConfigRecord(rawConfig);
|
const raw = resolveGoogleTtsConfigRecord(rawConfig);
|
||||||
|
const promptTemplate = normalizeGooglePromptTemplate(raw?.promptTemplate);
|
||||||
|
const personaPrompt = trimToUndefined(raw?.personaPrompt);
|
||||||
return {
|
return {
|
||||||
apiKey: normalizeResolvedSecretInputString({
|
apiKey: normalizeResolvedSecretInputString({
|
||||||
value: raw?.apiKey,
|
value: raw?.apiKey,
|
||||||
@@ -159,11 +190,16 @@ function normalizeGoogleTtsProviderConfig(
|
|||||||
voiceName: normalizeGoogleTtsVoiceName(raw?.voiceName ?? raw?.voice),
|
voiceName: normalizeGoogleTtsVoiceName(raw?.voiceName ?? raw?.voice),
|
||||||
audioProfile: trimToUndefined(raw?.audioProfile),
|
audioProfile: trimToUndefined(raw?.audioProfile),
|
||||||
speakerName: trimToUndefined(raw?.speakerName),
|
speakerName: trimToUndefined(raw?.speakerName),
|
||||||
|
...(promptTemplate ? { promptTemplate } : {}),
|
||||||
|
...(personaPrompt ? { personaPrompt } : {}),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsProviderConfig {
|
function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsProviderConfig {
|
||||||
const normalized = normalizeGoogleTtsProviderConfig({});
|
const normalized = normalizeGoogleTtsProviderConfig({});
|
||||||
|
const promptTemplate =
|
||||||
|
normalizeGooglePromptTemplate(config.promptTemplate) ?? normalized.promptTemplate;
|
||||||
|
const personaPrompt = trimToUndefined(config.personaPrompt) ?? normalized.personaPrompt;
|
||||||
return {
|
return {
|
||||||
apiKey: trimToUndefined(config.apiKey) ?? normalized.apiKey,
|
apiKey: trimToUndefined(config.apiKey) ?? normalized.apiKey,
|
||||||
baseUrl: trimToUndefined(config.baseUrl) ?? normalized.baseUrl,
|
baseUrl: trimToUndefined(config.baseUrl) ?? normalized.baseUrl,
|
||||||
@@ -173,6 +209,8 @@ function readGoogleTtsProviderConfig(config: SpeechProviderConfig): GoogleTtsPro
|
|||||||
),
|
),
|
||||||
audioProfile: trimToUndefined(config.audioProfile) ?? normalized.audioProfile,
|
audioProfile: trimToUndefined(config.audioProfile) ?? normalized.audioProfile,
|
||||||
speakerName: trimToUndefined(config.speakerName) ?? normalized.speakerName,
|
speakerName: trimToUndefined(config.speakerName) ?? normalized.speakerName,
|
||||||
|
...(promptTemplate ? { promptTemplate } : {}),
|
||||||
|
...(personaPrompt ? { personaPrompt } : {}),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -243,6 +281,116 @@ function extractGoogleSpeechPcm(payload: GoogleGenerateSpeechResponse): Buffer {
|
|||||||
throw new Error("Google TTS response missing audio data");
|
throw new Error("Google TTS response missing audio data");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function normalizePromptSectionText(value: string | undefined): string | undefined {
|
||||||
|
const trimmed = trimToUndefined(value?.replace(/\r\n?/g, "\n"));
|
||||||
|
if (!trimmed) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
let sanitized = "";
|
||||||
|
for (const char of trimmed) {
|
||||||
|
const code = char.charCodeAt(0);
|
||||||
|
if (
|
||||||
|
(code >= 0 && code <= 8) ||
|
||||||
|
code === 11 ||
|
||||||
|
code === 12 ||
|
||||||
|
(code >= 14 && code <= 31) ||
|
||||||
|
code === 127
|
||||||
|
) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
sanitized += char;
|
||||||
|
}
|
||||||
|
return sanitized;
|
||||||
|
}
|
||||||
|
|
||||||
|
function normalizePromptList(values: readonly string[] | undefined): string[] {
|
||||||
|
return (values ?? [])
|
||||||
|
.map((value) => normalizePromptSectionText(value))
|
||||||
|
.filter((value): value is string => Boolean(value));
|
||||||
|
}
|
||||||
|
|
||||||
|
function isOpenClawGoogleAudioProfilePrompt(text: string): boolean {
|
||||||
|
return (
|
||||||
|
text.includes("# AUDIO PROFILE:") &&
|
||||||
|
text.includes("### TRANSCRIPT") &&
|
||||||
|
text.startsWith("Synthesize speech from the TRANSCRIPT section only.")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderGoogleAudioProfilePrompt(params: {
|
||||||
|
text: string;
|
||||||
|
persona?: {
|
||||||
|
id: string;
|
||||||
|
label?: string;
|
||||||
|
prompt?: {
|
||||||
|
profile?: string;
|
||||||
|
scene?: string;
|
||||||
|
sampleContext?: string;
|
||||||
|
style?: string;
|
||||||
|
accent?: string;
|
||||||
|
pacing?: string;
|
||||||
|
constraints?: string[];
|
||||||
|
};
|
||||||
|
};
|
||||||
|
personaPrompt?: string;
|
||||||
|
}): string {
|
||||||
|
const transcript = params.text.replace(/\r\n?/g, "\n").trim();
|
||||||
|
const prompt = params.persona?.prompt;
|
||||||
|
const profile = normalizePromptSectionText(prompt?.profile);
|
||||||
|
const scene = normalizePromptSectionText(prompt?.scene);
|
||||||
|
const sampleContext = normalizePromptSectionText(prompt?.sampleContext);
|
||||||
|
const style = normalizePromptSectionText(prompt?.style);
|
||||||
|
const accent = normalizePromptSectionText(prompt?.accent);
|
||||||
|
const pacing = normalizePromptSectionText(prompt?.pacing);
|
||||||
|
const constraints = normalizePromptList(prompt?.constraints);
|
||||||
|
const personaPrompt = normalizePromptSectionText(params.personaPrompt);
|
||||||
|
const label =
|
||||||
|
normalizePromptSectionText(params.persona?.label) ??
|
||||||
|
normalizePromptSectionText(params.persona?.id);
|
||||||
|
|
||||||
|
const sections = [
|
||||||
|
[
|
||||||
|
"Synthesize speech from the TRANSCRIPT section only. Use the other sections only",
|
||||||
|
"as performance direction. Do not read section titles, notes, labels, or",
|
||||||
|
"configuration aloud.",
|
||||||
|
].join("\n"),
|
||||||
|
];
|
||||||
|
|
||||||
|
if (label || profile) {
|
||||||
|
sections.push([`# AUDIO PROFILE: ${label ?? "voice"}`, profile].filter(Boolean).join("\n"));
|
||||||
|
}
|
||||||
|
if (scene) {
|
||||||
|
sections.push(["## THE SCENE", scene].join("\n"));
|
||||||
|
}
|
||||||
|
|
||||||
|
const directorNotes: string[] = [];
|
||||||
|
if (style) {
|
||||||
|
directorNotes.push(`Style: ${style}`);
|
||||||
|
}
|
||||||
|
if (accent) {
|
||||||
|
directorNotes.push(`Accent: ${accent}`);
|
||||||
|
}
|
||||||
|
if (pacing) {
|
||||||
|
directorNotes.push(`Pacing: ${pacing}`);
|
||||||
|
}
|
||||||
|
if (constraints.length > 0) {
|
||||||
|
directorNotes.push(["Constraints:", ...constraints.map((item) => `- ${item}`)].join("\n"));
|
||||||
|
}
|
||||||
|
if (personaPrompt) {
|
||||||
|
directorNotes.push(["Provider notes:", personaPrompt].join("\n"));
|
||||||
|
}
|
||||||
|
if (directorNotes.length > 0) {
|
||||||
|
sections.push(["### DIRECTOR'S NOTES", ...directorNotes].join("\n"));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (sampleContext) {
|
||||||
|
sections.push(["### SAMPLE CONTEXT", sampleContext].join("\n"));
|
||||||
|
}
|
||||||
|
|
||||||
|
sections.push(["### TRANSCRIPT", transcript].join("\n"));
|
||||||
|
return sections.join("\n\n");
|
||||||
|
}
|
||||||
|
|
||||||
function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): Buffer {
|
function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): Buffer {
|
||||||
const byteRate = sampleRate * GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
|
const byteRate = sampleRate * GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
|
||||||
const blockAlign = GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
|
const blockAlign = GOOGLE_TTS_CHANNELS * (GOOGLE_TTS_BITS_PER_SAMPLE / 8);
|
||||||
@@ -265,7 +413,7 @@ function wrapPcm16MonoToWav(pcm: Buffer, sampleRate = GOOGLE_TTS_SAMPLE_RATE): B
|
|||||||
return Buffer.concat([header, pcm]);
|
return Buffer.concat([header, pcm]);
|
||||||
}
|
}
|
||||||
|
|
||||||
async function synthesizeGoogleTtsPcm(params: {
|
async function synthesizeGoogleTtsPcmOnce(params: {
|
||||||
text: string;
|
text: string;
|
||||||
apiKey: string;
|
apiKey: string;
|
||||||
baseUrl?: string;
|
baseUrl?: string;
|
||||||
@@ -322,19 +470,59 @@ async function synthesizeGoogleTtsPcm(params: {
|
|||||||
});
|
});
|
||||||
|
|
||||||
try {
|
try {
|
||||||
await assertOkOrThrowProviderError(res, "Google TTS failed");
|
if (!res.ok) {
|
||||||
return extractGoogleSpeechPcm((await res.json()) as GoogleGenerateSpeechResponse);
|
try {
|
||||||
|
await assertOkOrThrowProviderError(res, "Google TTS failed");
|
||||||
|
} catch (err) {
|
||||||
|
const message = err instanceof Error ? err.message : String(err);
|
||||||
|
if (res.status >= 500 && res.status < 600) {
|
||||||
|
throw new GoogleTtsRetryableError(message);
|
||||||
|
}
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
return extractGoogleSpeechPcm((await res.json()) as GoogleGenerateSpeechResponse);
|
||||||
|
} catch (err) {
|
||||||
|
const message = err instanceof Error ? err.message : String(err);
|
||||||
|
throw new GoogleTtsRetryableError(message);
|
||||||
|
}
|
||||||
} finally {
|
} finally {
|
||||||
await release();
|
await release();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function synthesizeGoogleTtsPcm(params: {
|
||||||
|
text: string;
|
||||||
|
apiKey: string;
|
||||||
|
baseUrl?: string;
|
||||||
|
request?: ReturnType<typeof sanitizeConfiguredModelProviderRequest>;
|
||||||
|
model: string;
|
||||||
|
voiceName: string;
|
||||||
|
audioProfile?: string;
|
||||||
|
speakerName?: string;
|
||||||
|
timeoutMs: number;
|
||||||
|
}): Promise<Buffer> {
|
||||||
|
let lastError: unknown;
|
||||||
|
for (let attempt = 0; attempt < 2; attempt += 1) {
|
||||||
|
try {
|
||||||
|
return await synthesizeGoogleTtsPcmOnce(params);
|
||||||
|
} catch (err) {
|
||||||
|
lastError = err;
|
||||||
|
if (!(err instanceof GoogleTtsRetryableError) || attempt > 0) {
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
throw lastError instanceof Error ? lastError : new Error(String(lastError));
|
||||||
|
}
|
||||||
|
|
||||||
export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
|
export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
|
||||||
return {
|
return {
|
||||||
id: "google",
|
id: "google",
|
||||||
label: "Google",
|
label: "Google",
|
||||||
autoSelectOrder: 50,
|
autoSelectOrder: 50,
|
||||||
models: [DEFAULT_GOOGLE_TTS_MODEL],
|
models: GOOGLE_TTS_MODELS,
|
||||||
voices: GOOGLE_TTS_VOICES,
|
voices: GOOGLE_TTS_VOICES,
|
||||||
resolveConfig: ({ rawConfig }) => normalizeGoogleTtsProviderConfig(rawConfig),
|
resolveConfig: ({ rawConfig }) => normalizeGoogleTtsProviderConfig(rawConfig),
|
||||||
parseDirectiveToken,
|
parseDirectiveToken,
|
||||||
@@ -372,6 +560,22 @@ export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
|
|||||||
listVoices: async () => GOOGLE_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
|
listVoices: async () => GOOGLE_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
|
||||||
isConfigured: ({ cfg, providerConfig }) =>
|
isConfigured: ({ cfg, providerConfig }) =>
|
||||||
Boolean(resolveGoogleTtsApiKey({ cfg, providerConfig })),
|
Boolean(resolveGoogleTtsApiKey({ cfg, providerConfig })),
|
||||||
|
prepareSynthesis: (ctx) => {
|
||||||
|
const config = readGoogleTtsProviderConfig(ctx.providerConfig);
|
||||||
|
const shouldWrap =
|
||||||
|
config.promptTemplate === GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE ||
|
||||||
|
Boolean(config.personaPrompt);
|
||||||
|
if (!shouldWrap || isOpenClawGoogleAudioProfilePrompt(ctx.text)) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
text: renderGoogleAudioProfilePrompt({
|
||||||
|
text: ctx.text,
|
||||||
|
persona: ctx.persona,
|
||||||
|
personaPrompt: config.personaPrompt,
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
},
|
||||||
synthesize: async (req) => {
|
synthesize: async (req) => {
|
||||||
const config = readGoogleTtsProviderConfig(req.providerConfig);
|
const config = readGoogleTtsProviderConfig(req.providerConfig);
|
||||||
const overrides = readGoogleTtsOverrides(req.providerOverrides);
|
const overrides = readGoogleTtsOverrides(req.providerOverrides);
|
||||||
@@ -449,7 +653,10 @@ export function buildGoogleSpeechProvider(): SpeechProviderPlugin {
|
|||||||
export const __testing = {
|
export const __testing = {
|
||||||
DEFAULT_GOOGLE_TTS_MODEL,
|
DEFAULT_GOOGLE_TTS_MODEL,
|
||||||
DEFAULT_GOOGLE_TTS_VOICE,
|
DEFAULT_GOOGLE_TTS_VOICE,
|
||||||
|
GOOGLE_AUDIO_PROFILE_PROMPT_TEMPLATE,
|
||||||
|
GOOGLE_TTS_MODELS,
|
||||||
GOOGLE_TTS_SAMPLE_RATE,
|
GOOGLE_TTS_SAMPLE_RATE,
|
||||||
normalizeGoogleTtsModel,
|
normalizeGoogleTtsModel,
|
||||||
|
renderGoogleAudioProfilePrompt,
|
||||||
wrapPcm16MonoToWav,
|
wrapPcm16MonoToWav,
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -134,6 +134,7 @@ function createLiveTtsConfig(): ResolvedTtsConfig {
|
|||||||
voice: "alloy",
|
voice: "alloy",
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
personas: {},
|
||||||
maxTextLength: 4_000,
|
maxTextLength: 4_000,
|
||||||
timeoutMs: 30_000,
|
timeoutMs: 30_000,
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -162,6 +162,40 @@ describe("buildOpenAISpeechProvider", () => {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("maps persona prompt fields to instructions when instructions are unset", async () => {
|
||||||
|
const provider = buildOpenAISpeechProvider();
|
||||||
|
|
||||||
|
const prepared = await provider.prepareSynthesis?.({
|
||||||
|
text: "hello",
|
||||||
|
cfg: {} as never,
|
||||||
|
providerConfig: {
|
||||||
|
apiKey: "sk-test",
|
||||||
|
model: "gpt-4o-mini-tts",
|
||||||
|
voice: "cedar",
|
||||||
|
},
|
||||||
|
persona: {
|
||||||
|
id: "alfred",
|
||||||
|
label: "Alfred",
|
||||||
|
prompt: {
|
||||||
|
profile: "A brilliant British butler.",
|
||||||
|
scene: "A quiet late-night study.",
|
||||||
|
sampleContext: "The speaker is answering a trusted operator.",
|
||||||
|
style: "Refined and lightly amused.",
|
||||||
|
accent: "British English.",
|
||||||
|
pacing: "Measured.",
|
||||||
|
constraints: ["Do not read configuration values aloud."],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
target: "audio-file",
|
||||||
|
timeoutMs: 1_000,
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(prepared?.providerConfig?.instructions).toContain("Persona: Alfred");
|
||||||
|
expect(prepared?.providerConfig?.instructions).toContain(
|
||||||
|
"Constraint: Do not read configuration values aloud.",
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
it("uses wav for Groq-compatible OpenAI TTS endpoints", async () => {
|
it("uses wav for Groq-compatible OpenAI TTS endpoints", async () => {
|
||||||
const provider = buildOpenAISpeechProvider();
|
const provider = buildOpenAISpeechProvider();
|
||||||
mockSpeechFetchExpectingFormat("wav");
|
mockSpeechFetchExpectingFormat("wav");
|
||||||
|
|||||||
@@ -71,7 +71,7 @@ function isGroqSpeechBaseUrl(baseUrl: string): boolean {
|
|||||||
|
|
||||||
function resolveSpeechResponseFormat(
|
function resolveSpeechResponseFormat(
|
||||||
baseUrl: string,
|
baseUrl: string,
|
||||||
target: "audio-file" | "voice-note",
|
target: "audio-file" | "voice-note" | "telephony",
|
||||||
configuredFormat?: OpenAiSpeechResponseFormat,
|
configuredFormat?: OpenAiSpeechResponseFormat,
|
||||||
): OpenAiSpeechResponseFormat {
|
): OpenAiSpeechResponseFormat {
|
||||||
if (configuredFormat) {
|
if (configuredFormat) {
|
||||||
@@ -145,6 +145,37 @@ function readOpenAIOverrides(
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function renderOpenAITtsPersonaInstructions(req: {
|
||||||
|
label?: string;
|
||||||
|
prompt?: {
|
||||||
|
profile?: string;
|
||||||
|
scene?: string;
|
||||||
|
sampleContext?: string;
|
||||||
|
style?: string;
|
||||||
|
accent?: string;
|
||||||
|
pacing?: string;
|
||||||
|
constraints?: string[];
|
||||||
|
};
|
||||||
|
}): string | undefined {
|
||||||
|
const prompt = req.prompt;
|
||||||
|
if (!prompt) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
const lines = [
|
||||||
|
req.label ? `Persona: ${req.label}` : undefined,
|
||||||
|
prompt.profile ? `Profile: ${prompt.profile}` : undefined,
|
||||||
|
prompt.scene ? `Scene: ${prompt.scene}` : undefined,
|
||||||
|
prompt.style ? `Style: ${prompt.style}` : undefined,
|
||||||
|
prompt.accent ? `Accent: ${prompt.accent}` : undefined,
|
||||||
|
prompt.pacing ? `Pacing: ${prompt.pacing}` : undefined,
|
||||||
|
prompt.sampleContext ? `Sample context: ${prompt.sampleContext}` : undefined,
|
||||||
|
...(prompt.constraints ?? []).map((constraint) => `Constraint: ${constraint}`),
|
||||||
|
]
|
||||||
|
.map((line) => trimToUndefined(line))
|
||||||
|
.filter((line): line is string => Boolean(line));
|
||||||
|
return lines.length > 0 ? lines.join("\n") : undefined;
|
||||||
|
}
|
||||||
|
|
||||||
function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): {
|
function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): {
|
||||||
handled: boolean;
|
handled: boolean;
|
||||||
overrides?: SpeechProviderOverrides;
|
overrides?: SpeechProviderOverrides;
|
||||||
@@ -229,6 +260,23 @@ export function buildOpenAISpeechProvider(): SpeechProviderPlugin {
|
|||||||
listVoices: async () => OPENAI_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
|
listVoices: async () => OPENAI_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
|
||||||
isConfigured: ({ providerConfig }) =>
|
isConfigured: ({ providerConfig }) =>
|
||||||
Boolean(readOpenAIProviderConfig(providerConfig).apiKey || process.env.OPENAI_API_KEY),
|
Boolean(readOpenAIProviderConfig(providerConfig).apiKey || process.env.OPENAI_API_KEY),
|
||||||
|
prepareSynthesis: (ctx) => {
|
||||||
|
const config = readOpenAIProviderConfig(ctx.providerConfig);
|
||||||
|
if (config.instructions) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
const instructions = renderOpenAITtsPersonaInstructions({
|
||||||
|
label: ctx.persona?.label ?? ctx.persona?.id,
|
||||||
|
prompt: ctx.persona?.prompt,
|
||||||
|
});
|
||||||
|
return instructions
|
||||||
|
? {
|
||||||
|
providerConfig: {
|
||||||
|
instructions,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
: undefined;
|
||||||
|
},
|
||||||
synthesize: async (req) => {
|
synthesize: async (req) => {
|
||||||
const config = readOpenAIProviderConfig(req.providerConfig);
|
const config = readOpenAIProviderConfig(req.providerConfig);
|
||||||
const overrides = readOpenAIOverrides(req.providerOverrides);
|
const overrides = readOpenAIOverrides(req.providerOverrides);
|
||||||
|
|||||||
@@ -3,11 +3,13 @@ export {
|
|||||||
getLastTtsAttempt,
|
getLastTtsAttempt,
|
||||||
getResolvedSpeechProviderConfig,
|
getResolvedSpeechProviderConfig,
|
||||||
getTtsMaxLength,
|
getTtsMaxLength,
|
||||||
|
getTtsPersona,
|
||||||
getTtsProvider,
|
getTtsProvider,
|
||||||
isSummarizationEnabled,
|
isSummarizationEnabled,
|
||||||
isTtsEnabled,
|
isTtsEnabled,
|
||||||
isTtsProviderConfigured,
|
isTtsProviderConfigured,
|
||||||
listSpeechVoices,
|
listSpeechVoices,
|
||||||
|
listTtsPersonas,
|
||||||
maybeApplyTtsToPayload,
|
maybeApplyTtsToPayload,
|
||||||
resolveExplicitTtsOverrides,
|
resolveExplicitTtsOverrides,
|
||||||
resolveTtsAutoMode,
|
resolveTtsAutoMode,
|
||||||
@@ -19,6 +21,7 @@ export {
|
|||||||
setTtsAutoMode,
|
setTtsAutoMode,
|
||||||
setTtsEnabled,
|
setTtsEnabled,
|
||||||
setTtsMaxLength,
|
setTtsMaxLength,
|
||||||
|
setTtsPersona,
|
||||||
setTtsProvider,
|
setTtsProvider,
|
||||||
synthesizeSpeech,
|
synthesizeSpeech,
|
||||||
textToSpeech,
|
textToSpeech,
|
||||||
|
|||||||
@@ -1,7 +1,12 @@
|
|||||||
import { rmSync } from "node:fs";
|
import { rmSync } from "node:fs";
|
||||||
import path from "node:path";
|
import path from "node:path";
|
||||||
import type { OpenClawConfig } from "openclaw/plugin-sdk/config-runtime";
|
import type { OpenClawConfig } from "openclaw/plugin-sdk/config-runtime";
|
||||||
import type { SpeechProviderPlugin, SpeechSynthesisRequest } from "openclaw/plugin-sdk/speech-core";
|
import type { ReplyPayload } from "openclaw/plugin-sdk/reply-payload";
|
||||||
|
import type {
|
||||||
|
SpeechProviderPlugin,
|
||||||
|
SpeechProviderPrepareSynthesisContext,
|
||||||
|
SpeechSynthesisRequest,
|
||||||
|
} from "openclaw/plugin-sdk/speech-core";
|
||||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||||
|
|
||||||
type MockSpeechSynthesisResult = Awaited<ReturnType<SpeechProviderPlugin["synthesize"]>>;
|
type MockSpeechSynthesisResult = Awaited<ReturnType<SpeechProviderPlugin["synthesize"]>>;
|
||||||
@@ -16,6 +21,9 @@ const synthesizeMock = vi.hoisted(() =>
|
|||||||
}),
|
}),
|
||||||
),
|
),
|
||||||
);
|
);
|
||||||
|
const prepareSynthesisMock = vi.hoisted(() =>
|
||||||
|
vi.fn(async (_ctx: SpeechProviderPrepareSynthesisContext) => undefined),
|
||||||
|
);
|
||||||
|
|
||||||
const listSpeechProvidersMock = vi.hoisted(() => vi.fn());
|
const listSpeechProvidersMock = vi.hoisted(() => vi.fn());
|
||||||
const getSpeechProviderMock = vi.hoisted(() => vi.fn());
|
const getSpeechProviderMock = vi.hoisted(() => vi.fn());
|
||||||
@@ -31,6 +39,7 @@ vi.mock("../api.js", async () => {
|
|||||||
label: "Mock",
|
label: "Mock",
|
||||||
autoSelectOrder: 1,
|
autoSelectOrder: 1,
|
||||||
isConfigured: () => true,
|
isConfigured: () => true,
|
||||||
|
prepareSynthesis: prepareSynthesisMock,
|
||||||
synthesize: synthesizeMock,
|
synthesize: synthesizeMock,
|
||||||
};
|
};
|
||||||
listSpeechProvidersMock.mockImplementation(() => [mockProvider]);
|
listSpeechProvidersMock.mockImplementation(() => [mockProvider]);
|
||||||
@@ -49,10 +58,40 @@ vi.mock("../api.js", async () => {
|
|||||||
};
|
};
|
||||||
});
|
});
|
||||||
|
|
||||||
const { _test, maybeApplyTtsToPayload, resolveTtsConfig } = await import("./tts.js");
|
const {
|
||||||
|
_test,
|
||||||
|
getTtsPersona,
|
||||||
|
getTtsProvider,
|
||||||
|
maybeApplyTtsToPayload,
|
||||||
|
resolveTtsConfig,
|
||||||
|
synthesizeSpeech,
|
||||||
|
textToSpeechTelephony,
|
||||||
|
} = await import("./tts.js");
|
||||||
|
|
||||||
const nativeVoiceNoteChannels = ["discord", "feishu", "matrix", "telegram", "whatsapp"] as const;
|
const nativeVoiceNoteChannels = ["discord", "feishu", "matrix", "telegram", "whatsapp"] as const;
|
||||||
|
|
||||||
|
function createMockSpeechProvider(
|
||||||
|
id = "mock",
|
||||||
|
options: Partial<SpeechProviderPlugin> = {},
|
||||||
|
): SpeechProviderPlugin {
|
||||||
|
return {
|
||||||
|
id,
|
||||||
|
label: id,
|
||||||
|
autoSelectOrder: id === "mock" ? 1 : 2,
|
||||||
|
isConfigured: () => true,
|
||||||
|
prepareSynthesis: prepareSynthesisMock,
|
||||||
|
synthesize: synthesizeMock,
|
||||||
|
...options,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function installSpeechProviders(providers: SpeechProviderPlugin[]): void {
|
||||||
|
listSpeechProvidersMock.mockImplementation(() => providers);
|
||||||
|
getSpeechProviderMock.mockImplementation(
|
||||||
|
(providerId: string) => providers.find((provider) => provider.id === providerId) ?? null,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
function createTtsConfig(prefsName: string): OpenClawConfig {
|
function createTtsConfig(prefsName: string): OpenClawConfig {
|
||||||
return {
|
return {
|
||||||
messages: {
|
messages: {
|
||||||
@@ -102,6 +141,8 @@ async function expectTtsPayloadResult(params: {
|
|||||||
describe("speech-core native voice-note routing", () => {
|
describe("speech-core native voice-note routing", () => {
|
||||||
afterEach(() => {
|
afterEach(() => {
|
||||||
synthesizeMock.mockClear();
|
synthesizeMock.mockClear();
|
||||||
|
prepareSynthesisMock.mockClear();
|
||||||
|
installSpeechProviders([createMockSpeechProvider()]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("keeps native voice-note channel support centralized", () => {
|
it("keeps native voice-note channel support centralized", () => {
|
||||||
@@ -153,6 +194,268 @@ describe("speech-core native voice-note routing", () => {
|
|||||||
audioAsVoice: undefined,
|
audioAsVoice: undefined,
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("selects persona preferred provider before config fallback", () => {
|
||||||
|
const cfg: OpenClawConfig = {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "other",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
label: "Alfred",
|
||||||
|
provider: "mock",
|
||||||
|
providers: {
|
||||||
|
mock: {
|
||||||
|
voice: "Algieba",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
const config = resolveTtsConfig(cfg);
|
||||||
|
const prefsPath = "/tmp/openclaw-speech-core-persona-provider.json";
|
||||||
|
|
||||||
|
expect(getTtsPersona(config, prefsPath)?.id).toBe("alfred");
|
||||||
|
expect(getTtsProvider(config, prefsPath)).toBe("mock");
|
||||||
|
});
|
||||||
|
|
||||||
|
it("merges active persona provider binding into synthesis config", async () => {
|
||||||
|
const cfg: OpenClawConfig = {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "mock",
|
||||||
|
prefsPath: "/tmp/openclaw-speech-core-persona-merge.json",
|
||||||
|
providers: {
|
||||||
|
mock: {
|
||||||
|
model: "base-model",
|
||||||
|
voice: "base-voice",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
provider: "mock",
|
||||||
|
providers: {
|
||||||
|
mock: {
|
||||||
|
voice: "persona-voice",
|
||||||
|
style: "dry",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
const payload: ReplyPayload = {
|
||||||
|
text: "This reply should use persona-specific provider configuration.",
|
||||||
|
};
|
||||||
|
|
||||||
|
let mediaDir: string | undefined;
|
||||||
|
try {
|
||||||
|
const result = await maybeApplyTtsToPayload({
|
||||||
|
payload,
|
||||||
|
cfg,
|
||||||
|
channel: "slack",
|
||||||
|
kind: "final",
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(synthesizeMock).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({
|
||||||
|
providerConfig: expect.objectContaining({
|
||||||
|
model: "base-model",
|
||||||
|
voice: "persona-voice",
|
||||||
|
style: "dry",
|
||||||
|
}),
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
expect(result.mediaUrl).toMatch(/voice-\d+\.ogg$/);
|
||||||
|
|
||||||
|
mediaDir = result.mediaUrl ? path.dirname(result.mediaUrl) : undefined;
|
||||||
|
} finally {
|
||||||
|
if (mediaDir) {
|
||||||
|
rmSync(mediaDir, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it("does not mark skipped unregistered providers as missing persona bindings", async () => {
|
||||||
|
const result = await synthesizeSpeech({
|
||||||
|
text: "Use fallback provider.",
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "missing",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
providers: {
|
||||||
|
missing: {
|
||||||
|
voice: "configured-but-unregistered",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
expect(result.attempts?.[0]).toMatchObject({
|
||||||
|
provider: "missing",
|
||||||
|
outcome: "skipped",
|
||||||
|
reasonCode: "no_provider_registered",
|
||||||
|
persona: "alfred",
|
||||||
|
});
|
||||||
|
expect(result.attempts?.[0]).not.toHaveProperty("personaBinding");
|
||||||
|
});
|
||||||
|
|
||||||
|
it("does not mark skipped telephony providers as missing persona bindings", async () => {
|
||||||
|
const result = await textToSpeechTelephony({
|
||||||
|
text: "Use telephony provider.",
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "mock",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
providers: {
|
||||||
|
mock: {
|
||||||
|
voice: "persona-voice",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.success).toBe(false);
|
||||||
|
expect(result.attempts?.[0]).toMatchObject({
|
||||||
|
provider: "mock",
|
||||||
|
outcome: "skipped",
|
||||||
|
reasonCode: "unsupported_for_telephony",
|
||||||
|
persona: "alfred",
|
||||||
|
});
|
||||||
|
expect(result.attempts?.[0]).not.toHaveProperty("personaBinding");
|
||||||
|
});
|
||||||
|
|
||||||
|
it("uses provider defaults when fallback policy allows missing persona bindings", async () => {
|
||||||
|
await synthesizeSpeech({
|
||||||
|
text: "Use neutral provider defaults.",
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "mock",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
fallbackPolicy: "provider-defaults",
|
||||||
|
prompt: {
|
||||||
|
profile: "A precise butler.",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(prepareSynthesisMock).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({
|
||||||
|
persona: undefined,
|
||||||
|
personaProviderConfig: undefined,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("preserves persona prompts by default when provider bindings are missing", async () => {
|
||||||
|
await synthesizeSpeech({
|
||||||
|
text: "Use persona prompt.",
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "mock",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
prompt: {
|
||||||
|
profile: "A precise butler.",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(prepareSynthesisMock).toHaveBeenCalledWith(
|
||||||
|
expect.objectContaining({
|
||||||
|
persona: expect.objectContaining({ id: "alfred" }),
|
||||||
|
personaProviderConfig: undefined,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("skips unbound providers under fail policy while allowing bound fallbacks", async () => {
|
||||||
|
installSpeechProviders([
|
||||||
|
createMockSpeechProvider("mock", { autoSelectOrder: 1 }),
|
||||||
|
createMockSpeechProvider("fallback", { autoSelectOrder: 2 }),
|
||||||
|
]);
|
||||||
|
|
||||||
|
const result = await synthesizeSpeech({
|
||||||
|
text: "Use the first persona-bound provider.",
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
enabled: true,
|
||||||
|
provider: "mock",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
fallbackPolicy: "fail",
|
||||||
|
providers: {
|
||||||
|
fallback: {
|
||||||
|
voice: "fallback-voice",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
expect(result.success).toBe(true);
|
||||||
|
expect(result.provider).toBe("fallback");
|
||||||
|
expect(result.fallbackFrom).toBe("mock");
|
||||||
|
expect(result.attempts?.[0]).toMatchObject({
|
||||||
|
provider: "mock",
|
||||||
|
outcome: "skipped",
|
||||||
|
reasonCode: "not_configured",
|
||||||
|
persona: "alfred",
|
||||||
|
personaBinding: "missing",
|
||||||
|
error: "mock: persona alfred has no provider binding",
|
||||||
|
});
|
||||||
|
expect(result.attempts?.[1]).toMatchObject({
|
||||||
|
provider: "fallback",
|
||||||
|
outcome: "success",
|
||||||
|
persona: "alfred",
|
||||||
|
personaBinding: "applied",
|
||||||
|
});
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
describe("speech-core per-agent TTS config", () => {
|
describe("speech-core per-agent TTS config", () => {
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ import path from "node:path";
|
|||||||
import { normalizeChannelId, type ChannelId } from "openclaw/plugin-sdk/channel-targets";
|
import { normalizeChannelId, type ChannelId } from "openclaw/plugin-sdk/channel-targets";
|
||||||
import type {
|
import type {
|
||||||
OpenClawConfig,
|
OpenClawConfig,
|
||||||
|
ResolvedTtsPersona,
|
||||||
TtsAutoMode,
|
TtsAutoMode,
|
||||||
TtsConfig,
|
TtsConfig,
|
||||||
TtsModelOverrideConfig,
|
TtsModelOverrideConfig,
|
||||||
@@ -40,6 +41,7 @@ import {
|
|||||||
normalizeSpeechProviderId,
|
normalizeSpeechProviderId,
|
||||||
normalizeTtsAutoMode,
|
normalizeTtsAutoMode,
|
||||||
parseTtsDirectives,
|
parseTtsDirectives,
|
||||||
|
resolveEffectiveTtsConfig,
|
||||||
type ResolvedTtsConfig,
|
type ResolvedTtsConfig,
|
||||||
type ResolvedTtsModelOverrides,
|
type ResolvedTtsModelOverrides,
|
||||||
scheduleCleanup,
|
scheduleCleanup,
|
||||||
@@ -62,13 +64,13 @@ const DEFAULT_TIMEOUT_MS = 30_000;
|
|||||||
const DEFAULT_TTS_MAX_LENGTH = 1500;
|
const DEFAULT_TTS_MAX_LENGTH = 1500;
|
||||||
const DEFAULT_TTS_SUMMARIZE = true;
|
const DEFAULT_TTS_SUMMARIZE = true;
|
||||||
const DEFAULT_MAX_TEXT_LENGTH = 4096;
|
const DEFAULT_MAX_TEXT_LENGTH = 4096;
|
||||||
const BLOCKED_MERGE_KEYS = new Set(["__proto__", "prototype", "constructor"]);
|
|
||||||
|
|
||||||
type TtsUserPrefs = {
|
type TtsUserPrefs = {
|
||||||
tts?: {
|
tts?: {
|
||||||
auto?: TtsAutoMode;
|
auto?: TtsAutoMode;
|
||||||
enabled?: boolean;
|
enabled?: boolean;
|
||||||
provider?: TtsProvider;
|
provider?: TtsProvider;
|
||||||
|
persona?: string | null;
|
||||||
maxLength?: number;
|
maxLength?: number;
|
||||||
summarize?: boolean;
|
summarize?: boolean;
|
||||||
};
|
};
|
||||||
@@ -86,6 +88,8 @@ export type TtsProviderAttempt = {
|
|||||||
provider: string;
|
provider: string;
|
||||||
outcome: "success" | "skipped" | "failed";
|
outcome: "success" | "skipped" | "failed";
|
||||||
reasonCode: TtsAttemptReasonCode;
|
reasonCode: TtsAttemptReasonCode;
|
||||||
|
persona?: string;
|
||||||
|
personaBinding?: "applied" | "missing" | "none";
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
error?: string;
|
error?: string;
|
||||||
};
|
};
|
||||||
@@ -96,6 +100,7 @@ export type TtsResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -111,6 +116,7 @@ export type TtsSynthesisResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -126,6 +132,7 @@ export type TtsTelephonyResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -139,6 +146,7 @@ type TtsStatusEntry = {
|
|||||||
textLength: number;
|
textLength: number;
|
||||||
summarized: boolean;
|
summarized: boolean;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -162,6 +170,10 @@ function normalizeConfiguredSpeechProviderId(
|
|||||||
return normalized === "edge" ? "microsoft" : normalized;
|
return normalized === "edge" ? "microsoft" : normalized;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function normalizeTtsPersonaId(personaId: string | null | undefined): string | undefined {
|
||||||
|
return normalizeOptionalLowercaseString(personaId ?? undefined);
|
||||||
|
}
|
||||||
|
|
||||||
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
|
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
|
||||||
if (prefsPath?.trim()) {
|
if (prefsPath?.trim()) {
|
||||||
return resolveUserPath(prefsPath.trim());
|
return resolveUserPath(prefsPath.trim());
|
||||||
@@ -229,6 +241,87 @@ function asProviderConfigMap(value: unknown): Record<string, unknown> {
|
|||||||
: {};
|
: {};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function hasOwnProperty(value: object, key: string): boolean {
|
||||||
|
return Object.prototype.hasOwnProperty.call(value, key);
|
||||||
|
}
|
||||||
|
|
||||||
|
function normalizeProviderConfigMap(
|
||||||
|
value: unknown,
|
||||||
|
): Record<string, SpeechProviderConfig> | undefined {
|
||||||
|
const rawMap = asProviderConfigMap(value);
|
||||||
|
if (Object.keys(rawMap).length === 0) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
const next: Record<string, SpeechProviderConfig> = {};
|
||||||
|
for (const [providerId, providerConfig] of Object.entries(rawMap)) {
|
||||||
|
const normalized = normalizeConfiguredSpeechProviderId(providerId) ?? providerId;
|
||||||
|
next[normalized] = asProviderConfig(providerConfig);
|
||||||
|
}
|
||||||
|
return next;
|
||||||
|
}
|
||||||
|
|
||||||
|
function collectTtsPersonas(raw: TtsConfig): Record<string, ResolvedTtsPersona> {
|
||||||
|
const rawPersonas = asProviderConfigMap(raw.personas);
|
||||||
|
const personas: Record<string, ResolvedTtsPersona> = {};
|
||||||
|
for (const [id, value] of Object.entries(rawPersonas)) {
|
||||||
|
const normalizedId = normalizeTtsPersonaId(id);
|
||||||
|
if (!normalizedId || typeof value !== "object" || value === null || Array.isArray(value)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const persona = value as Omit<ResolvedTtsPersona, "id">;
|
||||||
|
personas[normalizedId] = {
|
||||||
|
...persona,
|
||||||
|
id: normalizedId,
|
||||||
|
provider: normalizeConfiguredSpeechProviderId(persona.provider) ?? persona.provider,
|
||||||
|
providers: normalizeProviderConfigMap(persona.providers),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return personas;
|
||||||
|
}
|
||||||
|
|
||||||
|
function resolvePersonaProviderConfig(
|
||||||
|
persona: ResolvedTtsPersona | undefined,
|
||||||
|
providerId: string,
|
||||||
|
): SpeechProviderConfig | undefined {
|
||||||
|
if (!persona?.providers) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
const normalized = normalizeConfiguredSpeechProviderId(providerId) ?? providerId;
|
||||||
|
if (hasOwnProperty(persona.providers, normalized)) {
|
||||||
|
return persona.providers[normalized];
|
||||||
|
}
|
||||||
|
if (hasOwnProperty(persona.providers, providerId)) {
|
||||||
|
return persona.providers[providerId];
|
||||||
|
}
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
function mergeProviderConfigWithPersona(params: {
|
||||||
|
providerConfig: SpeechProviderConfig;
|
||||||
|
persona?: ResolvedTtsPersona;
|
||||||
|
providerId: string;
|
||||||
|
}): {
|
||||||
|
providerConfig: SpeechProviderConfig;
|
||||||
|
personaProviderConfig?: SpeechProviderConfig;
|
||||||
|
personaBinding: "applied" | "missing" | "none";
|
||||||
|
} {
|
||||||
|
if (!params.persona) {
|
||||||
|
return { providerConfig: params.providerConfig, personaBinding: "none" };
|
||||||
|
}
|
||||||
|
const personaProviderConfig = resolvePersonaProviderConfig(params.persona, params.providerId);
|
||||||
|
if (!personaProviderConfig) {
|
||||||
|
return { providerConfig: params.providerConfig, personaBinding: "missing" };
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
providerConfig: {
|
||||||
|
...params.providerConfig,
|
||||||
|
...personaProviderConfig,
|
||||||
|
},
|
||||||
|
personaProviderConfig,
|
||||||
|
personaBinding: "applied",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
function resolveRawProviderConfig(
|
function resolveRawProviderConfig(
|
||||||
raw: TtsConfig | undefined,
|
raw: TtsConfig | undefined,
|
||||||
providerId: string,
|
providerId: string,
|
||||||
@@ -241,48 +334,6 @@ function resolveRawProviderConfig(
|
|||||||
return asProviderConfig(direct);
|
return asProviderConfig(direct);
|
||||||
}
|
}
|
||||||
|
|
||||||
function isPlainObject(value: unknown): value is Record<string, unknown> {
|
|
||||||
return Boolean(value) && typeof value === "object" && !Array.isArray(value);
|
|
||||||
}
|
|
||||||
|
|
||||||
function deepMergeDefined(base: unknown, override: unknown): unknown {
|
|
||||||
if (!isPlainObject(base) || !isPlainObject(override)) {
|
|
||||||
return override === undefined ? base : override;
|
|
||||||
}
|
|
||||||
|
|
||||||
const result: Record<string, unknown> = { ...base };
|
|
||||||
for (const [key, value] of Object.entries(override)) {
|
|
||||||
if (BLOCKED_MERGE_KEYS.has(key) || value === undefined) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
const existing = result[key];
|
|
||||||
result[key] = key in result ? deepMergeDefined(existing, value) : value;
|
|
||||||
}
|
|
||||||
return result;
|
|
||||||
}
|
|
||||||
|
|
||||||
function normalizeAgentConfigId(value: string | undefined | null): string {
|
|
||||||
return normalizeLowercaseStringOrEmpty(value);
|
|
||||||
}
|
|
||||||
|
|
||||||
function resolveAgentTtsOverride(
|
|
||||||
cfg: OpenClawConfig,
|
|
||||||
agentId: string | undefined,
|
|
||||||
): TtsConfig | undefined {
|
|
||||||
if (!agentId || !Array.isArray(cfg.agents?.list)) {
|
|
||||||
return undefined;
|
|
||||||
}
|
|
||||||
const normalized = normalizeAgentConfigId(agentId);
|
|
||||||
const agent = cfg.agents.list.find((entry) => normalizeAgentConfigId(entry.id) === normalized);
|
|
||||||
return agent?.tts;
|
|
||||||
}
|
|
||||||
|
|
||||||
function resolveEffectiveTtsRawConfig(cfg: OpenClawConfig, agentId?: string): TtsConfig {
|
|
||||||
const base = cfg.messages?.tts ?? {};
|
|
||||||
const override = resolveAgentTtsOverride(cfg, agentId);
|
|
||||||
return deepMergeDefined(base, override ?? {}) as TtsConfig;
|
|
||||||
}
|
|
||||||
|
|
||||||
function resolveLazyProviderConfig(
|
function resolveLazyProviderConfig(
|
||||||
config: ResolvedTtsConfig,
|
config: ResolvedTtsConfig,
|
||||||
providerId: string,
|
providerId: string,
|
||||||
@@ -325,6 +376,8 @@ function collectDirectProviderConfigEntries(raw: TtsConfig): Record<string, Spee
|
|||||||
"maxTextLength",
|
"maxTextLength",
|
||||||
"mode",
|
"mode",
|
||||||
"modelOverrides",
|
"modelOverrides",
|
||||||
|
"persona",
|
||||||
|
"personas",
|
||||||
"prefsPath",
|
"prefsPath",
|
||||||
"provider",
|
"provider",
|
||||||
"providers",
|
"providers",
|
||||||
@@ -357,10 +410,11 @@ export function getResolvedSpeechProviderConfig(
|
|||||||
}
|
}
|
||||||
|
|
||||||
export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): ResolvedTtsConfig {
|
export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): ResolvedTtsConfig {
|
||||||
const raw: TtsConfig = resolveEffectiveTtsRawConfig(cfg, agentId);
|
const raw: TtsConfig = resolveEffectiveTtsConfig(cfg, agentId);
|
||||||
const providerSource = raw.provider ? "config" : "default";
|
const providerSource = raw.provider ? "config" : "default";
|
||||||
const timeoutMs = raw.timeoutMs ?? DEFAULT_TIMEOUT_MS;
|
const timeoutMs = raw.timeoutMs ?? DEFAULT_TIMEOUT_MS;
|
||||||
const auto = resolveConfiguredTtsAutoMode(raw);
|
const auto = resolveConfiguredTtsAutoMode(raw);
|
||||||
|
const persona = normalizeTtsPersonaId(raw.persona);
|
||||||
return {
|
return {
|
||||||
auto,
|
auto,
|
||||||
mode: raw.mode ?? "final",
|
mode: raw.mode ?? "final",
|
||||||
@@ -368,6 +422,8 @@ export function resolveTtsConfig(cfg: OpenClawConfig, agentId?: string): Resolve
|
|||||||
normalizeConfiguredSpeechProviderId(raw.provider) ??
|
normalizeConfiguredSpeechProviderId(raw.provider) ??
|
||||||
(providerSource === "config" ? (normalizeOptionalLowercaseString(raw.provider) ?? "") : ""),
|
(providerSource === "config" ? (normalizeOptionalLowercaseString(raw.provider) ?? "") : ""),
|
||||||
providerSource,
|
providerSource,
|
||||||
|
persona,
|
||||||
|
personas: collectTtsPersonas(raw),
|
||||||
summaryModel: normalizeOptionalString(raw.summaryModel),
|
summaryModel: normalizeOptionalString(raw.summaryModel),
|
||||||
modelOverrides: resolveModelOverridePolicy(raw.modelOverrides),
|
modelOverrides: resolveModelOverridePolicy(raw.modelOverrides),
|
||||||
providerConfigs: collectDirectProviderConfigEntries(raw),
|
providerConfigs: collectDirectProviderConfigEntries(raw),
|
||||||
@@ -418,7 +474,7 @@ function resolveEffectiveTtsAutoState(params: {
|
|||||||
autoMode: TtsAutoMode;
|
autoMode: TtsAutoMode;
|
||||||
prefsPath: string;
|
prefsPath: string;
|
||||||
} {
|
} {
|
||||||
const raw: TtsConfig = resolveEffectiveTtsRawConfig(params.cfg, params.agentId);
|
const raw: TtsConfig = resolveEffectiveTtsConfig(params.cfg, params.agentId);
|
||||||
const prefsPath = resolveTtsPrefsPathValue(raw.prefsPath);
|
const prefsPath = resolveTtsPrefsPathValue(raw.prefsPath);
|
||||||
const sessionAuto = normalizeTtsAutoMode(params.sessionAuto);
|
const sessionAuto = normalizeTtsAutoMode(params.sessionAuto);
|
||||||
if (sessionAuto) {
|
if (sessionAuto) {
|
||||||
@@ -443,6 +499,7 @@ export function buildTtsSystemPromptHint(
|
|||||||
return undefined;
|
return undefined;
|
||||||
}
|
}
|
||||||
const _config = resolveTtsConfig(cfg, agentId);
|
const _config = resolveTtsConfig(cfg, agentId);
|
||||||
|
const persona = getTtsPersona(_config, prefsPath);
|
||||||
const maxLength = getTtsMaxLength(prefsPath);
|
const maxLength = getTtsMaxLength(prefsPath);
|
||||||
const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off";
|
const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off";
|
||||||
const autoHint =
|
const autoHint =
|
||||||
@@ -454,6 +511,9 @@ export function buildTtsSystemPromptHint(
|
|||||||
return [
|
return [
|
||||||
"Voice (TTS) is enabled.",
|
"Voice (TTS) is enabled.",
|
||||||
autoHint,
|
autoHint,
|
||||||
|
persona
|
||||||
|
? `Active TTS persona: ${persona.label ?? persona.id}${persona.description ? ` - ${persona.description}` : ""}.`
|
||||||
|
: undefined,
|
||||||
`Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`,
|
`Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`,
|
||||||
"Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.",
|
"Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.",
|
||||||
]
|
]
|
||||||
@@ -523,6 +583,13 @@ export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): Tt
|
|||||||
if (prefsProvider) {
|
if (prefsProvider) {
|
||||||
return prefsProvider;
|
return prefsProvider;
|
||||||
}
|
}
|
||||||
|
const activePersona = resolveTtsPersonaFromPrefs(config, prefs);
|
||||||
|
const personaProvider =
|
||||||
|
canonicalizeSpeechProviderId(activePersona?.provider, config.sourceConfig) ??
|
||||||
|
normalizeConfiguredSpeechProviderId(activePersona?.provider);
|
||||||
|
if (personaProvider && getSpeechProvider(personaProvider, config.sourceConfig)) {
|
||||||
|
return personaProvider;
|
||||||
|
}
|
||||||
if (config.providerSource === "config") {
|
if (config.providerSource === "config") {
|
||||||
return normalizeConfiguredSpeechProviderId(config.provider) ?? config.provider;
|
return normalizeConfiguredSpeechProviderId(config.provider) ?? config.provider;
|
||||||
}
|
}
|
||||||
@@ -542,6 +609,38 @@ export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): Tt
|
|||||||
return config.provider;
|
return config.provider;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function resolveTtsPersonaFromPrefs(
|
||||||
|
config: ResolvedTtsConfig,
|
||||||
|
prefs: TtsUserPrefs,
|
||||||
|
): ResolvedTtsPersona | undefined {
|
||||||
|
if (prefs.tts && hasOwnProperty(prefs.tts, "persona")) {
|
||||||
|
const prefsPersona = normalizeTtsPersonaId(prefs.tts.persona);
|
||||||
|
return prefsPersona ? config.personas[prefsPersona] : undefined;
|
||||||
|
}
|
||||||
|
const configPersona = normalizeTtsPersonaId(config.persona);
|
||||||
|
return configPersona ? config.personas[configPersona] : undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getTtsPersona(
|
||||||
|
config: ResolvedTtsConfig,
|
||||||
|
prefsPath: string,
|
||||||
|
): ResolvedTtsPersona | undefined {
|
||||||
|
return resolveTtsPersonaFromPrefs(config, readPrefs(prefsPath));
|
||||||
|
}
|
||||||
|
|
||||||
|
export function listTtsPersonas(config: ResolvedTtsConfig): ResolvedTtsPersona[] {
|
||||||
|
return Object.values(config.personas).toSorted((left, right) => left.id.localeCompare(right.id));
|
||||||
|
}
|
||||||
|
|
||||||
|
export function setTtsPersona(prefsPath: string, persona: string | null | undefined): void {
|
||||||
|
updatePrefs(prefsPath, (prefs) => {
|
||||||
|
const next = { ...prefs.tts };
|
||||||
|
const normalized = normalizeTtsPersonaId(persona);
|
||||||
|
next.persona = normalized ?? null;
|
||||||
|
prefs.tts = next;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
export function setTtsProvider(prefsPath: string, provider: TtsProvider): void {
|
export function setTtsProvider(prefsPath: string, provider: TtsProvider): void {
|
||||||
updatePrefs(prefsPath, (prefs) => {
|
updatePrefs(prefsPath, (prefs) => {
|
||||||
prefs.tts = { ...prefs.tts, provider: canonicalizeSpeechProviderId(provider) ?? provider };
|
prefs.tts = { ...prefs.tts, provider: canonicalizeSpeechProviderId(provider) ?? provider };
|
||||||
@@ -714,17 +813,20 @@ function buildTtsFailureResult(
|
|||||||
errors: string[],
|
errors: string[],
|
||||||
attemptedProviders?: string[],
|
attemptedProviders?: string[],
|
||||||
attempts?: TtsProviderAttempt[],
|
attempts?: TtsProviderAttempt[],
|
||||||
|
persona?: string,
|
||||||
): {
|
): {
|
||||||
success: false;
|
success: false;
|
||||||
error: string;
|
error: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
|
persona?: string;
|
||||||
} {
|
} {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: `TTS conversion failed: ${errors.join("; ") || "no providers available"}`,
|
error: `TTS conversion failed: ${errors.join("; ") || "no providers available"}`,
|
||||||
attemptedProviders,
|
attemptedProviders,
|
||||||
attempts,
|
attempts,
|
||||||
|
persona,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -733,17 +835,22 @@ type TtsProviderReadyResolution =
|
|||||||
kind: "ready";
|
kind: "ready";
|
||||||
provider: NonNullable<ReturnType<typeof getSpeechProvider>>;
|
provider: NonNullable<ReturnType<typeof getSpeechProvider>>;
|
||||||
providerConfig: SpeechProviderConfig;
|
providerConfig: SpeechProviderConfig;
|
||||||
|
personaProviderConfig?: SpeechProviderConfig;
|
||||||
|
synthesisPersona?: ResolvedTtsPersona;
|
||||||
|
personaBinding: "applied" | "missing" | "none";
|
||||||
}
|
}
|
||||||
| {
|
| {
|
||||||
kind: "skip";
|
kind: "skip";
|
||||||
reasonCode: "no_provider_registered" | "not_configured" | "unsupported_for_telephony";
|
reasonCode: "no_provider_registered" | "not_configured" | "unsupported_for_telephony";
|
||||||
message: string;
|
message: string;
|
||||||
|
personaBinding?: "missing";
|
||||||
};
|
};
|
||||||
|
|
||||||
function resolveReadySpeechProvider(params: {
|
function resolveReadySpeechProvider(params: {
|
||||||
provider: TtsProvider;
|
provider: TtsProvider;
|
||||||
cfg: OpenClawConfig;
|
cfg: OpenClawConfig;
|
||||||
config: ResolvedTtsConfig;
|
config: ResolvedTtsConfig;
|
||||||
|
persona?: ResolvedTtsPersona;
|
||||||
requireTelephony?: boolean;
|
requireTelephony?: boolean;
|
||||||
}): TtsProviderReadyResolution {
|
}): TtsProviderReadyResolution {
|
||||||
const resolvedProvider = getSpeechProvider(params.provider, params.cfg);
|
const resolvedProvider = getSpeechProvider(params.provider, params.cfg);
|
||||||
@@ -759,10 +866,23 @@ function resolveReadySpeechProvider(params: {
|
|||||||
resolvedProvider.id,
|
resolvedProvider.id,
|
||||||
params.cfg,
|
params.cfg,
|
||||||
);
|
);
|
||||||
|
const merged = mergeProviderConfigWithPersona({
|
||||||
|
providerConfig,
|
||||||
|
persona: params.persona,
|
||||||
|
providerId: resolvedProvider.id,
|
||||||
|
});
|
||||||
|
if (params.persona?.fallbackPolicy === "fail" && merged.personaBinding === "missing") {
|
||||||
|
return {
|
||||||
|
kind: "skip",
|
||||||
|
reasonCode: "not_configured",
|
||||||
|
message: `${params.provider}: persona ${params.persona.id} has no provider binding`,
|
||||||
|
personaBinding: "missing",
|
||||||
|
};
|
||||||
|
}
|
||||||
if (
|
if (
|
||||||
!resolvedProvider.isConfigured({
|
!resolvedProvider.isConfigured({
|
||||||
cfg: params.cfg,
|
cfg: params.cfg,
|
||||||
providerConfig,
|
providerConfig: merged.providerConfig,
|
||||||
timeoutMs: params.config.timeoutMs,
|
timeoutMs: params.config.timeoutMs,
|
||||||
})
|
})
|
||||||
) {
|
) {
|
||||||
@@ -782,7 +902,56 @@ function resolveReadySpeechProvider(params: {
|
|||||||
return {
|
return {
|
||||||
kind: "ready",
|
kind: "ready",
|
||||||
provider: resolvedProvider,
|
provider: resolvedProvider,
|
||||||
providerConfig,
|
providerConfig: merged.providerConfig,
|
||||||
|
personaProviderConfig: merged.personaProviderConfig,
|
||||||
|
synthesisPersona:
|
||||||
|
params.persona?.fallbackPolicy === "provider-defaults" && merged.personaBinding === "missing"
|
||||||
|
? undefined
|
||||||
|
: params.persona,
|
||||||
|
personaBinding: merged.personaBinding,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function prepareSpeechSynthesis(params: {
|
||||||
|
provider: NonNullable<ReturnType<typeof getSpeechProvider>>;
|
||||||
|
text: string;
|
||||||
|
cfg: OpenClawConfig;
|
||||||
|
providerConfig: SpeechProviderConfig;
|
||||||
|
providerOverrides?: SpeechProviderOverrides;
|
||||||
|
persona?: ResolvedTtsPersona;
|
||||||
|
personaProviderConfig?: SpeechProviderConfig;
|
||||||
|
target: "audio-file" | "voice-note" | "telephony";
|
||||||
|
timeoutMs: number;
|
||||||
|
}): Promise<{
|
||||||
|
text: string;
|
||||||
|
providerConfig: SpeechProviderConfig;
|
||||||
|
providerOverrides?: SpeechProviderOverrides;
|
||||||
|
}> {
|
||||||
|
if (!params.provider.prepareSynthesis) {
|
||||||
|
return {
|
||||||
|
text: params.text,
|
||||||
|
providerConfig: params.providerConfig,
|
||||||
|
providerOverrides: params.providerOverrides,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
const prepared = await params.provider.prepareSynthesis({
|
||||||
|
text: params.text,
|
||||||
|
cfg: params.cfg,
|
||||||
|
providerConfig: params.providerConfig,
|
||||||
|
providerOverrides: params.providerOverrides,
|
||||||
|
persona: params.persona,
|
||||||
|
personaProviderConfig: params.personaProviderConfig,
|
||||||
|
target: params.target,
|
||||||
|
timeoutMs: params.timeoutMs,
|
||||||
|
});
|
||||||
|
return {
|
||||||
|
text: prepared?.text ?? params.text,
|
||||||
|
providerConfig: prepared?.providerConfig
|
||||||
|
? { ...params.providerConfig, ...prepared.providerConfig }
|
||||||
|
: params.providerConfig,
|
||||||
|
providerOverrides: prepared?.providerOverrides
|
||||||
|
? { ...params.providerOverrides, ...prepared.providerOverrides }
|
||||||
|
: params.providerOverrides,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -796,6 +965,7 @@ function resolveTtsRequestSetup(params: {
|
|||||||
}):
|
}):
|
||||||
| {
|
| {
|
||||||
config: ResolvedTtsConfig;
|
config: ResolvedTtsConfig;
|
||||||
|
persona?: ResolvedTtsPersona;
|
||||||
providers: TtsProvider[];
|
providers: TtsProvider[];
|
||||||
}
|
}
|
||||||
| {
|
| {
|
||||||
@@ -814,6 +984,7 @@ function resolveTtsRequestSetup(params: {
|
|||||||
canonicalizeSpeechProviderId(params.providerOverride, params.cfg) ?? userProvider;
|
canonicalizeSpeechProviderId(params.providerOverride, params.cfg) ?? userProvider;
|
||||||
return {
|
return {
|
||||||
config,
|
config,
|
||||||
|
persona: getTtsPersona(config, prefsPath),
|
||||||
providers: params.disableFallback ? [provider] : resolveTtsProviderOrder(provider, params.cfg),
|
providers: params.disableFallback ? [provider] : resolveTtsProviderOrder(provider, params.cfg),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
@@ -833,6 +1004,7 @@ export async function textToSpeech(params: {
|
|||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: synthesis.error ?? "TTS conversion failed",
|
error: synthesis.error ?? "TTS conversion failed",
|
||||||
|
persona: synthesis.persona,
|
||||||
attemptedProviders: synthesis.attemptedProviders,
|
attemptedProviders: synthesis.attemptedProviders,
|
||||||
attempts: synthesis.attempts,
|
attempts: synthesis.attempts,
|
||||||
};
|
};
|
||||||
@@ -850,6 +1022,7 @@ export async function textToSpeech(params: {
|
|||||||
audioPath,
|
audioPath,
|
||||||
latencyMs: synthesis.latencyMs,
|
latencyMs: synthesis.latencyMs,
|
||||||
provider: synthesis.provider,
|
provider: synthesis.provider,
|
||||||
|
persona: synthesis.persona,
|
||||||
fallbackFrom: synthesis.fallbackFrom,
|
fallbackFrom: synthesis.fallbackFrom,
|
||||||
attemptedProviders: synthesis.attemptedProviders,
|
attemptedProviders: synthesis.attemptedProviders,
|
||||||
attempts: synthesis.attempts,
|
attempts: synthesis.attempts,
|
||||||
@@ -886,7 +1059,7 @@ export async function synthesizeSpeech(params: {
|
|||||||
return { success: false, error: setup.error };
|
return { success: false, error: setup.error };
|
||||||
}
|
}
|
||||||
|
|
||||||
const { config, providers } = setup;
|
const { config, persona, providers } = setup;
|
||||||
const timeoutMs = params.timeoutMs ?? config.timeoutMs;
|
const timeoutMs = params.timeoutMs ?? config.timeoutMs;
|
||||||
const target = supportsNativeVoiceNoteTts(params.channel) ? "voice-note" : "audio-file";
|
const target = supportsNativeVoiceNoteTts(params.channel) ? "voice-note" : "audio-file";
|
||||||
|
|
||||||
@@ -906,6 +1079,7 @@ export async function synthesizeSpeech(params: {
|
|||||||
provider,
|
provider,
|
||||||
cfg: params.cfg,
|
cfg: params.cfg,
|
||||||
config,
|
config,
|
||||||
|
persona,
|
||||||
});
|
});
|
||||||
if (resolvedProvider.kind === "skip") {
|
if (resolvedProvider.kind === "skip") {
|
||||||
errors.push(resolvedProvider.message);
|
errors.push(resolvedProvider.message);
|
||||||
@@ -913,17 +1087,32 @@ export async function synthesizeSpeech(params: {
|
|||||||
provider,
|
provider,
|
||||||
outcome: "skipped",
|
outcome: "skipped",
|
||||||
reasonCode: resolvedProvider.reasonCode,
|
reasonCode: resolvedProvider.reasonCode,
|
||||||
|
persona: persona?.id,
|
||||||
|
...(resolvedProvider.personaBinding
|
||||||
|
? { personaBinding: resolvedProvider.personaBinding }
|
||||||
|
: {}),
|
||||||
error: resolvedProvider.message,
|
error: resolvedProvider.message,
|
||||||
});
|
});
|
||||||
logVerbose(`TTS: provider ${provider} skipped (${resolvedProvider.message})`);
|
logVerbose(`TTS: provider ${provider} skipped (${resolvedProvider.message})`);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
const synthesis = await resolvedProvider.provider.synthesize({
|
const prepared = await prepareSpeechSynthesis({
|
||||||
|
provider: resolvedProvider.provider,
|
||||||
text: params.text,
|
text: params.text,
|
||||||
cfg: params.cfg,
|
cfg: params.cfg,
|
||||||
providerConfig: resolvedProvider.providerConfig,
|
providerConfig: resolvedProvider.providerConfig,
|
||||||
target,
|
|
||||||
providerOverrides: params.overrides?.providerOverrides?.[resolvedProvider.provider.id],
|
providerOverrides: params.overrides?.providerOverrides?.[resolvedProvider.provider.id],
|
||||||
|
persona: resolvedProvider.synthesisPersona,
|
||||||
|
personaProviderConfig: resolvedProvider.personaProviderConfig,
|
||||||
|
target,
|
||||||
|
timeoutMs,
|
||||||
|
});
|
||||||
|
const synthesis = await resolvedProvider.provider.synthesize({
|
||||||
|
text: prepared.text,
|
||||||
|
cfg: params.cfg,
|
||||||
|
providerConfig: prepared.providerConfig,
|
||||||
|
target,
|
||||||
|
providerOverrides: prepared.providerOverrides,
|
||||||
timeoutMs,
|
timeoutMs,
|
||||||
});
|
});
|
||||||
const latencyMs = Date.now() - providerStart;
|
const latencyMs = Date.now() - providerStart;
|
||||||
@@ -931,6 +1120,8 @@ export async function synthesizeSpeech(params: {
|
|||||||
provider,
|
provider,
|
||||||
outcome: "success",
|
outcome: "success",
|
||||||
reasonCode: "success",
|
reasonCode: "success",
|
||||||
|
persona: persona?.id,
|
||||||
|
personaBinding: resolvedProvider.personaBinding,
|
||||||
latencyMs,
|
latencyMs,
|
||||||
});
|
});
|
||||||
return {
|
return {
|
||||||
@@ -938,6 +1129,7 @@ export async function synthesizeSpeech(params: {
|
|||||||
audioBuffer: synthesis.audioBuffer,
|
audioBuffer: synthesis.audioBuffer,
|
||||||
latencyMs,
|
latencyMs,
|
||||||
provider,
|
provider,
|
||||||
|
persona: persona?.id,
|
||||||
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
|
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
|
||||||
attemptedProviders,
|
attemptedProviders,
|
||||||
attempts,
|
attempts,
|
||||||
@@ -956,6 +1148,13 @@ export async function synthesizeSpeech(params: {
|
|||||||
reasonCode:
|
reasonCode:
|
||||||
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
|
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
|
||||||
latencyMs,
|
latencyMs,
|
||||||
|
persona: persona?.id,
|
||||||
|
personaBinding:
|
||||||
|
resolvePersonaProviderConfig(persona, provider) != null
|
||||||
|
? "applied"
|
||||||
|
: persona
|
||||||
|
? "missing"
|
||||||
|
: "none",
|
||||||
error: errorMsg,
|
error: errorMsg,
|
||||||
});
|
});
|
||||||
const rawError = sanitizeTtsErrorForLog(err);
|
const rawError = sanitizeTtsErrorForLog(err);
|
||||||
@@ -970,7 +1169,7 @@ export async function synthesizeSpeech(params: {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return buildTtsFailureResult(errors, attemptedProviders, attempts);
|
return buildTtsFailureResult(errors, attemptedProviders, attempts, persona?.id);
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function textToSpeechTelephony(params: {
|
export async function textToSpeechTelephony(params: {
|
||||||
@@ -987,7 +1186,7 @@ export async function textToSpeechTelephony(params: {
|
|||||||
return { success: false, error: setup.error };
|
return { success: false, error: setup.error };
|
||||||
}
|
}
|
||||||
|
|
||||||
const { config, providers } = setup;
|
const { config, persona, providers } = setup;
|
||||||
const errors: string[] = [];
|
const errors: string[] = [];
|
||||||
const attemptedProviders: string[] = [];
|
const attemptedProviders: string[] = [];
|
||||||
const attempts: TtsProviderAttempt[] = [];
|
const attempts: TtsProviderAttempt[] = [];
|
||||||
@@ -1004,6 +1203,7 @@ export async function textToSpeechTelephony(params: {
|
|||||||
provider,
|
provider,
|
||||||
cfg: params.cfg,
|
cfg: params.cfg,
|
||||||
config,
|
config,
|
||||||
|
persona,
|
||||||
requireTelephony: true,
|
requireTelephony: true,
|
||||||
});
|
});
|
||||||
if (resolvedProvider.kind === "skip") {
|
if (resolvedProvider.kind === "skip") {
|
||||||
@@ -1012,28 +1212,32 @@ export async function textToSpeechTelephony(params: {
|
|||||||
provider,
|
provider,
|
||||||
outcome: "skipped",
|
outcome: "skipped",
|
||||||
reasonCode: resolvedProvider.reasonCode,
|
reasonCode: resolvedProvider.reasonCode,
|
||||||
|
persona: persona?.id,
|
||||||
|
...(resolvedProvider.personaBinding
|
||||||
|
? { personaBinding: resolvedProvider.personaBinding }
|
||||||
|
: {}),
|
||||||
error: resolvedProvider.message,
|
error: resolvedProvider.message,
|
||||||
});
|
});
|
||||||
logVerbose(`TTS telephony: provider ${provider} skipped (${resolvedProvider.message})`);
|
logVerbose(`TTS telephony: provider ${provider} skipped (${resolvedProvider.message})`);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
const synthesizeTelephony = resolvedProvider.provider.synthesizeTelephony;
|
const synthesizeTelephony = resolvedProvider.provider.synthesizeTelephony as NonNullable<
|
||||||
if (!synthesizeTelephony) {
|
typeof resolvedProvider.provider.synthesizeTelephony
|
||||||
const message = `${provider}: unsupported for telephony`;
|
>;
|
||||||
errors.push(message);
|
const prepared = await prepareSpeechSynthesis({
|
||||||
attempts.push({
|
provider: resolvedProvider.provider,
|
||||||
provider,
|
|
||||||
outcome: "skipped",
|
|
||||||
reasonCode: "unsupported_for_telephony",
|
|
||||||
error: message,
|
|
||||||
});
|
|
||||||
logVerbose(`TTS telephony: provider ${provider} skipped (${message})`);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
const synthesis = await synthesizeTelephony({
|
|
||||||
text: params.text,
|
text: params.text,
|
||||||
cfg: params.cfg,
|
cfg: params.cfg,
|
||||||
providerConfig: resolvedProvider.providerConfig,
|
providerConfig: resolvedProvider.providerConfig,
|
||||||
|
persona: resolvedProvider.synthesisPersona,
|
||||||
|
personaProviderConfig: resolvedProvider.personaProviderConfig,
|
||||||
|
target: "telephony",
|
||||||
|
timeoutMs: config.timeoutMs,
|
||||||
|
});
|
||||||
|
const synthesis = await synthesizeTelephony({
|
||||||
|
text: prepared.text,
|
||||||
|
cfg: params.cfg,
|
||||||
|
providerConfig: prepared.providerConfig,
|
||||||
timeoutMs: config.timeoutMs,
|
timeoutMs: config.timeoutMs,
|
||||||
});
|
});
|
||||||
const latencyMs = Date.now() - providerStart;
|
const latencyMs = Date.now() - providerStart;
|
||||||
@@ -1041,6 +1245,8 @@ export async function textToSpeechTelephony(params: {
|
|||||||
provider,
|
provider,
|
||||||
outcome: "success",
|
outcome: "success",
|
||||||
reasonCode: "success",
|
reasonCode: "success",
|
||||||
|
persona: persona?.id,
|
||||||
|
personaBinding: resolvedProvider.personaBinding,
|
||||||
latencyMs,
|
latencyMs,
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -1049,6 +1255,7 @@ export async function textToSpeechTelephony(params: {
|
|||||||
audioBuffer: synthesis.audioBuffer,
|
audioBuffer: synthesis.audioBuffer,
|
||||||
latencyMs,
|
latencyMs,
|
||||||
provider,
|
provider,
|
||||||
|
persona: persona?.id,
|
||||||
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
|
fallbackFrom: provider !== primaryProvider ? primaryProvider : undefined,
|
||||||
attemptedProviders,
|
attemptedProviders,
|
||||||
attempts,
|
attempts,
|
||||||
@@ -1065,6 +1272,13 @@ export async function textToSpeechTelephony(params: {
|
|||||||
reasonCode:
|
reasonCode:
|
||||||
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
|
err instanceof Error && err.name === "AbortError" ? "timeout" : "provider_error",
|
||||||
latencyMs,
|
latencyMs,
|
||||||
|
persona: persona?.id,
|
||||||
|
personaBinding:
|
||||||
|
resolvePersonaProviderConfig(persona, provider) != null
|
||||||
|
? "applied"
|
||||||
|
: persona
|
||||||
|
? "missing"
|
||||||
|
: "none",
|
||||||
error: errorMsg,
|
error: errorMsg,
|
||||||
});
|
});
|
||||||
const rawError = sanitizeTtsErrorForLog(err);
|
const rawError = sanitizeTtsErrorForLog(err);
|
||||||
@@ -1079,7 +1293,7 @@ export async function textToSpeechTelephony(params: {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return buildTtsFailureResult(errors, attemptedProviders, attempts);
|
return buildTtsFailureResult(errors, attemptedProviders, attempts, persona?.id);
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function listSpeechVoices(params: {
|
export async function listSpeechVoices(params: {
|
||||||
@@ -1250,6 +1464,7 @@ export async function maybeApplyTtsToPayload(params: {
|
|||||||
textLength: text.length,
|
textLength: text.length,
|
||||||
summarized: wasSummarized,
|
summarized: wasSummarized,
|
||||||
provider: result.provider,
|
provider: result.provider,
|
||||||
|
persona: result.persona,
|
||||||
fallbackFrom: result.fallbackFrom,
|
fallbackFrom: result.fallbackFrom,
|
||||||
attemptedProviders: result.attemptedProviders,
|
attemptedProviders: result.attemptedProviders,
|
||||||
attempts: result.attempts,
|
attempts: result.attempts,
|
||||||
@@ -1268,6 +1483,7 @@ export async function maybeApplyTtsToPayload(params: {
|
|||||||
success: false,
|
success: false,
|
||||||
textLength: text.length,
|
textLength: text.length,
|
||||||
summarized: wasSummarized,
|
summarized: wasSummarized,
|
||||||
|
persona: result.persona,
|
||||||
attemptedProviders: result.attemptedProviders,
|
attemptedProviders: result.attemptedProviders,
|
||||||
attempts: result.attempts,
|
attempts: result.attempts,
|
||||||
error: result.error,
|
error: result.error,
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ import {
|
|||||||
type SpeechProviderConfig,
|
type SpeechProviderConfig,
|
||||||
type SpeechProviderOverrides,
|
type SpeechProviderOverrides,
|
||||||
type SpeechProviderPlugin,
|
type SpeechProviderPlugin,
|
||||||
|
type SpeechSynthesisTarget,
|
||||||
} from "openclaw/plugin-sdk/speech";
|
} from "openclaw/plugin-sdk/speech";
|
||||||
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
|
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
|
||||||
import {
|
import {
|
||||||
@@ -48,7 +49,7 @@ function normalizeXaiSpeechResponseFormat(value: unknown): XaiSpeechResponseForm
|
|||||||
}
|
}
|
||||||
|
|
||||||
function resolveSpeechResponseFormat(
|
function resolveSpeechResponseFormat(
|
||||||
target: "audio-file" | "voice-note",
|
target: SpeechSynthesisTarget,
|
||||||
configuredFormat?: XaiSpeechResponseFormat,
|
configuredFormat?: XaiSpeechResponseFormat,
|
||||||
): XaiSpeechResponseFormat {
|
): XaiSpeechResponseFormat {
|
||||||
if (configuredFormat) {
|
if (configuredFormat) {
|
||||||
|
|||||||
@@ -9,16 +9,19 @@ const ttsMocks = vi.hoisted(() => ({
|
|||||||
getResolvedSpeechProviderConfig: vi.fn(),
|
getResolvedSpeechProviderConfig: vi.fn(),
|
||||||
getLastTtsAttempt: vi.fn(),
|
getLastTtsAttempt: vi.fn(),
|
||||||
getTtsMaxLength: vi.fn(),
|
getTtsMaxLength: vi.fn(),
|
||||||
|
getTtsPersona: vi.fn(),
|
||||||
getTtsProvider: vi.fn(),
|
getTtsProvider: vi.fn(),
|
||||||
isSummarizationEnabled: vi.fn(),
|
isSummarizationEnabled: vi.fn(),
|
||||||
isTtsEnabled: vi.fn(),
|
isTtsEnabled: vi.fn(),
|
||||||
isTtsProviderConfigured: vi.fn(),
|
isTtsProviderConfigured: vi.fn(),
|
||||||
|
listTtsPersonas: vi.fn(),
|
||||||
resolveTtsConfig: vi.fn(),
|
resolveTtsConfig: vi.fn(),
|
||||||
resolveTtsPrefsPath: vi.fn(),
|
resolveTtsPrefsPath: vi.fn(),
|
||||||
setLastTtsAttempt: vi.fn(),
|
setLastTtsAttempt: vi.fn(),
|
||||||
setSummarizationEnabled: vi.fn(),
|
setSummarizationEnabled: vi.fn(),
|
||||||
setTtsEnabled: vi.fn(),
|
setTtsEnabled: vi.fn(),
|
||||||
setTtsMaxLength: vi.fn(),
|
setTtsMaxLength: vi.fn(),
|
||||||
|
setTtsPersona: vi.fn(),
|
||||||
setTtsProvider: vi.fn(),
|
setTtsProvider: vi.fn(),
|
||||||
textToSpeech: vi.fn(),
|
textToSpeech: vi.fn(),
|
||||||
}));
|
}));
|
||||||
@@ -66,10 +69,12 @@ describe("handleTtsCommands status fallback reporting", () => {
|
|||||||
ttsMocks.resolveTtsPrefsPath.mockReturnValue("/tmp/tts-prefs.json");
|
ttsMocks.resolveTtsPrefsPath.mockReturnValue("/tmp/tts-prefs.json");
|
||||||
ttsMocks.isTtsEnabled.mockReturnValue(true);
|
ttsMocks.isTtsEnabled.mockReturnValue(true);
|
||||||
ttsMocks.getTtsProvider.mockReturnValue(PRIMARY_TTS_PROVIDER);
|
ttsMocks.getTtsProvider.mockReturnValue(PRIMARY_TTS_PROVIDER);
|
||||||
|
ttsMocks.getTtsPersona.mockReturnValue(undefined);
|
||||||
ttsMocks.isTtsProviderConfigured.mockReturnValue(true);
|
ttsMocks.isTtsProviderConfigured.mockReturnValue(true);
|
||||||
ttsMocks.getTtsMaxLength.mockReturnValue(1500);
|
ttsMocks.getTtsMaxLength.mockReturnValue(1500);
|
||||||
ttsMocks.isSummarizationEnabled.mockReturnValue(true);
|
ttsMocks.isSummarizationEnabled.mockReturnValue(true);
|
||||||
ttsMocks.getLastTtsAttempt.mockReturnValue(undefined);
|
ttsMocks.getLastTtsAttempt.mockReturnValue(undefined);
|
||||||
|
ttsMocks.listTtsPersonas.mockReturnValue([]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("shows fallback provider details for successful attempts", async () => {
|
it("shows fallback provider details for successful attempts", async () => {
|
||||||
@@ -234,6 +239,24 @@ describe("handleTtsCommands status fallback reporting", () => {
|
|||||||
);
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("lists and sets configured TTS personas", async () => {
|
||||||
|
ttsMocks.listTtsPersonas.mockReturnValue([
|
||||||
|
{
|
||||||
|
id: "alfred",
|
||||||
|
label: "Alfred",
|
||||||
|
provider: "google",
|
||||||
|
},
|
||||||
|
]);
|
||||||
|
|
||||||
|
const listResult = await handleTtsCommands(buildTtsParams("/tts persona"), true);
|
||||||
|
expect(listResult?.shouldContinue).toBe(false);
|
||||||
|
expect(listResult?.reply?.text).toContain("alfred (Alfred) provider=google");
|
||||||
|
|
||||||
|
const setResult = await handleTtsCommands(buildTtsParams("/tts persona alfred"), true);
|
||||||
|
expect(setResult?.shouldContinue).toBe(false);
|
||||||
|
expect(ttsMocks.setTtsPersona).toHaveBeenCalledWith("/tmp/tts-prefs.json", "alfred");
|
||||||
|
});
|
||||||
|
|
||||||
it("reads the latest assistant transcript reply once", async () => {
|
it("reads the latest assistant transcript reply once", async () => {
|
||||||
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-tts-latest-"));
|
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-tts-latest-"));
|
||||||
const sessionFile = path.join(tempDir, "session.jsonl");
|
const sessionFile = path.join(tempDir, "session.jsonl");
|
||||||
|
|||||||
@@ -14,16 +14,19 @@ import {
|
|||||||
getResolvedSpeechProviderConfig,
|
getResolvedSpeechProviderConfig,
|
||||||
getLastTtsAttempt,
|
getLastTtsAttempt,
|
||||||
getTtsMaxLength,
|
getTtsMaxLength,
|
||||||
|
getTtsPersona,
|
||||||
getTtsProvider,
|
getTtsProvider,
|
||||||
isSummarizationEnabled,
|
isSummarizationEnabled,
|
||||||
isTtsEnabled,
|
isTtsEnabled,
|
||||||
isTtsProviderConfigured,
|
isTtsProviderConfigured,
|
||||||
|
listTtsPersonas,
|
||||||
resolveTtsConfig,
|
resolveTtsConfig,
|
||||||
resolveTtsPrefsPath,
|
resolveTtsPrefsPath,
|
||||||
setLastTtsAttempt,
|
setLastTtsAttempt,
|
||||||
setSummarizationEnabled,
|
setSummarizationEnabled,
|
||||||
setTtsEnabled,
|
setTtsEnabled,
|
||||||
setTtsMaxLength,
|
setTtsMaxLength,
|
||||||
|
setTtsPersona,
|
||||||
setTtsProvider,
|
setTtsProvider,
|
||||||
textToSpeech,
|
textToSpeech,
|
||||||
} from "../../tts/tts.js";
|
} from "../../tts/tts.js";
|
||||||
@@ -68,7 +71,11 @@ function formatAttemptDetails(attempts: TtsAttemptDetail[] | undefined): string
|
|||||||
.map((attempt) => {
|
.map((attempt) => {
|
||||||
const reason = attempt.reasonCode === "success" ? "ok" : attempt.reasonCode;
|
const reason = attempt.reasonCode === "success" ? "ok" : attempt.reasonCode;
|
||||||
const latency = Number.isFinite(attempt.latencyMs) ? ` ${attempt.latencyMs}ms` : "";
|
const latency = Number.isFinite(attempt.latencyMs) ? ` ${attempt.latencyMs}ms` : "";
|
||||||
return `${attempt.provider}:${attempt.outcome}(${reason})${latency}`;
|
const persona =
|
||||||
|
attempt.persona && attempt.personaBinding && attempt.personaBinding !== "none"
|
||||||
|
? ` persona=${attempt.persona}:${attempt.personaBinding}`
|
||||||
|
: "";
|
||||||
|
return `${attempt.provider}:${attempt.outcome}(${reason})${persona}${latency}`;
|
||||||
})
|
})
|
||||||
.join(", ");
|
.join(", ");
|
||||||
}
|
}
|
||||||
@@ -83,6 +90,7 @@ function ttsUsage(): ReplyPayload {
|
|||||||
`• /tts off — Disable TTS\n` +
|
`• /tts off — Disable TTS\n` +
|
||||||
`• /tts status — Show current settings\n` +
|
`• /tts status — Show current settings\n` +
|
||||||
`• /tts provider [name] — View/change provider\n` +
|
`• /tts provider [name] — View/change provider\n` +
|
||||||
|
`• /tts persona [id|off] — View/change persona\n` +
|
||||||
`• /tts limit [number] — View/change text limit\n` +
|
`• /tts limit [number] — View/change text limit\n` +
|
||||||
`• /tts summary [on|off] — View/change auto-summary\n` +
|
`• /tts summary [on|off] — View/change auto-summary\n` +
|
||||||
`• /tts audio <text> — Generate audio from text\n` +
|
`• /tts audio <text> — Generate audio from text\n` +
|
||||||
@@ -96,6 +104,7 @@ function ttsUsage(): ReplyPayload {
|
|||||||
`• Summary OFF: Truncates text, then generates audio\n\n` +
|
`• Summary OFF: Truncates text, then generates audio\n\n` +
|
||||||
`**Examples:**\n` +
|
`**Examples:**\n` +
|
||||||
`/tts provider <id>\n` +
|
`/tts provider <id>\n` +
|
||||||
|
`/tts persona <id>\n` +
|
||||||
`/tts limit 2000\n` +
|
`/tts limit 2000\n` +
|
||||||
`/tts latest\n` +
|
`/tts latest\n` +
|
||||||
`/tts audio Hello, this is a test!`,
|
`/tts audio Hello, this is a test!`,
|
||||||
@@ -129,6 +138,7 @@ async function buildTtsAudioReply(params: {
|
|||||||
textLength: params.text.length,
|
textLength: params.text.length,
|
||||||
summarized: false,
|
summarized: false,
|
||||||
provider: result.provider,
|
provider: result.provider,
|
||||||
|
persona: result.persona,
|
||||||
fallbackFrom: result.fallbackFrom,
|
fallbackFrom: result.fallbackFrom,
|
||||||
attemptedProviders: result.attemptedProviders,
|
attemptedProviders: result.attemptedProviders,
|
||||||
attempts: result.attempts,
|
attempts: result.attempts,
|
||||||
@@ -150,6 +160,7 @@ async function buildTtsAudioReply(params: {
|
|||||||
success: false,
|
success: false,
|
||||||
textLength: params.text.length,
|
textLength: params.text.length,
|
||||||
summarized: false,
|
summarized: false,
|
||||||
|
persona: result.persona,
|
||||||
attemptedProviders: result.attemptedProviders,
|
attemptedProviders: result.attemptedProviders,
|
||||||
attempts: result.attempts,
|
attempts: result.attempts,
|
||||||
error: result.error,
|
error: result.error,
|
||||||
@@ -349,6 +360,50 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (action === "persona") {
|
||||||
|
const personas = listTtsPersonas(config);
|
||||||
|
const activePersona = getTtsPersona(config, prefsPath);
|
||||||
|
if (!args.trim()) {
|
||||||
|
const lines = [
|
||||||
|
"🎭 TTS persona",
|
||||||
|
`Active: ${activePersona?.id ?? "none"}`,
|
||||||
|
personas.length > 0
|
||||||
|
? personas
|
||||||
|
.map((persona) => {
|
||||||
|
const label = persona.label ? ` (${persona.label})` : "";
|
||||||
|
const provider = persona.provider ? ` provider=${persona.provider}` : "";
|
||||||
|
return `${persona.id}${label}${provider}`;
|
||||||
|
})
|
||||||
|
.join("\n")
|
||||||
|
: "No personas configured.",
|
||||||
|
"Usage: /tts persona <id> | off",
|
||||||
|
];
|
||||||
|
return { shouldContinue: false, reply: { text: lines.join("\n") } };
|
||||||
|
}
|
||||||
|
|
||||||
|
const requested = normalizeOptionalLowercaseString(args) ?? "";
|
||||||
|
if (requested === "off" || requested === "none" || requested === "default") {
|
||||||
|
setTtsPersona(prefsPath, null);
|
||||||
|
return { shouldContinue: false, reply: { text: "✅ TTS persona disabled." } };
|
||||||
|
}
|
||||||
|
const persona = personas.find((entry) => entry.id === requested);
|
||||||
|
if (!persona) {
|
||||||
|
return {
|
||||||
|
shouldContinue: false,
|
||||||
|
reply: {
|
||||||
|
text:
|
||||||
|
`❌ Unknown TTS persona: ${requested || args}.\n` +
|
||||||
|
`Use /tts persona to list configured personas.`,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
setTtsPersona(prefsPath, persona.id);
|
||||||
|
return {
|
||||||
|
shouldContinue: false,
|
||||||
|
reply: { text: `✅ TTS persona set to ${persona.id}.` },
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
if (action === "limit") {
|
if (action === "limit") {
|
||||||
if (!args.trim()) {
|
if (!args.trim()) {
|
||||||
const currentLimit = getTtsMaxLength(prefsPath);
|
const currentLimit = getTtsMaxLength(prefsPath);
|
||||||
@@ -410,6 +465,7 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
|
|||||||
if (action === "status") {
|
if (action === "status") {
|
||||||
const enabled = isTtsEnabled(config, prefsPath);
|
const enabled = isTtsEnabled(config, prefsPath);
|
||||||
const provider = getTtsProvider(config, prefsPath);
|
const provider = getTtsProvider(config, prefsPath);
|
||||||
|
const persona = getTtsPersona(config, prefsPath);
|
||||||
const hasKey = isTtsProviderConfigured(config, provider, params.cfg);
|
const hasKey = isTtsProviderConfigured(config, provider, params.cfg);
|
||||||
const maxLength = getTtsMaxLength(prefsPath);
|
const maxLength = getTtsMaxLength(prefsPath);
|
||||||
const summarize = isSummarizationEnabled(prefsPath);
|
const summarize = isSummarizationEnabled(prefsPath);
|
||||||
@@ -419,6 +475,7 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
|
|||||||
`State: ${enabled ? "✅ enabled" : "❌ disabled"}`,
|
`State: ${enabled ? "✅ enabled" : "❌ disabled"}`,
|
||||||
`Chat override: ${params.sessionEntry?.ttsAuto ?? "default"}`,
|
`Chat override: ${params.sessionEntry?.ttsAuto ?? "default"}`,
|
||||||
`Provider: ${provider} (${hasKey ? "✅ configured" : "❌ not configured"})`,
|
`Provider: ${provider} (${hasKey ? "✅ configured" : "❌ not configured"})`,
|
||||||
|
`Persona: ${persona?.id ?? "none"}`,
|
||||||
`Text limit: ${maxLength} chars`,
|
`Text limit: ${maxLength} chars`,
|
||||||
`Auto-summary: ${summarize ? "on" : "off"}`,
|
`Auto-summary: ${summarize ? "on" : "off"}`,
|
||||||
];
|
];
|
||||||
@@ -429,6 +486,9 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
|
|||||||
lines.push(`Text: ${last.textLength} chars${last.summarized ? " (summarized)" : ""}`);
|
lines.push(`Text: ${last.textLength} chars${last.summarized ? " (summarized)" : ""}`);
|
||||||
if (last.success) {
|
if (last.success) {
|
||||||
lines.push(`Provider: ${last.provider ?? "unknown"}`);
|
lines.push(`Provider: ${last.provider ?? "unknown"}`);
|
||||||
|
if (last.persona) {
|
||||||
|
lines.push(`Persona: ${last.persona}`);
|
||||||
|
}
|
||||||
if (last.fallbackFrom && last.provider && last.fallbackFrom !== last.provider) {
|
if (last.fallbackFrom && last.provider && last.fallbackFrom !== last.provider) {
|
||||||
lines.push(`Fallback: ${last.fallbackFrom} -> ${last.provider}`);
|
lines.push(`Fallback: ${last.fallbackFrom} -> ${last.provider}`);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -73,6 +73,7 @@ const mocks = vi.hoisted(() => ({
|
|||||||
attempts: [],
|
attempts: [],
|
||||||
})),
|
})),
|
||||||
setTtsProvider: vi.fn(),
|
setTtsProvider: vi.fn(),
|
||||||
|
setTtsPersona: vi.fn(),
|
||||||
resolveExplicitTtsOverrides: vi.fn(
|
resolveExplicitTtsOverrides: vi.fn(
|
||||||
({
|
({
|
||||||
provider,
|
provider,
|
||||||
@@ -220,11 +221,14 @@ vi.mock("../video-generation/runtime.js", () => ({
|
|||||||
}));
|
}));
|
||||||
|
|
||||||
vi.mock("../tts/tts.js", () => ({
|
vi.mock("../tts/tts.js", () => ({
|
||||||
|
getTtsPersona: vi.fn(() => undefined),
|
||||||
getTtsProvider: vi.fn(() => "openai"),
|
getTtsProvider: vi.fn(() => "openai"),
|
||||||
|
listTtsPersonas: vi.fn(() => []),
|
||||||
listSpeechVoices: vi.fn(async () => []),
|
listSpeechVoices: vi.fn(async () => []),
|
||||||
resolveTtsConfig: vi.fn(() => ({})),
|
resolveTtsConfig: vi.fn(() => ({})),
|
||||||
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
|
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
|
||||||
setTtsEnabled: vi.fn(),
|
setTtsEnabled: vi.fn(),
|
||||||
|
setTtsPersona: mocks.setTtsPersona as typeof import("../tts/tts.js").setTtsPersona,
|
||||||
setTtsProvider: mocks.setTtsProvider as typeof import("../tts/tts.js").setTtsProvider,
|
setTtsProvider: mocks.setTtsProvider as typeof import("../tts/tts.js").setTtsProvider,
|
||||||
resolveExplicitTtsOverrides:
|
resolveExplicitTtsOverrides:
|
||||||
mocks.resolveExplicitTtsOverrides as typeof import("../tts/tts.js").resolveExplicitTtsOverrides,
|
mocks.resolveExplicitTtsOverrides as typeof import("../tts/tts.js").resolveExplicitTtsOverrides,
|
||||||
|
|||||||
@@ -56,11 +56,14 @@ import { theme } from "../terminal/theme.js";
|
|||||||
import { canonicalizeSpeechProviderId, listSpeechProviders } from "../tts/provider-registry.js";
|
import { canonicalizeSpeechProviderId, listSpeechProviders } from "../tts/provider-registry.js";
|
||||||
import {
|
import {
|
||||||
getTtsProvider,
|
getTtsProvider,
|
||||||
|
getTtsPersona,
|
||||||
|
listTtsPersonas,
|
||||||
listSpeechVoices,
|
listSpeechVoices,
|
||||||
resolveExplicitTtsOverrides,
|
resolveExplicitTtsOverrides,
|
||||||
resolveTtsConfig,
|
resolveTtsConfig,
|
||||||
resolveTtsPrefsPath,
|
resolveTtsPrefsPath,
|
||||||
setTtsEnabled,
|
setTtsEnabled,
|
||||||
|
setTtsPersona,
|
||||||
setTtsProvider,
|
setTtsProvider,
|
||||||
textToSpeech,
|
textToSpeech,
|
||||||
} from "../tts/tts.js";
|
} from "../tts/tts.js";
|
||||||
@@ -256,6 +259,13 @@ const CAPABILITY_METADATA: CapabilityMetadata[] = [
|
|||||||
flags: ["--local", "--gateway", "--json"],
|
flags: ["--local", "--gateway", "--json"],
|
||||||
resultShape: "provider ids, configured state, models, voices",
|
resultShape: "provider ids, configured state, models, voices",
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
id: "tts.personas",
|
||||||
|
description: "List TTS personas.",
|
||||||
|
transports: ["local", "gateway"],
|
||||||
|
flags: ["--local", "--gateway", "--json"],
|
||||||
|
resultShape: "persona ids, labels, providers, active persona",
|
||||||
|
},
|
||||||
{
|
{
|
||||||
id: "tts.status",
|
id: "tts.status",
|
||||||
description: "Show gateway-managed TTS state.",
|
description: "Show gateway-managed TTS state.",
|
||||||
@@ -284,6 +294,13 @@ const CAPABILITY_METADATA: CapabilityMetadata[] = [
|
|||||||
flags: ["--provider", "--local", "--gateway", "--json"],
|
flags: ["--provider", "--local", "--gateway", "--json"],
|
||||||
resultShape: "selected provider",
|
resultShape: "selected provider",
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
id: "tts.set-persona",
|
||||||
|
description: "Set the active TTS persona.",
|
||||||
|
transports: ["local", "gateway"],
|
||||||
|
flags: ["--persona", "--off", "--local", "--gateway", "--json"],
|
||||||
|
resultShape: "selected persona",
|
||||||
|
},
|
||||||
{
|
{
|
||||||
id: "video.generate",
|
id: "video.generate",
|
||||||
description: "Generate video files with configured video providers.",
|
description: "Generate video files with configured video providers.",
|
||||||
@@ -1181,6 +1198,30 @@ async function runTtsProviders(transport: CapabilityTransport) {
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function runTtsPersonas(transport: CapabilityTransport) {
|
||||||
|
if (transport === "gateway") {
|
||||||
|
return await callGateway({
|
||||||
|
method: "tts.personas",
|
||||||
|
timeoutMs: 30_000,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const cfg = loadConfig();
|
||||||
|
const config = resolveTtsConfig(cfg);
|
||||||
|
const prefsPath = resolveTtsPrefsPath(config);
|
||||||
|
const active = getTtsPersona(config, prefsPath);
|
||||||
|
return {
|
||||||
|
active: active?.id ?? null,
|
||||||
|
personas: listTtsPersonas(config).map((persona) => ({
|
||||||
|
id: persona.id,
|
||||||
|
label: persona.label,
|
||||||
|
description: persona.description,
|
||||||
|
provider: persona.provider,
|
||||||
|
fallbackPolicy: persona.fallbackPolicy,
|
||||||
|
providers: Object.keys(persona.providers ?? {}),
|
||||||
|
})),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
async function runTtsVoices(providerRaw?: string) {
|
async function runTtsVoices(providerRaw?: string) {
|
||||||
const cfg = loadConfig();
|
const cfg = loadConfig();
|
||||||
const config = resolveTtsConfig(cfg);
|
const config = resolveTtsConfig(cfg);
|
||||||
@@ -1194,9 +1235,10 @@ async function runTtsVoices(providerRaw?: string) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function runTtsStateMutation(params: {
|
async function runTtsStateMutation(params: {
|
||||||
capability: "tts.enable" | "tts.disable" | "tts.set-provider";
|
capability: "tts.enable" | "tts.disable" | "tts.set-provider" | "tts.set-persona";
|
||||||
transport: CapabilityTransport;
|
transport: CapabilityTransport;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string | null;
|
||||||
}) {
|
}) {
|
||||||
if (params.transport === "gateway") {
|
if (params.transport === "gateway") {
|
||||||
const method =
|
const method =
|
||||||
@@ -1204,10 +1246,17 @@ async function runTtsStateMutation(params: {
|
|||||||
? "tts.enable"
|
? "tts.enable"
|
||||||
: params.capability === "tts.disable"
|
: params.capability === "tts.disable"
|
||||||
? "tts.disable"
|
? "tts.disable"
|
||||||
: "tts.setProvider";
|
: params.capability === "tts.set-provider"
|
||||||
|
? "tts.setProvider"
|
||||||
|
: "tts.setPersona";
|
||||||
const payload = await callGateway({
|
const payload = await callGateway({
|
||||||
method,
|
method,
|
||||||
params: params.provider ? { provider: params.provider } : undefined,
|
params:
|
||||||
|
params.capability === "tts.set-provider"
|
||||||
|
? { provider: params.provider }
|
||||||
|
: params.capability === "tts.set-persona"
|
||||||
|
? { persona: params.persona ?? "off" }
|
||||||
|
: undefined,
|
||||||
timeoutMs: 30_000,
|
timeoutMs: 30_000,
|
||||||
});
|
});
|
||||||
return payload;
|
return payload;
|
||||||
@@ -1224,6 +1273,20 @@ async function runTtsStateMutation(params: {
|
|||||||
setTtsEnabled(prefsPath, false);
|
setTtsEnabled(prefsPath, false);
|
||||||
return { enabled: false };
|
return { enabled: false };
|
||||||
}
|
}
|
||||||
|
if (params.capability === "tts.set-persona") {
|
||||||
|
if (!params.persona) {
|
||||||
|
setTtsPersona(prefsPath, null);
|
||||||
|
return { persona: null };
|
||||||
|
}
|
||||||
|
const persona = listTtsPersonas(config).find(
|
||||||
|
(entry) => entry.id === normalizeLowercaseStringOrEmpty(params.persona ?? ""),
|
||||||
|
);
|
||||||
|
if (!persona) {
|
||||||
|
throw new Error(`Unknown TTS persona: ${params.persona}`);
|
||||||
|
}
|
||||||
|
setTtsPersona(prefsPath, persona.id);
|
||||||
|
return { persona: persona.id };
|
||||||
|
}
|
||||||
if (!params.provider) {
|
if (!params.provider) {
|
||||||
throw new Error("--provider is required");
|
throw new Error("--provider is required");
|
||||||
}
|
}
|
||||||
@@ -1746,6 +1809,27 @@ export function registerCapabilityCli(program: Command) {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
tts
|
||||||
|
.command("personas")
|
||||||
|
.description("List TTS personas")
|
||||||
|
.option("--local", "Force local execution", false)
|
||||||
|
.option("--gateway", "Force gateway execution", false)
|
||||||
|
.option("--json", "Output JSON", false)
|
||||||
|
.action(async (opts) => {
|
||||||
|
await runCommandWithRuntime(defaultRuntime, async () => {
|
||||||
|
const transport = resolveTransport({
|
||||||
|
local: Boolean(opts.local),
|
||||||
|
gateway: Boolean(opts.gateway),
|
||||||
|
supported: ["local", "gateway"],
|
||||||
|
defaultTransport: "local",
|
||||||
|
});
|
||||||
|
const result = await runTtsPersonas(transport);
|
||||||
|
emitJsonOrText(defaultRuntime, Boolean(opts.json), result, (value) =>
|
||||||
|
JSON.stringify(value, null, 2),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
tts
|
tts
|
||||||
.command("status")
|
.command("status")
|
||||||
.description("Show TTS status")
|
.description("Show TTS status")
|
||||||
@@ -1823,6 +1907,36 @@ export function registerCapabilityCli(program: Command) {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
tts
|
||||||
|
.command("set-persona")
|
||||||
|
.description("Set the active TTS persona")
|
||||||
|
.option("--persona <id>", "TTS persona id")
|
||||||
|
.option("--off", "Disable the active TTS persona", false)
|
||||||
|
.option("--local", "Force local execution", false)
|
||||||
|
.option("--gateway", "Force gateway execution", false)
|
||||||
|
.option("--json", "Output JSON", false)
|
||||||
|
.action(async (opts) => {
|
||||||
|
await runCommandWithRuntime(defaultRuntime, async () => {
|
||||||
|
const transport = resolveTransport({
|
||||||
|
local: Boolean(opts.local),
|
||||||
|
gateway: Boolean(opts.gateway),
|
||||||
|
supported: ["local", "gateway"],
|
||||||
|
defaultTransport: "gateway",
|
||||||
|
});
|
||||||
|
if (!opts.off && !opts.persona) {
|
||||||
|
throw new Error("--persona is required unless --off is set");
|
||||||
|
}
|
||||||
|
const result = await runTtsStateMutation({
|
||||||
|
capability: "tts.set-persona",
|
||||||
|
persona: opts.off ? null : String(opts.persona),
|
||||||
|
transport,
|
||||||
|
});
|
||||||
|
emitJsonOrText(defaultRuntime, Boolean(opts.json), result, (value) =>
|
||||||
|
JSON.stringify(value, null, 2),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
const video = capability.command("video").description("Video generation and description");
|
const video = capability.command("video").description("Video generation and description");
|
||||||
|
|
||||||
video
|
video
|
||||||
|
|||||||
@@ -19116,6 +19116,222 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
|
|||||||
type: "string",
|
type: "string",
|
||||||
minLength: 1,
|
minLength: 1,
|
||||||
},
|
},
|
||||||
|
persona: {
|
||||||
|
type: "string",
|
||||||
|
title: "TTS Persona",
|
||||||
|
description:
|
||||||
|
"Default TTS persona id. Local TTS persona preferences can override this per host.",
|
||||||
|
},
|
||||||
|
personas: {
|
||||||
|
type: "object",
|
||||||
|
propertyNames: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
additionalProperties: {
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
label: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
description: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
provider: {
|
||||||
|
type: "string",
|
||||||
|
minLength: 1,
|
||||||
|
},
|
||||||
|
fallbackPolicy: {
|
||||||
|
anyOf: [
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
const: "preserve-persona",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
const: "provider-defaults",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
const: "fail",
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
prompt: {
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
profile: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
scene: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
sampleContext: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
style: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
accent: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
pacing: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
constraints: {
|
||||||
|
type: "array",
|
||||||
|
items: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
additionalProperties: false,
|
||||||
|
title: "TTS Persona Prompt",
|
||||||
|
description:
|
||||||
|
"Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
|
||||||
|
},
|
||||||
|
rewrite: {
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
enabled: {
|
||||||
|
type: "boolean",
|
||||||
|
},
|
||||||
|
model: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
preserveMeaning: {
|
||||||
|
type: "boolean",
|
||||||
|
},
|
||||||
|
compressForSpeech: {
|
||||||
|
type: "boolean",
|
||||||
|
},
|
||||||
|
inCharacter: {
|
||||||
|
type: "boolean",
|
||||||
|
},
|
||||||
|
maxChars: {
|
||||||
|
type: "integer",
|
||||||
|
minimum: 1,
|
||||||
|
maximum: 9007199254740991,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
additionalProperties: false,
|
||||||
|
},
|
||||||
|
providers: {
|
||||||
|
type: "object",
|
||||||
|
propertyNames: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
additionalProperties: {
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
apiKey: {
|
||||||
|
anyOf: [
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
oneOf: [
|
||||||
|
{
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
source: {
|
||||||
|
type: "string",
|
||||||
|
const: "env",
|
||||||
|
},
|
||||||
|
provider: {
|
||||||
|
type: "string",
|
||||||
|
pattern: "^[a-z][a-z0-9_-]{0,63}$",
|
||||||
|
},
|
||||||
|
id: {
|
||||||
|
type: "string",
|
||||||
|
pattern: "^[A-Z][A-Z0-9_]{0,127}$",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ["source", "provider", "id"],
|
||||||
|
additionalProperties: false,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
source: {
|
||||||
|
type: "string",
|
||||||
|
const: "file",
|
||||||
|
},
|
||||||
|
provider: {
|
||||||
|
type: "string",
|
||||||
|
pattern: "^[a-z][a-z0-9_-]{0,63}$",
|
||||||
|
},
|
||||||
|
id: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ["source", "provider", "id"],
|
||||||
|
additionalProperties: false,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "object",
|
||||||
|
properties: {
|
||||||
|
source: {
|
||||||
|
type: "string",
|
||||||
|
const: "exec",
|
||||||
|
},
|
||||||
|
provider: {
|
||||||
|
type: "string",
|
||||||
|
pattern: "^[a-z][a-z0-9_-]{0,63}$",
|
||||||
|
},
|
||||||
|
id: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
required: ["source", "provider", "id"],
|
||||||
|
additionalProperties: false,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
additionalProperties: {
|
||||||
|
anyOf: [
|
||||||
|
{
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "number",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "boolean",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "null",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "array",
|
||||||
|
items: {},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: "object",
|
||||||
|
propertyNames: {
|
||||||
|
type: "string",
|
||||||
|
},
|
||||||
|
additionalProperties: {},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
title: "TTS Persona Provider Bindings",
|
||||||
|
description:
|
||||||
|
"Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
additionalProperties: false,
|
||||||
|
title: "TTS Persona",
|
||||||
|
description:
|
||||||
|
"One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
|
||||||
|
},
|
||||||
|
title: "TTS Personas",
|
||||||
|
description:
|
||||||
|
"Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
|
||||||
|
},
|
||||||
summaryModel: {
|
summaryModel: {
|
||||||
type: "string",
|
type: "string",
|
||||||
},
|
},
|
||||||
@@ -27520,6 +27736,31 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
|
|||||||
help: "Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
|
help: "Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
|
||||||
tags: ["media"],
|
tags: ["media"],
|
||||||
},
|
},
|
||||||
|
"messages.tts.persona": {
|
||||||
|
label: "TTS Persona",
|
||||||
|
help: "Default TTS persona id. Local TTS persona preferences can override this per host.",
|
||||||
|
tags: ["media"],
|
||||||
|
},
|
||||||
|
"messages.tts.personas": {
|
||||||
|
label: "TTS Personas",
|
||||||
|
help: "Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
|
||||||
|
tags: ["media"],
|
||||||
|
},
|
||||||
|
"messages.tts.personas.*": {
|
||||||
|
label: "TTS Persona",
|
||||||
|
help: "One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
|
||||||
|
tags: ["media"],
|
||||||
|
},
|
||||||
|
"messages.tts.personas.*.prompt": {
|
||||||
|
label: "TTS Persona Prompt",
|
||||||
|
help: "Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
|
||||||
|
tags: ["media"],
|
||||||
|
},
|
||||||
|
"messages.tts.personas.*.providers": {
|
||||||
|
label: "TTS Persona Provider Bindings",
|
||||||
|
help: "Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
|
||||||
|
tags: ["media"],
|
||||||
|
},
|
||||||
"messages.tts.providers": {
|
"messages.tts.providers": {
|
||||||
label: "TTS Provider Settings",
|
label: "TTS Provider Settings",
|
||||||
help: "Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
|
help: "Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
|
||||||
@@ -28081,6 +28322,10 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
|
|||||||
sensitive: true,
|
sensitive: true,
|
||||||
tags: ["security", "media", "tools"],
|
tags: ["security", "media", "tools"],
|
||||||
},
|
},
|
||||||
|
"messages.tts.personas.*.providers.*.apiKey": {
|
||||||
|
sensitive: true,
|
||||||
|
tags: ["security", "auth", "media"],
|
||||||
|
},
|
||||||
"mcp.servers.*.headers.*": {
|
"mcp.servers.*.headers.*": {
|
||||||
sensitive: true,
|
sensitive: true,
|
||||||
tags: ["security"],
|
tags: ["security"],
|
||||||
|
|||||||
@@ -1589,6 +1589,16 @@ export const FIELD_HELP: Record<string, string> = {
|
|||||||
"Removes the acknowledgment reaction after final reply delivery when enabled. Keep enabled for cleaner UX in channels where persistent ack reactions create clutter.",
|
"Removes the acknowledgment reaction after final reply delivery when enabled. Keep enabled for cleaner UX in channels where persistent ack reactions create clutter.",
|
||||||
"messages.tts":
|
"messages.tts":
|
||||||
"Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
|
"Text-to-speech policy for reading agent replies aloud on supported voice or audio surfaces. Keep disabled unless voice playback is part of your operator/user workflow.",
|
||||||
|
"messages.tts.persona":
|
||||||
|
"Default TTS persona id. Local TTS persona preferences can override this per host.",
|
||||||
|
"messages.tts.personas":
|
||||||
|
"Named TTS personas that define stable spoken identity plus provider-specific speech bindings.",
|
||||||
|
"messages.tts.personas.*":
|
||||||
|
"One TTS persona. Use provider-specific bindings for exact voices/models and prompt templates.",
|
||||||
|
"messages.tts.personas.*.prompt":
|
||||||
|
"Provider-neutral persona prompt intent. Providers decide whether and how to map this into request instructions.",
|
||||||
|
"messages.tts.personas.*.providers":
|
||||||
|
"Provider-specific TTS persona bindings keyed by speech provider id. These merge over messages.tts.providers for the active persona.",
|
||||||
"messages.tts.providers":
|
"messages.tts.providers":
|
||||||
"Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
|
"Provider-specific TTS settings keyed by speech provider id. Use this instead of bundled provider-specific top-level keys so speech plugins stay decoupled from core config schema.",
|
||||||
"messages.tts.providers.*":
|
"messages.tts.providers.*":
|
||||||
|
|||||||
@@ -820,6 +820,11 @@ export const FIELD_LABELS: Record<string, string> = {
|
|||||||
"messages.inbound.debounceMs": "Inbound Message Debounce (ms)",
|
"messages.inbound.debounceMs": "Inbound Message Debounce (ms)",
|
||||||
"messages.inbound.byChannel": "Inbound Debounce by Channel (ms)",
|
"messages.inbound.byChannel": "Inbound Debounce by Channel (ms)",
|
||||||
"messages.tts": "Message Text-to-Speech",
|
"messages.tts": "Message Text-to-Speech",
|
||||||
|
"messages.tts.persona": "TTS Persona",
|
||||||
|
"messages.tts.personas": "TTS Personas",
|
||||||
|
"messages.tts.personas.*": "TTS Persona",
|
||||||
|
"messages.tts.personas.*.prompt": "TTS Persona Prompt",
|
||||||
|
"messages.tts.personas.*.providers": "TTS Persona Provider Bindings",
|
||||||
"messages.tts.providers": "TTS Provider Settings",
|
"messages.tts.providers": "TTS Provider Settings",
|
||||||
"messages.tts.providers.*": "TTS Provider Config",
|
"messages.tts.providers.*": "TTS Provider Config",
|
||||||
"messages.tts.providers.*.apiKey": "TTS Provider API Key", // pragma: allowlist secret
|
"messages.tts.providers.*.apiKey": "TTS Provider API Key", // pragma: allowlist secret
|
||||||
|
|||||||
@@ -25,6 +25,43 @@ export type TtsModelOverrideConfig = {
|
|||||||
|
|
||||||
export type TtsProviderConfigMap = Record<string, Record<string, unknown>>;
|
export type TtsProviderConfigMap = Record<string, Record<string, unknown>>;
|
||||||
|
|
||||||
|
export type TtsPersonaFallbackPolicy = "preserve-persona" | "provider-defaults" | "fail";
|
||||||
|
|
||||||
|
export type TtsPersonaPromptConfig = {
|
||||||
|
profile?: string;
|
||||||
|
scene?: string;
|
||||||
|
sampleContext?: string;
|
||||||
|
style?: string;
|
||||||
|
accent?: string;
|
||||||
|
pacing?: string;
|
||||||
|
constraints?: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
export type TtsPersonaRewriteConfig = {
|
||||||
|
enabled?: boolean;
|
||||||
|
model?: string;
|
||||||
|
preserveMeaning?: boolean;
|
||||||
|
compressForSpeech?: boolean;
|
||||||
|
inCharacter?: boolean;
|
||||||
|
maxChars?: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type TtsPersonaConfig = {
|
||||||
|
label?: string;
|
||||||
|
description?: string;
|
||||||
|
/** Preferred provider for this persona. Explicit provider prefs still win. */
|
||||||
|
provider?: TtsProvider;
|
||||||
|
fallbackPolicy?: TtsPersonaFallbackPolicy;
|
||||||
|
prompt?: TtsPersonaPromptConfig;
|
||||||
|
rewrite?: TtsPersonaRewriteConfig;
|
||||||
|
/** Provider-specific persona bindings keyed by speech provider id. */
|
||||||
|
providers?: TtsProviderConfigMap;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type ResolvedTtsPersona = TtsPersonaConfig & {
|
||||||
|
id: string;
|
||||||
|
};
|
||||||
|
|
||||||
export type TtsConfig = {
|
export type TtsConfig = {
|
||||||
/** Auto-TTS mode (preferred). */
|
/** Auto-TTS mode (preferred). */
|
||||||
auto?: TtsAutoMode;
|
auto?: TtsAutoMode;
|
||||||
@@ -34,6 +71,10 @@ export type TtsConfig = {
|
|||||||
mode?: TtsMode;
|
mode?: TtsMode;
|
||||||
/** Primary TTS provider (fallbacks are automatic). */
|
/** Primary TTS provider (fallbacks are automatic). */
|
||||||
provider?: TtsProvider;
|
provider?: TtsProvider;
|
||||||
|
/** Active TTS persona id. */
|
||||||
|
persona?: string;
|
||||||
|
/** Named TTS personas. */
|
||||||
|
personas?: Record<string, TtsPersonaConfig>;
|
||||||
/** Optional model override for TTS auto-summary (provider/model or alias). */
|
/** Optional model override for TTS auto-summary (provider/model or alias). */
|
||||||
summaryModel?: string;
|
summaryModel?: string;
|
||||||
/** Allow the model to override TTS parameters. */
|
/** Allow the model to override TTS parameters. */
|
||||||
|
|||||||
@@ -497,12 +497,48 @@ const TtsProviderConfigSchema = z
|
|||||||
z.record(z.string(), z.unknown()),
|
z.record(z.string(), z.unknown()),
|
||||||
]),
|
]),
|
||||||
);
|
);
|
||||||
|
const TtsPersonaPromptSchema = z
|
||||||
|
.object({
|
||||||
|
profile: z.string().optional(),
|
||||||
|
scene: z.string().optional(),
|
||||||
|
sampleContext: z.string().optional(),
|
||||||
|
style: z.string().optional(),
|
||||||
|
accent: z.string().optional(),
|
||||||
|
pacing: z.string().optional(),
|
||||||
|
constraints: z.array(z.string()).optional(),
|
||||||
|
})
|
||||||
|
.strict();
|
||||||
|
const TtsPersonaRewriteSchema = z
|
||||||
|
.object({
|
||||||
|
enabled: z.boolean().optional(),
|
||||||
|
model: z.string().optional(),
|
||||||
|
preserveMeaning: z.boolean().optional(),
|
||||||
|
compressForSpeech: z.boolean().optional(),
|
||||||
|
inCharacter: z.boolean().optional(),
|
||||||
|
maxChars: z.number().int().min(1).optional(),
|
||||||
|
})
|
||||||
|
.strict();
|
||||||
|
const TtsPersonaSchema = z
|
||||||
|
.object({
|
||||||
|
label: z.string().optional(),
|
||||||
|
description: z.string().optional(),
|
||||||
|
provider: TtsProviderSchema.optional(),
|
||||||
|
fallbackPolicy: z
|
||||||
|
.union([z.literal("preserve-persona"), z.literal("provider-defaults"), z.literal("fail")])
|
||||||
|
.optional(),
|
||||||
|
prompt: TtsPersonaPromptSchema.optional(),
|
||||||
|
rewrite: TtsPersonaRewriteSchema.optional(),
|
||||||
|
providers: z.record(z.string(), TtsProviderConfigSchema).optional(),
|
||||||
|
})
|
||||||
|
.strict();
|
||||||
export const TtsConfigSchema = z
|
export const TtsConfigSchema = z
|
||||||
.object({
|
.object({
|
||||||
auto: TtsAutoSchema.optional(),
|
auto: TtsAutoSchema.optional(),
|
||||||
enabled: z.boolean().optional(),
|
enabled: z.boolean().optional(),
|
||||||
mode: TtsModeSchema.optional(),
|
mode: TtsModeSchema.optional(),
|
||||||
provider: TtsProviderSchema.optional(),
|
provider: TtsProviderSchema.optional(),
|
||||||
|
persona: z.string().optional(),
|
||||||
|
personas: z.record(z.string(), TtsPersonaSchema).optional(),
|
||||||
summaryModel: z.string().optional(),
|
summaryModel: z.string().optional(),
|
||||||
modelOverrides: z
|
modelOverrides: z
|
||||||
.object({
|
.object({
|
||||||
|
|||||||
@@ -39,4 +39,47 @@ describe("TtsConfigSchema openai speed and instructions", () => {
|
|||||||
}),
|
}),
|
||||||
).not.toThrow();
|
).not.toThrow();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("accepts provider-specific persona bindings and structured prompt fields", () => {
|
||||||
|
expect(() =>
|
||||||
|
TtsConfigSchema.parse({
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: {
|
||||||
|
label: "Alfred",
|
||||||
|
description: "Dry, warm British butler narrator.",
|
||||||
|
provider: "google",
|
||||||
|
fallbackPolicy: "preserve-persona",
|
||||||
|
prompt: {
|
||||||
|
profile: "A brilliant British butler.",
|
||||||
|
scene: "A quiet late-night study.",
|
||||||
|
sampleContext: "The speaker is answering a trusted operator.",
|
||||||
|
style: "Refined and lightly amused.",
|
||||||
|
accent: "British English.",
|
||||||
|
pacing: "Measured.",
|
||||||
|
constraints: ["Do not read configuration values aloud."],
|
||||||
|
},
|
||||||
|
rewrite: {
|
||||||
|
enabled: false,
|
||||||
|
preserveMeaning: true,
|
||||||
|
compressForSpeech: true,
|
||||||
|
maxChars: 1500,
|
||||||
|
},
|
||||||
|
providers: {
|
||||||
|
google: {
|
||||||
|
model: "gemini-3.1-flash-tts-preview",
|
||||||
|
voiceName: "Algieba",
|
||||||
|
promptTemplate: "audio-profile-v1",
|
||||||
|
},
|
||||||
|
openai: {
|
||||||
|
model: "gpt-4o-mini-tts",
|
||||||
|
voice: "cedar",
|
||||||
|
instructions: "Speak with dry warmth.",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
).not.toThrow();
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -78,6 +78,7 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
|
|||||||
"usage.cost",
|
"usage.cost",
|
||||||
"tts.status",
|
"tts.status",
|
||||||
"tts.providers",
|
"tts.providers",
|
||||||
|
"tts.personas",
|
||||||
"commands.list",
|
"commands.list",
|
||||||
"models.list",
|
"models.list",
|
||||||
"models.authStatus",
|
"models.authStatus",
|
||||||
@@ -131,6 +132,7 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
|
|||||||
"tts.disable",
|
"tts.disable",
|
||||||
"tts.convert",
|
"tts.convert",
|
||||||
"tts.setProvider",
|
"tts.setProvider",
|
||||||
|
"tts.setPersona",
|
||||||
"voicewake.set",
|
"voicewake.set",
|
||||||
"node.invoke",
|
"node.invoke",
|
||||||
"chat.send",
|
"chat.send",
|
||||||
|
|||||||
@@ -20,10 +20,12 @@ const BASE_METHODS = [
|
|||||||
"usage.cost",
|
"usage.cost",
|
||||||
"tts.status",
|
"tts.status",
|
||||||
"tts.providers",
|
"tts.providers",
|
||||||
|
"tts.personas",
|
||||||
"tts.enable",
|
"tts.enable",
|
||||||
"tts.disable",
|
"tts.disable",
|
||||||
"tts.convert",
|
"tts.convert",
|
||||||
"tts.setProvider",
|
"tts.setProvider",
|
||||||
|
"tts.setPersona",
|
||||||
"config.get",
|
"config.get",
|
||||||
"config.set",
|
"config.set",
|
||||||
"config.apply",
|
"config.apply",
|
||||||
|
|||||||
@@ -25,9 +25,11 @@ vi.mock("../../tts/provider-registry.js", () => ({
|
|||||||
|
|
||||||
vi.mock("../../tts/tts.js", () => ({
|
vi.mock("../../tts/tts.js", () => ({
|
||||||
getResolvedSpeechProviderConfig: vi.fn(),
|
getResolvedSpeechProviderConfig: vi.fn(),
|
||||||
|
getTtsPersona: vi.fn(() => undefined),
|
||||||
getTtsProvider: vi.fn(() => "openai"),
|
getTtsProvider: vi.fn(() => "openai"),
|
||||||
isTtsEnabled: vi.fn(() => true),
|
isTtsEnabled: vi.fn(() => true),
|
||||||
isTtsProviderConfigured: vi.fn(() => true),
|
isTtsProviderConfigured: vi.fn(() => true),
|
||||||
|
listTtsPersonas: vi.fn(() => []),
|
||||||
resolveExplicitTtsOverrides:
|
resolveExplicitTtsOverrides:
|
||||||
mocks.resolveExplicitTtsOverrides as typeof import("../../tts/tts.js").resolveExplicitTtsOverrides,
|
mocks.resolveExplicitTtsOverrides as typeof import("../../tts/tts.js").resolveExplicitTtsOverrides,
|
||||||
resolveTtsAutoMode: vi.fn(() => false),
|
resolveTtsAutoMode: vi.fn(() => false),
|
||||||
@@ -35,6 +37,7 @@ vi.mock("../../tts/tts.js", () => ({
|
|||||||
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
|
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
|
||||||
resolveTtsProviderOrder: vi.fn(() => ["openai"]),
|
resolveTtsProviderOrder: vi.fn(() => ["openai"]),
|
||||||
setTtsEnabled: vi.fn(),
|
setTtsEnabled: vi.fn(),
|
||||||
|
setTtsPersona: vi.fn(),
|
||||||
setTtsProvider: vi.fn(),
|
setTtsProvider: vi.fn(),
|
||||||
textToSpeech: mocks.textToSpeech as typeof import("../../tts/tts.js").textToSpeech,
|
textToSpeech: mocks.textToSpeech as typeof import("../../tts/tts.js").textToSpeech,
|
||||||
}));
|
}));
|
||||||
|
|||||||
@@ -7,15 +7,18 @@ import {
|
|||||||
} from "../../tts/provider-registry.js";
|
} from "../../tts/provider-registry.js";
|
||||||
import {
|
import {
|
||||||
getResolvedSpeechProviderConfig,
|
getResolvedSpeechProviderConfig,
|
||||||
|
getTtsPersona,
|
||||||
getTtsProvider,
|
getTtsProvider,
|
||||||
isTtsEnabled,
|
isTtsEnabled,
|
||||||
isTtsProviderConfigured,
|
isTtsProviderConfigured,
|
||||||
|
listTtsPersonas,
|
||||||
resolveExplicitTtsOverrides,
|
resolveExplicitTtsOverrides,
|
||||||
resolveTtsAutoMode,
|
resolveTtsAutoMode,
|
||||||
resolveTtsConfig,
|
resolveTtsConfig,
|
||||||
resolveTtsPrefsPath,
|
resolveTtsPrefsPath,
|
||||||
resolveTtsProviderOrder,
|
resolveTtsProviderOrder,
|
||||||
setTtsEnabled,
|
setTtsEnabled,
|
||||||
|
setTtsPersona,
|
||||||
setTtsProvider,
|
setTtsProvider,
|
||||||
textToSpeech,
|
textToSpeech,
|
||||||
} from "../../tts/tts.js";
|
} from "../../tts/tts.js";
|
||||||
@@ -30,6 +33,7 @@ export const ttsHandlers: GatewayRequestHandlers = {
|
|||||||
const config = resolveTtsConfig(cfg);
|
const config = resolveTtsConfig(cfg);
|
||||||
const prefsPath = resolveTtsPrefsPath(config);
|
const prefsPath = resolveTtsPrefsPath(config);
|
||||||
const provider = getTtsProvider(config, prefsPath);
|
const provider = getTtsProvider(config, prefsPath);
|
||||||
|
const persona = getTtsPersona(config, prefsPath);
|
||||||
const autoMode = resolveTtsAutoMode({ config, prefsPath });
|
const autoMode = resolveTtsAutoMode({ config, prefsPath });
|
||||||
const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
|
const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
|
||||||
.slice(1)
|
.slice(1)
|
||||||
@@ -47,6 +51,13 @@ export const ttsHandlers: GatewayRequestHandlers = {
|
|||||||
enabled: isTtsEnabled(config, prefsPath),
|
enabled: isTtsEnabled(config, prefsPath),
|
||||||
auto: autoMode,
|
auto: autoMode,
|
||||||
provider,
|
provider,
|
||||||
|
persona: persona?.id ?? null,
|
||||||
|
personas: listTtsPersonas(config).map((entry) => ({
|
||||||
|
id: entry.id,
|
||||||
|
label: entry.label,
|
||||||
|
description: entry.description,
|
||||||
|
provider: entry.provider,
|
||||||
|
})),
|
||||||
fallbackProvider: fallbackProviders[0] ?? null,
|
fallbackProvider: fallbackProviders[0] ?? null,
|
||||||
fallbackProviders,
|
fallbackProviders,
|
||||||
prefsPath,
|
prefsPath,
|
||||||
@@ -157,6 +168,58 @@ export const ttsHandlers: GatewayRequestHandlers = {
|
|||||||
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
|
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
"tts.personas": async ({ respond }) => {
|
||||||
|
try {
|
||||||
|
const cfg = loadConfig();
|
||||||
|
const config = resolveTtsConfig(cfg);
|
||||||
|
const prefsPath = resolveTtsPrefsPath(config);
|
||||||
|
const active = getTtsPersona(config, prefsPath);
|
||||||
|
respond(true, {
|
||||||
|
active: active?.id ?? null,
|
||||||
|
personas: listTtsPersonas(config).map((persona) => ({
|
||||||
|
id: persona.id,
|
||||||
|
label: persona.label,
|
||||||
|
description: persona.description,
|
||||||
|
provider: persona.provider,
|
||||||
|
fallbackPolicy: persona.fallbackPolicy,
|
||||||
|
providers: Object.keys(persona.providers ?? {}),
|
||||||
|
})),
|
||||||
|
});
|
||||||
|
} catch (err) {
|
||||||
|
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"tts.setPersona": async ({ params, respond }) => {
|
||||||
|
const cfg = loadConfig();
|
||||||
|
const rawPersona = normalizeOptionalString(params.persona);
|
||||||
|
try {
|
||||||
|
const config = resolveTtsConfig(cfg);
|
||||||
|
const prefsPath = resolveTtsPrefsPath(config);
|
||||||
|
if (!rawPersona || ["off", "none", "default"].includes(rawPersona.toLowerCase())) {
|
||||||
|
setTtsPersona(prefsPath, null);
|
||||||
|
respond(true, { persona: null });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const persona = listTtsPersonas(config).find(
|
||||||
|
(entry) => entry.id === rawPersona.toLowerCase(),
|
||||||
|
);
|
||||||
|
if (!persona) {
|
||||||
|
respond(
|
||||||
|
false,
|
||||||
|
undefined,
|
||||||
|
errorShape(
|
||||||
|
ErrorCodes.INVALID_REQUEST,
|
||||||
|
"Invalid persona. Use a configured TTS persona id.",
|
||||||
|
),
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
setTtsPersona(prefsPath, persona.id);
|
||||||
|
respond(true, { persona: persona.id });
|
||||||
|
} catch (err) {
|
||||||
|
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
|
||||||
|
}
|
||||||
|
},
|
||||||
"tts.providers": async ({ respond }) => {
|
"tts.providers": async ({ respond }) => {
|
||||||
try {
|
try {
|
||||||
const cfg = loadConfig();
|
const cfg = loadConfig();
|
||||||
|
|||||||
@@ -133,10 +133,15 @@ export type {
|
|||||||
TelegramInlineButtonsScope,
|
TelegramInlineButtonsScope,
|
||||||
TelegramNetworkConfig,
|
TelegramNetworkConfig,
|
||||||
TelegramTopicConfig,
|
TelegramTopicConfig,
|
||||||
|
ResolvedTtsPersona,
|
||||||
TtsAutoMode,
|
TtsAutoMode,
|
||||||
TtsConfig,
|
TtsConfig,
|
||||||
TtsMode,
|
TtsMode,
|
||||||
TtsModelOverrideConfig,
|
TtsModelOverrideConfig,
|
||||||
|
TtsPersonaConfig,
|
||||||
|
TtsPersonaFallbackPolicy,
|
||||||
|
TtsPersonaPromptConfig,
|
||||||
|
TtsPersonaRewriteConfig,
|
||||||
TtsProvider,
|
TtsProvider,
|
||||||
} from "../config/types.js";
|
} from "../config/types.js";
|
||||||
export {
|
export {
|
||||||
|
|||||||
@@ -9,11 +9,14 @@ export type {
|
|||||||
SpeechModelOverridePolicy,
|
SpeechModelOverridePolicy,
|
||||||
SpeechProviderConfig,
|
SpeechProviderConfig,
|
||||||
SpeechProviderConfiguredContext,
|
SpeechProviderConfiguredContext,
|
||||||
|
SpeechProviderPreparedSynthesis,
|
||||||
|
SpeechProviderPrepareSynthesisContext,
|
||||||
SpeechProviderResolveConfigContext,
|
SpeechProviderResolveConfigContext,
|
||||||
SpeechProviderResolveTalkConfigContext,
|
SpeechProviderResolveTalkConfigContext,
|
||||||
SpeechProviderResolveTalkOverridesContext,
|
SpeechProviderResolveTalkOverridesContext,
|
||||||
SpeechProviderOverrides,
|
SpeechProviderOverrides,
|
||||||
SpeechSynthesisRequest,
|
SpeechSynthesisRequest,
|
||||||
|
SpeechSynthesisTarget,
|
||||||
SpeechTelephonySynthesisRequest,
|
SpeechTelephonySynthesisRequest,
|
||||||
SpeechVoiceOption,
|
SpeechVoiceOption,
|
||||||
TtsDirectiveOverrides,
|
TtsDirectiveOverrides,
|
||||||
@@ -35,6 +38,7 @@ export {
|
|||||||
listSpeechProviders,
|
listSpeechProviders,
|
||||||
normalizeSpeechProviderId,
|
normalizeSpeechProviderId,
|
||||||
} from "../tts/provider-registry.js";
|
} from "../tts/provider-registry.js";
|
||||||
|
export { resolveEffectiveTtsConfig } from "../tts/tts-config.js";
|
||||||
export { normalizeTtsAutoMode, TTS_AUTO_MODES } from "../tts/tts-auto-mode.js";
|
export { normalizeTtsAutoMode, TTS_AUTO_MODES } from "../tts/tts-auto-mode.js";
|
||||||
export {
|
export {
|
||||||
asBoolean,
|
asBoolean,
|
||||||
|
|||||||
@@ -12,11 +12,14 @@ export type {
|
|||||||
SpeechModelOverridePolicy,
|
SpeechModelOverridePolicy,
|
||||||
SpeechProviderConfig,
|
SpeechProviderConfig,
|
||||||
SpeechProviderConfiguredContext,
|
SpeechProviderConfiguredContext,
|
||||||
|
SpeechProviderPreparedSynthesis,
|
||||||
|
SpeechProviderPrepareSynthesisContext,
|
||||||
SpeechProviderResolveConfigContext,
|
SpeechProviderResolveConfigContext,
|
||||||
SpeechProviderResolveTalkConfigContext,
|
SpeechProviderResolveTalkConfigContext,
|
||||||
SpeechProviderResolveTalkOverridesContext,
|
SpeechProviderResolveTalkOverridesContext,
|
||||||
SpeechProviderOverrides,
|
SpeechProviderOverrides,
|
||||||
SpeechSynthesisRequest,
|
SpeechSynthesisRequest,
|
||||||
|
SpeechSynthesisTarget,
|
||||||
SpeechTelephonySynthesisRequest,
|
SpeechTelephonySynthesisRequest,
|
||||||
SpeechVoiceOption,
|
SpeechVoiceOption,
|
||||||
TtsDirectiveOverrides,
|
TtsDirectiveOverrides,
|
||||||
|
|||||||
@@ -40,6 +40,10 @@ export const getTtsMaxLength: FacadeModule["getTtsMaxLength"] = createLazyFacade
|
|||||||
loadFacadeModule,
|
loadFacadeModule,
|
||||||
"getTtsMaxLength",
|
"getTtsMaxLength",
|
||||||
);
|
);
|
||||||
|
export const getTtsPersona: FacadeModule["getTtsPersona"] = createLazyFacadeRuntimeValue(
|
||||||
|
loadFacadeModule,
|
||||||
|
"getTtsPersona",
|
||||||
|
);
|
||||||
export const getTtsProvider: FacadeModule["getTtsProvider"] = createLazyFacadeRuntimeValue(
|
export const getTtsProvider: FacadeModule["getTtsProvider"] = createLazyFacadeRuntimeValue(
|
||||||
loadFacadeModule,
|
loadFacadeModule,
|
||||||
"getTtsProvider",
|
"getTtsProvider",
|
||||||
@@ -56,6 +60,10 @@ export const listSpeechVoices: FacadeModule["listSpeechVoices"] = createLazyFaca
|
|||||||
loadFacadeModule,
|
loadFacadeModule,
|
||||||
"listSpeechVoices",
|
"listSpeechVoices",
|
||||||
);
|
);
|
||||||
|
export const listTtsPersonas: FacadeModule["listTtsPersonas"] = createLazyFacadeRuntimeValue(
|
||||||
|
loadFacadeModule,
|
||||||
|
"listTtsPersonas",
|
||||||
|
);
|
||||||
export const maybeApplyTtsToPayload: FacadeModule["maybeApplyTtsToPayload"] =
|
export const maybeApplyTtsToPayload: FacadeModule["maybeApplyTtsToPayload"] =
|
||||||
createLazyFacadeRuntimeValue(loadFacadeModule, "maybeApplyTtsToPayload");
|
createLazyFacadeRuntimeValue(loadFacadeModule, "maybeApplyTtsToPayload");
|
||||||
export const resolveExplicitTtsOverrides: FacadeModule["resolveExplicitTtsOverrides"] =
|
export const resolveExplicitTtsOverrides: FacadeModule["resolveExplicitTtsOverrides"] =
|
||||||
@@ -90,6 +98,10 @@ export const setTtsMaxLength: FacadeModule["setTtsMaxLength"] = createLazyFacade
|
|||||||
loadFacadeModule,
|
loadFacadeModule,
|
||||||
"setTtsMaxLength",
|
"setTtsMaxLength",
|
||||||
);
|
);
|
||||||
|
export const setTtsPersona: FacadeModule["setTtsPersona"] = createLazyFacadeRuntimeValue(
|
||||||
|
loadFacadeModule,
|
||||||
|
"setTtsPersona",
|
||||||
|
);
|
||||||
export const setTtsProvider: FacadeModule["setTtsProvider"] = createLazyFacadeRuntimeValue(
|
export const setTtsProvider: FacadeModule["setTtsProvider"] = createLazyFacadeRuntimeValue(
|
||||||
loadFacadeModule,
|
loadFacadeModule,
|
||||||
"setTtsProvider",
|
"setTtsProvider",
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
import type { OpenClawConfig } from "../config/types.openclaw.js";
|
import type { OpenClawConfig } from "../config/types.openclaw.js";
|
||||||
import type { TtsAutoMode, TtsProvider } from "../config/types.tts.js";
|
import type { ResolvedTtsPersona, TtsAutoMode, TtsProvider } from "../config/types.tts.js";
|
||||||
import type {
|
import type {
|
||||||
SpeechProviderConfig,
|
SpeechProviderConfig,
|
||||||
SpeechVoiceOption,
|
SpeechVoiceOption,
|
||||||
@@ -24,6 +24,8 @@ export type TtsProviderAttempt = {
|
|||||||
provider: string;
|
provider: string;
|
||||||
outcome: "success" | "skipped" | "failed";
|
outcome: "success" | "skipped" | "failed";
|
||||||
reasonCode: TtsAttemptReasonCode;
|
reasonCode: TtsAttemptReasonCode;
|
||||||
|
persona?: string;
|
||||||
|
personaBinding?: "applied" | "missing" | "none";
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
error?: string;
|
error?: string;
|
||||||
};
|
};
|
||||||
@@ -34,6 +36,7 @@ export type TtsStatusEntry = {
|
|||||||
textLength: number;
|
textLength: number;
|
||||||
summarized: boolean;
|
summarized: boolean;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -126,6 +129,7 @@ export type TtsResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -141,6 +145,7 @@ export type TtsSynthesisResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -156,6 +161,7 @@ export type TtsTelephonyResult = {
|
|||||||
error?: string;
|
error?: string;
|
||||||
latencyMs?: number;
|
latencyMs?: number;
|
||||||
provider?: string;
|
provider?: string;
|
||||||
|
persona?: string;
|
||||||
fallbackFrom?: string;
|
fallbackFrom?: string;
|
||||||
attemptedProviders?: string[];
|
attemptedProviders?: string[];
|
||||||
attempts?: TtsProviderAttempt[];
|
attempts?: TtsProviderAttempt[];
|
||||||
@@ -179,6 +185,7 @@ export type TtsRuntimeFacade = {
|
|||||||
cfg?: OpenClawConfig,
|
cfg?: OpenClawConfig,
|
||||||
) => SpeechProviderConfig;
|
) => SpeechProviderConfig;
|
||||||
getTtsMaxLength: (prefsPath: string) => number;
|
getTtsMaxLength: (prefsPath: string) => number;
|
||||||
|
getTtsPersona: (config: ResolvedTtsConfig, prefsPath: string) => ResolvedTtsPersona | undefined;
|
||||||
getTtsProvider: (config: ResolvedTtsConfig, prefsPath: string) => TtsProvider;
|
getTtsProvider: (config: ResolvedTtsConfig, prefsPath: string) => TtsProvider;
|
||||||
isSummarizationEnabled: (prefsPath: string) => boolean;
|
isSummarizationEnabled: (prefsPath: string) => boolean;
|
||||||
isTtsEnabled: (config: ResolvedTtsConfig, prefsPath: string, sessionAuto?: string) => boolean;
|
isTtsEnabled: (config: ResolvedTtsConfig, prefsPath: string, sessionAuto?: string) => boolean;
|
||||||
@@ -188,6 +195,7 @@ export type TtsRuntimeFacade = {
|
|||||||
cfg?: OpenClawConfig,
|
cfg?: OpenClawConfig,
|
||||||
) => boolean;
|
) => boolean;
|
||||||
listSpeechVoices: ListSpeechVoices;
|
listSpeechVoices: ListSpeechVoices;
|
||||||
|
listTtsPersonas: (config: ResolvedTtsConfig) => ResolvedTtsPersona[];
|
||||||
maybeApplyTtsToPayload: (params: MaybeApplyTtsToPayloadParams) => Promise<ReplyPayload>;
|
maybeApplyTtsToPayload: (params: MaybeApplyTtsToPayloadParams) => Promise<ReplyPayload>;
|
||||||
resolveExplicitTtsOverrides: (params: ResolveExplicitTtsOverridesParams) => TtsDirectiveOverrides;
|
resolveExplicitTtsOverrides: (params: ResolveExplicitTtsOverridesParams) => TtsDirectiveOverrides;
|
||||||
resolveTtsAutoMode: (params: ResolveTtsAutoModeParams) => TtsAutoMode;
|
resolveTtsAutoMode: (params: ResolveTtsAutoModeParams) => TtsAutoMode;
|
||||||
@@ -199,6 +207,7 @@ export type TtsRuntimeFacade = {
|
|||||||
setTtsAutoMode: (prefsPath: string, mode: TtsAutoMode) => void;
|
setTtsAutoMode: (prefsPath: string, mode: TtsAutoMode) => void;
|
||||||
setTtsEnabled: (prefsPath: string, enabled: boolean) => void;
|
setTtsEnabled: (prefsPath: string, enabled: boolean) => void;
|
||||||
setTtsMaxLength: (prefsPath: string, maxLength: number) => void;
|
setTtsMaxLength: (prefsPath: string, maxLength: number) => void;
|
||||||
|
setTtsPersona: (prefsPath: string, persona: string | null | undefined) => void;
|
||||||
setTtsProvider: (prefsPath: string, provider: TtsProvider) => void;
|
setTtsProvider: (prefsPath: string, provider: TtsProvider) => void;
|
||||||
synthesizeSpeech: (params: TtsRequestParams) => Promise<TtsSynthesisResult>;
|
synthesizeSpeech: (params: TtsRequestParams) => Promise<TtsSynthesisResult>;
|
||||||
textToSpeech: TextToSpeech;
|
textToSpeech: TextToSpeech;
|
||||||
|
|||||||
@@ -65,6 +65,8 @@ import type {
|
|||||||
SpeechProviderResolveTalkConfigContext,
|
SpeechProviderResolveTalkConfigContext,
|
||||||
SpeechProviderResolveTalkOverridesContext,
|
SpeechProviderResolveTalkOverridesContext,
|
||||||
SpeechListVoicesRequest,
|
SpeechListVoicesRequest,
|
||||||
|
SpeechProviderPrepareSynthesisContext,
|
||||||
|
SpeechProviderPreparedSynthesis,
|
||||||
SpeechProviderId,
|
SpeechProviderId,
|
||||||
SpeechSynthesisRequest,
|
SpeechSynthesisRequest,
|
||||||
SpeechSynthesisResult,
|
SpeechSynthesisResult,
|
||||||
@@ -1724,6 +1726,12 @@ export type SpeechProviderPlugin = {
|
|||||||
resolveTalkOverrides?: (
|
resolveTalkOverrides?: (
|
||||||
ctx: SpeechProviderResolveTalkOverridesContext,
|
ctx: SpeechProviderResolveTalkOverridesContext,
|
||||||
) => SpeechProviderConfig | undefined;
|
) => SpeechProviderConfig | undefined;
|
||||||
|
prepareSynthesis?: (
|
||||||
|
ctx: SpeechProviderPrepareSynthesisContext,
|
||||||
|
) =>
|
||||||
|
| SpeechProviderPreparedSynthesis
|
||||||
|
| undefined
|
||||||
|
| Promise<SpeechProviderPreparedSynthesis | undefined>;
|
||||||
isConfigured: (ctx: SpeechProviderConfiguredContext) => boolean;
|
isConfigured: (ctx: SpeechProviderConfiguredContext) => boolean;
|
||||||
synthesize: (req: SpeechSynthesisRequest) => Promise<SpeechSynthesisResult>;
|
synthesize: (req: SpeechSynthesisRequest) => Promise<SpeechSynthesisResult>;
|
||||||
synthesizeTelephony?: (
|
synthesizeTelephony?: (
|
||||||
|
|||||||
@@ -465,6 +465,9 @@ const formatVoiceModeLine = (
|
|||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
const parts = [`🔊 Voice: ${snapshot.autoMode}`, `provider=${snapshot.provider}`];
|
const parts = [`🔊 Voice: ${snapshot.autoMode}`, `provider=${snapshot.provider}`];
|
||||||
|
if (snapshot.persona) {
|
||||||
|
parts.push(`persona=${snapshot.persona}`);
|
||||||
|
}
|
||||||
if (snapshot.displayName) {
|
if (snapshot.displayName) {
|
||||||
parts.push(`name=${snapshot.displayName}`);
|
parts.push(`name=${snapshot.displayName}`);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,9 +1,10 @@
|
|||||||
import type { TalkProviderConfig } from "../config/types.gateway.js";
|
import type { TalkProviderConfig } from "../config/types.gateway.js";
|
||||||
import type { OpenClawConfig } from "../config/types.js";
|
import type { OpenClawConfig } from "../config/types.js";
|
||||||
|
import type { ResolvedTtsPersona } from "../config/types.tts.js";
|
||||||
|
|
||||||
export type SpeechProviderId = string;
|
export type SpeechProviderId = string;
|
||||||
|
|
||||||
export type SpeechSynthesisTarget = "audio-file" | "voice-note";
|
export type SpeechSynthesisTarget = "audio-file" | "voice-note" | "telephony";
|
||||||
|
|
||||||
export type SpeechProviderConfig = Record<string, unknown>;
|
export type SpeechProviderConfig = Record<string, unknown>;
|
||||||
|
|
||||||
@@ -69,6 +70,23 @@ export type SpeechTelephonySynthesisResult = {
|
|||||||
sampleRate: number;
|
sampleRate: number;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
export type SpeechProviderPrepareSynthesisContext = {
|
||||||
|
text: string;
|
||||||
|
cfg: OpenClawConfig;
|
||||||
|
providerConfig: SpeechProviderConfig;
|
||||||
|
providerOverrides?: SpeechProviderOverrides;
|
||||||
|
persona?: ResolvedTtsPersona;
|
||||||
|
personaProviderConfig?: SpeechProviderConfig;
|
||||||
|
target: SpeechSynthesisTarget;
|
||||||
|
timeoutMs: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type SpeechProviderPreparedSynthesis = {
|
||||||
|
text?: string;
|
||||||
|
providerConfig?: SpeechProviderConfig;
|
||||||
|
providerOverrides?: SpeechProviderOverrides;
|
||||||
|
};
|
||||||
|
|
||||||
export type SpeechVoiceOption = {
|
export type SpeechVoiceOption = {
|
||||||
id: string;
|
id: string;
|
||||||
name?: string;
|
name?: string;
|
||||||
|
|||||||
@@ -138,6 +138,44 @@ describe("resolveStatusTtsSnapshot", () => {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it("reports per-agent persona provider over global persona", async () => {
|
||||||
|
await withStatusTempHome(async () => {
|
||||||
|
expect(
|
||||||
|
resolveStatusTtsSnapshot({
|
||||||
|
cfg: {
|
||||||
|
messages: {
|
||||||
|
tts: {
|
||||||
|
auto: "always",
|
||||||
|
persona: "alfred",
|
||||||
|
personas: {
|
||||||
|
alfred: { provider: "google" },
|
||||||
|
jarvis: { provider: "edge" },
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
agents: {
|
||||||
|
list: [
|
||||||
|
{
|
||||||
|
id: "reader",
|
||||||
|
tts: {
|
||||||
|
persona: "jarvis",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
} as OpenClawConfig,
|
||||||
|
agentId: "reader",
|
||||||
|
}),
|
||||||
|
).toEqual({
|
||||||
|
autoMode: "always",
|
||||||
|
provider: "microsoft",
|
||||||
|
persona: "jarvis",
|
||||||
|
maxLength: 1500,
|
||||||
|
summarize: true,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
it("reports configured OpenAI TTS model, voice, and sanitized custom endpoint", async () => {
|
it("reports configured OpenAI TTS model, voice, and sanitized custom endpoint", async () => {
|
||||||
await withStatusTempHome(async () => {
|
await withStatusTempHome(async () => {
|
||||||
expect(
|
expect(
|
||||||
|
|||||||
@@ -20,6 +20,7 @@ type TtsUserPrefs = {
|
|||||||
auto?: TtsAutoMode;
|
auto?: TtsAutoMode;
|
||||||
enabled?: boolean;
|
enabled?: boolean;
|
||||||
provider?: TtsProvider;
|
provider?: TtsProvider;
|
||||||
|
persona?: string | null;
|
||||||
maxLength?: number;
|
maxLength?: number;
|
||||||
summarize?: boolean;
|
summarize?: boolean;
|
||||||
};
|
};
|
||||||
@@ -31,6 +32,7 @@ type TtsStatusSnapshot = {
|
|||||||
displayName?: string;
|
displayName?: string;
|
||||||
model?: string;
|
model?: string;
|
||||||
voice?: string;
|
voice?: string;
|
||||||
|
persona?: string;
|
||||||
baseUrl?: string;
|
baseUrl?: string;
|
||||||
customBaseUrl?: boolean;
|
customBaseUrl?: boolean;
|
||||||
maxLength: number;
|
maxLength: number;
|
||||||
@@ -51,6 +53,27 @@ function normalizeConfiguredSpeechProviderId(
|
|||||||
return normalized === "edge" ? "microsoft" : normalized;
|
return normalized === "edge" ? "microsoft" : normalized;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function normalizeTtsPersonaId(personaId: string | null | undefined): string | undefined {
|
||||||
|
return normalizeOptionalLowercaseString(personaId ?? undefined);
|
||||||
|
}
|
||||||
|
|
||||||
|
function resolvePersonaPreferredProvider(
|
||||||
|
raw: TtsConfig,
|
||||||
|
personaId: string | undefined,
|
||||||
|
): TtsProvider | undefined {
|
||||||
|
if (!personaId || !raw.personas) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
for (const [id, persona] of Object.entries(raw.personas)) {
|
||||||
|
if (normalizeTtsPersonaId(id) !== personaId) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
const provider = normalizeConfiguredSpeechProviderId(persona.provider) ?? persona.provider;
|
||||||
|
return normalizeOptionalString(provider);
|
||||||
|
}
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
|
||||||
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
|
function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
|
||||||
const configuredPath = normalizeOptionalString(prefsPath);
|
const configuredPath = normalizeOptionalString(prefsPath);
|
||||||
if (configuredPath) {
|
if (configuredPath) {
|
||||||
@@ -212,8 +235,13 @@ export function resolveStatusTtsSnapshot(params: {
|
|||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const persona =
|
||||||
|
prefs.tts && Object.prototype.hasOwnProperty.call(prefs.tts, "persona")
|
||||||
|
? normalizeTtsPersonaId(prefs.tts.persona)
|
||||||
|
: normalizeTtsPersonaId(raw.persona);
|
||||||
const provider =
|
const provider =
|
||||||
normalizeConfiguredSpeechProviderId(prefs.tts?.provider) ??
|
normalizeConfiguredSpeechProviderId(prefs.tts?.provider) ??
|
||||||
|
resolvePersonaPreferredProvider(raw, persona) ??
|
||||||
normalizeConfiguredSpeechProviderId(raw.provider) ??
|
normalizeConfiguredSpeechProviderId(raw.provider) ??
|
||||||
"auto";
|
"auto";
|
||||||
|
|
||||||
@@ -221,6 +249,7 @@ export function resolveStatusTtsSnapshot(params: {
|
|||||||
autoMode,
|
autoMode,
|
||||||
provider,
|
provider,
|
||||||
...resolveStatusProviderDetails(raw, provider),
|
...resolveStatusProviderDetails(raw, provider),
|
||||||
|
...(persona ? { persona } : {}),
|
||||||
maxLength: prefs.tts?.maxLength ?? DEFAULT_TTS_MAX_LENGTH,
|
maxLength: prefs.tts?.maxLength ?? DEFAULT_TTS_MAX_LENGTH,
|
||||||
summarize: prefs.tts?.summarize ?? DEFAULT_TTS_SUMMARIZE,
|
summarize: prefs.tts?.summarize ?? DEFAULT_TTS_SUMMARIZE,
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -1,5 +1,11 @@
|
|||||||
import type { OpenClawConfig } from "../config/types.openclaw.js";
|
import type { OpenClawConfig } from "../config/types.openclaw.js";
|
||||||
import type { TtsAutoMode, TtsConfig, TtsMode, TtsProvider } from "../config/types.tts.js";
|
import type {
|
||||||
|
ResolvedTtsPersona,
|
||||||
|
TtsAutoMode,
|
||||||
|
TtsConfig,
|
||||||
|
TtsMode,
|
||||||
|
TtsProvider,
|
||||||
|
} from "../config/types.tts.js";
|
||||||
import type { SpeechModelOverridePolicy, SpeechProviderConfig } from "./provider-types.js";
|
import type { SpeechModelOverridePolicy, SpeechProviderConfig } from "./provider-types.js";
|
||||||
|
|
||||||
export type ResolvedTtsModelOverrides = SpeechModelOverridePolicy;
|
export type ResolvedTtsModelOverrides = SpeechModelOverridePolicy;
|
||||||
@@ -9,6 +15,8 @@ export type ResolvedTtsConfig = {
|
|||||||
mode: TtsMode;
|
mode: TtsMode;
|
||||||
provider: TtsProvider;
|
provider: TtsProvider;
|
||||||
providerSource: "config" | "default";
|
providerSource: "config" | "default";
|
||||||
|
persona?: string;
|
||||||
|
personas: Record<string, ResolvedTtsPersona>;
|
||||||
summaryModel?: string;
|
summaryModel?: string;
|
||||||
modelOverrides: ResolvedTtsModelOverrides;
|
modelOverrides: ResolvedTtsModelOverrides;
|
||||||
providerConfigs: Record<string, SpeechProviderConfig>;
|
providerConfigs: Record<string, SpeechProviderConfig>;
|
||||||
|
|||||||
@@ -4,11 +4,13 @@ export {
|
|||||||
getLastTtsAttempt,
|
getLastTtsAttempt,
|
||||||
getResolvedSpeechProviderConfig,
|
getResolvedSpeechProviderConfig,
|
||||||
getTtsMaxLength,
|
getTtsMaxLength,
|
||||||
|
getTtsPersona,
|
||||||
getTtsProvider,
|
getTtsProvider,
|
||||||
isSummarizationEnabled,
|
isSummarizationEnabled,
|
||||||
isTtsEnabled,
|
isTtsEnabled,
|
||||||
isTtsProviderConfigured,
|
isTtsProviderConfigured,
|
||||||
listSpeechVoices,
|
listSpeechVoices,
|
||||||
|
listTtsPersonas,
|
||||||
maybeApplyTtsToPayload,
|
maybeApplyTtsToPayload,
|
||||||
resolveExplicitTtsOverrides,
|
resolveExplicitTtsOverrides,
|
||||||
resolveTtsAutoMode,
|
resolveTtsAutoMode,
|
||||||
@@ -20,6 +22,7 @@ export {
|
|||||||
setTtsAutoMode,
|
setTtsAutoMode,
|
||||||
setTtsEnabled,
|
setTtsEnabled,
|
||||||
setTtsMaxLength,
|
setTtsMaxLength,
|
||||||
|
setTtsPersona,
|
||||||
setTtsProvider,
|
setTtsProvider,
|
||||||
synthesizeSpeech,
|
synthesizeSpeech,
|
||||||
textToSpeech,
|
textToSpeech,
|
||||||
|
|||||||
@@ -15,6 +15,7 @@ const providerHttpMocks = vi.hoisted(() => ({
|
|||||||
fetchWithTimeoutMock: vi.fn(),
|
fetchWithTimeoutMock: vi.fn(),
|
||||||
pollProviderOperationJsonMock: vi.fn(),
|
pollProviderOperationJsonMock: vi.fn(),
|
||||||
assertOkOrThrowHttpErrorMock: vi.fn(async (_response: Response, _label: string) => {}),
|
assertOkOrThrowHttpErrorMock: vi.fn(async (_response: Response, _label: string) => {}),
|
||||||
|
assertOkOrThrowProviderErrorMock: vi.fn(async (_response: Response, _label: string) => {}),
|
||||||
resolveProviderHttpRequestConfigMock: vi.fn((params: ResolveProviderHttpRequestConfigParams) => ({
|
resolveProviderHttpRequestConfigMock: vi.fn((params: ResolveProviderHttpRequestConfigParams) => ({
|
||||||
baseUrl: params.baseUrl ?? params.defaultBaseUrl,
|
baseUrl: params.baseUrl ?? params.defaultBaseUrl,
|
||||||
allowPrivateNetwork: false,
|
allowPrivateNetwork: false,
|
||||||
@@ -55,6 +56,7 @@ vi.mock("openclaw/plugin-sdk/provider-auth-runtime", () => ({
|
|||||||
|
|
||||||
vi.mock("openclaw/plugin-sdk/provider-http", () => ({
|
vi.mock("openclaw/plugin-sdk/provider-http", () => ({
|
||||||
assertOkOrThrowHttpError: providerHttpMocks.assertOkOrThrowHttpErrorMock,
|
assertOkOrThrowHttpError: providerHttpMocks.assertOkOrThrowHttpErrorMock,
|
||||||
|
assertOkOrThrowProviderError: providerHttpMocks.assertOkOrThrowProviderErrorMock,
|
||||||
createProviderOperationDeadline: ({
|
createProviderOperationDeadline: ({
|
||||||
label,
|
label,
|
||||||
timeoutMs,
|
timeoutMs,
|
||||||
@@ -85,6 +87,7 @@ export function installProviderHttpMockCleanup(): void {
|
|||||||
providerHttpMocks.fetchWithTimeoutMock.mockReset();
|
providerHttpMocks.fetchWithTimeoutMock.mockReset();
|
||||||
providerHttpMocks.pollProviderOperationJsonMock.mockClear();
|
providerHttpMocks.pollProviderOperationJsonMock.mockClear();
|
||||||
providerHttpMocks.assertOkOrThrowHttpErrorMock.mockClear();
|
providerHttpMocks.assertOkOrThrowHttpErrorMock.mockClear();
|
||||||
|
providerHttpMocks.assertOkOrThrowProviderErrorMock.mockClear();
|
||||||
providerHttpMocks.resolveProviderHttpRequestConfigMock.mockClear();
|
providerHttpMocks.resolveProviderHttpRequestConfigMock.mockClear();
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -499,6 +499,7 @@ function createResolvedSummarizationConfig(cfg: OpenClawConfig): ResolvedTtsConf
|
|||||||
allowSeed: true,
|
allowSeed: true,
|
||||||
},
|
},
|
||||||
providerConfigs: {},
|
providerConfigs: {},
|
||||||
|
personas: {},
|
||||||
prefsPath: typeof rawConfig.prefsPath === "string" ? rawConfig.prefsPath : undefined,
|
prefsPath: typeof rawConfig.prefsPath === "string" ? rawConfig.prefsPath : undefined,
|
||||||
maxTextLength: typeof rawConfig.maxTextLength === "number" ? rawConfig.maxTextLength : 4096,
|
maxTextLength: typeof rawConfig.maxTextLength === "number" ? rawConfig.maxTextLength : 4096,
|
||||||
timeoutMs: typeof rawConfig.timeoutMs === "number" ? rawConfig.timeoutMs : 30_000,
|
timeoutMs: typeof rawConfig.timeoutMs === "number" ? rawConfig.timeoutMs : 30_000,
|
||||||
@@ -715,6 +716,7 @@ export function describeTtsConfigContract() {
|
|||||||
microsoft: {},
|
microsoft: {},
|
||||||
elevenlabs: {},
|
elevenlabs: {},
|
||||||
},
|
},
|
||||||
|
personas: {},
|
||||||
prefsPath: undefined,
|
prefsPath: undefined,
|
||||||
maxTextLength: 4000,
|
maxTextLength: 4000,
|
||||||
timeoutMs: 30_000,
|
timeoutMs: 30_000,
|
||||||
|
|||||||
Reference in New Issue
Block a user