mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 04:50:44 +00:00
feat: add Gradium text-to-speech provider (#64958)
Adds the Gradium bundled plugin with TTS and speech-provider registration, docs, label routing, and focused/live coverage. Also carries the current main lint cleanup needed for the rebased CI lane. Co-authored-by: laurent <laurent.mazare@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
4
.github/labeler.yml
vendored
4
.github/labeler.yml
vendored
@@ -387,3 +387,7 @@
|
||||
- changed-files:
|
||||
- any-glob-to-any-file:
|
||||
- "extensions/fal/**"
|
||||
"extensions: gradium":
|
||||
- changed-files:
|
||||
- any-glob-to-any-file:
|
||||
- "extensions/gradium/**"
|
||||
|
||||
@@ -6,6 +6,7 @@ Docs: https://docs.openclaw.ai
|
||||
|
||||
### Changes
|
||||
|
||||
- Gradium: add a bundled text-to-speech provider with voice-note and telephony output support. (#64958) Thanks @LaurentMazare.
|
||||
- TUI/dependencies: remove direct `cli-highlight` usage from the OpenClaw TUI code-block renderer, keeping themed code coloring without the extra root dependency. Thanks @vincentkoc.
|
||||
- Diagnostics/OTEL: export run, model-call, and tool-execution diagnostic lifecycle events as OTEL spans without retaining live span state. Thanks @vincentkoc.
|
||||
- Plugins/activation: expose activation plan reasons and a richer plan API so callers can inspect why a plugin was selected while preserving existing id-list activation behavior. (#70943) Thanks @vincentkoc.
|
||||
|
||||
@@ -1300,6 +1300,7 @@
|
||||
"providers/github-copilot",
|
||||
"providers/glm",
|
||||
"providers/google",
|
||||
"providers/gradium",
|
||||
"providers/groq",
|
||||
"providers/huggingface",
|
||||
"providers/inferrs",
|
||||
|
||||
66
docs/providers/gradium.md
Normal file
66
docs/providers/gradium.md
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
summary: "Use Gradium text-to-speech in OpenClaw"
|
||||
read_when:
|
||||
- You want Gradium for text-to-speech
|
||||
- You need Gradium API key or voice configuration
|
||||
title: "Gradium"
|
||||
---
|
||||
|
||||
# Gradium
|
||||
|
||||
Gradium is a bundled text-to-speech provider for OpenClaw. It can generate normal audio replies, voice-note-compatible Opus output, and 8 kHz u-law audio for telephony surfaces.
|
||||
|
||||
## Setup
|
||||
|
||||
Create a Gradium API key, then expose it to OpenClaw:
|
||||
|
||||
```bash
|
||||
export GRADIUM_API_KEY="gsk_..."
|
||||
```
|
||||
|
||||
You can also store the key in config under `messages.tts.providers.gradium.apiKey`.
|
||||
|
||||
## Config
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
provider: "gradium",
|
||||
providers: {
|
||||
gradium: {
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
// apiKey: "${GRADIUM_API_KEY}",
|
||||
// baseUrl: "https://api.gradium.ai",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Voices
|
||||
|
||||
| Name | Voice ID |
|
||||
| --------- | ------------------ |
|
||||
| Emma | `YTpq7expH9539ERJ` |
|
||||
| Kent | `LFZvm12tW_z0xfGo` |
|
||||
| Tiffany | `Eu9iL_CYe8N-Gkx_` |
|
||||
| Christina | `2H4HY2CBNyJHBCrP` |
|
||||
| Sydney | `jtEKaLYNn6iif5PR` |
|
||||
| John | `KWJiFWu2O9nMPYcR` |
|
||||
| Arthur | `3jUdJyOi9pgbxBTK` |
|
||||
|
||||
Default voice: Emma.
|
||||
|
||||
## Output
|
||||
|
||||
- Audio-file replies use WAV.
|
||||
- Voice-note replies use Opus and are marked voice-compatible.
|
||||
- Telephony synthesis uses `ulaw_8000` at 8 kHz.
|
||||
|
||||
## Related
|
||||
|
||||
- [Text-to-Speech](/tools/tts)
|
||||
- [Media Overview](/tools/media-overview)
|
||||
@@ -40,6 +40,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi
|
||||
- [fal](/providers/fal)
|
||||
- [Fireworks](/providers/fireworks)
|
||||
- [GitHub Copilot](/providers/github-copilot)
|
||||
- [Gradium](/providers/gradium)
|
||||
- [GLM models](/providers/glm)
|
||||
- [Google (Gemini)](/providers/google)
|
||||
- [Groq (LPU inference)](/providers/groq)
|
||||
|
||||
@@ -18,7 +18,7 @@ OpenClaw generates images, videos, and music, understands inbound media (images,
|
||||
| Image generation | `image_generate` | ComfyUI, fal, Google, MiniMax, OpenAI, Vydra, xAI | Creates or edits images from text prompts or references |
|
||||
| Video generation | `video_generate` | Alibaba, BytePlus, ComfyUI, fal, Google, MiniMax, OpenAI, Qwen, Runway, Together, Vydra, xAI | Creates videos from text, images, or existing videos |
|
||||
| Music generation | `music_generate` | ComfyUI, Google, MiniMax | Creates music or audio tracks from text prompts |
|
||||
| Text-to-speech (TTS) | `tts` | ElevenLabs, Google, Microsoft, MiniMax, OpenAI, xAI | Converts outbound replies to spoken audio |
|
||||
| Text-to-speech (TTS) | `tts` | ElevenLabs, Google, Gradium, Microsoft, MiniMax, OpenAI, Vydra, xAI | Converts outbound replies to spoken audio |
|
||||
| Media understanding | (automatic) | Any vision/audio-capable model provider, plus CLI fallbacks | Summarizes inbound images, audio, and video |
|
||||
|
||||
## Provider capability matrix
|
||||
@@ -34,6 +34,7 @@ This table shows which providers support which media capabilities across the pla
|
||||
| ElevenLabs | | | | Yes | Yes | | |
|
||||
| fal | Yes | Yes | | | | | |
|
||||
| Google | Yes | Yes | Yes | Yes | | Yes | Yes |
|
||||
| Gradium | | | | Yes | | | |
|
||||
| Microsoft | | | | Yes | | | |
|
||||
| MiniMax | Yes | Yes | Yes | Yes | | | |
|
||||
| Mistral | | | | | Yes | | |
|
||||
@@ -41,7 +42,7 @@ This table shows which providers support which media capabilities across the pla
|
||||
| Qwen | | Yes | | | | | |
|
||||
| Runway | | Yes | | | | | |
|
||||
| Together | | Yes | | | | | |
|
||||
| Vydra | Yes | Yes | | | | | |
|
||||
| Vydra | Yes | Yes | | Yes | | | |
|
||||
| xAI | Yes | Yes | | Yes | Yes | | Yes |
|
||||
|
||||
<Note>
|
||||
|
||||
@@ -7,16 +7,18 @@ read_when:
|
||||
title: "Text-to-speech"
|
||||
---
|
||||
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Microsoft, MiniMax, OpenAI, or xAI.
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Gradium, Microsoft, MiniMax, OpenAI, Vydra, or xAI.
|
||||
It works anywhere OpenClaw can send audio.
|
||||
|
||||
## Supported services
|
||||
|
||||
- **ElevenLabs** (primary or fallback provider)
|
||||
- **Google Gemini** (primary or fallback provider; uses Gemini API TTS)
|
||||
- **Gradium** (primary or fallback provider; supports voice-note and telephony output)
|
||||
- **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`)
|
||||
- **MiniMax** (primary or fallback provider; uses the T2A v2 API)
|
||||
- **OpenAI** (primary or fallback provider; also used for summaries)
|
||||
- **Vydra** (primary or fallback provider; shared image, video, and speech provider)
|
||||
- **xAI** (primary or fallback provider; uses the xAI TTS API)
|
||||
|
||||
### Microsoft speech notes
|
||||
@@ -34,12 +36,14 @@ or ElevenLabs.
|
||||
|
||||
## Optional keys
|
||||
|
||||
If you want OpenAI, ElevenLabs, Google Gemini, MiniMax, or xAI:
|
||||
If you want OpenAI, ElevenLabs, Google Gemini, Gradium, MiniMax, Vydra, or xAI:
|
||||
|
||||
- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
|
||||
- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`)
|
||||
- `GRADIUM_API_KEY`
|
||||
- `MINIMAX_API_KEY`
|
||||
- `OPENAI_API_KEY`
|
||||
- `VYDRA_API_KEY`
|
||||
- `XAI_API_KEY`
|
||||
|
||||
Microsoft speech does **not** require an API key.
|
||||
@@ -54,6 +58,7 @@ so that provider must also be authenticated if you enable summaries.
|
||||
- [OpenAI Audio API reference](https://platform.openai.com/docs/api-reference/audio)
|
||||
- [ElevenLabs Text to Speech](https://elevenlabs.io/docs/api-reference/text-to-speech)
|
||||
- [ElevenLabs Authentication](https://elevenlabs.io/docs/api-reference/authentication)
|
||||
- [Gradium](/providers/gradium)
|
||||
- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2)
|
||||
- [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts)
|
||||
- [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs)
|
||||
@@ -226,6 +231,26 @@ Resolution order is `messages.tts.providers.xai.apiKey` -> `XAI_API_KEY`.
|
||||
Current live voices are `ara`, `eve`, `leo`, `rex`, `sal`, and `una`; `eve` is
|
||||
the default. `language` accepts a BCP-47 tag or `auto`.
|
||||
|
||||
### Gradium primary
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
provider: "gradium",
|
||||
providers: {
|
||||
gradium: {
|
||||
apiKey: "gradium_api_key",
|
||||
baseUrl: "https://api.gradium.ai",
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Disable Microsoft speech
|
||||
|
||||
```json5
|
||||
@@ -294,7 +319,7 @@ Then run:
|
||||
- `tagged` only sends audio when the reply includes `[[tts:key=value]]` directives or a `[[tts:text]]...[[/tts:text]]` block.
|
||||
- `enabled`: legacy toggle (doctor migrates this to `auto`).
|
||||
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
|
||||
- `provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"microsoft"`, `"minimax"`, or `"openai"` (fallback is automatic).
|
||||
- `provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"microsoft"`, `"minimax"`, `"openai"`, `"vydra"`, or `"xai"` (fallback is automatic).
|
||||
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
|
||||
- Legacy `provider: "edge"` still works and is normalized to `microsoft`.
|
||||
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
||||
@@ -306,7 +331,7 @@ Then run:
|
||||
- `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
|
||||
- `timeoutMs`: request timeout (ms).
|
||||
- `prefsPath`: override the local prefs JSON path (provider/limit/summary).
|
||||
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`).
|
||||
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`).
|
||||
- `providers.elevenlabs.baseUrl`: override ElevenLabs API base URL.
|
||||
- `providers.openai.baseUrl`: override the OpenAI TTS endpoint.
|
||||
- Resolution order: `messages.tts.providers.openai.baseUrl` -> `OPENAI_TTS_BASE_URL` -> `https://api.openai.com/v1`
|
||||
@@ -328,6 +353,8 @@ Then run:
|
||||
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
|
||||
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
|
||||
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
||||
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
|
||||
- `providers.gradium.voiceId`: Gradium voice identifier (default Emma, `YTpq7expH9539ERJ`).
|
||||
- `providers.xai.apiKey`: xAI TTS API key (env: `XAI_API_KEY`).
|
||||
- `providers.xai.baseUrl`: override the xAI TTS base URL (default `https://api.x.ai/v1`, env: `XAI_BASE_URL`).
|
||||
- `providers.xai.voiceId`: xAI voice id (default `eve`; current live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`).
|
||||
@@ -368,8 +395,8 @@ Here you go.
|
||||
|
||||
Available directive keys (when enabled):
|
||||
|
||||
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `minimax`, or `microsoft`; requires `allowProvider: true`)
|
||||
- `voice` (OpenAI voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / MiniMax / xAI)
|
||||
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `gradium`, `minimax`, `microsoft`, `vydra`, or `xai`; requires `allowProvider: true`)
|
||||
- `voice` (OpenAI or Gradium voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / Gradium / MiniMax / xAI)
|
||||
- `model` (OpenAI TTS model, ElevenLabs model id, or MiniMax model) or `google_model` (Google TTS model)
|
||||
- `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
|
||||
- `vol` / `volume` (MiniMax volume, 0-10)
|
||||
@@ -431,6 +458,7 @@ These override `messages.tts.*` for that host.
|
||||
- 44.1kHz / 128kbps is the default balance for speech clarity.
|
||||
- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate). Voice-note format not natively supported; use OpenAI or ElevenLabs for guaranteed Opus voice messages.
|
||||
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
|
||||
- **Gradium**: WAV for audio attachments, Opus for voice-note targets, and `ulaw_8000` at 8 kHz for telephony.
|
||||
- **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
|
||||
- **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
|
||||
- The bundled transport accepts an `outputFormat`, but not all formats are available from the service.
|
||||
|
||||
42
extensions/gradium/gradium.live.test.ts
Normal file
42
extensions/gradium/gradium.live.test.ts
Normal file
@@ -0,0 +1,42 @@
|
||||
import { writeFileSync } from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import { isLiveTestEnabled } from "../../src/agents/live-test-helpers.js";
|
||||
import {
|
||||
registerProviderPlugin,
|
||||
requireRegisteredProvider,
|
||||
} from "../../test/helpers/plugins/provider-registration.js";
|
||||
import plugin from "./index.js";
|
||||
|
||||
const LIVE = isLiveTestEnabled();
|
||||
const GRADIUM_API_KEY = process.env.GRADIUM_API_KEY?.trim() ?? "";
|
||||
|
||||
const registerGradiumPlugin = () =>
|
||||
registerProviderPlugin({
|
||||
plugin,
|
||||
id: "gradium",
|
||||
name: "Gradium Speech",
|
||||
});
|
||||
|
||||
describe.skipIf(!LIVE || !GRADIUM_API_KEY)("gradium live", () => {
|
||||
it("synthesizes speech through the registered provider", async () => {
|
||||
const { speechProviders } = await registerGradiumPlugin();
|
||||
const provider = requireRegisteredProvider(speechProviders, "gradium");
|
||||
|
||||
const result = await provider.synthesize({
|
||||
text: "Hello, this is a test of Gradium text to speech.",
|
||||
cfg: { plugins: { enabled: true } } as never,
|
||||
providerConfig: { apiKey: GRADIUM_API_KEY },
|
||||
target: "audio-file",
|
||||
timeoutMs: 45_000,
|
||||
});
|
||||
|
||||
expect(result.outputFormat).toBe("wav");
|
||||
expect(result.audioBuffer.byteLength).toBeGreaterThan(512);
|
||||
|
||||
const outPath = join(tmpdir(), "gradium-live-test.wav");
|
||||
writeFileSync(outPath, result.audioBuffer);
|
||||
console.log(`Audio written to ${outPath}`);
|
||||
}, 60_000);
|
||||
});
|
||||
11
extensions/gradium/index.ts
Normal file
11
extensions/gradium/index.ts
Normal file
@@ -0,0 +1,11 @@
|
||||
import { definePluginEntry } from "openclaw/plugin-sdk/plugin-entry";
|
||||
import { buildGradiumSpeechProvider } from "./speech-provider.js";
|
||||
|
||||
export default definePluginEntry({
|
||||
id: "gradium",
|
||||
name: "Gradium Speech",
|
||||
description: "Bundled Gradium speech provider",
|
||||
register(api) {
|
||||
api.registerSpeechProvider(buildGradiumSpeechProvider());
|
||||
},
|
||||
});
|
||||
14
extensions/gradium/openclaw.plugin.json
Normal file
14
extensions/gradium/openclaw.plugin.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"id": "gradium",
|
||||
"providerAuthEnvVars": {
|
||||
"gradium": ["GRADIUM_API_KEY"]
|
||||
},
|
||||
"contracts": {
|
||||
"speechProviders": ["gradium"]
|
||||
},
|
||||
"configSchema": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {}
|
||||
}
|
||||
}
|
||||
15
extensions/gradium/package.json
Normal file
15
extensions/gradium/package.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"name": "@openclaw/gradium-speech",
|
||||
"version": "2026.4.10",
|
||||
"private": true,
|
||||
"description": "OpenClaw Gradium speech plugin",
|
||||
"type": "module",
|
||||
"devDependencies": {
|
||||
"@openclaw/plugin-sdk": "workspace:*"
|
||||
},
|
||||
"openclaw": {
|
||||
"extensions": [
|
||||
"./index.ts"
|
||||
]
|
||||
}
|
||||
}
|
||||
6
extensions/gradium/plugin-registration.contract.test.ts
Normal file
6
extensions/gradium/plugin-registration.contract.test.ts
Normal file
@@ -0,0 +1,6 @@
|
||||
import { describePluginRegistrationContract } from "../../test/helpers/plugins/plugin-registration-contract.js";
|
||||
|
||||
describePluginRegistrationContract({
|
||||
pluginId: "gradium",
|
||||
speechProviderIds: ["gradium"],
|
||||
});
|
||||
17
extensions/gradium/shared.ts
Normal file
17
extensions/gradium/shared.ts
Normal file
@@ -0,0 +1,17 @@
|
||||
export const DEFAULT_GRADIUM_BASE_URL = "https://api.gradium.ai";
|
||||
export const DEFAULT_GRADIUM_VOICE_ID = "YTpq7expH9539ERJ";
|
||||
|
||||
export const GRADIUM_VOICES = [
|
||||
{ id: "YTpq7expH9539ERJ", name: "Emma" },
|
||||
{ id: "LFZvm12tW_z0xfGo", name: "Kent" },
|
||||
{ id: "Eu9iL_CYe8N-Gkx_", name: "Tiffany" },
|
||||
{ id: "2H4HY2CBNyJHBCrP", name: "Christina" },
|
||||
{ id: "jtEKaLYNn6iif5PR", name: "Sydney" },
|
||||
{ id: "KWJiFWu2O9nMPYcR", name: "John" },
|
||||
{ id: "3jUdJyOi9pgbxBTK", name: "Arthur" },
|
||||
] as const;
|
||||
|
||||
export function normalizeGradiumBaseUrl(baseUrl?: string): string {
|
||||
const trimmed = baseUrl?.trim();
|
||||
return trimmed?.replace(/\/+$/, "") || DEFAULT_GRADIUM_BASE_URL;
|
||||
}
|
||||
131
extensions/gradium/speech-provider.test.ts
Normal file
131
extensions/gradium/speech-provider.test.ts
Normal file
@@ -0,0 +1,131 @@
|
||||
import { installPinnedHostnameTestHooks } from "openclaw/plugin-sdk/testing";
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { buildGradiumSpeechProvider } from "./speech-provider.js";
|
||||
|
||||
describe("gradium speech provider", () => {
|
||||
installPinnedHostnameTestHooks();
|
||||
|
||||
const provider = buildGradiumSpeechProvider();
|
||||
|
||||
afterEach(() => {
|
||||
vi.unstubAllGlobals();
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
it("reports configured when GRADIUM_API_KEY is set", () => {
|
||||
const original = process.env.GRADIUM_API_KEY;
|
||||
try {
|
||||
process.env.GRADIUM_API_KEY = "gsk_test";
|
||||
expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 5_000 })).toBe(true);
|
||||
} finally {
|
||||
if (original === undefined) {
|
||||
delete process.env.GRADIUM_API_KEY;
|
||||
} else {
|
||||
process.env.GRADIUM_API_KEY = original;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
it("reports not configured when no key is available", () => {
|
||||
const original = process.env.GRADIUM_API_KEY;
|
||||
try {
|
||||
delete process.env.GRADIUM_API_KEY;
|
||||
expect(provider.isConfigured({ providerConfig: {}, timeoutMs: 5_000 })).toBe(false);
|
||||
} finally {
|
||||
if (original !== undefined) {
|
||||
process.env.GRADIUM_API_KEY = original;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
it("synthesizes audio via the Gradium TTS endpoint", async () => {
|
||||
const audioData = Buffer.from("wav-audio-data");
|
||||
const fetchMock = vi.fn().mockResolvedValue(new Response(audioData, { status: 200 }));
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const result = await provider.synthesize({
|
||||
text: "OpenClaw test",
|
||||
cfg: {} as never,
|
||||
providerConfig: { apiKey: "gsk_test123" },
|
||||
target: "audio-file",
|
||||
timeoutMs: 30_000,
|
||||
});
|
||||
|
||||
expect(fetchMock).toHaveBeenCalledOnce();
|
||||
const [url, init] = fetchMock.mock.calls[0] as [string, RequestInit];
|
||||
expect(url).toBe("https://api.gradium.ai/api/post/speech/tts");
|
||||
const headers = new Headers(init.headers);
|
||||
expect(headers.get("x-api-key")).toBe("gsk_test123");
|
||||
expect(JSON.parse(init.body as string)).toEqual({
|
||||
text: "OpenClaw test",
|
||||
voice_id: "YTpq7expH9539ERJ",
|
||||
only_audio: true,
|
||||
output_format: "wav",
|
||||
json_config: '{"padding_bonus":0}',
|
||||
});
|
||||
expect(result.outputFormat).toBe("wav");
|
||||
expect(result.fileExtension).toBe(".wav");
|
||||
expect(result.voiceCompatible).toBe(false);
|
||||
expect(result.audioBuffer).toEqual(audioData);
|
||||
});
|
||||
|
||||
it("uses opus and voiceCompatible for voice-note target", async () => {
|
||||
const audioData = Buffer.from("opus-audio-data");
|
||||
const fetchMock = vi.fn().mockResolvedValue(new Response(audioData, { status: 200 }));
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const result = await provider.synthesize({
|
||||
text: "Voice note test",
|
||||
cfg: {} as never,
|
||||
providerConfig: { apiKey: "gsk_test123" },
|
||||
target: "voice-note",
|
||||
timeoutMs: 30_000,
|
||||
});
|
||||
|
||||
const [, init] = fetchMock.mock.calls[0] as [string, RequestInit];
|
||||
expect(JSON.parse(init.body as string).output_format).toBe("opus");
|
||||
expect(result.outputFormat).toBe("opus");
|
||||
expect(result.fileExtension).toBe(".opus");
|
||||
expect(result.voiceCompatible).toBe(true);
|
||||
expect(result.audioBuffer).toEqual(audioData);
|
||||
});
|
||||
|
||||
it("uses ulaw_8000 for telephony synthesis", async () => {
|
||||
const audioData = Buffer.from("ulaw-audio-data");
|
||||
const fetchMock = vi.fn().mockResolvedValue(new Response(audioData, { status: 200 }));
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const result = await provider.synthesizeTelephony!({
|
||||
text: "Telephony test",
|
||||
cfg: {} as never,
|
||||
providerConfig: { apiKey: "gsk_test123" },
|
||||
timeoutMs: 30_000,
|
||||
});
|
||||
|
||||
const [, init] = fetchMock.mock.calls[0] as [string, RequestInit];
|
||||
expect(JSON.parse(init.body as string).output_format).toBe("ulaw_8000");
|
||||
expect(result.outputFormat).toBe("ulaw_8000");
|
||||
expect(result.sampleRate).toBe(8_000);
|
||||
expect(result.audioBuffer).toEqual(audioData);
|
||||
});
|
||||
|
||||
it("throws when no API key is available", async () => {
|
||||
const original = process.env.GRADIUM_API_KEY;
|
||||
try {
|
||||
delete process.env.GRADIUM_API_KEY;
|
||||
await expect(
|
||||
provider.synthesize({
|
||||
text: "test",
|
||||
cfg: {} as never,
|
||||
providerConfig: {},
|
||||
target: "audio-file",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow("Gradium API key missing");
|
||||
} finally {
|
||||
if (original !== undefined) {
|
||||
process.env.GRADIUM_API_KEY = original;
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
116
extensions/gradium/speech-provider.ts
Normal file
116
extensions/gradium/speech-provider.ts
Normal file
@@ -0,0 +1,116 @@
|
||||
import { normalizeResolvedSecretInputString } from "openclaw/plugin-sdk/secret-input";
|
||||
import type {
|
||||
SpeechDirectiveTokenParseContext,
|
||||
SpeechProviderConfig,
|
||||
SpeechProviderPlugin,
|
||||
} from "openclaw/plugin-sdk/speech";
|
||||
import { asObject, trimToUndefined } from "openclaw/plugin-sdk/speech";
|
||||
import { DEFAULT_GRADIUM_VOICE_ID, GRADIUM_VOICES, normalizeGradiumBaseUrl } from "./shared.js";
|
||||
import { gradiumTTS } from "./tts.js";
|
||||
|
||||
type GradiumProviderConfig = {
|
||||
apiKey?: string;
|
||||
baseUrl: string;
|
||||
voiceId: string;
|
||||
};
|
||||
|
||||
function normalizeGradiumProviderConfig(rawConfig: Record<string, unknown>): GradiumProviderConfig {
|
||||
const providers = asObject(rawConfig.providers);
|
||||
const raw = asObject(providers?.gradium) ?? asObject(rawConfig.gradium);
|
||||
return {
|
||||
apiKey: normalizeResolvedSecretInputString({
|
||||
value: raw?.apiKey,
|
||||
path: "messages.tts.providers.gradium.apiKey",
|
||||
}),
|
||||
baseUrl: normalizeGradiumBaseUrl(trimToUndefined(raw?.baseUrl)),
|
||||
voiceId: trimToUndefined(raw?.voiceId) ?? DEFAULT_GRADIUM_VOICE_ID,
|
||||
};
|
||||
}
|
||||
|
||||
function readGradiumProviderConfig(config: SpeechProviderConfig): GradiumProviderConfig {
|
||||
const defaults = normalizeGradiumProviderConfig({});
|
||||
return {
|
||||
apiKey: trimToUndefined(config.apiKey) ?? defaults.apiKey,
|
||||
baseUrl: normalizeGradiumBaseUrl(trimToUndefined(config.baseUrl) ?? defaults.baseUrl),
|
||||
voiceId: trimToUndefined(config.voiceId) ?? defaults.voiceId,
|
||||
};
|
||||
}
|
||||
|
||||
function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): {
|
||||
handled: boolean;
|
||||
overrides?: Record<string, unknown>;
|
||||
warnings?: string[];
|
||||
} {
|
||||
switch (ctx.key) {
|
||||
case "voice":
|
||||
case "voice_id":
|
||||
case "voiceid":
|
||||
case "gradium_voice":
|
||||
case "gradiumvoice":
|
||||
if (!ctx.policy.allowVoice) {
|
||||
return { handled: true };
|
||||
}
|
||||
return {
|
||||
handled: true,
|
||||
overrides: { ...ctx.currentOverrides, voiceId: ctx.value },
|
||||
};
|
||||
default:
|
||||
return { handled: false };
|
||||
}
|
||||
}
|
||||
|
||||
export function buildGradiumSpeechProvider(): SpeechProviderPlugin {
|
||||
return {
|
||||
id: "gradium",
|
||||
label: "Gradium",
|
||||
autoSelectOrder: 30,
|
||||
voices: GRADIUM_VOICES.map((v) => v.id),
|
||||
resolveConfig: ({ rawConfig }) => normalizeGradiumProviderConfig(rawConfig),
|
||||
parseDirectiveToken,
|
||||
listVoices: async () => GRADIUM_VOICES.map((v) => ({ id: v.id, name: v.name })),
|
||||
isConfigured: ({ providerConfig }) =>
|
||||
Boolean(readGradiumProviderConfig(providerConfig).apiKey || process.env.GRADIUM_API_KEY),
|
||||
synthesize: async (req) => {
|
||||
const config = readGradiumProviderConfig(req.providerConfig);
|
||||
const overrides = req.providerOverrides ?? {};
|
||||
const apiKey = config.apiKey || process.env.GRADIUM_API_KEY;
|
||||
if (!apiKey) {
|
||||
throw new Error("Gradium API key missing");
|
||||
}
|
||||
const wantsVoiceNote = req.target === "voice-note";
|
||||
const outputFormat = wantsVoiceNote ? "opus" : "wav";
|
||||
const audioBuffer = await gradiumTTS({
|
||||
text: req.text,
|
||||
apiKey,
|
||||
baseUrl: config.baseUrl,
|
||||
voiceId: trimToUndefined(overrides.voiceId) ?? config.voiceId,
|
||||
outputFormat,
|
||||
timeoutMs: req.timeoutMs,
|
||||
});
|
||||
return {
|
||||
audioBuffer,
|
||||
outputFormat,
|
||||
fileExtension: wantsVoiceNote ? ".opus" : ".wav",
|
||||
voiceCompatible: wantsVoiceNote,
|
||||
};
|
||||
},
|
||||
synthesizeTelephony: async (req) => {
|
||||
const config = readGradiumProviderConfig(req.providerConfig);
|
||||
const apiKey = config.apiKey || process.env.GRADIUM_API_KEY;
|
||||
if (!apiKey) {
|
||||
throw new Error("Gradium API key missing");
|
||||
}
|
||||
const outputFormat = "ulaw_8000";
|
||||
const sampleRate = 8_000;
|
||||
const audioBuffer = await gradiumTTS({
|
||||
text: req.text,
|
||||
apiKey,
|
||||
baseUrl: config.baseUrl,
|
||||
voiceId: config.voiceId,
|
||||
outputFormat,
|
||||
timeoutMs: req.timeoutMs,
|
||||
});
|
||||
return { audioBuffer, outputFormat, sampleRate };
|
||||
},
|
||||
};
|
||||
}
|
||||
16
extensions/gradium/tsconfig.json
Normal file
16
extensions/gradium/tsconfig.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"extends": "../tsconfig.package-boundary.base.json",
|
||||
"compilerOptions": {
|
||||
"rootDir": "."
|
||||
},
|
||||
"include": ["./*.ts", "./src/**/*.ts"],
|
||||
"exclude": [
|
||||
"./**/*.test.ts",
|
||||
"./dist/**",
|
||||
"./node_modules/**",
|
||||
"./src/test-support/**",
|
||||
"./src/**/*test-helpers.ts",
|
||||
"./src/**/*test-harness.ts",
|
||||
"./src/**/*test-support.ts"
|
||||
]
|
||||
}
|
||||
137
extensions/gradium/tts.test.ts
Normal file
137
extensions/gradium/tts.test.ts
Normal file
@@ -0,0 +1,137 @@
|
||||
import { installPinnedHostnameTestHooks } from "openclaw/plugin-sdk/testing";
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { gradiumTTS } from "./tts.js";
|
||||
|
||||
describe("gradium tts diagnostics", () => {
|
||||
installPinnedHostnameTestHooks();
|
||||
|
||||
function createStreamingErrorResponse(params: {
|
||||
status: number;
|
||||
chunkCount: number;
|
||||
chunkSize: number;
|
||||
byte: number;
|
||||
}): { response: Response; getReadCount: () => number } {
|
||||
let reads = 0;
|
||||
const stream = new ReadableStream<Uint8Array>({
|
||||
pull(controller) {
|
||||
if (reads >= params.chunkCount) {
|
||||
controller.close();
|
||||
return;
|
||||
}
|
||||
reads += 1;
|
||||
controller.enqueue(new Uint8Array(params.chunkSize).fill(params.byte));
|
||||
},
|
||||
});
|
||||
return {
|
||||
response: new Response(stream, { status: params.status }),
|
||||
getReadCount: () => reads,
|
||||
};
|
||||
}
|
||||
|
||||
afterEach(() => {
|
||||
vi.unstubAllGlobals();
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
it("includes parsed provider detail and request id for JSON API errors", async () => {
|
||||
const fetchMock = vi.fn().mockResolvedValue(
|
||||
new Response(
|
||||
JSON.stringify({
|
||||
message: "Invalid API key",
|
||||
}),
|
||||
{
|
||||
status: 401,
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
"x-request-id": "grad_req_123",
|
||||
},
|
||||
},
|
||||
),
|
||||
);
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
await expect(
|
||||
gradiumTTS({
|
||||
text: "hello",
|
||||
apiKey: "bad-key",
|
||||
baseUrl: "https://api.gradium.ai",
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
outputFormat: "wav",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow("Gradium API error (401): Invalid API key [request_id=grad_req_123]");
|
||||
expect(fetchMock).toHaveBeenCalledOnce();
|
||||
});
|
||||
|
||||
it("falls back to raw body text when the error body is non-JSON", async () => {
|
||||
vi.stubGlobal(
|
||||
"fetch",
|
||||
vi.fn().mockResolvedValue(new Response("service unavailable", { status: 503 })),
|
||||
);
|
||||
|
||||
await expect(
|
||||
gradiumTTS({
|
||||
text: "hello",
|
||||
apiKey: "test-key",
|
||||
baseUrl: "https://api.gradium.ai",
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
outputFormat: "wav",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow("Gradium API error (503): service unavailable");
|
||||
});
|
||||
|
||||
it("caps streamed non-JSON error reads instead of consuming full response bodies", async () => {
|
||||
const streamed = createStreamingErrorResponse({
|
||||
status: 503,
|
||||
chunkCount: 200,
|
||||
chunkSize: 1024,
|
||||
byte: 121,
|
||||
});
|
||||
vi.stubGlobal("fetch", vi.fn().mockResolvedValue(streamed.response));
|
||||
|
||||
await expect(
|
||||
gradiumTTS({
|
||||
text: "hello",
|
||||
apiKey: "test-key",
|
||||
baseUrl: "https://api.gradium.ai",
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
outputFormat: "wav",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow("Gradium API error (503)");
|
||||
|
||||
expect(streamed.getReadCount()).toBeLessThan(200);
|
||||
});
|
||||
|
||||
it("sends the correct request payload", async () => {
|
||||
const audioData = Buffer.from("fake-wav-data");
|
||||
const fetchMock = vi.fn().mockResolvedValue(new Response(audioData, { status: 200 }));
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const result = await gradiumTTS({
|
||||
text: "Hello world",
|
||||
apiKey: "gsk_test123",
|
||||
baseUrl: "https://api.gradium.ai",
|
||||
voiceId: "YTpq7expH9539ERJ",
|
||||
outputFormat: "wav",
|
||||
timeoutMs: 5_000,
|
||||
});
|
||||
|
||||
expect(fetchMock).toHaveBeenCalledOnce();
|
||||
const [url, init] = fetchMock.mock.calls[0] as [string, RequestInit];
|
||||
expect(url).toBe("https://api.gradium.ai/api/post/speech/tts");
|
||||
expect(init.method).toBe("POST");
|
||||
const headers = new Headers(init.headers);
|
||||
expect(headers.get("x-api-key")).toBe("gsk_test123");
|
||||
expect(headers.get("content-type")).toBe("application/json");
|
||||
expect(JSON.parse(init.body as string)).toEqual({
|
||||
text: "Hello world",
|
||||
voice_id: "YTpq7expH9539ERJ",
|
||||
only_audio: true,
|
||||
output_format: "wav",
|
||||
json_config: '{"padding_bonus":0}',
|
||||
});
|
||||
expect(result).toEqual(audioData);
|
||||
});
|
||||
});
|
||||
86
extensions/gradium/tts.ts
Normal file
86
extensions/gradium/tts.ts
Normal file
@@ -0,0 +1,86 @@
|
||||
import {
|
||||
asObject,
|
||||
readResponseTextLimited,
|
||||
trimToUndefined,
|
||||
truncateErrorDetail,
|
||||
} from "openclaw/plugin-sdk/speech";
|
||||
import { fetchWithSsrFGuard } from "openclaw/plugin-sdk/ssrf-runtime";
|
||||
import { normalizeGradiumBaseUrl } from "./shared.js";
|
||||
|
||||
function formatGradiumErrorPayload(payload: unknown): string | undefined {
|
||||
const root = asObject(payload);
|
||||
if (!root) {
|
||||
return undefined;
|
||||
}
|
||||
const message =
|
||||
trimToUndefined(root.message) ?? trimToUndefined(root.error) ?? trimToUndefined(root.detail);
|
||||
if (message) {
|
||||
return truncateErrorDetail(message);
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
async function extractGradiumErrorDetail(response: Response): Promise<string | undefined> {
|
||||
const rawBody = trimToUndefined(await readResponseTextLimited(response));
|
||||
if (!rawBody) {
|
||||
return undefined;
|
||||
}
|
||||
try {
|
||||
return formatGradiumErrorPayload(JSON.parse(rawBody)) ?? truncateErrorDetail(rawBody);
|
||||
} catch {
|
||||
return truncateErrorDetail(rawBody);
|
||||
}
|
||||
}
|
||||
|
||||
export async function gradiumTTS(params: {
|
||||
text: string;
|
||||
apiKey: string;
|
||||
baseUrl: string;
|
||||
voiceId: string;
|
||||
outputFormat: "wav" | "opus" | "ulaw_8000" | "pcm" | "pcm_24000" | "alaw_8000";
|
||||
timeoutMs: number;
|
||||
}): Promise<Buffer> {
|
||||
const { text, apiKey, baseUrl, voiceId, outputFormat, timeoutMs } = params;
|
||||
const normalizedBaseUrl = normalizeGradiumBaseUrl(baseUrl);
|
||||
const url = `${normalizedBaseUrl}/api/post/speech/tts`;
|
||||
const hostname = new URL(normalizedBaseUrl).hostname;
|
||||
|
||||
const { response, release } = await fetchWithSsrFGuard({
|
||||
url,
|
||||
init: {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"x-api-key": apiKey,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
text,
|
||||
voice_id: voiceId,
|
||||
only_audio: true,
|
||||
output_format: outputFormat,
|
||||
json_config: JSON.stringify({ padding_bonus: 0 }),
|
||||
}),
|
||||
},
|
||||
timeoutMs,
|
||||
policy: { hostnameAllowlist: [hostname] },
|
||||
auditContext: "gradium.tts",
|
||||
});
|
||||
|
||||
try {
|
||||
if (!response.ok) {
|
||||
const detail = await extractGradiumErrorDetail(response);
|
||||
const requestId =
|
||||
trimToUndefined(response.headers.get("x-request-id")) ??
|
||||
trimToUndefined(response.headers.get("request-id"));
|
||||
throw new Error(
|
||||
`Gradium API error (${response.status})` +
|
||||
(detail ? `: ${detail}` : "") +
|
||||
(requestId ? ` [request_id=${requestId}]` : ""),
|
||||
);
|
||||
}
|
||||
|
||||
return Buffer.from(await response.arrayBuffer());
|
||||
} finally {
|
||||
await release();
|
||||
}
|
||||
}
|
||||
6
pnpm-lock.yaml
generated
6
pnpm-lock.yaml
generated
@@ -643,6 +643,12 @@ importers:
|
||||
specifier: workspace:*
|
||||
version: link:../..
|
||||
|
||||
extensions/gradium:
|
||||
devDependencies:
|
||||
'@openclaw/plugin-sdk':
|
||||
specifier: workspace:*
|
||||
version: link:../../packages/plugin-sdk
|
||||
|
||||
extensions/groq:
|
||||
devDependencies:
|
||||
'@openclaw/plugin-sdk':
|
||||
|
||||
Reference in New Issue
Block a user