mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 16:50:43 +00:00
feat(tts): add BytePlus Seed Speech provider
Add Volcengine/BytePlus Seed Speech as a bundled TTS provider with current API-key auth, legacy AppID/token fallback, native Ogg/Opus voice-note output, and MP3 audio-file output. Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
committed by
Peter Steinberger
parent
b1b29a8fc2
commit
1531123d35
@@ -111,6 +111,10 @@
|
||||
"source": "BytePlus (International)",
|
||||
"target": "BytePlus(国际版)"
|
||||
},
|
||||
{
|
||||
"source": "Volcengine TTS HTTP API",
|
||||
"target": "Volcengine TTS HTTP API"
|
||||
},
|
||||
{
|
||||
"source": "Amazon Bedrock Mantle",
|
||||
"target": "Amazon Bedrock Mantle"
|
||||
|
||||
@@ -1,20 +1,23 @@
|
||||
---
|
||||
summary: "Volcano Engine setup (Doubao models, general + coding endpoints)"
|
||||
summary: "Volcano Engine setup (Doubao models, coding endpoints, and Seed Speech TTS)"
|
||||
title: "Volcengine (Doubao)"
|
||||
read_when:
|
||||
- You want to use Volcano Engine or Doubao models with OpenClaw
|
||||
- You need the Volcengine API key setup
|
||||
- You want to use Volcengine Speech text-to-speech
|
||||
---
|
||||
|
||||
The Volcengine provider gives access to Doubao models and third-party models
|
||||
hosted on Volcano Engine, with separate endpoints for general and coding
|
||||
workloads.
|
||||
workloads. The same bundled plugin can also register Volcengine Speech as a TTS
|
||||
provider.
|
||||
|
||||
| Detail | Value |
|
||||
| --------- | --------------------------------------------------- |
|
||||
| Providers | `volcengine` (general) + `volcengine-plan` (coding) |
|
||||
| Auth | `VOLCANO_ENGINE_API_KEY` |
|
||||
| API | OpenAI-compatible |
|
||||
| Detail | Value |
|
||||
| ---------- | ---------------------------------------------------------- |
|
||||
| Providers | `volcengine` (general + TTS) + `volcengine-plan` (coding) |
|
||||
| Model auth | `VOLCANO_ENGINE_API_KEY` |
|
||||
| TTS auth | `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY` |
|
||||
| API | OpenAI-compatible models, BytePlus Seed Speech TTS |
|
||||
|
||||
## Getting started
|
||||
|
||||
@@ -95,6 +98,59 @@ Both providers are configured from a single API key. Setup registers both automa
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Text-to-speech
|
||||
|
||||
Volcengine TTS uses the BytePlus Seed Speech HTTP API and is configured
|
||||
separately from the OpenAI-compatible Doubao model API key. In the BytePlus
|
||||
console, open Seed Speech > Settings > API Keys and copy the API key, then set:
|
||||
|
||||
```bash
|
||||
export VOLCENGINE_TTS_API_KEY="byteplus_seed_speech_api_key"
|
||||
export VOLCENGINE_TTS_RESOURCE_ID="seed-tts-1.0"
|
||||
```
|
||||
|
||||
Then enable it in `openclaw.json`:
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
provider: "volcengine",
|
||||
providers: {
|
||||
volcengine: {
|
||||
apiKey: "byteplus_seed_speech_api_key",
|
||||
voice: "en_female_anna_mars_bigtts",
|
||||
speedRatio: 1.0,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
For voice-note targets, OpenClaw asks Volcengine for provider-native
|
||||
`ogg_opus`. For normal audio attachments, it asks for `mp3`. Provider aliases
|
||||
`bytedance` and `doubao` also resolve to the same speech provider.
|
||||
|
||||
The default resource id is `seed-tts-1.0` because that is what BytePlus grants
|
||||
to newly created Seed Speech API keys in the default project. If your project
|
||||
has TTS 2.0 entitlement, set `VOLCENGINE_TTS_RESOURCE_ID=seed-tts-2.0`.
|
||||
|
||||
<Warning>
|
||||
`VOLCANO_ENGINE_API_KEY` is for the ModelArk/Doubao model endpoints and is not a
|
||||
Seed Speech API key. TTS needs a Seed Speech API key from the BytePlus Speech
|
||||
Console, or a legacy Speech Console AppID/token pair.
|
||||
</Warning>
|
||||
|
||||
Legacy AppID/token auth remains supported for older Speech Console applications:
|
||||
|
||||
```bash
|
||||
export VOLCENGINE_TTS_APPID="speech_app_id"
|
||||
export VOLCENGINE_TTS_TOKEN="speech_access_token"
|
||||
export VOLCENGINE_TTS_CLUSTER="volcano_tts"
|
||||
```
|
||||
|
||||
## Advanced configuration
|
||||
|
||||
<AccordionGroup>
|
||||
@@ -112,8 +168,10 @@ Both providers are configured from a single API key. Setup registers both automa
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Environment variables for daemon processes">
|
||||
If the Gateway runs as a daemon (launchd/systemd), make sure
|
||||
`VOLCANO_ENGINE_API_KEY` is available to that process (for example, in
|
||||
If the Gateway runs as a daemon (launchd/systemd), make sure model and TTS
|
||||
env vars such as `VOLCANO_ENGINE_API_KEY`, `VOLCENGINE_TTS_API_KEY`,
|
||||
`BYTEPLUS_SEED_SPEECH_API_KEY`, `VOLCENGINE_TTS_APPID`, and
|
||||
`VOLCENGINE_TTS_TOKEN` are available to that process (for example, in
|
||||
`~/.openclaw/.env` or via `env.shellEnv`).
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
@@ -7,7 +7,7 @@ read_when:
|
||||
title: "Text-to-speech"
|
||||
---
|
||||
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Gradium, Inworld, Local CLI, Microsoft, MiniMax, OpenAI, Vydra, xAI, or Xiaomi MiMo.
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Gradium, Inworld, Local CLI, Microsoft, MiniMax, OpenAI, Volcengine, Vydra, xAI, or Xiaomi MiMo.
|
||||
It works anywhere OpenClaw can send audio.
|
||||
|
||||
## Supported services
|
||||
@@ -20,6 +20,7 @@ It works anywhere OpenClaw can send audio.
|
||||
- **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`)
|
||||
- **MiniMax** (primary or fallback provider; uses the T2A v2 API)
|
||||
- **OpenAI** (primary or fallback provider; also used for summaries)
|
||||
- **Volcengine** (primary or fallback provider; uses the BytePlus Seed Speech HTTP API)
|
||||
- **Vydra** (primary or fallback provider; shared image, video, and speech provider)
|
||||
- **xAI** (primary or fallback provider; uses the xAI TTS API)
|
||||
- **Xiaomi MiMo** (primary or fallback provider; uses MiMo TTS through Xiaomi chat completions)
|
||||
@@ -39,7 +40,7 @@ or ElevenLabs.
|
||||
|
||||
## Optional keys
|
||||
|
||||
If you want ElevenLabs, Google Gemini, Gradium, Inworld, MiniMax, OpenAI, Vydra, xAI, or Xiaomi MiMo:
|
||||
If you want ElevenLabs, Google Gemini, Gradium, Inworld, MiniMax, OpenAI, Volcengine, Vydra, xAI, or Xiaomi MiMo:
|
||||
|
||||
- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
|
||||
- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`)
|
||||
@@ -49,6 +50,9 @@ If you want ElevenLabs, Google Gemini, Gradium, Inworld, MiniMax, OpenAI, Vydra,
|
||||
`MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`, or
|
||||
`MINIMAX_CODING_API_KEY`
|
||||
- `OPENAI_API_KEY`
|
||||
- `VOLCENGINE_TTS_API_KEY` (or `BYTEPLUS_SEED_SPEECH_API_KEY`);
|
||||
legacy AppID/token auth also accepts `VOLCENGINE_TTS_APPID` and
|
||||
`VOLCENGINE_TTS_TOKEN`
|
||||
- `VYDRA_API_KEY`
|
||||
- `XAI_API_KEY`
|
||||
- `XIAOMI_API_KEY`
|
||||
@@ -68,6 +72,7 @@ so that provider must also be authenticated if you enable summaries.
|
||||
- [Gradium](/providers/gradium)
|
||||
- [Inworld TTS API](https://docs.inworld.ai/tts/tts)
|
||||
- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2)
|
||||
- [Volcengine TTS HTTP API](/providers/volcengine#text-to-speech)
|
||||
- [Xiaomi MiMo speech synthesis](/providers/xiaomi#text-to-speech)
|
||||
- [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts)
|
||||
- [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs)
|
||||
@@ -249,6 +254,35 @@ encoding, so do not pass a raw bearer token and do not Base64-encode it
|
||||
yourself. The key falls back to the `INWORLD_API_KEY` env var. See
|
||||
[Inworld provider](/providers/inworld) for full setup.
|
||||
|
||||
### Volcengine primary
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
provider: "volcengine",
|
||||
providers: {
|
||||
volcengine: {
|
||||
apiKey: "byteplus_seed_speech_api_key",
|
||||
resourceId: "seed-tts-1.0",
|
||||
voice: "en_female_anna_mars_bigtts",
|
||||
speedRatio: 1.0,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Volcengine TTS uses the BytePlus Seed Speech API key from the Speech Console,
|
||||
not the OpenAI-compatible `VOLCANO_ENGINE_API_KEY` used for Doubao model
|
||||
providers. Resolution order is `messages.tts.providers.volcengine.apiKey` ->
|
||||
`VOLCENGINE_TTS_API_KEY` -> `BYTEPLUS_SEED_SPEECH_API_KEY`. Legacy AppID/token
|
||||
auth still works through `messages.tts.providers.volcengine.appId` / `token` or
|
||||
`VOLCENGINE_TTS_APPID` / `VOLCENGINE_TTS_TOKEN`. Voice-note targets request
|
||||
provider-native `ogg_opus`; normal audio-file targets request `mp3`.
|
||||
|
||||
### xAI primary
|
||||
|
||||
```json5
|
||||
@@ -447,7 +481,7 @@ Then run:
|
||||
- `tagged` only sends audio when the reply includes `[[tts:key=value]]` directives or a `[[tts:text]]...[[/tts:text]]` block.
|
||||
- `enabled`: legacy toggle (doctor migrates this to `auto`).
|
||||
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
|
||||
- `provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"inworld"`, `"microsoft"`, `"minimax"`, `"openai"`, `"vydra"`, `"xai"`, or `"xiaomi"` (fallback is automatic).
|
||||
- `provider`: speech provider id such as `"elevenlabs"`, `"google"`, `"gradium"`, `"inworld"`, `"microsoft"`, `"minimax"`, `"openai"`, `"volcengine"`, `"vydra"`, `"xai"`, or `"xiaomi"` (fallback is automatic).
|
||||
- If `provider` is **unset**, OpenClaw uses the first configured speech provider in registry auto-select order.
|
||||
- Legacy `provider: "edge"` config is repaired by `openclaw doctor --fix` and
|
||||
rewritten to `provider: "microsoft"`.
|
||||
@@ -461,7 +495,7 @@ Then run:
|
||||
- `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
|
||||
- `timeoutMs`: request timeout (ms).
|
||||
- `prefsPath`: override the local prefs JSON path (provider/limit/summary).
|
||||
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `INWORLD_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`, `XIAOMI_API_KEY`).
|
||||
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `GEMINI_API_KEY`/`GOOGLE_API_KEY`, `GRADIUM_API_KEY`, `INWORLD_API_KEY`, `MINIMAX_API_KEY`, `OPENAI_API_KEY`, `VYDRA_API_KEY`, `XAI_API_KEY`, `XIAOMI_API_KEY`). Volcengine uses `appId`/`token` instead.
|
||||
- `providers.elevenlabs.baseUrl`: override ElevenLabs API base URL.
|
||||
- `providers.openai.baseUrl`: override the OpenAI TTS endpoint.
|
||||
- Resolution order: `messages.tts.providers.openai.baseUrl` -> `OPENAI_TTS_BASE_URL` -> `https://api.openai.com/v1`
|
||||
@@ -497,6 +531,21 @@ Then run:
|
||||
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
||||
- `providers.gradium.baseUrl`: override Gradium API base URL (default `https://api.gradium.ai`).
|
||||
- `providers.gradium.voiceId`: Gradium voice identifier (default Emma, `YTpq7expH9539ERJ`).
|
||||
- `providers.volcengine.apiKey`: BytePlus Seed Speech API key (env:
|
||||
`VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY`).
|
||||
- `providers.volcengine.resourceId`: BytePlus Seed Speech resource id (default
|
||||
`seed-tts-1.0`, env: `VOLCENGINE_TTS_RESOURCE_ID`; use `seed-tts-2.0` when
|
||||
your BytePlus project has TTS 2.0 entitlement).
|
||||
- `providers.volcengine.appKey`: BytePlus Seed Speech app key header (default
|
||||
`aGjiRDfUWi`, env: `VOLCENGINE_TTS_APP_KEY`).
|
||||
- `providers.volcengine.baseUrl`: override the Seed Speech TTS HTTP endpoint
|
||||
(env: `VOLCENGINE_TTS_BASE_URL`).
|
||||
- `providers.volcengine.appId`: legacy Volcengine Speech Console application id (env: `VOLCENGINE_TTS_APPID`).
|
||||
- `providers.volcengine.token`: legacy Volcengine Speech Console access token (env: `VOLCENGINE_TTS_TOKEN`).
|
||||
- `providers.volcengine.cluster`: legacy Volcengine TTS cluster (default `volcano_tts`, env: `VOLCENGINE_TTS_CLUSTER`).
|
||||
- `providers.volcengine.voice`: voice type (default `en_female_anna_mars_bigtts`, env: `VOLCENGINE_TTS_VOICE`).
|
||||
- `providers.volcengine.speedRatio`: provider-native speed ratio.
|
||||
- `providers.volcengine.emotion`: provider-native emotion tag.
|
||||
- `providers.xai.apiKey`: xAI TTS API key (env: `XAI_API_KEY`).
|
||||
- `providers.xai.baseUrl`: override the xAI TTS base URL (default `https://api.x.ai/v1`, env: `XAI_BASE_URL`).
|
||||
- `providers.xai.voiceId`: xAI voice id (default `eve`; current live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`).
|
||||
@@ -550,12 +599,13 @@ Here you go.
|
||||
|
||||
Available directive keys (when enabled):
|
||||
|
||||
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `gradium`, `minimax`, `microsoft`, `vydra`, `xai`, or `xiaomi`; requires `allowProvider: true`)
|
||||
- `voice` (OpenAI, Gradium, or Xiaomi voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / Gradium / MiniMax / xAI)
|
||||
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `gradium`, `minimax`, `microsoft`, `volcengine`, `vydra`, `xai`, or `xiaomi`; requires `allowProvider: true`)
|
||||
- `voice` (OpenAI, Gradium, Volcengine, or Xiaomi voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / Gradium / MiniMax / xAI)
|
||||
- `model` (OpenAI TTS model, ElevenLabs model id, MiniMax model, or Xiaomi MiMo TTS model) or `google_model` (Google TTS model)
|
||||
- `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
|
||||
- `vol` / `volume` (MiniMax volume, 0-10)
|
||||
- `pitch` (MiniMax integer pitch, -12 to 12; fractional values are truncated before the MiniMax request)
|
||||
- `emotion` (Volcengine emotion tag)
|
||||
- `applyTextNormalization` (`auto|on|off`)
|
||||
- `languageCode` (ISO 639-1)
|
||||
- `seed`
|
||||
|
||||
Reference in New Issue
Block a user