mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 07:00:43 +00:00
docs(voice-call): rewrite around Steps Tabs and provider Tabs
The voice-call plugin doc was 664 lines with a flat install/setup walkthrough, three flat 'Realtime' / 'Streaming' / 'TTS' provider config blocks each shown twice, an italicised webhook-security section in Title Case, and a duplicate-Voice Call body H1. Restructure for scan-first reading without losing operational detail: - Wrap Quick start in a Steps component (install -> configure -> verify -> smoke), with the 'install from npm' vs 'install from local folder' choice as a nested Tabs. - Surface the public-webhook-URL constraint as a Warning at the top of Quick start so readers see it before they hit setup. - Move provider exposure caveats, streaming connection caps, and legacy config migration notes into a single AccordionGroup so the Configuration section reads as the canonical config plus collapsible operational details. - Convert the Realtime, Streaming, and TTS provider examples to Tabs with one tab per provider (Google/OpenAI for realtime; OpenAI/xAI for streaming; Core/ElevenLabs/OpenAI override for TTS), removing the previous duplicate-block-per-provider pattern. - Convert the realtime tool-policy bullet list to a 3-row table. - Convert the agent tool action list and gateway RPC list into small tables (action -> args). - Surface inboundPolicy caller-ID weakness, microsoft-not-supported for telephony, and realtime+streaming exclusivity as Warning callouts where they were previously buried inline. - Sentence-case 'Webhook security' (was Title Case), drop the duplicate body H1, and refresh the Related list to alphabetical sentence-case. Provider names, env vars, defaults, models, voice ids, command flags, and field semantics are unchanged. Pure restructure plus Mintlify component upgrades.
This commit is contained in:
@@ -1,63 +1,95 @@
|
||||
---
|
||||
summary: "Voice Call plugin: outbound + inbound calls via Twilio/Telnyx/Plivo (plugin install + config + CLI)"
|
||||
summary: "Place outbound and accept inbound voice calls via Twilio, Telnyx, or Plivo, with optional realtime voice and streaming transcription"
|
||||
read_when:
|
||||
- You want to place an outbound voice call from OpenClaw
|
||||
- You are configuring or developing the voice-call plugin
|
||||
- You need realtime voice or streaming transcription on telephony
|
||||
title: "Voice call plugin"
|
||||
sidebarTitle: "Voice call"
|
||||
---
|
||||
|
||||
Voice calls for OpenClaw via a plugin. Supports outbound notifications and
|
||||
multi-turn conversations with inbound policies.
|
||||
Voice calls for OpenClaw via a plugin. Supports outbound notifications,
|
||||
multi-turn conversations, full-duplex realtime voice, streaming
|
||||
transcription, and inbound calls with allowlist policies.
|
||||
|
||||
Current providers:
|
||||
**Current providers:** `twilio` (Programmable Voice + Media Streams),
|
||||
`telnyx` (Call Control v2), `plivo` (Voice API + XML transfer + GetInput
|
||||
speech), `mock` (dev/no network).
|
||||
|
||||
- `twilio` (Programmable Voice + Media Streams)
|
||||
- `telnyx` (Call Control v2)
|
||||
- `plivo` (Voice API + XML transfer + GetInput speech)
|
||||
- `mock` (dev/no network)
|
||||
<Note>
|
||||
The Voice Call plugin runs **inside the Gateway process**. If you use a
|
||||
remote Gateway, install and configure the plugin on the machine running
|
||||
the Gateway, then restart the Gateway to load it.
|
||||
</Note>
|
||||
|
||||
Quick mental model:
|
||||
## Quick start
|
||||
|
||||
- Install plugin
|
||||
- Restart Gateway
|
||||
- Configure under `plugins.entries.voice-call.config`
|
||||
- Use `openclaw voicecall ...` or the `voice_call` tool
|
||||
<Steps>
|
||||
<Step title="Install the plugin">
|
||||
<Tabs>
|
||||
<Tab title="From npm (recommended)">
|
||||
```bash
|
||||
openclaw plugins install @openclaw/voice-call
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="From a local folder (dev)">
|
||||
```bash
|
||||
PLUGIN_SRC=./path/to/local/voice-call-plugin
|
||||
openclaw plugins install "$PLUGIN_SRC"
|
||||
cd "$PLUGIN_SRC" && pnpm install
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Where it runs (local vs remote)
|
||||
Restart the Gateway afterwards so the plugin loads.
|
||||
|
||||
The Voice Call plugin runs **inside the Gateway process**.
|
||||
</Step>
|
||||
<Step title="Configure provider and webhook">
|
||||
Set config under `plugins.entries.voice-call.config` (see
|
||||
[Configuration](#configuration) below for the full shape). At minimum:
|
||||
`provider`, provider credentials, `fromNumber`, and a publicly
|
||||
reachable webhook URL.
|
||||
</Step>
|
||||
<Step title="Verify setup">
|
||||
```bash
|
||||
openclaw voicecall setup
|
||||
```
|
||||
|
||||
If you use a remote Gateway, install/configure the plugin on the **machine running the Gateway**, then restart the Gateway to load it.
|
||||
The default output is readable in chat logs and terminals. It checks
|
||||
plugin enablement, provider credentials, webhook exposure, and that
|
||||
only one audio mode (`streaming` or `realtime`) is active. Use
|
||||
`--json` for scripts.
|
||||
|
||||
## Install
|
||||
</Step>
|
||||
<Step title="Smoke test">
|
||||
```bash
|
||||
openclaw voicecall smoke
|
||||
openclaw voicecall smoke --to "+15555550123"
|
||||
```
|
||||
|
||||
### Option A: install from npm (recommended)
|
||||
Both are dry runs by default. Add `--yes` to actually place a short
|
||||
outbound notify call:
|
||||
|
||||
```bash
|
||||
openclaw plugins install @openclaw/voice-call
|
||||
```
|
||||
```bash
|
||||
openclaw voicecall smoke --to "+15555550123" --yes
|
||||
```
|
||||
|
||||
Restart the Gateway afterwards.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
### Option B: install from a local folder (dev, no copying)
|
||||
<Warning>
|
||||
For Twilio, Telnyx, and Plivo, setup must resolve to a **public webhook URL**.
|
||||
If `publicUrl`, the tunnel URL, the Tailscale URL, or the serve fallback
|
||||
resolves to loopback or private network space, setup fails instead of
|
||||
starting a provider that cannot receive carrier webhooks.
|
||||
</Warning>
|
||||
|
||||
```bash
|
||||
PLUGIN_SRC=./path/to/local/voice-call-plugin
|
||||
openclaw plugins install "$PLUGIN_SRC"
|
||||
cd "$PLUGIN_SRC" && pnpm install
|
||||
```
|
||||
## Configuration
|
||||
|
||||
Restart the Gateway afterwards.
|
||||
|
||||
## Config
|
||||
|
||||
Set config under `plugins.entries.voice-call.config`:
|
||||
|
||||
If `enabled` is true but the selected provider is missing credentials, Gateway
|
||||
startup logs a setup-incomplete warning with the missing keys and skips starting
|
||||
the runtime. Run `openclaw voicecall setup` to see the same readiness details.
|
||||
Commands, RPC calls, and agent tools still return the exact missing provider
|
||||
configuration when used.
|
||||
If `enabled: true` but the selected provider is missing credentials,
|
||||
Gateway startup logs a setup-incomplete warning with the missing keys and
|
||||
skips starting the runtime. Commands, RPC calls, and agent tools still
|
||||
return the exact missing provider configuration when used.
|
||||
|
||||
```json5
|
||||
{
|
||||
@@ -74,15 +106,13 @@ configuration when used.
|
||||
accountSid: "ACxxxxxxxx",
|
||||
authToken: "...",
|
||||
},
|
||||
|
||||
telnyx: {
|
||||
apiKey: "...",
|
||||
connectionId: "...",
|
||||
// Telnyx webhook public key from the Telnyx Mission Control Portal
|
||||
// (Base64 string; can also be set via TELNYX_PUBLIC_KEY).
|
||||
// Telnyx webhook public key from the Mission Control Portal
|
||||
// (Base64; can also be set via TELNYX_PUBLIC_KEY).
|
||||
publicKey: "...",
|
||||
},
|
||||
|
||||
plivo: {
|
||||
authId: "MAxxxxxxxxxxxxxxxxxxxx",
|
||||
authToken: "...",
|
||||
@@ -103,41 +133,14 @@ configuration when used.
|
||||
// Public exposure (pick one)
|
||||
// publicUrl: "https://example.ngrok.app/voice/webhook",
|
||||
// tunnel: { provider: "ngrok" },
|
||||
// tailscale: { mode: "funnel", path: "/voice/webhook" }
|
||||
// tailscale: { mode: "funnel", path: "/voice/webhook" },
|
||||
|
||||
outbound: {
|
||||
defaultMode: "notify", // notify | conversation
|
||||
},
|
||||
|
||||
streaming: {
|
||||
enabled: true,
|
||||
provider: "openai", // optional; first registered realtime transcription provider when unset
|
||||
streamPath: "/voice/stream",
|
||||
providers: {
|
||||
openai: {
|
||||
apiKey: "sk-...", // optional if OPENAI_API_KEY is set
|
||||
model: "gpt-4o-transcribe",
|
||||
silenceDurationMs: 800,
|
||||
vadThreshold: 0.5,
|
||||
},
|
||||
},
|
||||
preStartTimeoutMs: 5000,
|
||||
maxPendingConnections: 32,
|
||||
maxPendingConnectionsPerIp: 4,
|
||||
maxConnections: 128,
|
||||
},
|
||||
|
||||
realtime: {
|
||||
enabled: false,
|
||||
provider: "google", // optional; first registered realtime voice provider when unset
|
||||
toolPolicy: "safe-read-only",
|
||||
providers: {
|
||||
google: {
|
||||
model: "gemini-2.5-flash-native-audio-preview-12-2025",
|
||||
voice: "Kore",
|
||||
},
|
||||
},
|
||||
},
|
||||
streaming: { enabled: true /* see Streaming transcription */ },
|
||||
realtime: { enabled: false /* see Realtime voice */ },
|
||||
},
|
||||
},
|
||||
},
|
||||
@@ -145,152 +148,135 @@ configuration when used.
|
||||
}
|
||||
```
|
||||
|
||||
Check setup before testing with a real provider:
|
||||
<AccordionGroup>
|
||||
<Accordion title="Provider exposure and security notes">
|
||||
- Twilio, Telnyx, and Plivo all require a **publicly reachable** webhook URL.
|
||||
- `mock` is a local dev provider (no network calls).
|
||||
- Telnyx requires `telnyx.publicKey` (or `TELNYX_PUBLIC_KEY`) unless `skipSignatureVerification` is true.
|
||||
- `skipSignatureVerification` is for local testing only.
|
||||
- On ngrok free tier, set `publicUrl` to the exact ngrok URL; signature verification is always enforced.
|
||||
- `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Local dev only.
|
||||
- Ngrok free-tier URLs can change or add interstitial behaviour; if `publicUrl` drifts, Twilio signatures fail. Production: prefer a stable domain or a Tailscale funnel.
|
||||
</Accordion>
|
||||
<Accordion title="Streaming connection caps">
|
||||
- `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame.
|
||||
- `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets.
|
||||
- `streaming.maxPendingConnectionsPerIp` caps unauthenticated pre-start sockets per source IP.
|
||||
- `streaming.maxConnections` caps total open media stream sockets (pending + active).
|
||||
</Accordion>
|
||||
<Accordion title="Legacy config migrations">
|
||||
Older configs using `provider: "log"`, `twilio.from`, or legacy
|
||||
`streaming.*` OpenAI keys are rewritten by `openclaw doctor --fix`.
|
||||
Runtime fallback still accepts the old voice-call keys for now, but
|
||||
the rewrite path is `openclaw doctor --fix` and the compat shim is
|
||||
temporary.
|
||||
|
||||
```bash
|
||||
openclaw voicecall setup
|
||||
```
|
||||
Auto-migrated streaming keys:
|
||||
|
||||
The default output is readable in chat logs and terminal sessions. It checks
|
||||
whether the plugin is enabled, the provider and credentials are present, webhook
|
||||
exposure is configured, and only one audio mode is active. Use
|
||||
`openclaw voicecall setup --json` for scripts.
|
||||
- `streaming.sttProvider` → `streaming.provider`
|
||||
- `streaming.openaiApiKey` → `streaming.providers.openai.apiKey`
|
||||
- `streaming.sttModel` → `streaming.providers.openai.model`
|
||||
- `streaming.silenceDurationMs` → `streaming.providers.openai.silenceDurationMs`
|
||||
- `streaming.vadThreshold` → `streaming.providers.openai.vadThreshold`
|
||||
|
||||
For Twilio, Telnyx, and Plivo, setup must resolve to a public webhook URL. If the
|
||||
configured `publicUrl`, tunnel URL, Tailscale URL, or serve fallback resolves to
|
||||
loopback or private network space, setup fails instead of starting a provider
|
||||
that cannot receive real carrier webhooks.
|
||||
|
||||
For a no-surprises smoke test, run:
|
||||
|
||||
```bash
|
||||
openclaw voicecall smoke
|
||||
openclaw voicecall smoke --to "+15555550123"
|
||||
```
|
||||
|
||||
The second command is still a dry run. Add `--yes` to place a short outbound
|
||||
notify call:
|
||||
|
||||
```bash
|
||||
openclaw voicecall smoke --to "+15555550123" --yes
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- Twilio/Telnyx require a **publicly reachable** webhook URL.
|
||||
- Plivo requires a **publicly reachable** webhook URL.
|
||||
- `mock` is a local dev provider (no network calls).
|
||||
- If older configs still use `provider: "log"`, `twilio.from`, or legacy `streaming.*` OpenAI keys, run `openclaw doctor --fix` to rewrite them.
|
||||
- Telnyx requires `telnyx.publicKey` (or `TELNYX_PUBLIC_KEY`) unless `skipSignatureVerification` is true.
|
||||
- `skipSignatureVerification` is for local testing only.
|
||||
- If you use ngrok free tier, set `publicUrl` to the exact ngrok URL; signature verification is always enforced.
|
||||
- `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Use for local dev only.
|
||||
- Ngrok free tier URLs can change or add interstitial behavior; if `publicUrl` drifts, Twilio signatures will fail. For production, prefer a stable domain or Tailscale funnel.
|
||||
- `realtime.enabled` starts full voice-to-voice conversations; do not enable it together with `streaming.enabled`.
|
||||
- Streaming security defaults:
|
||||
- `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame.
|
||||
- `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets.
|
||||
- `streaming.maxPendingConnectionsPerIp` caps unauthenticated pre-start sockets per source IP.
|
||||
- `streaming.maxConnections` caps total open media stream sockets (pending + active).
|
||||
- Runtime fallback still accepts those old voice-call keys for now, but the rewrite path is `openclaw doctor --fix` and the compat shim is temporary.
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Realtime voice conversations
|
||||
|
||||
`realtime` selects a full duplex realtime voice provider for live call audio.
|
||||
It is separate from `streaming`, which only forwards audio to realtime
|
||||
transcription providers.
|
||||
`realtime` selects a full-duplex realtime voice provider for live call
|
||||
audio. It is separate from `streaming`, which only forwards audio to
|
||||
realtime transcription providers.
|
||||
|
||||
Current runtime behavior:
|
||||
<Warning>
|
||||
`realtime.enabled` cannot be combined with `streaming.enabled`. Pick one
|
||||
audio mode per call.
|
||||
</Warning>
|
||||
|
||||
Current runtime behaviour:
|
||||
|
||||
- `realtime.enabled` is supported for Twilio Media Streams.
|
||||
- `realtime.enabled` cannot be combined with `streaming.enabled`.
|
||||
- `realtime.provider` is optional. If unset, Voice Call uses the first
|
||||
registered realtime voice provider.
|
||||
- Bundled realtime voice providers include Google Gemini Live (`google`) and
|
||||
OpenAI (`openai`), registered by their provider plugins.
|
||||
- `realtime.provider` is optional. If unset, Voice Call uses the first registered realtime voice provider.
|
||||
- Bundled realtime voice providers: Google Gemini Live (`google`) and OpenAI (`openai`), registered by their provider plugins.
|
||||
- Provider-owned raw config lives under `realtime.providers.<providerId>`.
|
||||
- Voice Call exposes the shared `openclaw_agent_consult` realtime tool by
|
||||
default. The realtime model can call it when the caller asks for deeper
|
||||
reasoning, current information, or normal OpenClaw tools.
|
||||
- `realtime.toolPolicy` controls the consult run:
|
||||
- `safe-read-only`: expose the consult tool and limit the regular agent to
|
||||
`read`, `web_search`, `web_fetch`, `x_search`, `memory_search`, and
|
||||
`memory_get`.
|
||||
- `owner`: expose the consult tool and let the regular agent use the normal
|
||||
agent tool policy.
|
||||
- `none`: do not expose the consult tool. Custom `realtime.tools` are still
|
||||
passed through to the realtime provider.
|
||||
- Consult session keys reuse the existing voice session when available, then
|
||||
fall back to the caller/callee phone number so follow-up consult calls keep
|
||||
context during the call.
|
||||
- If `realtime.provider` points at an unregistered provider, or no realtime
|
||||
voice provider is registered at all, Voice Call logs a warning and skips
|
||||
realtime media instead of failing the whole plugin.
|
||||
- Voice Call exposes the shared `openclaw_agent_consult` realtime tool by default. The realtime model can call it when the caller asks for deeper reasoning, current information, or normal OpenClaw tools.
|
||||
- If `realtime.provider` points at an unregistered provider, or no realtime voice provider is registered at all, Voice Call logs a warning and skips realtime media instead of failing the whole plugin.
|
||||
- Consult session keys reuse the existing voice session when available, then fall back to the caller/callee phone number so follow-up consult calls keep context during the call.
|
||||
|
||||
Google Gemini Live realtime defaults:
|
||||
### Tool policy
|
||||
|
||||
- API key: `realtime.providers.google.apiKey`, `GEMINI_API_KEY`, or
|
||||
`GOOGLE_GENERATIVE_AI_API_KEY`
|
||||
- model: `gemini-2.5-flash-native-audio-preview-12-2025`
|
||||
- voice: `Kore`
|
||||
`realtime.toolPolicy` controls the consult run:
|
||||
|
||||
Example:
|
||||
| Policy | Behavior |
|
||||
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `safe-read-only` | Expose the consult tool and limit the regular agent to `read`, `web_search`, `web_fetch`, `x_search`, `memory_search`, and `memory_get`. |
|
||||
| `owner` | Expose the consult tool and let the regular agent use the normal agent tool policy. |
|
||||
| `none` | Do not expose the consult tool. Custom `realtime.tools` are still passed through to the realtime provider. |
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
provider: "twilio",
|
||||
inboundPolicy: "allowlist",
|
||||
allowFrom: ["+15550005678"],
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "google",
|
||||
instructions: "Speak briefly. Call openclaw_agent_consult before using deeper tools.",
|
||||
toolPolicy: "safe-read-only",
|
||||
providers: {
|
||||
google: {
|
||||
apiKey: "${GEMINI_API_KEY}",
|
||||
model: "gemini-2.5-flash-native-audio-preview-12-2025",
|
||||
voice: "Kore",
|
||||
### Realtime provider examples
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Google Gemini Live">
|
||||
Defaults: API key from `realtime.providers.google.apiKey`,
|
||||
`GEMINI_API_KEY`, or `GOOGLE_GENERATIVE_AI_API_KEY`; model
|
||||
`gemini-2.5-flash-native-audio-preview-12-2025`; voice `Kore`.
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
provider: "twilio",
|
||||
inboundPolicy: "allowlist",
|
||||
allowFrom: ["+15550005678"],
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "google",
|
||||
instructions: "Speak briefly. Call openclaw_agent_consult before using deeper tools.",
|
||||
toolPolicy: "safe-read-only",
|
||||
providers: {
|
||||
google: {
|
||||
apiKey: "${GEMINI_API_KEY}",
|
||||
model: "gemini-2.5-flash-native-audio-preview-12-2025",
|
||||
voice: "Kore",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
|
||||
Use OpenAI instead:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "openai",
|
||||
providers: {
|
||||
openai: {
|
||||
apiKey: "${OPENAI_API_KEY}",
|
||||
</Tab>
|
||||
<Tab title="OpenAI">
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
realtime: {
|
||||
enabled: true,
|
||||
provider: "openai",
|
||||
providers: {
|
||||
openai: { apiKey: "${OPENAI_API_KEY}" },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
See [Google provider](/providers/google) and [OpenAI provider](/providers/openai)
|
||||
for provider-specific realtime voice options.
|
||||
See [Google provider](/providers/google) and
|
||||
[OpenAI provider](/providers/openai) for provider-specific realtime voice
|
||||
options.
|
||||
|
||||
## Streaming transcription
|
||||
|
||||
@@ -298,173 +284,84 @@ for provider-specific realtime voice options.
|
||||
|
||||
Current runtime behavior:
|
||||
|
||||
- `streaming.provider` is optional. If unset, Voice Call uses the first
|
||||
registered realtime transcription provider.
|
||||
- Bundled realtime transcription providers include Deepgram (`deepgram`),
|
||||
ElevenLabs (`elevenlabs`), Mistral (`mistral`), OpenAI (`openai`), and xAI
|
||||
(`xai`), registered by their provider plugins.
|
||||
- `streaming.provider` is optional. If unset, Voice Call uses the first registered realtime transcription provider.
|
||||
- Bundled realtime transcription providers: Deepgram (`deepgram`), ElevenLabs (`elevenlabs`), Mistral (`mistral`), OpenAI (`openai`), and xAI (`xai`), registered by their provider plugins.
|
||||
- Provider-owned raw config lives under `streaming.providers.<providerId>`.
|
||||
- If `streaming.provider` points at an unregistered provider, or no realtime
|
||||
transcription provider is registered at all, Voice Call logs a warning and
|
||||
skips media streaming instead of failing the whole plugin.
|
||||
- If `streaming.provider` points at an unregistered provider, or none is registered, Voice Call logs a warning and skips media streaming instead of failing the whole plugin.
|
||||
|
||||
OpenAI streaming transcription defaults:
|
||||
### Streaming provider examples
|
||||
|
||||
- API key: `streaming.providers.openai.apiKey` or `OPENAI_API_KEY`
|
||||
- model: `gpt-4o-transcribe`
|
||||
- `silenceDurationMs`: `800`
|
||||
- `vadThreshold`: `0.5`
|
||||
<Tabs>
|
||||
<Tab title="OpenAI">
|
||||
Defaults: API key `streaming.providers.openai.apiKey` or
|
||||
`OPENAI_API_KEY`; model `gpt-4o-transcribe`; `silenceDurationMs: 800`;
|
||||
`vadThreshold: 0.5`.
|
||||
|
||||
xAI streaming transcription defaults:
|
||||
|
||||
- API key: `streaming.providers.xai.apiKey` or `XAI_API_KEY`
|
||||
- endpoint: `wss://api.x.ai/v1/stt`
|
||||
- `encoding`: `mulaw`
|
||||
- `sampleRate`: `8000`
|
||||
- `endpointingMs`: `800`
|
||||
- `interimResults`: `true`
|
||||
|
||||
Example:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
streaming: {
|
||||
enabled: true,
|
||||
provider: "openai",
|
||||
streamPath: "/voice/stream",
|
||||
providers: {
|
||||
openai: {
|
||||
apiKey: "sk-...", // optional if OPENAI_API_KEY is set
|
||||
model: "gpt-4o-transcribe",
|
||||
silenceDurationMs: 800,
|
||||
vadThreshold: 0.5,
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
streaming: {
|
||||
enabled: true,
|
||||
provider: "openai",
|
||||
streamPath: "/voice/stream",
|
||||
providers: {
|
||||
openai: {
|
||||
apiKey: "sk-...", // optional if OPENAI_API_KEY is set
|
||||
model: "gpt-4o-transcribe",
|
||||
silenceDurationMs: 800,
|
||||
vadThreshold: 0.5,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
|
||||
Use xAI instead:
|
||||
</Tab>
|
||||
<Tab title="xAI">
|
||||
Defaults: API key `streaming.providers.xai.apiKey` or `XAI_API_KEY`;
|
||||
endpoint `wss://api.x.ai/v1/stt`; encoding `mulaw`; sample rate `8000`;
|
||||
`endpointingMs: 800`; `interimResults: true`.
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
streaming: {
|
||||
enabled: true,
|
||||
provider: "xai",
|
||||
streamPath: "/voice/stream",
|
||||
providers: {
|
||||
xai: {
|
||||
apiKey: "${XAI_API_KEY}", // optional if XAI_API_KEY is set
|
||||
endpointingMs: 800,
|
||||
language: "en",
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
streaming: {
|
||||
enabled: true,
|
||||
provider: "xai",
|
||||
streamPath: "/voice/stream",
|
||||
providers: {
|
||||
xai: {
|
||||
apiKey: "${XAI_API_KEY}", // optional if XAI_API_KEY is set
|
||||
endpointingMs: 800,
|
||||
language: "en",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
|
||||
Legacy keys are still auto-migrated by `openclaw doctor --fix`:
|
||||
|
||||
- `streaming.sttProvider` → `streaming.provider`
|
||||
- `streaming.openaiApiKey` → `streaming.providers.openai.apiKey`
|
||||
- `streaming.sttModel` → `streaming.providers.openai.model`
|
||||
- `streaming.silenceDurationMs` → `streaming.providers.openai.silenceDurationMs`
|
||||
- `streaming.vadThreshold` → `streaming.providers.openai.vadThreshold`
|
||||
|
||||
## Stale call reaper
|
||||
|
||||
Use `staleCallReaperSeconds` to end calls that never receive a terminal webhook
|
||||
(for example, notify-mode calls that never complete). The default is `0`
|
||||
(disabled).
|
||||
|
||||
Recommended ranges:
|
||||
|
||||
- **Production:** `120`–`300` seconds for notify-style flows.
|
||||
- Keep this value **higher than `maxDurationSeconds`** so normal calls can
|
||||
finish. A good starting point is `maxDurationSeconds + 30–60` seconds.
|
||||
|
||||
Example:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
maxDurationSeconds: 300,
|
||||
staleCallReaperSeconds: 360,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Webhook Security
|
||||
|
||||
When a proxy or tunnel sits in front of the Gateway, the plugin reconstructs the
|
||||
public URL for signature verification. These options control which forwarded
|
||||
headers are trusted.
|
||||
|
||||
`webhookSecurity.allowedHosts` allowlists hosts from forwarding headers.
|
||||
|
||||
`webhookSecurity.trustForwardingHeaders` trusts forwarded headers without an allowlist.
|
||||
|
||||
`webhookSecurity.trustedProxyIPs` only trusts forwarded headers when the request
|
||||
remote IP matches the list.
|
||||
|
||||
Webhook replay protection is enabled for Twilio and Plivo. Replayed valid webhook
|
||||
requests are acknowledged but skipped for side effects.
|
||||
|
||||
Twilio conversation turns include a per-turn token in `<Gather>` callbacks, so
|
||||
stale/replayed speech callbacks cannot satisfy a newer pending transcript turn.
|
||||
|
||||
Unauthenticated webhook requests are rejected before body reads when the
|
||||
provider's required signature headers are missing.
|
||||
|
||||
The voice-call webhook uses the shared pre-auth body profile (64 KB / 5 seconds)
|
||||
plus a per-IP in-flight cap before signature verification.
|
||||
|
||||
Example with a stable public host:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
publicUrl: "https://voice.example.com/voice/webhook",
|
||||
webhookSecurity: {
|
||||
allowedHosts: ["voice.example.com"],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## TTS for calls
|
||||
|
||||
Voice Call uses the core `messages.tts` configuration for
|
||||
streaming speech on calls. You can override it under the plugin config with the
|
||||
**same shape** — it deep‑merges with `messages.tts`.
|
||||
Voice Call uses the core `messages.tts` configuration for streaming
|
||||
speech on calls. You can override it under the plugin config with the
|
||||
**same shape** — it deep-merges with `messages.tts`.
|
||||
|
||||
```json5
|
||||
{
|
||||
@@ -480,21 +377,23 @@ streaming speech on calls. You can override it under the plugin config with the
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
<Warning>
|
||||
**Microsoft speech is ignored for voice calls.** Telephony audio needs PCM;
|
||||
the current Microsoft transport does not expose telephony PCM output.
|
||||
</Warning>
|
||||
|
||||
Behavior notes:
|
||||
|
||||
- Legacy `tts.<provider>` keys inside plugin config (`openai`, `elevenlabs`, `microsoft`, `edge`) are repaired by `openclaw doctor --fix`; committed config should use `tts.providers.<provider>`.
|
||||
- **Microsoft speech is ignored for voice calls** (telephony audio needs PCM; the current Microsoft transport does not expose telephony PCM output).
|
||||
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
|
||||
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider-native voices.
|
||||
- If a Twilio media stream is already active, Voice Call does not fall back to TwiML `<Say>`. If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths.
|
||||
- When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from`, `to`, `attempts`) for debugging.
|
||||
- When Twilio barge-in or stream teardown clears the pending TTS queue, queued
|
||||
playback requests settle instead of hanging callers that are awaiting playback
|
||||
completion.
|
||||
- When Twilio barge-in or stream teardown clears the pending TTS queue, queued playback requests settle instead of hanging callers awaiting playback completion.
|
||||
|
||||
### More examples
|
||||
|
||||
Use core TTS only (no override):
|
||||
### TTS examples
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Core TTS only">
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
@@ -507,9 +406,8 @@ Use core TTS only (no override):
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Override to ElevenLabs just for calls (keep core default elsewhere):
|
||||
|
||||
</Tab>
|
||||
<Tab title="Override to ElevenLabs (calls only)">
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
@@ -532,9 +430,8 @@ Override to ElevenLabs just for calls (keep core default elsewhere):
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Override only the OpenAI model for calls (deep‑merge example):
|
||||
|
||||
</Tab>
|
||||
<Tab title="OpenAI model override (deep-merge)">
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
@@ -555,6 +452,8 @@ Override only the OpenAI model for calls (deep‑merge example):
|
||||
},
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Inbound calls
|
||||
|
||||
@@ -568,50 +467,122 @@ Inbound policy defaults to `disabled`. To enable inbound calls, set:
|
||||
}
|
||||
```
|
||||
|
||||
`inboundPolicy: "allowlist"` is a low-assurance caller-ID screen. The plugin
|
||||
normalizes the provider-supplied `From` value and compares it to `allowFrom`.
|
||||
Webhook verification authenticates provider delivery and payload integrity, but
|
||||
it does not prove PSTN/VoIP caller-number ownership. Treat `allowFrom` as
|
||||
caller-ID filtering, not strong caller identity.
|
||||
<Warning>
|
||||
`inboundPolicy: "allowlist"` is a low-assurance caller-ID screen. The
|
||||
plugin normalizes the provider-supplied `From` value and compares it to
|
||||
`allowFrom`. Webhook verification authenticates provider delivery and
|
||||
payload integrity, but it does **not** prove PSTN/VoIP caller-number
|
||||
ownership. Treat `allowFrom` as caller-ID filtering, not strong caller
|
||||
identity.
|
||||
</Warning>
|
||||
|
||||
Auto-responses use the agent system. Tune with:
|
||||
|
||||
- `responseModel`
|
||||
- `responseSystemPrompt`
|
||||
- `responseTimeoutMs`
|
||||
Auto-responses use the agent system. Tune with `responseModel`,
|
||||
`responseSystemPrompt`, and `responseTimeoutMs`.
|
||||
|
||||
### Spoken output contract
|
||||
|
||||
For auto-responses, Voice Call appends a strict spoken-output contract to the system prompt:
|
||||
For auto-responses, Voice Call appends a strict spoken-output contract to
|
||||
the system prompt:
|
||||
|
||||
- `{"spoken":"..."}`
|
||||
```text
|
||||
{"spoken":"..."}
|
||||
```
|
||||
|
||||
Voice Call then extracts speech text defensively:
|
||||
Voice Call extracts speech text defensively:
|
||||
|
||||
- Ignores payloads marked as reasoning/error content.
|
||||
- Parses direct JSON, fenced JSON, or inline `"spoken"` keys.
|
||||
- Falls back to plain text and removes likely planning/meta lead-in paragraphs.
|
||||
|
||||
This keeps spoken playback focused on caller-facing text and avoids leaking planning text into audio.
|
||||
This keeps spoken playback focused on caller-facing text and avoids
|
||||
leaking planning text into audio.
|
||||
|
||||
### Conversation startup behavior
|
||||
|
||||
For outbound `conversation` calls, first-message handling is tied to live playback state:
|
||||
For outbound `conversation` calls, first-message handling is tied to live
|
||||
playback state:
|
||||
|
||||
- Barge-in queue clear and auto-response are suppressed only while the initial greeting is actively speaking.
|
||||
- If initial playback fails, the call returns to `listening` and the initial message remains queued for retry.
|
||||
- Initial playback for Twilio streaming starts on stream connect without extra delay.
|
||||
- Barge-in aborts active playback and clears queued-but-not-yet-playing Twilio
|
||||
TTS entries. Cleared entries resolve as skipped, so follow-up response logic
|
||||
can continue without waiting on audio that will never play.
|
||||
- Realtime voice conversations use the realtime stream's own opening turn. Voice Call does not post a legacy `<Say>` TwiML update for that initial message, so outbound `<Connect><Stream>` sessions stay attached.
|
||||
- Barge-in aborts active playback and clears queued-but-not-yet-playing Twilio TTS entries. Cleared entries resolve as skipped, so follow-up response logic can continue without waiting on audio that will never play.
|
||||
- Realtime voice conversations use the realtime stream's own opening turn. Voice Call does **not** post a legacy `<Say>` TwiML update for that initial message, so outbound `<Connect><Stream>` sessions stay attached.
|
||||
|
||||
### Twilio stream disconnect grace
|
||||
|
||||
When a Twilio media stream disconnects, Voice Call waits `2000ms` before auto-ending the call:
|
||||
When a Twilio media stream disconnects, Voice Call waits **2000 ms** before
|
||||
auto-ending the call:
|
||||
|
||||
- If the stream reconnects during that window, auto-end is canceled.
|
||||
- If no stream is re-registered after the grace period, the call is ended to prevent stuck active calls.
|
||||
- If no stream re-registers after the grace period, the call is ended to prevent stuck active calls.
|
||||
|
||||
## Stale call reaper
|
||||
|
||||
Use `staleCallReaperSeconds` to end calls that never receive a terminal
|
||||
webhook (for example, notify-mode calls that never complete). The default
|
||||
is `0` (disabled).
|
||||
|
||||
Recommended ranges:
|
||||
|
||||
- **Production:** `120`–`300` seconds for notify-style flows.
|
||||
- Keep this value **higher than `maxDurationSeconds`** so normal calls can finish. A good starting point is `maxDurationSeconds + 30–60` seconds.
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
maxDurationSeconds: 300,
|
||||
staleCallReaperSeconds: 360,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Webhook security
|
||||
|
||||
When a proxy or tunnel sits in front of the Gateway, the plugin
|
||||
reconstructs the public URL for signature verification. These options
|
||||
control which forwarded headers are trusted:
|
||||
|
||||
<ParamField path="webhookSecurity.allowedHosts" type="string[]">
|
||||
Allowlist hosts from forwarding headers.
|
||||
</ParamField>
|
||||
<ParamField path="webhookSecurity.trustForwardingHeaders" type="boolean">
|
||||
Trust forwarded headers without an allowlist.
|
||||
</ParamField>
|
||||
<ParamField path="webhookSecurity.trustedProxyIPs" type="string[]">
|
||||
Only trust forwarded headers when the request remote IP matches the list.
|
||||
</ParamField>
|
||||
|
||||
Additional protections:
|
||||
|
||||
- Webhook **replay protection** is enabled for Twilio and Plivo. Replayed valid webhook requests are acknowledged but skipped for side effects.
|
||||
- Twilio conversation turns include a per-turn token in `<Gather>` callbacks, so stale/replayed speech callbacks cannot satisfy a newer pending transcript turn.
|
||||
- Unauthenticated webhook requests are rejected before body reads when the provider's required signature headers are missing.
|
||||
- The voice-call webhook uses the shared pre-auth body profile (64 KB / 5 seconds) plus a per-IP in-flight cap before signature verification.
|
||||
|
||||
Example with a stable public host:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
"voice-call": {
|
||||
config: {
|
||||
publicUrl: "https://voice.example.com/voice/webhook",
|
||||
webhookSecurity: {
|
||||
allowedHosts: ["voice.example.com"],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## CLI
|
||||
|
||||
@@ -624,41 +595,43 @@ openclaw voicecall dtmf --call-id <id> --digits "ww123456#"
|
||||
openclaw voicecall end --call-id <id>
|
||||
openclaw voicecall status --call-id <id>
|
||||
openclaw voicecall tail
|
||||
openclaw voicecall latency # summarize turn latency from logs
|
||||
openclaw voicecall latency # summarize turn latency from logs
|
||||
openclaw voicecall expose --mode funnel
|
||||
```
|
||||
|
||||
`latency` reads `calls.jsonl` from the default voice-call storage path. Use
|
||||
`--file <path>` to point at a different log and `--last <n>` to limit analysis
|
||||
to the last N records (default 200). Output includes p50/p90/p99 for turn
|
||||
latency and listen-wait times.
|
||||
`latency` reads `calls.jsonl` from the default voice-call storage path.
|
||||
Use `--file <path>` to point at a different log and `--last <n>` to limit
|
||||
analysis to the last N records (default 200). Output includes p50/p90/p99
|
||||
for turn latency and listen-wait times.
|
||||
|
||||
## Agent tool
|
||||
|
||||
Tool name: `voice_call`
|
||||
Tool name: `voice_call`.
|
||||
|
||||
Actions:
|
||||
|
||||
- `initiate_call` (message, to?, mode?)
|
||||
- `continue_call` (callId, message)
|
||||
- `speak_to_user` (callId, message)
|
||||
- `send_dtmf` (callId, digits)
|
||||
- `end_call` (callId)
|
||||
- `get_status` (callId)
|
||||
| Action | Args |
|
||||
| --------------- | ------------------------- |
|
||||
| `initiate_call` | `message`, `to?`, `mode?` |
|
||||
| `continue_call` | `callId`, `message` |
|
||||
| `speak_to_user` | `callId`, `message` |
|
||||
| `send_dtmf` | `callId`, `digits` |
|
||||
| `end_call` | `callId` |
|
||||
| `get_status` | `callId` |
|
||||
|
||||
This repo ships a matching skill doc at `skills/voice-call/SKILL.md`.
|
||||
|
||||
## Gateway RPC
|
||||
|
||||
- `voicecall.initiate` (`to?`, `message`, `mode?`)
|
||||
- `voicecall.continue` (`callId`, `message`)
|
||||
- `voicecall.speak` (`callId`, `message`)
|
||||
- `voicecall.dtmf` (`callId`, `digits`)
|
||||
- `voicecall.end` (`callId`)
|
||||
- `voicecall.status` (`callId`)
|
||||
| Method | Args |
|
||||
| -------------------- | ------------------------- |
|
||||
| `voicecall.initiate` | `to?`, `message`, `mode?` |
|
||||
| `voicecall.continue` | `callId`, `message` |
|
||||
| `voicecall.speak` | `callId`, `message` |
|
||||
| `voicecall.dtmf` | `callId`, `digits` |
|
||||
| `voicecall.end` | `callId` |
|
||||
| `voicecall.status` | `callId` |
|
||||
|
||||
## Related
|
||||
|
||||
- [Text-to-speech](/tools/tts)
|
||||
- [Talk mode](/nodes/talk)
|
||||
- [Text-to-speech](/tools/tts)
|
||||
- [Voice wake](/nodes/voicewake)
|
||||
|
||||
Reference in New Issue
Block a user