Files
openclaw/extensions/voice-call/README.md

182 lines
4.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# @openclaw/voice-call
Official Voice Call plugin for **OpenClaw**.
Providers:
- **Twilio** (Programmable Voice + Media Streams)
- **Telnyx** (Call Control v2)
- **Plivo** (Voice API + XML transfer + GetInput speech)
- **Mock** (dev/no network)
Docs: `https://docs.openclaw.ai/plugins/voice-call`
Plugin system: `https://docs.openclaw.ai/plugin`
## Install (local dev)
### Option A: install via OpenClaw (recommended)
```bash
openclaw plugins install @openclaw/voice-call
```
Restart the Gateway afterwards.
### Option B: copy into your global extensions folder (dev)
```bash
mkdir -p ~/.openclaw/extensions
cp -R extensions/voice-call ~/.openclaw/extensions/voice-call
cd ~/.openclaw/extensions/voice-call && pnpm install
```
## Config
Put under `plugins.entries.voice-call.config`:
```json5
{
provider: "twilio", // or "telnyx" | "plivo" | "mock"
fromNumber: "+15550001234",
toNumber: "+15550005678",
twilio: {
accountSid: "ACxxxxxxxx",
authToken: "your_token",
},
telnyx: {
apiKey: "KEYxxxx",
connectionId: "CONNxxxx",
// Telnyx webhook public key from the Telnyx Mission Control Portal
// (Base64 string; can also be set via TELNYX_PUBLIC_KEY).
publicKey: "...",
},
plivo: {
authId: "MAxxxxxxxxxxxxxxxxxxxx",
authToken: "your_token",
},
// Webhook server
serve: {
port: 3334,
path: "/voice/webhook",
},
// Public exposure (pick one):
// publicUrl: "https://example.ngrok.app/voice/webhook",
// tunnel: { provider: "ngrok" },
// tailscale: { mode: "funnel", path: "/voice/webhook" }
outbound: {
defaultMode: "notify", // or "conversation"
},
streaming: {
enabled: true,
streamPath: "/voice/stream",
preStartTimeoutMs: 5000,
maxPendingConnections: 32,
maxPendingConnectionsPerIp: 4,
maxConnections: 128,
},
}
```
Notes:
- Twilio/Telnyx/Plivo require a **publicly reachable** webhook URL.
- `mock` is a local dev provider (no network calls).
- Telnyx requires `telnyx.publicKey` (or `TELNYX_PUBLIC_KEY`) unless `skipSignatureVerification` is true.
- `tunnel.allowNgrokFreeTierLoopbackBypass: true` allows Twilio webhooks with invalid signatures **only** when `tunnel.provider="ngrok"` and `serve.bind` is loopback (ngrok local agent). Use for local dev only.
Streaming security defaults:
- `streaming.preStartTimeoutMs` closes sockets that never send a valid `start` frame.
- `streaming.maxPendingConnections` caps total unauthenticated pre-start sockets.
- `streaming.maxPendingConnectionsPerIp` caps unauthenticated pre-start sockets per source IP.
- `streaming.maxConnections` caps total open media stream sockets (pending + active).
## Stale call reaper
Use `staleCallReaperSeconds` to end calls that never receive a terminal webhook
(for example, notify-mode calls that never complete). The default is `0`
(disabled).
Recommended ranges:
- **Production:** `120``300` seconds for notify-style flows.
- Keep this value **higher than `maxDurationSeconds`** so normal calls can
finish. A good starting point is `maxDurationSeconds + 3060` seconds.
Example:
```json5
{
staleCallReaperSeconds: 360,
}
```
## TTS for calls
Voice Call uses the core `messages.tts` configuration (OpenAI or ElevenLabs) for
streaming speech on calls. You can override it under the plugin config with the
same shape — overrides deep-merge with `messages.tts`.
```json5
{
tts: {
provider: "openai",
openai: {
voice: "alloy",
},
},
}
```
Notes:
- Edge TTS is ignored for voice calls (telephony audio needs PCM; Edge output is unreliable).
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
## CLI
```bash
openclaw voicecall call --to "+15555550123" --message "Hello from OpenClaw"
openclaw voicecall continue --call-id <id> --message "Any questions?"
openclaw voicecall speak --call-id <id> --message "One moment"
openclaw voicecall end --call-id <id>
openclaw voicecall status --call-id <id>
openclaw voicecall tail
openclaw voicecall expose --mode funnel
```
## Tool
Tool name: `voice_call`
Actions:
- `initiate_call` (message, to?, mode?)
- `continue_call` (callId, message)
- `speak_to_user` (callId, message)
- `end_call` (callId)
- `get_status` (callId)
## Gateway RPC
- `voicecall.initiate` (to?, message, mode?)
- `voicecall.continue` (callId, message)
- `voicecall.speak` (callId, message)
- `voicecall.end` (callId)
- `voicecall.status` (callId)
## Notes
- Uses webhook signature verification for Twilio/Telnyx/Plivo.
- Adds replay protection for Twilio and Plivo webhooks (valid duplicate callbacks are ignored safely).
- Twilio speech turns include a per-turn token so stale/replayed callbacks cannot complete a newer turn.
- `responseModel` / `responseSystemPrompt` control AI auto-responses.
- Media streaming requires `ws` and OpenAI Realtime API key.