mirror of https://github.com/openclaw/openclaw.git synced 2026-04-23 07:01:40 +00:00

Files

Michael Appel afadb7dae6 fix(voice-call): reject oversized realtime WebSocket frames

Reject realtime voice WebSocket frames above 256 KB before JSON parsing or bridge setup, and absorb ws error events so oversized frames close the connection instead of crashing the gateway.

Local verification:
- pnpm test extensions/voice-call/src/webhook/realtime-handler.test.ts
- pnpm check

Thanks @mmaps.

Co-authored-by: mmaps <3399869+mmaps@users.noreply.github.com>

2026-04-10 17:58:44 +01:00

src

fix(voice-call): reject oversized realtime WebSocket frames

2026-04-10 17:58:44 +01:00

api.ts

refactor: privatize bundled sdk facades

2026-03-20 15:56:14 +00:00

CHANGELOG.md

chore: bump version to 2026.4.10

2026-04-09 03:56:22 +01:00

cli-metadata.ts

perf(plugins): narrow boundary compile sdk imports

2026-04-08 08:52:51 +01:00

config-api.ts

refactor(voice-call): migrate legacy config via doctor

2026-04-04 14:06:52 +09:00

index.test.ts

fix: repair test typing for check gate

2026-04-07 20:58:01 +01:00

index.ts

refactor: dedupe trimmed string readers

2026-04-07 22:57:52 +01:00

openclaw.plugin.json

perf(secrets): scope compat migration scans

2026-04-07 09:52:56 +01:00

package.json

chore: bump version to 2026.4.10

2026-04-09 03:56:22 +01:00

README.md

refactor(voice-call): migrate legacy config via doctor

2026-04-04 14:06:52 +09:00

runtime-api.ts

fix(plugin-sdk): remove relative extension boundary escapes (#51939 )

2026-03-21 20:03:18 -05:00

runtime-entry.ts

refactor: add runtime-boundary plugin test seams

2026-03-27 13:46:17 +00:00

setup-api.ts

refactor: dedupe broad record guard

2026-04-07 00:21:12 +01:00

tsconfig.json

chore(plugins): bulk add package boundary tsconfig rollout

2026-04-07 08:48:23 +01:00

README.md

@openclaw/voice-call

Official Voice Call plugin for OpenClaw.

Providers:

Twilio (Programmable Voice + Media Streams)
Telnyx (Call Control v2)
Plivo (Voice API + XML transfer + GetInput speech)
Mock (dev/no network)

Docs: https://docs.openclaw.ai/plugins/voice-call Plugin system: https://docs.openclaw.ai/plugin

Install (local dev)

Option A: install via OpenClaw (recommended)

openclaw plugins install @openclaw/voice-call

Restart the Gateway afterwards.

Option B: copy into your global extensions folder (dev)

PLUGIN_HOME=~/.openclaw/extensions
mkdir -p "$PLUGIN_HOME"
cp -R <local-plugin-checkout> "$PLUGIN_HOME/voice-call"
cd "$PLUGIN_HOME/voice-call" && pnpm install

Config

Put under plugins.entries.voice-call.config:

{
  provider: "twilio", // or "telnyx" | "plivo" | "mock"
  fromNumber: "+15550001234",
  toNumber: "+15550005678",

  twilio: {
    accountSid: "ACxxxxxxxx",
    authToken: "your_token",
  },

  telnyx: {
    apiKey: "KEYxxxx",
    connectionId: "CONNxxxx",
    // Telnyx webhook public key from the Telnyx Mission Control Portal
    // (Base64 string; can also be set via TELNYX_PUBLIC_KEY).
    publicKey: "...",
  },

  plivo: {
    authId: "MAxxxxxxxxxxxxxxxxxxxx",
    authToken: "your_token",
  },

  // Webhook server
  serve: {
    port: 3334,
    path: "/voice/webhook",
  },

  // Public exposure (pick one):
  // publicUrl: "https://example.ngrok.app/voice/webhook",
  // tunnel: { provider: "ngrok" },
  // tailscale: { mode: "funnel", path: "/voice/webhook" }

  outbound: {
    defaultMode: "notify", // or "conversation"
  },

  streaming: {
    enabled: true,
    // optional; if omitted, Voice Call picks the first registered
    // realtime-transcription provider by autoSelectOrder
    provider: "<realtime-transcription-provider-id>",
    streamPath: "/voice/stream",
    providers: {
      "<realtime-transcription-provider-id>": {
        // provider-owned options
      },
    },
    preStartTimeoutMs: 5000,
    maxPendingConnections: 32,
    maxPendingConnectionsPerIp: 4,
    maxConnections: 128,
  },
}

Notes:

Twilio/Telnyx/Plivo require a publicly reachable webhook URL.
mock is a local dev provider (no network calls).
Telnyx requires telnyx.publicKey (or TELNYX_PUBLIC_KEY) unless skipSignatureVerification is true.
If older configs still use provider: "log", twilio.from, or legacy streaming.* OpenAI keys, run openclaw doctor --fix to rewrite them.
advanced webhook, streaming, and tunnel notes: https://docs.openclaw.ai/plugins/voice-call
responseModel is optional. When unset, voice responses use the runtime default model.

Stale call reaper

See the plugin docs for recommended ranges and production examples: https://docs.openclaw.ai/plugins/voice-call#stale-call-reaper

TTS for calls

Voice Call uses the core messages.tts configuration for streaming speech on calls. Override examples and provider caveats live here: https://docs.openclaw.ai/plugins/voice-call#tts-for-calls

CLI

openclaw voicecall call --to "+15555550123" --message "Hello from OpenClaw"
openclaw voicecall continue --call-id <id> --message "Any questions?"
openclaw voicecall speak --call-id <id> --message "One moment"
openclaw voicecall end --call-id <id>
openclaw voicecall status --call-id <id>
openclaw voicecall tail
openclaw voicecall expose --mode funnel

Tool

Tool name: voice_call

Actions:

initiate_call (message, to?, mode?)
continue_call (callId, message)
speak_to_user (callId, message)
end_call (callId)
get_status (callId)

Gateway RPC

voicecall.initiate (to?, message, mode?)
voicecall.continue (callId, message)
voicecall.speak (callId, message)
voicecall.end (callId)
voicecall.status (callId)

Notes

Uses webhook signature verification for Twilio/Telnyx/Plivo.
Adds replay protection for Twilio and Plivo webhooks (valid duplicate callbacks are ignored safely).
Twilio speech turns include a per-turn token so stale/replayed callbacks cannot complete a newer turn.
responseModel / responseSystemPrompt control AI auto-responses.
Voice-call auto-responses enforce a spoken JSON contract ({"spoken":"..."}) and filter reasoning/meta output before playback.
While a Twilio stream is active, playback does not fall back to TwiML <Say>; stream-TTS failures fail the playback request.
Outbound conversation calls suppress barge-in only while the initial greeting is actively speaking, then re-enable normal interruption.
Twilio stream disconnect auto-end uses a short grace window so quick reconnects do not end the call.
Realtime provider selection is generic. Configure streaming.provider / realtime.provider and put provider-owned options under providers.<id>.
Runtime fallback still accepts the old voice-call keys for now, but migration is a doctor step and the compat shim is scheduled to go away in a future release.