feat(providers): add streaming stt providers

2026-05-06 14:40:43 +00:00 · 2026-04-23 03:05:44 +01:00
parent 5b68092351
commit 51ed22e608
32 changed files with 2399 additions and 16 deletions
--- a/docs/providers/deepgram.md
+++ b/docs/providers/deepgram.md
@@ -2,18 +2,22 @@
 summary: "Deepgram transcription for inbound voice notes"
 read_when:
  - You want Deepgram speech-to-text for audio attachments
+  - You want Deepgram streaming transcription for Voice Call
  - You need a quick Deepgram config example
 title: "Deepgram"
 ---

 # Deepgram (Audio Transcription)

-Deepgram is a speech-to-text API. In OpenClaw it is used for **inbound audio/voice note
-transcription** via `tools.media.audio`.
+Deepgram is a speech-to-text API. In OpenClaw it is used for inbound
+audio/voice-note transcription through `tools.media.audio` and for Voice Call
+streaming STT through `plugins.entries.voice-call.config.streaming`.

-When enabled, OpenClaw uploads the audio file to Deepgram and injects the transcript
-into the reply pipeline (`{{Transcript}}` + `[Audio]` block). This is **not streaming**;
-it uses the pre-recorded transcription endpoint.
+For batch transcription, OpenClaw uploads the complete audio file to Deepgram
+and injects the transcript into the reply pipeline (`{{Transcript}}` +
+`[Audio]` block). For Voice Call streaming, OpenClaw forwards live G.711
+u-law frames over Deepgram's WebSocket `listen` endpoint and emits partial or
+final transcripts as Deepgram returns them.

 | Detail        | Value                                                      |
 | ------------- | ---------------------------------------------------------- |
@@ -101,6 +105,52 @@ it uses the pre-recorded transcription endpoint.
  </Tab>
 </Tabs>

+## Voice Call streaming STT
+
+The bundled `deepgram` plugin also registers a realtime transcription provider
+for the Voice Call plugin.
+
+| Setting         | Config path                                                             | Default                          |
+| --------------- | ----------------------------------------------------------------------- | -------------------------------- |
+| API key         | `plugins.entries.voice-call.config.streaming.providers.deepgram.apiKey` | Falls back to `DEEPGRAM_API_KEY` |
+| Model           | `...deepgram.model`                                                     | `nova-3`                         |
+| Language        | `...deepgram.language`                                                  | (unset)                          |
+| Encoding        | `...deepgram.encoding`                                                  | `mulaw`                          |
+| Sample rate     | `...deepgram.sampleRate`                                                | `8000`                           |
+| Endpointing     | `...deepgram.endpointingMs`                                             | `800`                            |
+| Interim results | `...deepgram.interimResults`                                            | `true`                           |
+
+```json5
+{
+  plugins: {
+    entries: {
+      "voice-call": {
+        config: {
+          streaming: {
+            enabled: true,
+            provider: "deepgram",
+            providers: {
+              deepgram: {
+                apiKey: "${DEEPGRAM_API_KEY}",
+                model: "nova-3",
+                endpointingMs: 800,
+                language: "en-US",
+              },
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+<Note>
+Voice Call receives telephony audio as 8 kHz G.711 u-law. The Deepgram
+streaming provider defaults to `encoding: "mulaw"` and `sampleRate: 8000`, so
+Twilio media frames can be forwarded directly.
+</Note>
+
 ## Notes

 <AccordionGroup>
@@ -118,12 +168,6 @@ it uses the pre-recorded transcription endpoint.
  </Accordion>
 </AccordionGroup>

-<Note>
-Deepgram transcription is **pre-recorded only** (not real-time streaming). OpenClaw
-uploads the complete audio file and waits for the full transcript before injecting
-it into the conversation.
-</Note>
-
 ## Related

 <CardGroup cols={2}>