fix(voice-call): stabilize Twilio STT startup (#75257)

Fix Twilio voice-call startup so accepted media streams register immediately, realtime transcription readiness gates only the initial greeting, and early inbound media is preserved while STT connects.

Fixes #75197.
Thanks @PfanP and @donkeykong91.
This commit is contained in:
Ben
2026-05-01 14:25:36 +09:00
committed by GitHub
parent 4ea0556f64
commit e8f9c3e6de
6 changed files with 268 additions and 14 deletions

View File

@@ -297,6 +297,7 @@ Current runtime behavior:
- `streaming.provider` is optional. If unset, Voice Call uses the first registered realtime transcription provider.
- Bundled realtime transcription providers: Deepgram (`deepgram`), ElevenLabs (`elevenlabs`), Mistral (`mistral`), OpenAI (`openai`), and xAI (`xai`), registered by their provider plugins.
- Provider-owned raw config lives under `streaming.providers.<providerId>`.
- After Twilio sends an accepted stream `start` message, Voice Call registers the stream immediately, queues inbound media through the transcription provider while the provider connects, and starts the initial greeting only after realtime transcription is ready.
- If `streaming.provider` points at an unregistered provider, or none is registered, Voice Call logs a warning and skips media streaming instead of failing the whole plugin.
### Streaming provider examples