feat: add browser realtime talk transports

This commit is contained in:
Peter Steinberger
2026-04-27 14:21:38 +01:00
parent 5dd1e264eb
commit 93bbbe5e37
26 changed files with 2607 additions and 319 deletions

View File

@@ -352,11 +352,17 @@ SDK rejects language-code hints on this API path.
</Note>
<Note>
Control UI Talk browser sessions still require a realtime voice provider with a
browser WebRTC session implementation. Today that path is OpenAI Realtime; the
Google provider is for backend realtime bridges.
Control UI Talk supports Google Live browser sessions with constrained one-use
tokens. Backend-only realtime voice providers can also run through the generic
Gateway relay transport, which keeps provider credentials on the Gateway.
</Note>
For maintainer live verification, run
`OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts`.
The Google leg mints the same constrained Live API token shape used by Control
UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload,
and waits for `setupComplete`.
## Advanced configuration
<AccordionGroup>

View File

@@ -546,7 +546,17 @@ Legacy `plugins.entries.openai.config.personality` is still read as a compatibil
| API key | `...openai.apiKey` | Falls back to `OPENAI_API_KEY` |
<Note>
Supports Azure OpenAI via `azureEndpoint` and `azureDeployment` config keys. Supports bidirectional tool calling. Uses G.711 u-law audio format.
Supports Azure OpenAI via `azureEndpoint` and `azureDeployment` config keys for backend realtime bridges. Supports bidirectional tool calling. Uses G.711 u-law audio format.
</Note>
<Note>
Control UI Talk uses OpenAI browser realtime sessions with a Gateway-minted
ephemeral client secret and a direct browser WebRTC SDP exchange against the
OpenAI Realtime API. Maintainer live verification is available with
`OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts`;
the OpenAI leg mints a client secret in Node, generates a browser SDP offer
with fake microphone media, posts it to OpenAI, and applies the SDP answer
without logging secrets.
</Note>
</Accordion>