mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 17:40:44 +00:00
docs: outline unified talk API
This commit is contained in:
@@ -197,6 +197,9 @@ Node commands must pass two gates before they can be invoked:
|
||||
|
||||
Windows and macOS companion nodes allow safe declared commands such as
|
||||
`canvas.*`, `camera.list`, `location.get`, and `screen.snapshot` by default.
|
||||
Trusted nodes that advertise the `talk` capability or declare `talk.*` commands
|
||||
also allow declared push-to-talk commands (`talk.ptt.start`, `talk.ptt.stop`,
|
||||
`talk.ptt.cancel`, `talk.ptt.once`) by default, independent of platform label.
|
||||
Dangerous or privacy-heavy commands such as `camera.snap`, `camera.clip`, and
|
||||
`screen.record` still require explicit opt-in with
|
||||
`gateway.nodes.allowCommands`. `gateway.nodes.denyCommands` always wins over
|
||||
|
||||
@@ -1,18 +1,28 @@
|
||||
---
|
||||
summary: "Talk mode: continuous speech conversations with configured TTS providers"
|
||||
summary: "Talk mode: continuous speech conversations across local STT/TTS and realtime voice"
|
||||
read_when:
|
||||
- Implementing Talk mode on macOS/iOS/Android
|
||||
- Changing voice/TTS/interrupt behavior
|
||||
title: "Talk mode"
|
||||
---
|
||||
|
||||
Talk mode is a continuous voice conversation loop:
|
||||
Talk mode has two runtime shapes:
|
||||
|
||||
- Native macOS/iOS/Android Talk uses local speech recognition, Gateway chat, and `talk.speak` TTS. Nodes advertise the `talk` capability and declare the `talk.*` commands they support.
|
||||
- Browser Talk uses `talk.realtime.session` with canonical transports: `webrtc`, `provider-websocket`, or `gateway-relay`. `managed-room` is reserved for Gateway handoff rooms.
|
||||
- Transcription-only clients use `talk.transcription.session` plus `talk.transcription.relayAudio`, `talk.transcription.relayCancel`, and `talk.transcription.relayStop` when they need captions or dictation without an assistant voice response.
|
||||
|
||||
Native Talk is a continuous voice conversation loop:
|
||||
|
||||
1. Listen for speech
|
||||
2. Send transcript to the model (main session, chat.send)
|
||||
2. Send transcript to the model through the active session
|
||||
3. Wait for the response
|
||||
4. Speak it via the configured Talk provider (`talk.speak`)
|
||||
|
||||
Browser realtime Talk forwards provider tool calls through `talk.realtime.toolCall`; browser clients do not call `chat.send` directly for realtime consults.
|
||||
|
||||
Transcription-only Talk emits the same common Talk event envelope as realtime and STT/TTS sessions, but uses `mode: "transcription"` and `brain: "none"`. It is for captions, dictation, and observe-only speech capture; one-shot uploaded voice notes still use the media/audio path.
|
||||
|
||||
## Behavior (macOS)
|
||||
|
||||
- **Always-on overlay** while Talk mode is enabled.
|
||||
@@ -66,6 +76,19 @@ Supported keys:
|
||||
speechLocale: "ru-RU",
|
||||
silenceTimeoutMs: 1500,
|
||||
interruptOnSpeech: true,
|
||||
realtime: {
|
||||
provider: "openai",
|
||||
providers: {
|
||||
openai: {
|
||||
apiKey: "openai_api_key",
|
||||
model: "gpt-realtime",
|
||||
voice: "alloy",
|
||||
},
|
||||
},
|
||||
mode: "realtime",
|
||||
transport: "webrtc",
|
||||
brain: "agent-consult",
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
@@ -79,6 +102,11 @@ Defaults:
|
||||
- `providers.elevenlabs.modelId`: defaults to `eleven_v3` when unset.
|
||||
- `providers.mlx.modelId`: defaults to `mlx-community/Soprano-80M-bf16` when unset.
|
||||
- `providers.elevenlabs.apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available).
|
||||
- `realtime.provider`: selects the active browser/server realtime voice provider. Use `openai` for WebRTC, `google` for provider WebSocket, or a bridge-only provider through Gateway relay.
|
||||
- `realtime.providers.<provider>` stores provider-owned realtime config. The browser receives only ephemeral or constrained session credentials, never a standard API key.
|
||||
- `realtime.brain`: `agent-consult` routes realtime tool calls through Gateway policy; `direct-tools` is owner-only compatibility behavior; `none` is for transcription or external orchestration.
|
||||
- `talk.catalog` exposes each provider's valid modes, transports, brain strategies, realtime audio formats, and capability flags so first-party Talk clients can avoid unsupported combinations.
|
||||
- Streaming transcription providers are discovered through `talk.catalog.transcription`. The current Gateway relay uses the Voice Call streaming provider config until the dedicated Talk transcription config surface is added.
|
||||
- `speechLocale`: optional BCP 47 locale id for on-device Talk speech recognition on iOS/macOS. Leave unset to use the device default.
|
||||
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
|
||||
|
||||
@@ -103,7 +131,9 @@ Defaults:
|
||||
## Notes
|
||||
|
||||
- Requires Speech + Microphone permissions.
|
||||
- Uses `chat.send` against session key `main`.
|
||||
- Native Talk uses the active Gateway session and only falls back to history polling when response events are unavailable.
|
||||
- Browser realtime Talk uses `talk.realtime.toolCall` for `openclaw_agent_consult` instead of exposing `chat.send` to provider-owned browser sessions.
|
||||
- Transcription-only Talk uses `talk.transcription.session`, `talk.transcription.relayAudio`, `talk.transcription.relayCancel`, and `talk.transcription.relayStop`; clients subscribe to `talk.transcription.relay` events for partial/final transcript updates.
|
||||
- The gateway resolves Talk playback through `talk.speak` using the active Talk provider. Android falls back to local system TTS only when that RPC is unavailable.
|
||||
- macOS local MLX playback uses the bundled `openclaw-mlx-tts` helper when present, or an executable on `PATH`. Set `OPENCLAW_MLX_TTS_BIN` to point at a custom helper binary during development.
|
||||
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
|
||||
|
||||
Reference in New Issue
Block a user