feat(voice-call): improve realtime Meet voice agent

* feat(voice-call): inject agent context into realtime voice

* fix(voice-call): stabilize realtime meet audio

* fix(voice-call): delegate realtime consults to agent

* Improve realtime Meet voice consult routing

* Pin voice consult delivery to call session

* Move voice changelog entries to changes

* fix(voice-call): isolate final realtime transcripts

* test(voice-call): trim redundant realtime coverage
This commit is contained in:
scoootscooob
2026-05-05 12:56:31 -07:00
committed by GitHub
parent 782963ae66
commit 79dd65e208
35 changed files with 2088 additions and 137 deletions

View File

@@ -229,6 +229,8 @@ Current runtime behaviour:
- Bundled realtime voice providers: Google Gemini Live (`google`) and OpenAI (`openai`), registered by their provider plugins.
- Provider-owned raw config lives under `realtime.providers.<providerId>`.
- Voice Call exposes the shared `openclaw_agent_consult` realtime tool by default. The realtime model can call it when the caller asks for deeper reasoning, current information, or normal OpenClaw tools.
- `realtime.consultPolicy` optionally adds guidance for when the realtime model should call `openclaw_agent_consult`.
- `realtime.agentContext.enabled` is default-off. When enabled, Voice Call injects a bounded agent identity, system prompt override, and selected workspace-file capsule into the realtime provider instructions at session setup.
- `realtime.fastContext.enabled` is default-off. When enabled, Voice Call first searches indexed memory/session context for the consult question and returns those snippets to the realtime model within `realtime.fastContext.timeoutMs` before falling back to the full consult agent only if `realtime.fastContext.fallbackToConsult` is true.
- If `realtime.provider` points at an unregistered provider, or no realtime voice provider is registered at all, Voice Call logs a warning and skips realtime media instead of failing the whole plugin.
- Consult session keys reuse the stored call session when available, then fall back to the configured `sessionScope` (`per-phone` by default, or `per-call` for isolated calls).
@@ -243,6 +245,51 @@ Current runtime behaviour:
| `owner` | Expose the consult tool and let the regular agent use the normal agent tool policy. |
| `none` | Do not expose the consult tool. Custom `realtime.tools` are still passed through to the realtime provider. |
`realtime.consultPolicy` controls only the realtime model instructions:
| Policy | Guidance |
| ------------- | ----------------------------------------------------------------------------------------------- |
| `auto` | Keep the default prompt and let the provider decide when to call the consult tool. |
| `substantive` | Answer simple conversational glue directly and consult before facts, memory, tools, or context. |
| `always` | Consult before every substantive answer. |
### Agent voice context
Enable `realtime.agentContext` when the voice bridge should sound like the
configured OpenClaw agent without paying a full agent-consult round trip on
ordinary turns. The context capsule is added once when the realtime session is
created, so it does not add per-turn latency. Calls to
`openclaw_agent_consult` still run the full OpenClaw agent and should be used
for tool work, current information, memory lookups, or workspace state.
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
agentId: "main",
realtime: {
enabled: true,
provider: "google",
toolPolicy: "safe-read-only",
consultPolicy: "substantive",
agentContext: {
enabled: true,
maxChars: 6000,
includeIdentity: true,
includeSystemPrompt: true,
includeWorkspaceFiles: true,
files: ["SOUL.md", "IDENTITY.md", "USER.md"],
},
},
},
},
},
},
}
```
### Realtime provider examples
<Tabs>
@@ -268,6 +315,8 @@ Current runtime behaviour:
provider: "google",
instructions: "Speak briefly. Call openclaw_agent_consult before using deeper tools.",
toolPolicy: "safe-read-only",
consultPolicy: "substantive",
agentContext: { enabled: true },
providers: {
google: {
apiKey: "${GEMINI_API_KEY}",