mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-06 06:41:08 +00:00
refactor: move voice-call realtime providers into extensions
This commit is contained in:
@@ -32,6 +32,7 @@ native OpenClaw plugin registers against one or more capability types:
|
||||
| Text inference | `api.registerProvider(...)` | `openai`, `anthropic` |
|
||||
| CLI inference backend | `api.registerCliBackend(...)` | `openai`, `anthropic` |
|
||||
| Speech | `api.registerSpeechProvider(...)` | `elevenlabs`, `microsoft` |
|
||||
| Realtime voice | `api.registerRealtimeVoiceProvider(...)` | `openai` |
|
||||
| Media understanding | `api.registerMediaUnderstandingProvider(...)` | `openai`, `google` |
|
||||
| Image generation | `api.registerImageGenerationProvider(...)` | `openai`, `google` |
|
||||
| Web search | `api.registerWebSearchProvider(...)` | `google` |
|
||||
@@ -239,8 +240,9 @@ Examples:
|
||||
- the bundled `minimax`, `mistral`, `moonshot`, and `zai` plugins own their
|
||||
media-understanding backends
|
||||
- the `voice-call` plugin is a feature plugin: it owns call transport, tools,
|
||||
CLI, routes, and runtime, but it consumes core TTS/STT capability instead of
|
||||
inventing a second speech stack
|
||||
CLI, routes, and Twilio media-stream bridging, but it consumes shared speech
|
||||
plus realtime-transcription and realtime-voice capabilities instead of
|
||||
importing vendor plugins directly
|
||||
|
||||
The intended end state is:
|
||||
|
||||
|
||||
@@ -146,6 +146,7 @@ A single plugin can register any number of capabilities via the `api` object:
|
||||
| CLI inference backend | `api.registerCliBackend(...)` | [CLI Backends](/gateway/cli-backends) |
|
||||
| Channel / messaging | `api.registerChannel(...)` | [Channel Plugins](/plugins/sdk-channel-plugins) |
|
||||
| Speech (TTS/STT) | `api.registerSpeechProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Realtime voice | `api.registerRealtimeVoiceProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Media understanding | `api.registerMediaUnderstandingProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Image generation | `api.registerImageGenerationProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
| Web search | `api.registerWebSearchProvider(...)` | [Provider Plugins](/plugins/sdk-provider-plugins#step-5-add-extra-capabilities) |
|
||||
|
||||
@@ -196,6 +196,8 @@ read without importing the plugin runtime.
|
||||
{
|
||||
"contracts": {
|
||||
"speechProviders": ["openai"],
|
||||
"realtimeTranscriptionProviders": ["openai"],
|
||||
"realtimeVoiceProviders": ["openai"],
|
||||
"mediaUnderstandingProviders": ["openai", "openai-codex"],
|
||||
"imageGenerationProviders": ["openai"],
|
||||
"webSearchProviders": ["gemini"],
|
||||
@@ -206,13 +208,15 @@ read without importing the plugin runtime.
|
||||
|
||||
Each list is optional:
|
||||
|
||||
| Field | Type | What it means |
|
||||
| ----------------------------- | ---------- | -------------------------------------------------------------- |
|
||||
| `speechProviders` | `string[]` | Speech provider ids this plugin owns. |
|
||||
| `mediaUnderstandingProviders` | `string[]` | Media-understanding provider ids this plugin owns. |
|
||||
| `imageGenerationProviders` | `string[]` | Image-generation provider ids this plugin owns. |
|
||||
| `webSearchProviders` | `string[]` | Web-search provider ids this plugin owns. |
|
||||
| `tools` | `string[]` | Agent tool names this plugin owns for bundled contract checks. |
|
||||
| Field | Type | What it means |
|
||||
| -------------------------------- | ---------- | -------------------------------------------------------------- |
|
||||
| `speechProviders` | `string[]` | Speech provider ids this plugin owns. |
|
||||
| `realtimeTranscriptionProviders` | `string[]` | Realtime-transcription provider ids this plugin owns. |
|
||||
| `realtimeVoiceProviders` | `string[]` | Realtime-voice provider ids this plugin owns. |
|
||||
| `mediaUnderstandingProviders` | `string[]` | Media-understanding provider ids this plugin owns. |
|
||||
| `imageGenerationProviders` | `string[]` | Image-generation provider ids this plugin owns. |
|
||||
| `webSearchProviders` | `string[]` | Web-search provider ids this plugin owns. |
|
||||
| `tools` | `string[]` | Agent tool names this plugin owns for bundled contract checks. |
|
||||
|
||||
Legacy top-level `speechProviders`, `mediaUnderstandingProviders`, and
|
||||
`imageGenerationProviders` are deprecated. Use `openclaw doctor --fix` to move
|
||||
|
||||
@@ -128,15 +128,17 @@ methods:
|
||||
|
||||
### Capability registration
|
||||
|
||||
| Method | What it registers |
|
||||
| --------------------------------------------- | ------------------------------ |
|
||||
| `api.registerProvider(...)` | Text inference (LLM) |
|
||||
| `api.registerCliBackend(...)` | Local CLI inference backend |
|
||||
| `api.registerChannel(...)` | Messaging channel |
|
||||
| `api.registerSpeechProvider(...)` | Text-to-speech / STT synthesis |
|
||||
| `api.registerMediaUnderstandingProvider(...)` | Image/audio/video analysis |
|
||||
| `api.registerImageGenerationProvider(...)` | Image generation |
|
||||
| `api.registerWebSearchProvider(...)` | Web search |
|
||||
| Method | What it registers |
|
||||
| ------------------------------------------------ | -------------------------------- |
|
||||
| `api.registerProvider(...)` | Text inference (LLM) |
|
||||
| `api.registerCliBackend(...)` | Local CLI inference backend |
|
||||
| `api.registerChannel(...)` | Messaging channel |
|
||||
| `api.registerSpeechProvider(...)` | Text-to-speech / STT synthesis |
|
||||
| `api.registerRealtimeTranscriptionProvider(...)` | Streaming realtime transcription |
|
||||
| `api.registerRealtimeVoiceProvider(...)` | Duplex realtime voice sessions |
|
||||
| `api.registerMediaUnderstandingProvider(...)` | Image/audio/video analysis |
|
||||
| `api.registerImageGenerationProvider(...)` | Image generation |
|
||||
| `api.registerWebSearchProvider(...)` | Web search |
|
||||
|
||||
### Tools and commands
|
||||
|
||||
|
||||
@@ -324,8 +324,8 @@ API key auth, and dynamic model resolution.
|
||||
|
||||
<Step title="Add extra capabilities (optional)">
|
||||
<a id="step-5-add-extra-capabilities"></a>
|
||||
A provider plugin can register speech, media understanding, image
|
||||
generation, and web search alongside text inference:
|
||||
A provider plugin can register speech, realtime transcription, realtime voice, media
|
||||
understanding, image generation, and web search alongside text inference:
|
||||
|
||||
```typescript
|
||||
register(api) {
|
||||
@@ -343,6 +343,33 @@ API key auth, and dynamic model resolution.
|
||||
}),
|
||||
});
|
||||
|
||||
api.registerRealtimeTranscriptionProvider({
|
||||
id: "acme-ai",
|
||||
label: "Acme Realtime Transcription",
|
||||
isConfigured: () => true,
|
||||
createSession: (req) => ({
|
||||
connect: async () => {},
|
||||
sendAudio: () => {},
|
||||
close: () => {},
|
||||
isConnected: () => true,
|
||||
}),
|
||||
});
|
||||
|
||||
api.registerRealtimeVoiceProvider({
|
||||
id: "acme-ai",
|
||||
label: "Acme Realtime Voice",
|
||||
isConfigured: ({ providerConfig }) => Boolean(providerConfig.apiKey),
|
||||
createBridge: (req) => ({
|
||||
connect: async () => {},
|
||||
sendAudio: () => {},
|
||||
setMediaTimestamp: () => {},
|
||||
submitToolResult: () => {},
|
||||
acknowledgeMark: () => {},
|
||||
close: () => {},
|
||||
isConnected: () => true,
|
||||
}),
|
||||
});
|
||||
|
||||
api.registerMediaUnderstandingProvider({
|
||||
id: "acme-ai",
|
||||
capabilities: ["image", "audio"],
|
||||
|
||||
Reference in New Issue
Block a user