refactor(stt): share transcription helpers

This commit is contained in:
Peter Steinberger
2026-04-23 04:29:19 +01:00
parent a58633d809
commit c866820fed
24 changed files with 360 additions and 779 deletions

View File

@@ -279,7 +279,7 @@ Current bundled provider examples:
| `plugin-sdk/provider-model-shared` | Shared provider model/replay helpers | `ProviderReplayFamily`, `buildProviderReplayFamilyHooks`, `normalizeModelCompat`, shared replay-policy builders, provider-endpoint helpers, and model-id normalization helpers |
| `plugin-sdk/provider-catalog-shared` | Shared provider catalog helpers | `findCatalogTemplate`, `buildSingleProviderApiKeyCatalog`, `supportsNativeStreamingUsageCompat`, `applyProviderNativeStreamingUsageCompat` |
| `plugin-sdk/provider-onboard` | Provider onboarding patches | Onboarding config helpers |
| `plugin-sdk/provider-http` | Provider HTTP helpers | Generic provider HTTP/endpoint capability helpers |
| `plugin-sdk/provider-http` | Provider HTTP helpers | Generic provider HTTP/endpoint capability helpers, including audio transcription multipart form helpers |
| `plugin-sdk/provider-web-fetch` | Provider web-fetch helpers | Web-fetch provider registration/cache helpers |
| `plugin-sdk/provider-web-search-config-contract` | Provider web-search config helpers | Narrow web-search config/credential helpers for providers that do not need plugin-enable wiring |
| `plugin-sdk/provider-web-search-contract` | Provider web-search contract helpers | Narrow web-search config/credential contract helpers such as `createWebSearchProviderContractFields`, `enablePluginInConfig`, `resolveProviderWebSearchPluginConfig`, and scoped credential setters/getters |

View File

@@ -137,7 +137,7 @@ explicitly promotes one as public.
| `plugin-sdk/provider-auth` | `createProviderApiKeyAuthMethod`, `ensureApiKeyFromOptionEnvOrPrompt`, `upsertAuthProfile`, `upsertApiKeyProfile`, `writeOAuthCredentials` |
| `plugin-sdk/provider-model-shared` | `ProviderReplayFamily`, `buildProviderReplayFamilyHooks`, `normalizeModelCompat`, shared replay-policy builders, provider-endpoint helpers, and model-id normalization helpers such as `normalizeNativeXaiModelId` |
| `plugin-sdk/provider-catalog-shared` | `findCatalogTemplate`, `buildSingleProviderApiKeyCatalog`, `supportsNativeStreamingUsageCompat`, `applyProviderNativeStreamingUsageCompat` |
| `plugin-sdk/provider-http` | Generic provider HTTP/endpoint capability helpers |
| `plugin-sdk/provider-http` | Generic provider HTTP/endpoint capability helpers, including audio transcription multipart form helpers |
| `plugin-sdk/provider-web-fetch-contract` | Narrow web-fetch config/selection contract helpers such as `enablePluginInConfig` and `WebFetchProviderPlugin` |
| `plugin-sdk/provider-web-fetch` | Web-fetch provider registration/cache helpers |
| `plugin-sdk/provider-web-search-config-contract` | Narrow web-search config/credential helpers for providers that do not need plugin-enable wiring |

View File

@@ -716,6 +716,17 @@ API key auth, and dynamic model resolution.
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
enough to advertise transform-mode support or disabled modes cleanly.
Prefer the shared WebSocket helper for streaming STT providers. It keeps
proxy capture, reconnect backoff, close flushing, ready handshakes, audio
queueing, and close-event diagnostics consistent across providers while
leaving provider code responsible for only the upstream event mapping.
Batch STT providers that POST multipart audio should use
`buildAudioTranscriptionFormData(...)` from
`openclaw/plugin-sdk/provider-http` together with the provider HTTP request
helpers. The form helper normalizes upload filenames, including AAC uploads
that need an M4A-style filename for compatible transcription APIs.
Music-generation providers should follow the same pattern:
`generate` for prompt-only generation and `edit` for reference-image-based
generation. Flat aggregate fields such as `maxInputImages`,

View File

@@ -155,7 +155,8 @@ Current runtime behavior:
- `streaming.provider` is optional. If unset, Voice Call uses the first
registered realtime transcription provider.
- Bundled realtime transcription providers include OpenAI (`openai`) and xAI
- Bundled realtime transcription providers include Deepgram (`deepgram`),
ElevenLabs (`elevenlabs`), Mistral (`mistral`), OpenAI (`openai`), and xAI
(`xai`), registered by their provider plugins.
- Provider-owned raw config lives under `streaming.providers.<providerId>`.
- If `streaming.provider` points at an unregistered provider, or no realtime