refactor(stt): share transcription helpers

This commit is contained in:
Peter Steinberger
2026-04-23 04:29:19 +01:00
parent a58633d809
commit c866820fed
24 changed files with 360 additions and 779 deletions

View File

@@ -716,6 +716,17 @@ API key auth, and dynamic model resolution.
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
enough to advertise transform-mode support or disabled modes cleanly.
Prefer the shared WebSocket helper for streaming STT providers. It keeps
proxy capture, reconnect backoff, close flushing, ready handshakes, audio
queueing, and close-event diagnostics consistent across providers while
leaving provider code responsible for only the upstream event mapping.
Batch STT providers that POST multipart audio should use
`buildAudioTranscriptionFormData(...)` from
`openclaw/plugin-sdk/provider-http` together with the provider HTTP request
helpers. The form helper normalizes upload filenames, including AAC uploads
that need an M4A-style filename for compatible transcription APIs.
Music-generation providers should follow the same pattern:
`generate` for prompt-only generation and `edit` for reference-image-based
generation. Flat aggregate fields such as `maxInputImages`,