mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 12:30:44 +00:00
refactor(stt): share transcription helpers
This commit is contained in:
@@ -716,6 +716,17 @@ API key auth, and dynamic model resolution.
|
||||
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
|
||||
enough to advertise transform-mode support or disabled modes cleanly.
|
||||
|
||||
Prefer the shared WebSocket helper for streaming STT providers. It keeps
|
||||
proxy capture, reconnect backoff, close flushing, ready handshakes, audio
|
||||
queueing, and close-event diagnostics consistent across providers while
|
||||
leaving provider code responsible for only the upstream event mapping.
|
||||
|
||||
Batch STT providers that POST multipart audio should use
|
||||
`buildAudioTranscriptionFormData(...)` from
|
||||
`openclaw/plugin-sdk/provider-http` together with the provider HTTP request
|
||||
helpers. The form helper normalizes upload filenames, including AAC uploads
|
||||
that need an M4A-style filename for compatible transcription APIs.
|
||||
|
||||
Music-generation providers should follow the same pattern:
|
||||
`generate` for prompt-only generation and `edit` for reference-image-based
|
||||
generation. Flat aggregate fields such as `maxInputImages`,
|
||||
|
||||
Reference in New Issue
Block a user