feat: add xai speech-to-text support

This commit is contained in:
Peter Steinberger
2026-04-23 00:46:19 +01:00
parent 2bec189174
commit 012841816d
14 changed files with 307 additions and 30 deletions

View File

@@ -41,7 +41,7 @@ This table shows which providers support which media capabilities across the pla
| Runway | | Yes | | | | |
| Together | | Yes | | | | |
| Vydra | Yes | Yes | | | | |
| xAI | Yes | Yes | | Yes | | |
| xAI | Yes | Yes | | Yes | Yes | Yes |
<Note>
Media understanding uses any vision-capable or audio-capable model registered in your provider config. The table above highlights providers with dedicated media-understanding support; most LLM providers with multimodal models (Anthropic, Google, OpenAI, etc.) can also understand inbound media when configured as the active reply model.
@@ -51,10 +51,10 @@ Media understanding uses any vision-capable or audio-capable model registered in
Video and music generation run as background tasks because provider processing typically takes 30 seconds to several minutes. When the agent calls `video_generate` or `music_generate`, OpenClaw submits the request to the provider, returns a task ID immediately, and tracks the job in the task ledger. The agent continues responding to other messages while the job runs. When the provider finishes, OpenClaw wakes the agent so it can post the finished media back into the original channel. Image generation and TTS are synchronous and complete inline with the reply.
xAI currently maps to OpenClaw's image, video, search, code-execution, and
batch TTS surfaces. xAI STT and Realtime voice are upstream capabilities, but
they are not registered in OpenClaw until the shared transcription and realtime
voice contracts can represent them.
xAI currently maps to OpenClaw's image, video, search, code-execution, batch
TTS, and batch STT surfaces. xAI streaming STT and Realtime voice are upstream
capabilities, but they are not registered in OpenClaw until the shared
streaming transcription and realtime voice contracts can represent them.
## Quick links