fix(xai): support video reference images

This commit is contained in:
Peter Steinberger
2026-04-25 18:14:36 +01:00
parent 768bbc7cc0
commit 67506ac2a9
4 changed files with 195 additions and 11 deletions

View File

@@ -132,12 +132,14 @@ Legacy aliases still normalize to the canonical bundled ids:
`video_generate` tool.
- Default video model: `xai/grok-imagine-video`
- Modes: text-to-video, image-to-video, remote video edit, and remote video
extension
- Modes: text-to-video, image-to-video, reference-image generation, remote
video edit, and remote video extension
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`
- Resolutions: `480P`, `720P`
- Duration: 1-15 seconds for generation/image-to-video, 2-10 seconds for
extension
- Duration: 1-15 seconds for generation/image-to-video, 1-10 seconds when
using `reference_image` roles, 2-10 seconds for extension
- Reference-image generation: set `imageRoles` to `reference_image` for
every supplied image; xAI accepts up to 7 such images
<Warning>
Local video buffers are not accepted. Use remote `http(s)` URLs for

View File

@@ -97,7 +97,7 @@ Duplicate prevention: if a video task is already `queued` or `running` for the c
| Runway | `gen4.5` | Yes | 1 image | 1 video | `RUNWAYML_API_SECRET` |
| Together | `Wan-AI/Wan2.2-T2V-A14B` | Yes | 1 image | No | `TOGETHER_API_KEY` |
| Vydra | `veo3` | Yes | 1 image (`kling`) | No | `VYDRA_API_KEY` |
| xAI | `grok-imagine-video` | Yes | 1 image | 1 video | `XAI_API_KEY` |
| xAI | `grok-imagine-video` | Yes | 1 first-frame image or up to 7 `reference_image`s | 1 video | `XAI_API_KEY` |
Some providers accept additional or alternate API key env vars. See individual [provider pages](#related) for details.
@@ -150,7 +150,9 @@ Role hints are forwarded to the provider as-is. Canonical values come from
the `VideoGenerationAssetRole` union but providers may accept additional
role strings. `*Roles` arrays must not have more entries than the
corresponding reference list; off-by-one mistakes fail with a clear error.
Use an empty string to leave a slot unset.
Use an empty string to leave a slot unset. For xAI, set every image role to
`reference_image` to use its `reference_images` generation mode; omit the role
or use `first_frame` for single-image image-to-video.
### Style controls
@@ -326,7 +328,7 @@ entries.
</Accordion>
<Accordion title="xAI">
Supports text-to-video, image-to-video, and remote video edit/extend flows.
Supports text-to-video, single first-frame image-to-video, up to 7 `reference_image` inputs through xAI `reference_images`, and remote video edit/extend flows.
</Accordion>
</AccordionGroup>