mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 05:40:44 +00:00
feat: add xai media providers
Add xAI image generation and text-to-speech provider support with docs, live tests, and guarded provider HTTP handling.\n\nThanks @KateWilkins.
This commit is contained in:
@@ -9,6 +9,7 @@ Docs: https://docs.openclaw.ai
|
||||
- OpenAI/Responses: use OpenAI's native `web_search` tool automatically for direct OpenAI Responses models when web search is enabled and no managed search provider is pinned; explicit providers such as Brave keep the managed `web_search` tool.
|
||||
- ACPX: add an explicit `openClawToolsMcpBridge` option that injects a core OpenClaw MCP server for selected built-in tools, starting with `cron`.
|
||||
- Providers/GPT-5: move the GPT-5 prompt overlay into the shared provider runtime so compatible GPT-5 models receive the same behavior and heartbeat guidance through OpenAI, OpenRouter, OpenCode, Codex, and other GPT providers; add `agents.defaults.promptOverlays.gpt5.personality` as the global friendly-style toggle while keeping the OpenAI plugin setting as a fallback.
|
||||
- Providers/xAI: add image generation and text-to-speech support, including `grok-imagine-image` / `grok-imagine-image-pro`, reference-image edits, six live xAI voices, and MP3/WAV/PCM/G.711 TTS formats. (#68694) Thanks @KateWilkins.
|
||||
- Models/commands: add `/models add <provider> <modelId>` so you can register a model from chat and use it without restarting the gateway; keep `/models` as a simple provider browser while adding clearer add guidance and copy-friendly command examples. (#70211) Thanks @Takhoffman.
|
||||
- Pi/models: update the bundled pi packages to `0.68.1` and let the OpenCode Go catalog come from pi instead of plugin-maintained model aliases, adding the refreshed `opencode-go/kimi-k2.6`, Qwen, GLM, MiMo, and MiniMax entries.
|
||||
- CLI/doctor plugins: lazy-load doctor plugin paths and prefer installed plugin `dist/*` runtime entries over source-adjacent JavaScript fallbacks, reducing the measured `doctor --non-interactive` runtime by about 74% while keeping cold doctor startup on built plugin artifacts. (#69840) Thanks @gumadeiras.
|
||||
|
||||
@@ -781,10 +781,11 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
|
||||
- Current bundled providers covered:
|
||||
- `openai`
|
||||
- `google`
|
||||
- `xai`
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google,xai"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview,xai/grok-imagine-image"`
|
||||
- `OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit,xai:default-generate,xai:default-edit"`
|
||||
- Optional auth behavior:
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
|
||||
|
||||
|
||||
@@ -63,6 +63,32 @@ they follow the same API shape.
|
||||
current image-capable Grok refs in the bundled catalog.
|
||||
</Tip>
|
||||
|
||||
## OpenClaw feature coverage
|
||||
|
||||
The bundled plugin maps xAI's current public API surface onto OpenClaw's shared
|
||||
provider and tool contracts where the behavior fits cleanly.
|
||||
|
||||
| xAI capability | OpenClaw surface | Status |
|
||||
| -------------------------- | -------------------------------------- | ------------------------------------------------------------------- |
|
||||
| Chat / Responses | `xai/<model>` model provider | Yes |
|
||||
| Server-side web search | `web_search` provider `grok` | Yes |
|
||||
| Server-side X search | `x_search` tool | Yes |
|
||||
| Server-side code execution | `code_execution` tool | Yes |
|
||||
| Images | `image_generate` | Yes |
|
||||
| Videos | `video_generate` | Yes |
|
||||
| Batch text-to-speech | `messages.tts.provider: "xai"` / `tts` | Yes |
|
||||
| Streaming TTS | — | Not exposed; OpenClaw's TTS contract returns complete audio buffers |
|
||||
| Speech-to-text | — | Not exposed yet; needs a transcription provider surface |
|
||||
| Realtime voice | — | Not exposed yet; different session/WebSocket contract |
|
||||
| Files / batches | Generic model API compatibility only | Not a first-class OpenClaw tool |
|
||||
|
||||
<Note>
|
||||
OpenClaw uses xAI's REST image/video/TTS APIs for media generation and the
|
||||
Responses API for model, search, and code-execution tools. Features that need
|
||||
new OpenClaw contracts, such as streaming STT or Realtime voice sessions, are
|
||||
documented here as upstream capabilities rather than hidden plugin behavior.
|
||||
</Note>
|
||||
|
||||
### Fast-mode mappings
|
||||
|
||||
`/fast on` or `agents.defaults.models["xai/<model>"].params.fastMode: true`
|
||||
@@ -103,12 +129,17 @@ Legacy aliases still normalize to the canonical bundled ids:
|
||||
`video_generate` tool.
|
||||
|
||||
- Default video model: `xai/grok-imagine-video`
|
||||
- Modes: text-to-video, image-to-video, and remote video edit/extend flows
|
||||
- Supports `aspectRatio` and `resolution`
|
||||
- Modes: text-to-video, image-to-video, remote video edit, and remote video
|
||||
extension
|
||||
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`
|
||||
- Resolutions: `480P`, `720P`
|
||||
- Duration: 1-15 seconds for generation/image-to-video, 2-10 seconds for
|
||||
extension
|
||||
|
||||
<Warning>
|
||||
Local video buffers are not accepted. Use remote `http(s)` URLs for
|
||||
video-reference and edit inputs.
|
||||
video edit/extend inputs. Image-to-video accepts local image buffers because
|
||||
OpenClaw can encode those as data URLs for xAI.
|
||||
</Warning>
|
||||
|
||||
To use xAI as the default video provider:
|
||||
@@ -132,6 +163,82 @@ Legacy aliases still normalize to the canonical bundled ids:
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Image generation">
|
||||
The bundled `xai` plugin registers image generation through the shared
|
||||
`image_generate` tool.
|
||||
|
||||
- Default image model: `xai/grok-imagine-image`
|
||||
- Additional model: `xai/grok-imagine-image-pro`
|
||||
- Modes: text-to-image and reference-image edit
|
||||
- Reference inputs: one `image` or up to five `images`
|
||||
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
|
||||
- Resolutions: `1K`, `2K`
|
||||
- Count: up to 4 images
|
||||
|
||||
OpenClaw asks xAI for `b64_json` image responses so generated media can be
|
||||
stored and delivered through the normal channel attachment path. Local
|
||||
reference images are converted to data URLs; remote `http(s)` references are
|
||||
passed through.
|
||||
|
||||
To use xAI as the default image provider:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "xai/grok-imagine-image",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
xAI also documents `quality`, `mask`, `user`, and additional native ratios
|
||||
such as `1:2`, `2:1`, `9:20`, and `20:9`. OpenClaw forwards only the
|
||||
shared cross-provider image controls today; unsupported native-only knobs
|
||||
are intentionally not exposed through `image_generate`.
|
||||
</Note>
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Text-to-speech">
|
||||
The bundled `xai` plugin registers text-to-speech through the shared `tts`
|
||||
provider surface.
|
||||
|
||||
- Voices: `eve`, `ara`, `rex`, `sal`, `leo`, `una`
|
||||
- Default voice: `eve`
|
||||
- Formats: `mp3`, `wav`, `pcm`, `mulaw`, `alaw`
|
||||
- Language: BCP-47 code or `auto`
|
||||
- Speed: provider-native speed override
|
||||
- Native Opus voice-note format is not supported
|
||||
|
||||
To use xAI as the default TTS provider:
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
provider: "xai",
|
||||
providers: {
|
||||
xai: {
|
||||
voiceId: "eve",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
OpenClaw uses xAI's batch `/v1/tts` endpoint. xAI also offers streaming TTS
|
||||
over WebSocket, but the OpenClaw speech provider contract currently expects
|
||||
a complete audio buffer before reply delivery.
|
||||
</Note>
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="x_search configuration">
|
||||
The bundled xAI plugin exposes `x_search` as an OpenClaw tool for searching
|
||||
X (formerly Twitter) content via Grok.
|
||||
@@ -209,6 +316,12 @@ Legacy aliases still normalize to the canonical bundled ids:
|
||||
- `grok-4.20-multi-agent-experimental-beta-0304` is not supported on the
|
||||
normal xAI provider path because it requires a different upstream API
|
||||
surface than the standard OpenClaw xAI transport.
|
||||
- xAI STT and Realtime voice are not registered as OpenClaw providers yet.
|
||||
They require transcription/session contracts rather than the existing
|
||||
batch TTS provider shape.
|
||||
- xAI image `quality`, image `mask`, and extra native-only aspect ratios are
|
||||
not exposed until the shared `image_generate` tool has corresponding
|
||||
cross-provider controls.
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Advanced notes">
|
||||
@@ -229,6 +342,23 @@ Legacy aliases still normalize to the canonical bundled ids:
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Live testing
|
||||
|
||||
The xAI media paths are covered by unit tests and opt-in live suites. The live
|
||||
commands load secrets from your login shell, including `~/.profile`, before
|
||||
probing `XAI_API_KEY`.
|
||||
|
||||
```bash
|
||||
pnpm test extensions/xai
|
||||
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=1 pnpm test:live -- extensions/xai/xai.live.test.ts
|
||||
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=1 OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS=xai pnpm test:live -- test/image-generation.runtime.live.test.ts
|
||||
```
|
||||
|
||||
The provider-specific live file synthesizes normal TTS, telephony-friendly PCM
|
||||
TTS, text-to-image generation, and reference-image editing. The shared image
|
||||
live file verifies the same xAI provider through OpenClaw's runtime selection,
|
||||
fallback, normalization, and media attachment path.
|
||||
|
||||
## Related
|
||||
|
||||
<CardGroup cols={2}>
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI, Vydra)"
|
||||
summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI, Vydra, xAI)"
|
||||
read_when:
|
||||
- Generating images via the agent
|
||||
- Configuring image generation providers and models
|
||||
@@ -46,6 +46,7 @@ The agent calls `image_generate` automatically. No tool allow-listing needed —
|
||||
| MiniMax | `image-01` | Yes (subject reference) | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
|
||||
| ComfyUI | `workflow` | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud |
|
||||
| Vydra | `grok-imagine` | No | `VYDRA_API_KEY` |
|
||||
| xAI | `grok-imagine-image` | Yes (up to 5 images) | `XAI_API_KEY` |
|
||||
|
||||
Use `action: "list"` to inspect available providers and models at runtime:
|
||||
|
||||
@@ -115,13 +116,13 @@ Notes:
|
||||
|
||||
### Image editing
|
||||
|
||||
OpenAI, Google, fal, MiniMax, and ComfyUI support editing reference images. Pass a reference image path or URL:
|
||||
OpenAI, Google, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:
|
||||
|
||||
```
|
||||
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
|
||||
```
|
||||
|
||||
OpenAI and Google support up to 5 reference images via the `images` parameter. fal, MiniMax, and ComfyUI support 1.
|
||||
OpenAI, Google, and xAI support up to 5 reference images via the `images` parameter. fal, MiniMax, and ComfyUI support 1.
|
||||
|
||||
### OpenAI `gpt-image-2`
|
||||
|
||||
@@ -166,13 +167,29 @@ MiniMax image generation is available through both bundled MiniMax auth paths:
|
||||
|
||||
## Provider capabilities
|
||||
|
||||
| Capability | OpenAI | Google | fal | MiniMax | ComfyUI | Vydra |
|
||||
| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- | ------- |
|
||||
| Generate | Yes (up to 4) | Yes (up to 4) | Yes (up to 4) | Yes (up to 9) | Yes (workflow-defined outputs) | Yes (1) |
|
||||
| Edit/reference | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image) | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) | No |
|
||||
| Size control | Yes (up to 4K) | Yes | Yes | No | No | No |
|
||||
| Aspect ratio | No | Yes | Yes (generate only) | Yes | No | No |
|
||||
| Resolution (1K/2K/4K) | No | Yes | Yes | No | No | No |
|
||||
| Capability | OpenAI | Google | fal | MiniMax | ComfyUI | Vydra | xAI |
|
||||
| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- | ------- | -------------------- |
|
||||
| Generate | Yes (up to 4) | Yes (up to 4) | Yes (up to 4) | Yes (up to 9) | Yes (workflow-defined outputs) | Yes (1) | Yes (up to 4) |
|
||||
| Edit/reference | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image) | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) | No | Yes (up to 5 images) |
|
||||
| Size control | Yes (up to 4K) | Yes | Yes | No | No | No | No |
|
||||
| Aspect ratio | No | Yes | Yes (generate only) | Yes | No | No | Yes |
|
||||
| Resolution (1K/2K/4K) | No | Yes | Yes | No | No | No | Yes (1K/2K) |
|
||||
|
||||
### xAI `grok-imagine-image`
|
||||
|
||||
The bundled xAI provider uses `/v1/images/generations` for prompt-only requests
|
||||
and `/v1/images/edits` when `image` or `images` is present.
|
||||
|
||||
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro`
|
||||
- Count: up to 4
|
||||
- References: one `image` or up to five `images`
|
||||
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
|
||||
- Resolutions: `1K`, `2K`
|
||||
- Outputs: returned as OpenClaw-managed image attachments
|
||||
|
||||
OpenClaw intentionally does not expose xAI-native `quality`, `mask`, `user`, or
|
||||
extra native-only aspect ratios until those controls exist in the shared
|
||||
cross-provider `image_generate` contract.
|
||||
|
||||
## Related
|
||||
|
||||
@@ -183,5 +200,6 @@ MiniMax image generation is available through both bundled MiniMax auth paths:
|
||||
- [MiniMax](/providers/minimax) — MiniMax image provider setup
|
||||
- [OpenAI](/providers/openai) — OpenAI Images provider setup
|
||||
- [Vydra](/providers/vydra) — Vydra image, video, and speech setup
|
||||
- [xAI](/providers/xai) — Grok image, video, search, code execution, and TTS setup
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `imageGenerationModel` config
|
||||
- [Models](/concepts/models) — model configuration and failover
|
||||
|
||||
@@ -15,10 +15,10 @@ OpenClaw generates images, videos, and music, understands inbound media (images,
|
||||
|
||||
| Capability | Tool | Providers | What it does |
|
||||
| -------------------- | ---------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
|
||||
| Image generation | `image_generate` | ComfyUI, fal, Google, MiniMax, OpenAI, Vydra | Creates or edits images from text prompts or references |
|
||||
| Image generation | `image_generate` | ComfyUI, fal, Google, MiniMax, OpenAI, Vydra, xAI | Creates or edits images from text prompts or references |
|
||||
| Video generation | `video_generate` | Alibaba, BytePlus, ComfyUI, fal, Google, MiniMax, OpenAI, Qwen, Runway, Together, Vydra, xAI | Creates videos from text, images, or existing videos |
|
||||
| Music generation | `music_generate` | ComfyUI, Google, MiniMax | Creates music or audio tracks from text prompts |
|
||||
| Text-to-speech (TTS) | `tts` | ElevenLabs, Microsoft, MiniMax, OpenAI | Converts outbound replies to spoken audio |
|
||||
| Text-to-speech (TTS) | `tts` | ElevenLabs, Microsoft, MiniMax, OpenAI, xAI | Converts outbound replies to spoken audio |
|
||||
| Media understanding | (automatic) | Any vision/audio-capable model provider, plus CLI fallbacks | Summarizes inbound images, audio, and video |
|
||||
|
||||
## Provider capability matrix
|
||||
@@ -41,7 +41,7 @@ This table shows which providers support which media capabilities across the pla
|
||||
| Runway | | Yes | | | | |
|
||||
| Together | | Yes | | | | |
|
||||
| Vydra | Yes | Yes | | | | |
|
||||
| xAI | | Yes | | | | |
|
||||
| xAI | Yes | Yes | | Yes | | |
|
||||
|
||||
<Note>
|
||||
Media understanding uses any vision-capable or audio-capable model registered in your provider config. The table above highlights providers with dedicated media-understanding support; most LLM providers with multimodal models (Anthropic, Google, OpenAI, etc.) can also understand inbound media when configured as the active reply model.
|
||||
@@ -51,6 +51,11 @@ Media understanding uses any vision-capable or audio-capable model registered in
|
||||
|
||||
Video and music generation run as background tasks because provider processing typically takes 30 seconds to several minutes. When the agent calls `video_generate` or `music_generate`, OpenClaw submits the request to the provider, returns a task ID immediately, and tracks the job in the task ledger. The agent continues responding to other messages while the job runs. When the provider finishes, OpenClaw wakes the agent so it can post the finished media back into the original channel. Image generation and TTS are synchronous and complete inline with the reply.
|
||||
|
||||
xAI currently maps to OpenClaw's image, video, search, code-execution, and
|
||||
batch TTS surfaces. xAI STT and Realtime voice are upstream capabilities, but
|
||||
they are not registered in OpenClaw until the shared transcription and realtime
|
||||
voice contracts can represent them.
|
||||
|
||||
## Quick links
|
||||
|
||||
- [Image Generation](/tools/image-generation) -- generating and editing images
|
||||
|
||||
@@ -9,7 +9,7 @@ title: "Text-to-Speech"
|
||||
|
||||
# Text-to-speech (TTS)
|
||||
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Microsoft, MiniMax, or OpenAI.
|
||||
OpenClaw can convert outbound replies into audio using ElevenLabs, Google Gemini, Microsoft, MiniMax, OpenAI, or xAI.
|
||||
It works anywhere OpenClaw can send audio.
|
||||
|
||||
## Supported services
|
||||
@@ -19,6 +19,7 @@ It works anywhere OpenClaw can send audio.
|
||||
- **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`)
|
||||
- **MiniMax** (primary or fallback provider; uses the T2A v2 API)
|
||||
- **OpenAI** (primary or fallback provider; also used for summaries)
|
||||
- **xAI** (primary or fallback provider; uses the xAI TTS API)
|
||||
|
||||
### Microsoft speech notes
|
||||
|
||||
@@ -35,12 +36,13 @@ or ElevenLabs.
|
||||
|
||||
## Optional keys
|
||||
|
||||
If you want OpenAI, ElevenLabs, Google Gemini, or MiniMax:
|
||||
If you want OpenAI, ElevenLabs, Google Gemini, MiniMax, or xAI:
|
||||
|
||||
- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
|
||||
- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`)
|
||||
- `MINIMAX_API_KEY`
|
||||
- `OPENAI_API_KEY`
|
||||
- `XAI_API_KEY`
|
||||
|
||||
Microsoft speech does **not** require an API key.
|
||||
|
||||
@@ -57,6 +59,7 @@ so that provider must also be authenticated if you enable summaries.
|
||||
- [MiniMax T2A v2 API](https://platform.minimaxi.com/document/T2A%20V2)
|
||||
- [node-edge-tts](https://github.com/SchneeHertz/node-edge-tts)
|
||||
- [Microsoft Speech output formats](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech#audio-outputs)
|
||||
- [xAI Text to Speech](https://docs.x.ai/developers/rest-api-reference/inference/voice#text-to-speech-rest)
|
||||
|
||||
## Is it enabled by default?
|
||||
|
||||
@@ -198,6 +201,33 @@ by the bundled Google image-generation provider. Resolution order is
|
||||
`messages.tts.providers.google.apiKey` -> `models.providers.google.apiKey` ->
|
||||
`GEMINI_API_KEY` -> `GOOGLE_API_KEY`.
|
||||
|
||||
### xAI primary
|
||||
|
||||
```json5
|
||||
{
|
||||
messages: {
|
||||
tts: {
|
||||
auto: "always",
|
||||
provider: "xai",
|
||||
providers: {
|
||||
xai: {
|
||||
apiKey: "xai_api_key",
|
||||
voiceId: "eve",
|
||||
language: "en",
|
||||
responseFormat: "mp3",
|
||||
speed: 1.0,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
xAI TTS uses the same `XAI_API_KEY` path as the bundled Grok model provider.
|
||||
Resolution order is `messages.tts.providers.xai.apiKey` -> `XAI_API_KEY`.
|
||||
Current live voices are `ara`, `eve`, `leo`, `rex`, `sal`, and `una`; `eve` is
|
||||
the default. `language` accepts a BCP-47 tag or `auto`.
|
||||
|
||||
### Disable Microsoft speech
|
||||
|
||||
```json5
|
||||
@@ -300,6 +330,12 @@ Then run:
|
||||
- `providers.google.voiceName`: Gemini prebuilt voice name (default `Kore`; `voice` is also accepted).
|
||||
- `providers.google.baseUrl`: override the Gemini API base URL. Only `https://generativelanguage.googleapis.com` is accepted.
|
||||
- If `messages.tts.providers.google.apiKey` is omitted, TTS can reuse `models.providers.google.apiKey` before env fallback.
|
||||
- `providers.xai.apiKey`: xAI TTS API key (env: `XAI_API_KEY`).
|
||||
- `providers.xai.baseUrl`: override the xAI TTS base URL (default `https://api.x.ai/v1`, env: `XAI_BASE_URL`).
|
||||
- `providers.xai.voiceId`: xAI voice id (default `eve`; current live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`).
|
||||
- `providers.xai.language`: BCP-47 language code or `auto` (default `en`).
|
||||
- `providers.xai.responseFormat`: `mp3`, `wav`, `pcm`, `mulaw`, or `alaw` (default `mp3`).
|
||||
- `providers.xai.speed`: provider-native speed override.
|
||||
- `providers.microsoft.enabled`: allow Microsoft speech usage (default `true`; no API key).
|
||||
- `providers.microsoft.voice`: Microsoft neural voice name (e.g. `en-US-MichelleNeural`).
|
||||
- `providers.microsoft.lang`: language code (e.g. `en-US`).
|
||||
@@ -335,7 +371,7 @@ Here you go.
|
||||
Available directive keys (when enabled):
|
||||
|
||||
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, `google`, `minimax`, or `microsoft`; requires `allowProvider: true`)
|
||||
- `voice` (OpenAI voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / MiniMax)
|
||||
- `voice` (OpenAI voice), `voiceName` / `voice_name` / `google_voice` (Google voice), or `voiceId` (ElevenLabs / MiniMax / xAI)
|
||||
- `model` (OpenAI TTS model, ElevenLabs model id, or MiniMax model) or `google_model` (Google TTS model)
|
||||
- `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
|
||||
- `vol` / `volume` (MiniMax volume, 0-10)
|
||||
@@ -397,6 +433,7 @@ These override `messages.tts.*` for that host.
|
||||
- 44.1kHz / 128kbps is the default balance for speech clarity.
|
||||
- **MiniMax**: MP3 (`speech-2.8-hd` model, 32kHz sample rate). Voice-note format not natively supported; use OpenAI or ElevenLabs for guaranteed Opus voice messages.
|
||||
- **Google Gemini**: Gemini API TTS returns raw 24kHz PCM. OpenClaw wraps it as WAV for audio attachments and returns PCM directly for Talk/telephony. Native Opus voice-note format is not supported by this path.
|
||||
- **xAI**: MP3 by default; `responseFormat` may be `mp3`, `wav`, `pcm`, `mulaw`, or `alaw`. OpenClaw uses xAI's batch REST TTS endpoint and returns a complete audio attachment; xAI's streaming TTS WebSocket is not used by this provider path. Native Opus voice-note format is not supported by this path.
|
||||
- **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
|
||||
- The bundled transport accepts an `outputFormat`, but not all formats are available from the service.
|
||||
- Output format values follow Microsoft Speech output formats (including Ogg/WebM Opus).
|
||||
|
||||
@@ -11,15 +11,18 @@ import { readStringValue } from "openclaw/plugin-sdk/text-runtime";
|
||||
|
||||
export { buildXaiProvider } from "./provider-catalog.js";
|
||||
export { applyXaiConfig, applyXaiProviderConfig } from "./onboard.js";
|
||||
export { buildXaiImageGenerationProvider } from "./image-generation-provider.js";
|
||||
export {
|
||||
buildXaiCatalogModels,
|
||||
buildXaiModelDefinition,
|
||||
resolveXaiCatalogEntry,
|
||||
XAI_BASE_URL,
|
||||
XAI_DEFAULT_CONTEXT_WINDOW,
|
||||
XAI_DEFAULT_IMAGE_MODEL,
|
||||
XAI_DEFAULT_MODEL_ID,
|
||||
XAI_DEFAULT_MODEL_REF,
|
||||
XAI_DEFAULT_MAX_TOKENS,
|
||||
XAI_IMAGE_MODELS,
|
||||
} from "./model-definitions.js";
|
||||
export { isModernXaiModel, resolveXaiForwardCompatModel } from "./provider-models.js";
|
||||
export {
|
||||
@@ -88,3 +91,18 @@ export function resolveXaiTransport(params: {
|
||||
baseUrl: readStringValue(params.baseUrl),
|
||||
};
|
||||
}
|
||||
|
||||
export function resolveXaiBaseUrl(baseUrlOrConfig?: unknown): string {
|
||||
let candidate = baseUrlOrConfig;
|
||||
if (
|
||||
baseUrlOrConfig &&
|
||||
typeof baseUrlOrConfig === "object" &&
|
||||
!Array.isArray(baseUrlOrConfig) &&
|
||||
"cfg" in baseUrlOrConfig
|
||||
) {
|
||||
candidate =
|
||||
(baseUrlOrConfig as { cfg?: { models?: { providers?: { xai?: { baseUrl?: unknown } } } } })
|
||||
.cfg?.models?.providers?.xai?.baseUrl ?? baseUrlOrConfig;
|
||||
}
|
||||
return readStringValue(candidate) || "https://api.x.ai/v1";
|
||||
}
|
||||
|
||||
177
extensions/xai/image-generation-provider.test.ts
Normal file
177
extensions/xai/image-generation-provider.test.ts
Normal file
@@ -0,0 +1,177 @@
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { buildXaiImageGenerationProvider } from "./image-generation-provider.js";
|
||||
|
||||
const {
|
||||
resolveApiKeyForProviderMock,
|
||||
postJsonRequestMock,
|
||||
assertOkOrThrowHttpErrorMock,
|
||||
resolveProviderHttpRequestConfigMock,
|
||||
createProviderOperationDeadlineMock,
|
||||
resolveProviderOperationTimeoutMsMock,
|
||||
} = vi.hoisted(() => ({
|
||||
resolveApiKeyForProviderMock: vi.fn(async () => ({ apiKey: "xai-key" })),
|
||||
postJsonRequestMock: vi.fn(),
|
||||
assertOkOrThrowHttpErrorMock: vi.fn(async () => {}),
|
||||
resolveProviderHttpRequestConfigMock: vi.fn((params: Record<string, unknown>) => ({
|
||||
baseUrl: params.baseUrl ?? params.defaultBaseUrl ?? "https://api.x.ai/v1",
|
||||
allowPrivateNetwork: false,
|
||||
headers: new Headers(params.defaultHeaders as HeadersInit | undefined),
|
||||
dispatcherPolicy: undefined,
|
||||
})),
|
||||
createProviderOperationDeadlineMock: vi.fn((params: Record<string, unknown>) => ({
|
||||
timeoutMs: params.timeoutMs,
|
||||
label: params.label,
|
||||
})),
|
||||
resolveProviderOperationTimeoutMsMock: vi.fn(
|
||||
(params: Record<string, unknown>) => params.defaultTimeoutMs ?? 60000,
|
||||
),
|
||||
}));
|
||||
|
||||
vi.mock("openclaw/plugin-sdk/provider-auth-runtime", () => ({
|
||||
resolveApiKeyForProvider: resolveApiKeyForProviderMock,
|
||||
}));
|
||||
|
||||
vi.mock("openclaw/plugin-sdk/provider-http", () => ({
|
||||
assertOkOrThrowHttpError: assertOkOrThrowHttpErrorMock,
|
||||
createProviderOperationDeadline: createProviderOperationDeadlineMock,
|
||||
postJsonRequest: postJsonRequestMock,
|
||||
resolveProviderHttpRequestConfig: resolveProviderHttpRequestConfigMock,
|
||||
resolveProviderOperationTimeoutMs: resolveProviderOperationTimeoutMsMock,
|
||||
}));
|
||||
|
||||
vi.mock("openclaw/plugin-sdk/text-runtime", () => ({
|
||||
normalizeOptionalString: (v: unknown) => (typeof v === "string" ? v.trim() : undefined),
|
||||
normalizeOptionalLowercaseString: (v: unknown) =>
|
||||
typeof v === "string" ? v.trim().toLowerCase() : undefined,
|
||||
readStringValue: (v: unknown) => (typeof v === "string" ? v.trim() : undefined),
|
||||
}));
|
||||
|
||||
describe("xai image generation provider", () => {
|
||||
afterEach(() => {
|
||||
resolveApiKeyForProviderMock.mockClear();
|
||||
postJsonRequestMock.mockReset();
|
||||
assertOkOrThrowHttpErrorMock.mockClear();
|
||||
resolveProviderHttpRequestConfigMock.mockClear();
|
||||
createProviderOperationDeadlineMock.mockClear();
|
||||
resolveProviderOperationTimeoutMsMock.mockClear();
|
||||
});
|
||||
|
||||
it("builds provider with correct models, default, and capabilities", () => {
|
||||
const provider = buildXaiImageGenerationProvider();
|
||||
expect(provider.id).toBe("xai");
|
||||
expect(provider.label).toBe("xAI");
|
||||
expect(provider.defaultModel).toBe("grok-imagine-image");
|
||||
expect(provider.models).toEqual(["grok-imagine-image", "grok-imagine-image-pro"]);
|
||||
expect(provider.capabilities.generate.maxCount).toBe(4);
|
||||
expect(provider.capabilities.generate.supportsAspectRatio).toBe(true);
|
||||
expect(provider.capabilities.geometry?.aspectRatios).toEqual([
|
||||
"1:1",
|
||||
"16:9",
|
||||
"9:16",
|
||||
"4:3",
|
||||
"3:4",
|
||||
"2:3",
|
||||
"3:2",
|
||||
]);
|
||||
expect(provider.capabilities.edit.enabled).toBe(true);
|
||||
expect(provider.capabilities.edit.maxInputImages).toBe(5);
|
||||
expect(provider.isConfigured).toBeDefined();
|
||||
expect(provider.generateImage).toBeDefined();
|
||||
});
|
||||
|
||||
it("uses main provider URL and resolves auth for generation", async () => {
|
||||
postJsonRequestMock.mockResolvedValue({
|
||||
response: {
|
||||
json: async () => ({
|
||||
data: [{ b64_json: Buffer.from("testpng").toString("base64") }],
|
||||
}),
|
||||
},
|
||||
release: vi.fn(async () => {}),
|
||||
});
|
||||
|
||||
const provider = buildXaiImageGenerationProvider();
|
||||
await provider.generateImage({
|
||||
provider: "xai",
|
||||
model: "grok-imagine-image",
|
||||
prompt: "test prompt",
|
||||
aspectRatio: "2:3",
|
||||
resolution: "2K",
|
||||
cfg: {
|
||||
models: {
|
||||
providers: {
|
||||
xai: {
|
||||
baseUrl: "https://custom.x.ai/v1",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
} as any);
|
||||
|
||||
expect(resolveApiKeyForProviderMock).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ provider: "xai" }),
|
||||
);
|
||||
expect(resolveProviderHttpRequestConfigMock).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
provider: "xai",
|
||||
capability: "image",
|
||||
baseUrl: "https://custom.x.ai/v1",
|
||||
}),
|
||||
);
|
||||
expect(postJsonRequestMock).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
url: expect.stringContaining("/images/generations"),
|
||||
body: expect.objectContaining({
|
||||
aspect_ratio: "2:3",
|
||||
resolution: "2k",
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("supports edit with exact user-provided payload format including image object with type image_url", async () => {
|
||||
postJsonRequestMock.mockResolvedValue({
|
||||
response: {
|
||||
json: async () => ({
|
||||
data: [
|
||||
{
|
||||
b64_json:
|
||||
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNkYGD4z0ABAAEfAG0B0xMAAAAASUVORK5CYII=",
|
||||
mime_type: "image/png",
|
||||
},
|
||||
],
|
||||
}),
|
||||
},
|
||||
release: vi.fn(async () => {}),
|
||||
});
|
||||
|
||||
const provider = buildXaiImageGenerationProvider();
|
||||
const buffer = Buffer.from("fakeimage");
|
||||
await provider.generateImage({
|
||||
provider: "xai",
|
||||
model: "grok-imagine-image-pro",
|
||||
prompt: "Render this as a pencil sketch with detailed shading",
|
||||
inputImages: [
|
||||
{
|
||||
buffer,
|
||||
mimeType: "image/png",
|
||||
},
|
||||
],
|
||||
cfg: {},
|
||||
} as any);
|
||||
|
||||
expect(postJsonRequestMock).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
url: expect.stringContaining("/images/edits"),
|
||||
body: expect.objectContaining({
|
||||
model: "grok-imagine-image-pro",
|
||||
prompt: "Render this as a pencil sketch with detailed shading",
|
||||
image: {
|
||||
url: expect.stringContaining("data:image/png;base64,"),
|
||||
type: "image_url",
|
||||
},
|
||||
response_format: "b64_json",
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
});
|
||||
220
extensions/xai/image-generation-provider.ts
Normal file
220
extensions/xai/image-generation-provider.ts
Normal file
@@ -0,0 +1,220 @@
|
||||
import type {
|
||||
GeneratedImageAsset,
|
||||
ImageGenerationProvider,
|
||||
ImageGenerationRequest,
|
||||
ImageGenerationResult,
|
||||
} from "openclaw/plugin-sdk/image-generation";
|
||||
import { isProviderApiKeyConfigured } from "openclaw/plugin-sdk/provider-auth";
|
||||
import { resolveApiKeyForProvider } from "openclaw/plugin-sdk/provider-auth-runtime";
|
||||
import {
|
||||
assertOkOrThrowHttpError,
|
||||
createProviderOperationDeadline,
|
||||
postJsonRequest,
|
||||
resolveProviderHttpRequestConfig,
|
||||
resolveProviderOperationTimeoutMs,
|
||||
} from "openclaw/plugin-sdk/provider-http";
|
||||
import {
|
||||
normalizeOptionalLowercaseString,
|
||||
normalizeOptionalString,
|
||||
} from "openclaw/plugin-sdk/text-runtime";
|
||||
import { XAI_BASE_URL, XAI_DEFAULT_IMAGE_MODEL, XAI_IMAGE_MODELS } from "./model-definitions.js";
|
||||
|
||||
const DEFAULT_OUTPUT_MIME = "image/png";
|
||||
const DEFAULT_TIMEOUT_MS = 60_000;
|
||||
|
||||
const XAI_SUPPORTED_ASPECT_RATIOS = ["1:1", "16:9", "9:16", "4:3", "3:4", "2:3", "3:2"] as const;
|
||||
|
||||
type XaiImageApiResponse = {
|
||||
data?: Array<{
|
||||
b64_json?: string;
|
||||
mime_type?: string;
|
||||
revised_prompt?: string;
|
||||
}>;
|
||||
};
|
||||
|
||||
function toDataUrl(buffer: Buffer, mimeType: string): string {
|
||||
return `data:${mimeType};base64,${buffer.toString("base64")}`;
|
||||
}
|
||||
|
||||
function resolveImageForEdit(
|
||||
input: { url?: string; buffer?: Buffer; mimeType?: string } | undefined,
|
||||
): string {
|
||||
if (!input) {
|
||||
throw new Error("xAI image edit requires an input image.");
|
||||
}
|
||||
const url = normalizeOptionalString(input.url);
|
||||
if (url) {
|
||||
return url;
|
||||
}
|
||||
if (!input.buffer) {
|
||||
throw new Error("xAI image edit input is missing both URL and buffer data.");
|
||||
}
|
||||
const mime = normalizeOptionalString(input.mimeType) ?? "image/png";
|
||||
return toDataUrl(input.buffer, mime);
|
||||
}
|
||||
|
||||
function isEdit(req: ImageGenerationRequest): boolean {
|
||||
return (req.inputImages?.length ?? 0) > 0;
|
||||
}
|
||||
|
||||
function resolveXaiImageBaseUrl(req: ImageGenerationRequest): string {
|
||||
return normalizeOptionalString(req.cfg?.models?.providers?.xai?.baseUrl) ?? XAI_BASE_URL;
|
||||
}
|
||||
|
||||
function buildBody(req: ImageGenerationRequest, edit: boolean): Record<string, unknown> {
|
||||
const model = normalizeOptionalString(req.model) ?? XAI_DEFAULT_IMAGE_MODEL;
|
||||
const count = req.count ?? 1;
|
||||
const body: Record<string, unknown> = {
|
||||
model,
|
||||
prompt: req.prompt,
|
||||
n: Math.min(count, 4),
|
||||
response_format: "b64_json" as const,
|
||||
};
|
||||
|
||||
const aspect = normalizeOptionalString(req.aspectRatio);
|
||||
if (aspect && (XAI_SUPPORTED_ASPECT_RATIOS as readonly string[]).includes(aspect)) {
|
||||
body.aspect_ratio = aspect;
|
||||
}
|
||||
|
||||
const resolution = normalizeOptionalLowercaseString(req.resolution);
|
||||
if (resolution) {
|
||||
body.resolution = resolution;
|
||||
}
|
||||
|
||||
if (edit) {
|
||||
const inputImages = req.inputImages ?? [];
|
||||
if (inputImages.length > 1) {
|
||||
body.images = inputImages.map((input) => ({
|
||||
url: resolveImageForEdit(input),
|
||||
type: "image_url",
|
||||
}));
|
||||
} else {
|
||||
body.image = {
|
||||
url: resolveImageForEdit(inputImages[0]),
|
||||
type: "image_url",
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
return body;
|
||||
}
|
||||
|
||||
export function buildXaiImageGenerationProvider(): ImageGenerationProvider {
|
||||
return {
|
||||
id: "xai",
|
||||
label: "xAI",
|
||||
defaultModel: XAI_DEFAULT_IMAGE_MODEL,
|
||||
models: [...XAI_IMAGE_MODELS],
|
||||
isConfigured: ({ agentDir }) =>
|
||||
isProviderApiKeyConfigured({
|
||||
provider: "xai",
|
||||
agentDir,
|
||||
}),
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: false,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxCount: 4,
|
||||
maxInputImages: 5,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
supportsSize: false,
|
||||
},
|
||||
geometry: {
|
||||
aspectRatios: [...XAI_SUPPORTED_ASPECT_RATIOS],
|
||||
resolutions: ["1K", "2K"],
|
||||
},
|
||||
},
|
||||
async generateImage(req: ImageGenerationRequest): Promise<ImageGenerationResult> {
|
||||
const edit = isEdit(req);
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
provider: "xai",
|
||||
cfg: req.cfg,
|
||||
agentDir: req.agentDir,
|
||||
store: req.authStore,
|
||||
});
|
||||
if (!auth.apiKey) {
|
||||
throw new Error("xAI API key missing");
|
||||
}
|
||||
|
||||
const fetchFn = fetch;
|
||||
const deadline = createProviderOperationDeadline({
|
||||
timeoutMs: req.timeoutMs,
|
||||
label: edit ? "xAI image edit" : "xAI image generation",
|
||||
});
|
||||
const {
|
||||
baseUrl: resolvedBaseUrl,
|
||||
allowPrivateNetwork,
|
||||
headers,
|
||||
dispatcherPolicy,
|
||||
} = resolveProviderHttpRequestConfig({
|
||||
baseUrl: resolveXaiImageBaseUrl(req),
|
||||
defaultBaseUrl: XAI_BASE_URL,
|
||||
allowPrivateNetwork: false,
|
||||
defaultHeaders: {
|
||||
Authorization: `Bearer ${auth.apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
provider: "xai",
|
||||
capability: "image",
|
||||
transport: "http",
|
||||
});
|
||||
|
||||
const body = buildBody(req, edit);
|
||||
const endpoint = edit ? "/images/edits" : "/images/generations";
|
||||
const { response, release } = await postJsonRequest({
|
||||
url: `${resolvedBaseUrl}${endpoint}`,
|
||||
headers,
|
||||
body,
|
||||
timeoutMs: resolveProviderOperationTimeoutMs({
|
||||
deadline,
|
||||
defaultTimeoutMs: DEFAULT_TIMEOUT_MS,
|
||||
}),
|
||||
fetchFn,
|
||||
allowPrivateNetwork,
|
||||
dispatcherPolicy,
|
||||
});
|
||||
|
||||
try {
|
||||
await assertOkOrThrowHttpError(
|
||||
response,
|
||||
edit ? "xAI image edit failed" : "xAI image generation failed",
|
||||
);
|
||||
|
||||
const payload = (await response.json()) as XaiImageApiResponse;
|
||||
const images: GeneratedImageAsset[] = (payload.data ?? []).flatMap((item, idx) => {
|
||||
if (!item) {
|
||||
return [];
|
||||
}
|
||||
const b64 = normalizeOptionalString(item.b64_json);
|
||||
if (!b64) {
|
||||
return [];
|
||||
}
|
||||
const mimeType = normalizeOptionalString(item.mime_type) ?? DEFAULT_OUTPUT_MIME;
|
||||
return [
|
||||
{
|
||||
buffer: Buffer.from(b64, "base64"),
|
||||
mimeType,
|
||||
fileName: `image-${idx + 1}.${mimeType.split("/")[1] || "png"}`,
|
||||
...(item.revised_prompt
|
||||
? { revisedPrompt: normalizeOptionalString(item.revised_prompt) }
|
||||
: {}),
|
||||
},
|
||||
];
|
||||
});
|
||||
|
||||
return {
|
||||
images,
|
||||
model: normalizeOptionalString(req.model) ?? XAI_DEFAULT_IMAGE_MODEL,
|
||||
};
|
||||
} finally {
|
||||
await release();
|
||||
}
|
||||
},
|
||||
};
|
||||
}
|
||||
@@ -5,6 +5,7 @@ import { defaultToolStreamExtraParams } from "openclaw/plugin-sdk/provider-strea
|
||||
import { jsonResult, readProviderEnvValue } from "openclaw/plugin-sdk/provider-web-search";
|
||||
import {
|
||||
applyXaiModelCompat,
|
||||
buildXaiImageGenerationProvider,
|
||||
normalizeXaiModelId,
|
||||
resolveXaiTransport,
|
||||
resolveXaiModelCompatPatch,
|
||||
@@ -13,6 +14,7 @@ import {
|
||||
import { applyXaiConfig, XAI_DEFAULT_MODEL_REF } from "./onboard.js";
|
||||
import { buildXaiProvider } from "./provider-catalog.js";
|
||||
import { isModernXaiModel, resolveXaiForwardCompatModel } from "./provider-models.js";
|
||||
import { buildXaiSpeechProvider } from "./speech-provider.js";
|
||||
import { resolveFallbackXaiAuth } from "./src/tool-auth-shared.js";
|
||||
import { resolveEffectiveXSearchConfig } from "./src/x-search-config.js";
|
||||
import { wrapXaiProviderStream } from "./stream.js";
|
||||
@@ -203,6 +205,8 @@ export default defineSingleProviderPluginEntry({
|
||||
register(api) {
|
||||
api.registerWebSearchProvider(createXaiWebSearchProvider());
|
||||
api.registerVideoGenerationProvider(buildXaiVideoGenerationProvider());
|
||||
api.registerImageGenerationProvider(buildXaiImageGenerationProvider());
|
||||
api.registerSpeechProvider(buildXaiSpeechProvider());
|
||||
api.registerTool((ctx) => createLazyCodeExecutionTool(ctx), { name: "code_execution" });
|
||||
api.registerTool((ctx) => createLazyXSearchTool(ctx), { name: "x_search" });
|
||||
},
|
||||
|
||||
@@ -2,14 +2,16 @@ import type { ModelDefinitionConfig } from "openclaw/plugin-sdk/provider-model-s
|
||||
import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/text-runtime";
|
||||
|
||||
export const XAI_BASE_URL = "https://api.x.ai/v1";
|
||||
export const XAI_DEFAULT_MODEL_ID = "grok-4";
|
||||
export const XAI_DEFAULT_MODEL_REF = `xai/${XAI_DEFAULT_MODEL_ID}`;
|
||||
export const XAI_DEFAULT_IMAGE_MODEL = "grok-imagine-image";
|
||||
export const XAI_IMAGE_MODELS = ["grok-imagine-image", "grok-imagine-image-pro"] as const;
|
||||
export const XAI_DEFAULT_CONTEXT_WINDOW = 256_000;
|
||||
export const XAI_LARGE_CONTEXT_WINDOW = 2_000_000;
|
||||
export const XAI_CODE_CONTEXT_WINDOW = 256_000;
|
||||
export const XAI_DEFAULT_MAX_TOKENS = 64_000;
|
||||
export const XAI_LEGACY_CONTEXT_WINDOW = 131_072;
|
||||
export const XAI_LEGACY_MAX_TOKENS = 8_192;
|
||||
export const XAI_DEFAULT_MODEL_ID = "grok-4";
|
||||
export const XAI_DEFAULT_MODEL_REF = `xai/${XAI_DEFAULT_MODEL_ID}`;
|
||||
|
||||
type XaiCost = ModelDefinitionConfig["cost"];
|
||||
|
||||
|
||||
@@ -85,6 +85,8 @@
|
||||
"contracts": {
|
||||
"webSearchProviders": ["grok"],
|
||||
"videoGenerationProviders": ["xai"],
|
||||
"speechProviders": ["xai"],
|
||||
"imageGenerationProviders": ["xai"],
|
||||
"tools": ["code_execution", "x_search"]
|
||||
},
|
||||
"configContracts": {
|
||||
|
||||
71
extensions/xai/speech-provider.test.ts
Normal file
71
extensions/xai/speech-provider.test.ts
Normal file
@@ -0,0 +1,71 @@
|
||||
import { describe, expect, it, vi } from "vitest";
|
||||
import { buildXaiSpeechProvider } from "./speech-provider.js";
|
||||
|
||||
const { xaiTTSMock } = vi.hoisted(() => ({
|
||||
xaiTTSMock: vi.fn(async () => Buffer.from("audio-bytes")),
|
||||
}));
|
||||
|
||||
vi.mock("./tts.js", () => ({
|
||||
XAI_BASE_URL: "https://api.x.ai/v1",
|
||||
XAI_TTS_VOICES: ["eve", "ara", "rex", "sal", "leo", "una"],
|
||||
isValidXaiTtsVoice: (voice: string) => ["eve", "ara", "rex", "sal", "leo", "una"].includes(voice),
|
||||
normalizeXaiLanguageCode: (value: unknown) =>
|
||||
typeof value === "string" && value.trim() ? value.trim().toLowerCase() : undefined,
|
||||
normalizeXaiTtsBaseUrl: (baseUrl?: string) =>
|
||||
baseUrl?.trim().replace(/\/+$/, "") || "https://api.x.ai/v1",
|
||||
xaiTTS: xaiTTSMock,
|
||||
}));
|
||||
|
||||
describe("xai speech provider", () => {
|
||||
it("synthesizes mp3 audio and does not claim native voice-note compatibility", async () => {
|
||||
const provider = buildXaiSpeechProvider();
|
||||
const result = await provider.synthesize({
|
||||
text: "hello",
|
||||
cfg: {},
|
||||
providerConfig: {
|
||||
apiKey: "xai-key",
|
||||
voiceId: "eve",
|
||||
},
|
||||
target: "voice-note",
|
||||
timeoutMs: 5_000,
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
outputFormat: "mp3",
|
||||
fileExtension: ".mp3",
|
||||
voiceCompatible: false,
|
||||
});
|
||||
expect(result.audioBuffer.byteLength).toBeGreaterThan(0);
|
||||
expect(xaiTTSMock).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
text: "hello",
|
||||
apiKey: "xai-key",
|
||||
baseUrl: "https://api.x.ai/v1",
|
||||
voiceId: "eve",
|
||||
responseFormat: "mp3",
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("honors configured response formats", async () => {
|
||||
const provider = buildXaiSpeechProvider();
|
||||
const result = await provider.synthesize({
|
||||
text: "hello",
|
||||
cfg: {},
|
||||
providerConfig: {
|
||||
apiKey: "xai-key",
|
||||
responseFormat: "wav",
|
||||
},
|
||||
target: "audio-file",
|
||||
timeoutMs: 5_000,
|
||||
});
|
||||
|
||||
expect(result.outputFormat).toBe("wav");
|
||||
expect(result.fileExtension).toBe(".wav");
|
||||
expect(xaiTTSMock).toHaveBeenLastCalledWith(
|
||||
expect.objectContaining({
|
||||
responseFormat: "wav",
|
||||
}),
|
||||
);
|
||||
});
|
||||
});
|
||||
251
extensions/xai/speech-provider.ts
Normal file
251
extensions/xai/speech-provider.ts
Normal file
@@ -0,0 +1,251 @@
|
||||
import { normalizeResolvedSecretInputString } from "openclaw/plugin-sdk/secret-input";
|
||||
import {
|
||||
asFiniteNumber,
|
||||
trimToUndefined,
|
||||
type SpeechDirectiveTokenParseContext,
|
||||
type SpeechProviderConfig,
|
||||
type SpeechProviderOverrides,
|
||||
type SpeechProviderPlugin,
|
||||
} from "openclaw/plugin-sdk/speech";
|
||||
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
|
||||
import {
|
||||
isValidXaiTtsVoice,
|
||||
normalizeXaiLanguageCode,
|
||||
normalizeXaiTtsBaseUrl,
|
||||
XAI_BASE_URL,
|
||||
XAI_TTS_VOICES,
|
||||
xaiTTS,
|
||||
} from "./tts.js";
|
||||
|
||||
const XAI_SPEECH_RESPONSE_FORMATS = ["mp3", "wav", "pcm", "mulaw", "alaw"] as const;
|
||||
|
||||
type XaiSpeechResponseFormat = (typeof XAI_SPEECH_RESPONSE_FORMATS)[number];
|
||||
|
||||
type XaiTtsProviderConfig = {
|
||||
apiKey?: string;
|
||||
baseUrl: string;
|
||||
voiceId: string;
|
||||
language?: string;
|
||||
speed?: number;
|
||||
responseFormat?: XaiSpeechResponseFormat;
|
||||
};
|
||||
|
||||
type XaiTtsProviderOverrides = {
|
||||
voiceId?: string;
|
||||
language?: string;
|
||||
speed?: number;
|
||||
};
|
||||
|
||||
function normalizeXaiSpeechResponseFormat(value: unknown): XaiSpeechResponseFormat | undefined {
|
||||
const next = normalizeLowercaseStringOrEmpty(value);
|
||||
if (!next) {
|
||||
return undefined;
|
||||
}
|
||||
if (XAI_SPEECH_RESPONSE_FORMATS.some((format) => format === next)) {
|
||||
return next as XaiSpeechResponseFormat;
|
||||
}
|
||||
throw new Error(`Invalid xAI speech responseFormat: ${next}`);
|
||||
}
|
||||
|
||||
function resolveSpeechResponseFormat(
|
||||
target: "audio-file" | "voice-note",
|
||||
configuredFormat?: XaiSpeechResponseFormat,
|
||||
): XaiSpeechResponseFormat {
|
||||
if (configuredFormat) {
|
||||
return configuredFormat;
|
||||
}
|
||||
return "mp3";
|
||||
}
|
||||
|
||||
function responseFormatToFileExtension(
|
||||
format: XaiSpeechResponseFormat,
|
||||
): ".mp3" | ".pcm" | ".wav" | ".mulaw" | ".alaw" {
|
||||
switch (format) {
|
||||
case "wav":
|
||||
return ".wav";
|
||||
case "pcm":
|
||||
return ".pcm";
|
||||
case "mulaw":
|
||||
return ".mulaw";
|
||||
case "alaw":
|
||||
return ".alaw";
|
||||
default:
|
||||
return ".mp3";
|
||||
}
|
||||
}
|
||||
|
||||
function normalizeXaiProviderConfig(rawConfig: Record<string, unknown>): XaiTtsProviderConfig {
|
||||
const providers = rawConfig?.providers as Record<string, unknown> | undefined;
|
||||
const xai = (providers?.xai ?? rawConfig?.xai ?? rawConfig) as Record<string, unknown>;
|
||||
return {
|
||||
apiKey: normalizeResolvedSecretInputString({
|
||||
value: xai?.apiKey,
|
||||
path: "messages.tts.providers.xai.apiKey",
|
||||
}),
|
||||
baseUrl: normalizeXaiTtsBaseUrl(
|
||||
trimToUndefined(xai?.baseUrl) ?? trimToUndefined(process.env.XAI_BASE_URL) ?? XAI_BASE_URL,
|
||||
),
|
||||
voiceId: trimToUndefined(xai?.voiceId ?? xai?.voice) ?? "eve",
|
||||
language: normalizeXaiLanguageCode(trimToUndefined(xai?.language ?? xai?.languageCode)),
|
||||
speed: asFiniteNumber(xai?.speed),
|
||||
responseFormat: normalizeXaiSpeechResponseFormat(xai?.responseFormat),
|
||||
};
|
||||
}
|
||||
|
||||
function readXaiProviderConfig(config: SpeechProviderConfig): XaiTtsProviderConfig {
|
||||
const normalized = normalizeXaiProviderConfig({});
|
||||
return {
|
||||
apiKey: trimToUndefined(config.apiKey) ?? normalized.apiKey,
|
||||
baseUrl: trimToUndefined(config.baseUrl) ?? normalized.baseUrl,
|
||||
voiceId: trimToUndefined(config.voiceId ?? config.voice) ?? normalized.voiceId,
|
||||
language:
|
||||
normalizeXaiLanguageCode(trimToUndefined(config.language ?? config.languageCode)) ??
|
||||
normalized.language,
|
||||
speed: asFiniteNumber(config.speed) ?? normalized.speed,
|
||||
responseFormat:
|
||||
normalizeXaiSpeechResponseFormat(config.responseFormat) ?? normalized.responseFormat,
|
||||
};
|
||||
}
|
||||
|
||||
function readXaiOverrides(overrides: SpeechProviderOverrides | undefined): XaiTtsProviderOverrides {
|
||||
if (!overrides) {
|
||||
return {};
|
||||
}
|
||||
return {
|
||||
voiceId: trimToUndefined(overrides.voiceId ?? overrides.voice),
|
||||
language: normalizeXaiLanguageCode(trimToUndefined(overrides.language)),
|
||||
speed: asFiniteNumber(overrides.speed),
|
||||
};
|
||||
}
|
||||
|
||||
function parseDirectiveToken(ctx: SpeechDirectiveTokenParseContext): {
|
||||
handled: boolean;
|
||||
overrides?: SpeechProviderOverrides;
|
||||
warnings?: string[];
|
||||
} {
|
||||
const providerConfig = ctx.providerConfig as Record<string, unknown> | undefined;
|
||||
const baseUrl = trimToUndefined(providerConfig?.baseUrl);
|
||||
switch (ctx.key) {
|
||||
case "voice":
|
||||
case "voice_id":
|
||||
case "voiceid":
|
||||
case "xai_voice":
|
||||
case "xaivoice":
|
||||
if (!ctx.policy.allowVoice) {
|
||||
return { handled: true };
|
||||
}
|
||||
if (!isValidXaiTtsVoice(ctx.value, baseUrl)) {
|
||||
return { handled: true, warnings: [`invalid xAI voice "${ctx.value}"`] };
|
||||
}
|
||||
return { handled: true, overrides: { voiceId: ctx.value } };
|
||||
default:
|
||||
return { handled: false };
|
||||
}
|
||||
}
|
||||
|
||||
export function buildXaiSpeechProvider(): SpeechProviderPlugin {
|
||||
return {
|
||||
id: "xai",
|
||||
label: "xAI",
|
||||
autoSelectOrder: 25,
|
||||
models: [],
|
||||
voices: XAI_TTS_VOICES,
|
||||
resolveConfig: ({ rawConfig }) => normalizeXaiProviderConfig(rawConfig),
|
||||
parseDirectiveToken,
|
||||
resolveTalkConfig: ({ baseTtsConfig, talkProviderConfig }) => {
|
||||
const base = normalizeXaiProviderConfig(baseTtsConfig);
|
||||
const responseFormat = normalizeXaiSpeechResponseFormat(talkProviderConfig.responseFormat);
|
||||
return {
|
||||
...base,
|
||||
...(talkProviderConfig.apiKey === undefined
|
||||
? {}
|
||||
: {
|
||||
apiKey: normalizeResolvedSecretInputString({
|
||||
value: talkProviderConfig.apiKey,
|
||||
path: "talk.providers.xai.apiKey",
|
||||
}),
|
||||
}),
|
||||
...(trimToUndefined(talkProviderConfig.baseUrl) == null
|
||||
? {}
|
||||
: { baseUrl: normalizeXaiTtsBaseUrl(trimToUndefined(talkProviderConfig.baseUrl)) }),
|
||||
...(trimToUndefined(talkProviderConfig.voiceId) == null
|
||||
? {}
|
||||
: { voiceId: trimToUndefined(talkProviderConfig.voiceId) }),
|
||||
...(normalizeXaiLanguageCode(
|
||||
trimToUndefined(talkProviderConfig.language ?? talkProviderConfig.languageCode),
|
||||
) == null
|
||||
? {}
|
||||
: {
|
||||
language: normalizeXaiLanguageCode(
|
||||
trimToUndefined(talkProviderConfig.language ?? talkProviderConfig.languageCode),
|
||||
),
|
||||
}),
|
||||
...(asFiniteNumber(talkProviderConfig.speed) == null
|
||||
? {}
|
||||
: { speed: asFiniteNumber(talkProviderConfig.speed) }),
|
||||
...(responseFormat == null ? {} : { responseFormat }),
|
||||
};
|
||||
},
|
||||
resolveTalkOverrides: ({ params }) => ({
|
||||
...(trimToUndefined(params.voiceId ?? params.voice) == null
|
||||
? {}
|
||||
: { voiceId: trimToUndefined(params.voiceId ?? params.voice) }),
|
||||
...(normalizeXaiLanguageCode(trimToUndefined(params.language ?? params.languageCode)) == null
|
||||
? {}
|
||||
: {
|
||||
language: normalizeXaiLanguageCode(
|
||||
trimToUndefined(params.language ?? params.languageCode),
|
||||
),
|
||||
}),
|
||||
...(asFiniteNumber(params.speed) == null ? {} : { speed: asFiniteNumber(params.speed) }),
|
||||
}),
|
||||
listVoices: async () => XAI_TTS_VOICES.map((voice) => ({ id: voice, name: voice })),
|
||||
isConfigured: ({ providerConfig }) =>
|
||||
Boolean(readXaiProviderConfig(providerConfig).apiKey || process.env.XAI_API_KEY),
|
||||
synthesize: async (req) => {
|
||||
const config = readXaiProviderConfig(req.providerConfig);
|
||||
const overrides = readXaiOverrides(req.providerOverrides);
|
||||
const apiKey = config.apiKey || process.env.XAI_API_KEY;
|
||||
if (!apiKey) {
|
||||
throw new Error("xAI API key missing");
|
||||
}
|
||||
const responseFormat = resolveSpeechResponseFormat(req.target, config.responseFormat);
|
||||
const audioBuffer = await xaiTTS({
|
||||
text: req.text,
|
||||
apiKey,
|
||||
baseUrl: config.baseUrl,
|
||||
voiceId: overrides.voiceId ?? config.voiceId,
|
||||
language: overrides.language ?? config.language,
|
||||
speed: overrides.speed ?? config.speed,
|
||||
responseFormat,
|
||||
timeoutMs: req.timeoutMs,
|
||||
});
|
||||
return {
|
||||
audioBuffer,
|
||||
outputFormat: responseFormat,
|
||||
fileExtension: responseFormatToFileExtension(responseFormat),
|
||||
voiceCompatible: false,
|
||||
};
|
||||
},
|
||||
synthesizeTelephony: async (req) => {
|
||||
const config = readXaiProviderConfig(req.providerConfig);
|
||||
const apiKey = config.apiKey || process.env.XAI_API_KEY;
|
||||
if (!apiKey) {
|
||||
throw new Error("xAI API key missing");
|
||||
}
|
||||
const outputFormat = "pcm" as const;
|
||||
const sampleRate = 24000;
|
||||
const audioBuffer = await xaiTTS({
|
||||
text: req.text,
|
||||
apiKey,
|
||||
baseUrl: config.baseUrl,
|
||||
voiceId: config.voiceId,
|
||||
language: config.language,
|
||||
speed: config.speed,
|
||||
responseFormat: outputFormat,
|
||||
timeoutMs: req.timeoutMs,
|
||||
});
|
||||
return { audioBuffer, outputFormat, sampleRate };
|
||||
},
|
||||
};
|
||||
}
|
||||
89
extensions/xai/tts.test.ts
Normal file
89
extensions/xai/tts.test.ts
Normal file
@@ -0,0 +1,89 @@
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { isValidXaiTtsVoice, XAI_BASE_URL, XAI_TTS_VOICES, xaiTTS } from "./tts.js";
|
||||
|
||||
describe("xai tts", () => {
|
||||
const originalFetch = globalThis.fetch;
|
||||
|
||||
afterEach(() => {
|
||||
globalThis.fetch = originalFetch;
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
describe("isValidXaiTtsVoice", () => {
|
||||
it("accepts all valid voices", () => {
|
||||
for (const voice of XAI_TTS_VOICES) {
|
||||
expect(isValidXaiTtsVoice(voice)).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
it("rejects invalid voice names", () => {
|
||||
expect(isValidXaiTtsVoice("invalid")).toBe(false);
|
||||
expect(isValidXaiTtsVoice("")).toBe(false);
|
||||
expect(isValidXaiTtsVoice("ALLOY")).toBe(false);
|
||||
expect(isValidXaiTtsVoice("alloy ")).toBe(false);
|
||||
expect(isValidXaiTtsVoice(" alloy")).toBe(false);
|
||||
});
|
||||
|
||||
it("treats custom endpoints as permissive", () => {
|
||||
expect(isValidXaiTtsVoice("grok-voice-custom", "https://custom.api.x.ai/v1")).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe("xaiTTS diagnostics", () => {
|
||||
it("includes parsed provider detail and request id for JSON API errors", async () => {
|
||||
const fetchMock = vi.fn(
|
||||
async () =>
|
||||
new Response(
|
||||
JSON.stringify({
|
||||
error: {
|
||||
message: "Invalid API key",
|
||||
type: "invalid_request_error",
|
||||
code: "invalid_api_key",
|
||||
},
|
||||
}),
|
||||
{
|
||||
status: 401,
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
"x-request-id": "req_123",
|
||||
},
|
||||
},
|
||||
),
|
||||
);
|
||||
globalThis.fetch = fetchMock as unknown as typeof fetch;
|
||||
|
||||
await expect(
|
||||
xaiTTS({
|
||||
text: "hello",
|
||||
apiKey: "bad-key",
|
||||
baseUrl: XAI_BASE_URL,
|
||||
voiceId: "eve",
|
||||
language: "en",
|
||||
responseFormat: "mp3",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow(
|
||||
"xAI TTS API error (401): Invalid API key [type=invalid_request_error, code=invalid_api_key] [request_id=req_123]",
|
||||
);
|
||||
});
|
||||
|
||||
it("falls back to raw body text when the error body is non-JSON", async () => {
|
||||
const fetchMock = vi.fn(
|
||||
async () => new Response("temporary upstream outage", { status: 503 }),
|
||||
);
|
||||
globalThis.fetch = fetchMock as unknown as typeof fetch;
|
||||
|
||||
await expect(
|
||||
xaiTTS({
|
||||
text: "hello",
|
||||
apiKey: "test-key",
|
||||
baseUrl: XAI_BASE_URL,
|
||||
voiceId: "eve",
|
||||
language: "en",
|
||||
responseFormat: "mp3",
|
||||
timeoutMs: 5_000,
|
||||
}),
|
||||
).rejects.toThrow("xAI TTS API error (503): temporary upstream outage");
|
||||
});
|
||||
});
|
||||
});
|
||||
148
extensions/xai/tts.ts
Normal file
148
extensions/xai/tts.ts
Normal file
@@ -0,0 +1,148 @@
|
||||
import { postJsonRequest } from "openclaw/plugin-sdk/provider-http";
|
||||
import {
|
||||
asObject,
|
||||
readResponseTextLimited,
|
||||
trimToUndefined,
|
||||
truncateErrorDetail,
|
||||
} from "openclaw/plugin-sdk/speech";
|
||||
import { XAI_BASE_URL } from "./api.js";
|
||||
export { XAI_BASE_URL };
|
||||
|
||||
export const XAI_TTS_VOICES = ["eve", "ara", "rex", "sal", "leo", "una"] as const;
|
||||
|
||||
type XaiTtsVoice = (typeof XAI_TTS_VOICES)[number];
|
||||
|
||||
export function normalizeXaiTtsBaseUrl(baseUrl?: string): string {
|
||||
const trimmed = baseUrl?.trim();
|
||||
if (!trimmed) {
|
||||
return XAI_BASE_URL;
|
||||
}
|
||||
return trimmed.replace(/\/+$/, "");
|
||||
}
|
||||
|
||||
export function isValidXaiTtsVoice(voice: string, baseUrl?: string): voice is XaiTtsVoice {
|
||||
const normalizedBase = normalizeXaiTtsBaseUrl(baseUrl ?? process.env.XAI_BASE_URL);
|
||||
const host = normalizedBase.includes("://") ? new URL(normalizedBase).hostname : normalizedBase;
|
||||
const isNative = host === "api.x.ai" || host === "api.grok.x.ai";
|
||||
if (!isNative) {
|
||||
return true;
|
||||
}
|
||||
return XAI_TTS_VOICES.includes(voice as XaiTtsVoice);
|
||||
}
|
||||
|
||||
export function normalizeXaiLanguageCode(value: unknown): string | undefined {
|
||||
const trimmed = trimToUndefined(value);
|
||||
if (!trimmed) {
|
||||
return undefined;
|
||||
}
|
||||
const normalized = trimmed.toLowerCase();
|
||||
if (normalized === "auto" || /^[a-z]{2,3}(?:-[a-z]{2,4})?$/.test(normalized)) {
|
||||
return normalized;
|
||||
}
|
||||
throw new Error(
|
||||
`xAI language must be "auto" or a BCP-47 tag (e.g. "en", "pt-br", "zh-cn"); got: ${normalized}`,
|
||||
);
|
||||
}
|
||||
|
||||
function formatXaiErrorPayload(payload: unknown): string | undefined {
|
||||
const root = asObject(payload);
|
||||
const subject = asObject(root?.error) ?? root;
|
||||
if (!subject) {
|
||||
return undefined;
|
||||
}
|
||||
const message =
|
||||
trimToUndefined(subject.message) ??
|
||||
trimToUndefined(subject.detail) ??
|
||||
trimToUndefined(root?.message);
|
||||
const type = trimToUndefined(subject.type);
|
||||
const code = trimToUndefined(subject.code);
|
||||
const metadata = [type ? `type=${type}` : undefined, code ? `code=${code}` : undefined]
|
||||
.filter((value): value is string => Boolean(value))
|
||||
.join(", ");
|
||||
if (message && metadata) {
|
||||
return `${truncateErrorDetail(message)} [${metadata}]`;
|
||||
}
|
||||
if (message) {
|
||||
return truncateErrorDetail(message);
|
||||
}
|
||||
if (metadata) {
|
||||
return `[${metadata}]`;
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
async function extractXaiErrorDetail(response: Response): Promise<string | undefined> {
|
||||
const rawBody = trimToUndefined(await readResponseTextLimited(response));
|
||||
if (!rawBody) {
|
||||
return undefined;
|
||||
}
|
||||
try {
|
||||
return formatXaiErrorPayload(JSON.parse(rawBody)) ?? truncateErrorDetail(rawBody);
|
||||
} catch {
|
||||
return truncateErrorDetail(rawBody);
|
||||
}
|
||||
}
|
||||
|
||||
export async function xaiTTS(params: {
|
||||
text: string;
|
||||
apiKey: string;
|
||||
baseUrl: string;
|
||||
voiceId: string;
|
||||
language?: string;
|
||||
speed?: number;
|
||||
responseFormat?: "mp3" | "wav" | "pcm" | "mulaw" | "alaw";
|
||||
timeoutMs: number;
|
||||
}): Promise<Buffer> {
|
||||
const {
|
||||
text,
|
||||
apiKey,
|
||||
baseUrl,
|
||||
voiceId,
|
||||
language: rawLanguage,
|
||||
speed,
|
||||
responseFormat = "mp3",
|
||||
timeoutMs,
|
||||
} = params;
|
||||
const language = normalizeXaiLanguageCode(rawLanguage) ?? "en";
|
||||
|
||||
if (!isValidXaiTtsVoice(voiceId, baseUrl)) {
|
||||
throw new Error(`Invalid voice: ${voiceId}`);
|
||||
}
|
||||
|
||||
const { response, release } = await postJsonRequest({
|
||||
url: `${normalizeXaiTtsBaseUrl(baseUrl)}/tts`,
|
||||
headers: new Headers({
|
||||
Authorization: `Bearer ${apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
}),
|
||||
body: {
|
||||
text,
|
||||
voice_id: voiceId,
|
||||
language,
|
||||
output_format: {
|
||||
codec: responseFormat,
|
||||
},
|
||||
...(speed != null && { speed }),
|
||||
},
|
||||
timeoutMs,
|
||||
fetchFn: fetch,
|
||||
auditContext: "xai tts",
|
||||
});
|
||||
try {
|
||||
if (!response.ok) {
|
||||
const detail = await extractXaiErrorDetail(response);
|
||||
const requestId =
|
||||
trimToUndefined(response.headers.get("x-request-id")) ??
|
||||
trimToUndefined(response.headers.get("request-id"));
|
||||
throw new Error(
|
||||
`xAI TTS API error (${response.status})` +
|
||||
(detail ? `: ${detail}` : "") +
|
||||
(requestId ? ` [request_id=${requestId}]` : ""),
|
||||
);
|
||||
}
|
||||
|
||||
return Buffer.from(await response.arrayBuffer());
|
||||
} finally {
|
||||
await release();
|
||||
}
|
||||
}
|
||||
162
extensions/xai/xai.live.test.ts
Normal file
162
extensions/xai/xai.live.test.ts
Normal file
@@ -0,0 +1,162 @@
|
||||
import fs from "node:fs/promises";
|
||||
import os from "node:os";
|
||||
import path from "node:path";
|
||||
import { loadConfig, type OpenClawConfig } from "openclaw/plugin-sdk/config-runtime";
|
||||
import { encodePngRgba, fillPixel } from "openclaw/plugin-sdk/media-runtime";
|
||||
import { describe, expect, it } from "vitest";
|
||||
import {
|
||||
registerProviderPlugin,
|
||||
requireRegisteredProvider,
|
||||
} from "../../test/helpers/plugins/provider-registration.js";
|
||||
import plugin from "./index.js";
|
||||
|
||||
const XAI_API_KEY = process.env.XAI_API_KEY ?? "";
|
||||
const LIVE_IMAGE_MODEL = process.env.OPENCLAW_LIVE_XAI_IMAGE_MODEL?.trim() || "grok-imagine-image";
|
||||
const liveEnabled = XAI_API_KEY.trim().length > 0 && process.env.OPENCLAW_LIVE_TEST === "1";
|
||||
const describeLive = liveEnabled ? describe : describe.skip;
|
||||
const EMPTY_AUTH_STORE = { version: 1, profiles: {} } as const;
|
||||
|
||||
function createLiveConfig(): OpenClawConfig {
|
||||
const cfg = loadConfig();
|
||||
return {
|
||||
...cfg,
|
||||
models: {
|
||||
...cfg.models,
|
||||
providers: {
|
||||
...cfg.models?.providers,
|
||||
xai: {
|
||||
...cfg.models?.providers?.xai,
|
||||
apiKey: XAI_API_KEY,
|
||||
baseUrl: "https://api.x.ai/v1",
|
||||
},
|
||||
},
|
||||
},
|
||||
} as OpenClawConfig;
|
||||
}
|
||||
|
||||
function createReferencePng(): Buffer {
|
||||
const width = 96;
|
||||
const height = 96;
|
||||
const buf = Buffer.alloc(width * height * 4, 255);
|
||||
|
||||
for (let y = 0; y < height; y += 1) {
|
||||
for (let x = 0; x < width; x += 1) {
|
||||
fillPixel(buf, x, y, width, 230, 244, 255, 255);
|
||||
}
|
||||
}
|
||||
|
||||
for (let y = 24; y < 72; y += 1) {
|
||||
for (let x = 24; x < 72; x += 1) {
|
||||
fillPixel(buf, x, y, width, 255, 153, 51, 255);
|
||||
}
|
||||
}
|
||||
|
||||
return encodePngRgba(buf, width, height);
|
||||
}
|
||||
|
||||
async function createTempAgentDir(): Promise<string> {
|
||||
return await fs.mkdtemp(path.join(os.tmpdir(), "xai-plugin-live-"));
|
||||
}
|
||||
|
||||
const registerXaiPlugin = () =>
|
||||
registerProviderPlugin({
|
||||
plugin,
|
||||
id: "xai",
|
||||
name: "xAI Provider",
|
||||
});
|
||||
|
||||
describeLive("xai plugin live", () => {
|
||||
it("synthesizes TTS through the registered speech provider", async () => {
|
||||
const { speechProviders } = await registerXaiPlugin();
|
||||
const speechProvider = requireRegisteredProvider(speechProviders, "xai");
|
||||
const cfg = createLiveConfig();
|
||||
|
||||
const voices = await speechProvider.listVoices?.({});
|
||||
expect(voices).toEqual(expect.arrayContaining([expect.objectContaining({ id: "eve" })]));
|
||||
|
||||
const audioFile = await speechProvider.synthesize({
|
||||
text: "OpenClaw xAI text to speech integration test OK.",
|
||||
cfg,
|
||||
providerConfig: {
|
||||
apiKey: XAI_API_KEY,
|
||||
baseUrl: "https://api.x.ai/v1",
|
||||
voiceId: "eve",
|
||||
},
|
||||
target: "audio-file",
|
||||
timeoutMs: 90_000,
|
||||
});
|
||||
|
||||
expect(audioFile.outputFormat).toBe("mp3");
|
||||
expect(audioFile.fileExtension).toBe(".mp3");
|
||||
expect(audioFile.voiceCompatible).toBe(false);
|
||||
expect(audioFile.audioBuffer.byteLength).toBeGreaterThan(512);
|
||||
|
||||
const telephony = await speechProvider.synthesizeTelephony?.({
|
||||
text: "OpenClaw xAI telephony check OK.",
|
||||
cfg,
|
||||
providerConfig: {
|
||||
apiKey: XAI_API_KEY,
|
||||
baseUrl: "https://api.x.ai/v1",
|
||||
voiceId: "eve",
|
||||
},
|
||||
timeoutMs: 90_000,
|
||||
});
|
||||
expect(telephony?.outputFormat).toBe("pcm");
|
||||
expect(telephony?.sampleRate).toBe(24_000);
|
||||
expect(telephony?.audioBuffer.byteLength).toBeGreaterThan(512);
|
||||
}, 120_000);
|
||||
|
||||
it("generates and edits images through the registered image provider", async () => {
|
||||
const { imageProviders } = await registerXaiPlugin();
|
||||
const imageProvider = requireRegisteredProvider(imageProviders, "xai");
|
||||
const cfg = createLiveConfig();
|
||||
const agentDir = await createTempAgentDir();
|
||||
|
||||
try {
|
||||
const generated = await imageProvider.generateImage({
|
||||
provider: "xai",
|
||||
model: LIVE_IMAGE_MODEL,
|
||||
prompt: "Create a minimal flat orange square centered on a white background.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore: EMPTY_AUTH_STORE,
|
||||
timeoutMs: 180_000,
|
||||
count: 1,
|
||||
aspectRatio: "1:1",
|
||||
resolution: "1K",
|
||||
});
|
||||
|
||||
expect(generated.model).toBe(LIVE_IMAGE_MODEL);
|
||||
expect(generated.images.length).toBeGreaterThan(0);
|
||||
expect(generated.images[0]?.mimeType.startsWith("image/")).toBe(true);
|
||||
expect(generated.images[0]?.buffer.byteLength).toBeGreaterThan(1_000);
|
||||
|
||||
const edited = await imageProvider.generateImage({
|
||||
provider: "xai",
|
||||
model: LIVE_IMAGE_MODEL,
|
||||
prompt:
|
||||
"Render this image as a pencil sketch with detailed shading. Keep the same framing.",
|
||||
cfg,
|
||||
agentDir,
|
||||
authStore: EMPTY_AUTH_STORE,
|
||||
timeoutMs: 180_000,
|
||||
count: 1,
|
||||
resolution: "1K",
|
||||
inputImages: [
|
||||
{
|
||||
buffer: createReferencePng(),
|
||||
mimeType: "image/png",
|
||||
fileName: "reference.png",
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(edited.model).toBe(LIVE_IMAGE_MODEL);
|
||||
expect(edited.images.length).toBeGreaterThan(0);
|
||||
expect(edited.images[0]?.mimeType.startsWith("image/")).toBe(true);
|
||||
expect(edited.images[0]?.buffer.byteLength).toBeGreaterThan(1_000);
|
||||
} finally {
|
||||
await fs.rm(agentDir, { recursive: true, force: true });
|
||||
}
|
||||
}, 300_000);
|
||||
});
|
||||
@@ -33,7 +33,7 @@ export const MEDIA_SUITES: Record<MediaSuiteId, MediaSuiteConfig> = {
|
||||
id: "image",
|
||||
testFile: "test/image-generation.runtime.live.test.ts",
|
||||
providerEnvVar: "OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS",
|
||||
providers: ["fal", "google", "minimax", "openai", "vydra"],
|
||||
providers: ["fal", "google", "minimax", "openai", "vydra", "xai"],
|
||||
},
|
||||
music: {
|
||||
id: "music",
|
||||
|
||||
@@ -16,6 +16,7 @@ export const DEFAULT_LIVE_IMAGE_MODELS: Record<string, string> = {
|
||||
minimax: "minimax/image-01",
|
||||
openai: "openai/gpt-image-2",
|
||||
vydra: "vydra/grok-imagine",
|
||||
xai: "xai/grok-imagine-image",
|
||||
};
|
||||
|
||||
export function parseCaseFilter(raw?: string): Set<string> | null {
|
||||
|
||||
@@ -79,6 +79,11 @@ const PROVIDER_CASES: LiveProviderCase[] = [
|
||||
pluginName: "Vydra Provider",
|
||||
providerId: "vydra",
|
||||
},
|
||||
{
|
||||
pluginId: "xai",
|
||||
pluginName: "xAI Provider",
|
||||
providerId: "xai",
|
||||
},
|
||||
]
|
||||
.filter((entry) => (providerFilter ? providerFilter.has(entry.providerId) : true))
|
||||
.toSorted((left, right) => left.providerId.localeCompare(right.providerId));
|
||||
|
||||
@@ -59,6 +59,7 @@ import { createRuntimeConfigVitestConfig } from "./vitest/vitest.runtime-config.
|
||||
import { createScopedVitestConfig, resolveVitestIsolation } from "./vitest/vitest.scoped-config.ts";
|
||||
import { createSecretsVitestConfig } from "./vitest/vitest.secrets.config.ts";
|
||||
import { createSharedCoreVitestConfig } from "./vitest/vitest.shared-core.config.ts";
|
||||
import { sharedVitestConfig } from "./vitest/vitest.shared.config.ts";
|
||||
import { createTasksVitestConfig } from "./vitest/vitest.tasks.config.ts";
|
||||
import { createToolingVitestConfig } from "./vitest/vitest.tooling.config.ts";
|
||||
import { createTuiVitestConfig } from "./vitest/vitest.tui.config.ts";
|
||||
@@ -321,7 +322,7 @@ describe("scoped vitest configs", () => {
|
||||
});
|
||||
|
||||
it("keeps the broad agents lane on shared file parallelism", () => {
|
||||
expect(defaultAgentsConfig.test?.fileParallelism).toBe(true);
|
||||
expect(defaultAgentsConfig.test?.fileParallelism).toBe(sharedVitestConfig.test.fileParallelism);
|
||||
});
|
||||
|
||||
it("keeps selected plugin-sdk and commands light lanes off the openclaw runtime setup", () => {
|
||||
|
||||
Reference in New Issue
Block a user