mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 13:30:42 +00:00
docs(providers): improve sglang, fal, groq, bedrock-mantle, vllm with Mintlify components
This commit is contained in:
@@ -13,55 +13,95 @@ the Mantle OpenAI-compatible endpoint. Mantle hosts open-source and
|
||||
third-party models (GPT-OSS, Qwen, Kimi, GLM, and similar) through a standard
|
||||
`/v1/chat/completions` surface backed by Bedrock infrastructure.
|
||||
|
||||
## What OpenClaw supports
|
||||
| Property | Value |
|
||||
| -------------- | ----------------------------------------------------------------------------------- |
|
||||
| Provider ID | `amazon-bedrock-mantle` |
|
||||
| API | `openai-completions` (OpenAI-compatible) |
|
||||
| Auth | Explicit `AWS_BEARER_TOKEN_BEDROCK` or IAM credential-chain bearer-token generation |
|
||||
| Default region | `us-east-1` (override with `AWS_REGION` or `AWS_DEFAULT_REGION`) |
|
||||
|
||||
- Provider: `amazon-bedrock-mantle`
|
||||
- API: `openai-completions` (OpenAI-compatible)
|
||||
- Auth: explicit `AWS_BEARER_TOKEN_BEDROCK` or IAM credential-chain bearer-token generation
|
||||
- Region: `AWS_REGION` or `AWS_DEFAULT_REGION` (default: `us-east-1`)
|
||||
## Getting started
|
||||
|
||||
Choose your preferred auth method and follow the setup steps.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Explicit bearer token">
|
||||
**Best for:** environments where you already have a Mantle bearer token.
|
||||
|
||||
<Steps>
|
||||
<Step title="Set the bearer token on the gateway host">
|
||||
```bash
|
||||
export AWS_BEARER_TOKEN_BEDROCK="..."
|
||||
```
|
||||
|
||||
Optionally set a region (defaults to `us-east-1`):
|
||||
|
||||
```bash
|
||||
export AWS_REGION="us-west-2"
|
||||
```
|
||||
</Step>
|
||||
<Step title="Verify models are discovered">
|
||||
```bash
|
||||
openclaw models list
|
||||
```
|
||||
|
||||
Discovered models appear under the `amazon-bedrock-mantle` provider. No
|
||||
additional config is required unless you want to override defaults.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="IAM credentials">
|
||||
**Best for:** using AWS SDK-compatible credentials (shared config, SSO, web identity, instance or task roles).
|
||||
|
||||
<Steps>
|
||||
<Step title="Configure AWS credentials on the gateway host">
|
||||
Any AWS SDK-compatible auth source works:
|
||||
|
||||
```bash
|
||||
export AWS_PROFILE="default"
|
||||
export AWS_REGION="us-west-2"
|
||||
```
|
||||
</Step>
|
||||
<Step title="Verify models are discovered">
|
||||
```bash
|
||||
openclaw models list
|
||||
```
|
||||
|
||||
OpenClaw generates a Mantle bearer token from the credential chain automatically.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
<Tip>
|
||||
When `AWS_BEARER_TOKEN_BEDROCK` is not set, OpenClaw mints the bearer token for you from the AWS default credential chain, including shared credentials/config profiles, SSO, web identity, and instance or task roles.
|
||||
</Tip>
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Automatic model discovery
|
||||
|
||||
When `AWS_BEARER_TOKEN_BEDROCK` is set, OpenClaw uses it directly. Otherwise,
|
||||
OpenClaw attempts to generate a Mantle bearer token from the AWS default
|
||||
credential chain, including shared credentials/config profiles, SSO, web
|
||||
identity, and instance or task roles. It then discovers available Mantle
|
||||
models by querying the region's `/v1/models` endpoint. Discovery results are
|
||||
cached for 1 hour, and IAM-derived bearer tokens are refreshed hourly.
|
||||
credential chain. It then discovers available Mantle models by querying the
|
||||
region's `/v1/models` endpoint.
|
||||
|
||||
Supported regions: `us-east-1`, `us-east-2`, `us-west-2`, `ap-northeast-1`,
|
||||
| Behavior | Detail |
|
||||
| ----------------- | ------------------------- |
|
||||
| Discovery cache | Results cached for 1 hour |
|
||||
| IAM token refresh | Hourly |
|
||||
|
||||
<Note>
|
||||
The bearer token is the same `AWS_BEARER_TOKEN_BEDROCK` used by the standard [Amazon Bedrock](/providers/bedrock) provider.
|
||||
</Note>
|
||||
|
||||
### Supported regions
|
||||
|
||||
`us-east-1`, `us-east-2`, `us-west-2`, `ap-northeast-1`,
|
||||
`ap-south-1`, `ap-southeast-3`, `eu-central-1`, `eu-west-1`, `eu-west-2`,
|
||||
`eu-south-1`, `eu-north-1`, `sa-east-1`.
|
||||
|
||||
## Onboarding
|
||||
|
||||
1. Choose one auth path on the **gateway host**:
|
||||
|
||||
Explicit bearer token:
|
||||
|
||||
```bash
|
||||
export AWS_BEARER_TOKEN_BEDROCK="..."
|
||||
# Optional (defaults to us-east-1):
|
||||
export AWS_REGION="us-west-2"
|
||||
```
|
||||
|
||||
IAM credentials:
|
||||
|
||||
```bash
|
||||
# Any AWS SDK-compatible auth source works here, for example:
|
||||
export AWS_PROFILE="default"
|
||||
export AWS_REGION="us-west-2"
|
||||
```
|
||||
|
||||
2. Verify models are discovered:
|
||||
|
||||
```bash
|
||||
openclaw models list
|
||||
```
|
||||
|
||||
Discovered models appear under the `amazon-bedrock-mantle` provider. No
|
||||
additional config is required unless you want to override defaults.
|
||||
|
||||
## Manual configuration
|
||||
|
||||
If you prefer explicit config instead of auto-discovery:
|
||||
@@ -92,13 +132,46 @@ If you prefer explicit config instead of auto-discovery:
|
||||
}
|
||||
```
|
||||
|
||||
## Notes
|
||||
## Advanced notes
|
||||
|
||||
- OpenClaw can mint the Mantle bearer token for you from AWS SDK-compatible
|
||||
IAM credentials when `AWS_BEARER_TOKEN_BEDROCK` is not set.
|
||||
- The bearer token is the same `AWS_BEARER_TOKEN_BEDROCK` used by the standard
|
||||
[Amazon Bedrock](/providers/bedrock) provider.
|
||||
- Reasoning support is inferred from model IDs containing patterns like
|
||||
`thinking`, `reasoner`, or `gpt-oss-120b`.
|
||||
- If the Mantle endpoint is unavailable or returns no models, the provider is
|
||||
silently skipped.
|
||||
<AccordionGroup>
|
||||
<Accordion title="Reasoning support">
|
||||
Reasoning support is inferred from model IDs containing patterns like
|
||||
`thinking`, `reasoner`, or `gpt-oss-120b`. OpenClaw sets `reasoning: true`
|
||||
automatically for matching models during discovery.
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Endpoint unavailability">
|
||||
If the Mantle endpoint is unavailable or returns no models, the provider is
|
||||
silently skipped. OpenClaw does not error; other configured providers
|
||||
continue to work normally.
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Relationship to Amazon Bedrock provider">
|
||||
Bedrock Mantle is a separate provider from the standard
|
||||
[Amazon Bedrock](/providers/bedrock) provider. Mantle uses an
|
||||
OpenAI-compatible `/v1` surface, while the standard Bedrock provider uses
|
||||
the native Bedrock API.
|
||||
|
||||
Both providers share the same `AWS_BEARER_TOKEN_BEDROCK` credential when
|
||||
present.
|
||||
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Related
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Amazon Bedrock" href="/providers/bedrock" icon="cloud">
|
||||
Native Bedrock provider for Anthropic Claude, Titan, and other models.
|
||||
</Card>
|
||||
<Card title="Model selection" href="/concepts/model-providers" icon="layers">
|
||||
Choosing providers, model refs, and failover behavior.
|
||||
</Card>
|
||||
<Card title="OAuth and auth" href="/gateway/authentication" icon="key">
|
||||
Auth details and credential reuse rules.
|
||||
</Card>
|
||||
<Card title="Troubleshooting" href="/help/troubleshooting" icon="wrench">
|
||||
Common issues and how to resolve them.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -11,42 +11,51 @@ read_when:
|
||||
|
||||
OpenClaw ships a bundled `fal` provider for hosted image and video generation.
|
||||
|
||||
- Provider: `fal`
|
||||
- Auth: `FAL_KEY` (canonical; `FAL_API_KEY` also works as a fallback)
|
||||
- API: fal model endpoints
|
||||
| Property | Value |
|
||||
| -------- | ------------------------------------------------------------- |
|
||||
| Provider | `fal` |
|
||||
| Auth | `FAL_KEY` (canonical; `FAL_API_KEY` also works as a fallback) |
|
||||
| API | fal model endpoints |
|
||||
|
||||
## Quick start
|
||||
## Getting started
|
||||
|
||||
1. Set the API key:
|
||||
|
||||
```bash
|
||||
openclaw onboard --auth-choice fal-api-key
|
||||
```
|
||||
|
||||
2. Set a default image model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
<Steps>
|
||||
<Step title="Set the API key">
|
||||
```bash
|
||||
openclaw onboard --auth-choice fal-api-key
|
||||
```
|
||||
</Step>
|
||||
<Step title="Set a default image model">
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Image generation
|
||||
|
||||
The bundled `fal` image-generation provider defaults to
|
||||
`fal/fal-ai/flux/dev`.
|
||||
|
||||
- Generate: up to 4 images per request
|
||||
- Edit mode: enabled, 1 reference image
|
||||
- Supports `size`, `aspectRatio`, and `resolution`
|
||||
- Current edit caveat: the fal image edit endpoint does **not** support
|
||||
`aspectRatio` overrides
|
||||
| Capability | Value |
|
||||
| -------------- | -------------------------- |
|
||||
| Max images | 4 per request |
|
||||
| Edit mode | Enabled, 1 reference image |
|
||||
| Size overrides | Supported |
|
||||
| Aspect ratio | Supported |
|
||||
| Resolution | Supported |
|
||||
|
||||
<Warning>
|
||||
The fal image edit endpoint does **not** support `aspectRatio` overrides.
|
||||
</Warning>
|
||||
|
||||
To use fal as the default image provider:
|
||||
|
||||
@@ -67,46 +76,70 @@ To use fal as the default image provider:
|
||||
The bundled `fal` video-generation provider defaults to
|
||||
`fal/fal-ai/minimax/video-01-live`.
|
||||
|
||||
- Modes: text-to-video and single-image reference flows
|
||||
- Runtime: queue-backed submit/status/result flow for long-running jobs
|
||||
- HeyGen video-agent model ref:
|
||||
- `fal/fal-ai/heygen/v2/video-agent`
|
||||
- Seedance 2.0 model refs:
|
||||
- `fal/bytedance/seedance-2.0/fast/text-to-video`
|
||||
- `fal/bytedance/seedance-2.0/fast/image-to-video`
|
||||
- `fal/bytedance/seedance-2.0/text-to-video`
|
||||
- `fal/bytedance/seedance-2.0/image-to-video`
|
||||
| Capability | Value |
|
||||
| ---------- | ------------------------------------------------------------ |
|
||||
| Modes | Text-to-video, single-image reference |
|
||||
| Runtime | Queue-backed submit/status/result flow for long-running jobs |
|
||||
|
||||
To use Seedance 2.0 as the default video model:
|
||||
<AccordionGroup>
|
||||
<Accordion title="Available video models">
|
||||
**HeyGen video-agent:**
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "fal/bytedance/seedance-2.0/fast/text-to-video",
|
||||
- `fal/fal-ai/heygen/v2/video-agent`
|
||||
|
||||
**Seedance 2.0:**
|
||||
|
||||
- `fal/bytedance/seedance-2.0/fast/text-to-video`
|
||||
- `fal/bytedance/seedance-2.0/fast/image-to-video`
|
||||
- `fal/bytedance/seedance-2.0/text-to-video`
|
||||
- `fal/bytedance/seedance-2.0/image-to-video`
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Seedance 2.0 config example">
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "fal/bytedance/seedance-2.0/fast/text-to-video",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
</Accordion>
|
||||
|
||||
To use HeyGen video-agent as the default video model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "fal/fal-ai/heygen/v2/video-agent",
|
||||
<Accordion title="HeyGen video-agent config example">
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
videoGenerationModel: {
|
||||
primary: "fal/fal-ai/heygen/v2/video-agent",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
}
|
||||
```
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
<Tip>
|
||||
Use `openclaw models list --provider fal` to see the full list of available fal
|
||||
models, including any recently added entries.
|
||||
</Tip>
|
||||
|
||||
## Related
|
||||
|
||||
- [Image Generation](/tools/image-generation)
|
||||
- [Video Generation](/tools/video-generation)
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Image generation" href="/tools/image-generation" icon="image">
|
||||
Shared image tool parameters and provider selection.
|
||||
</Card>
|
||||
<Card title="Video generation" href="/tools/video-generation" icon="video">
|
||||
Shared video tool parameters and provider selection.
|
||||
</Card>
|
||||
<Card title="Configuration reference" href="/gateway/configuration-reference#agent-defaults" icon="gear">
|
||||
Agent defaults including image and video model selection.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -12,33 +12,37 @@ read_when:
|
||||
(Llama, Gemma, Mistral, and more) using custom LPU hardware. OpenClaw connects
|
||||
to Groq through its OpenAI-compatible API.
|
||||
|
||||
- Provider: `groq`
|
||||
- Auth: `GROQ_API_KEY`
|
||||
- API: OpenAI-compatible
|
||||
| Property | Value |
|
||||
| -------- | ----------------- |
|
||||
| Provider | `groq` |
|
||||
| Auth | `GROQ_API_KEY` |
|
||||
| API | OpenAI-compatible |
|
||||
|
||||
## Quick start
|
||||
## Getting started
|
||||
|
||||
1. Get an API key from [console.groq.com/keys](https://console.groq.com/keys).
|
||||
<Steps>
|
||||
<Step title="Get an API key">
|
||||
Create an API key at [console.groq.com/keys](https://console.groq.com/keys).
|
||||
</Step>
|
||||
<Step title="Set the API key">
|
||||
```bash
|
||||
export GROQ_API_KEY="gsk_..."
|
||||
```
|
||||
</Step>
|
||||
<Step title="Set a default model">
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "groq/llama-3.3-70b-versatile" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
2. Set the API key:
|
||||
|
||||
```bash
|
||||
export GROQ_API_KEY="gsk_..."
|
||||
```
|
||||
|
||||
3. Set a default model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "groq/llama-3.3-70b-versatile" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Config file example
|
||||
### Config file example
|
||||
|
||||
```json5
|
||||
{
|
||||
@@ -51,6 +55,24 @@ export GROQ_API_KEY="gsk_..."
|
||||
}
|
||||
```
|
||||
|
||||
## Available models
|
||||
|
||||
Groq's model catalog changes frequently. Run `openclaw models list | grep groq`
|
||||
to see currently available models, or check
|
||||
[console.groq.com/docs/models](https://console.groq.com/docs/models).
|
||||
|
||||
| Model | Notes |
|
||||
| --------------------------- | ---------------------------------- |
|
||||
| **Llama 3.3 70B Versatile** | General-purpose, large context |
|
||||
| **Llama 3.1 8B Instant** | Fast, lightweight |
|
||||
| **Gemma 2 9B** | Compact, efficient |
|
||||
| **Mixtral 8x7B** | MoE architecture, strong reasoning |
|
||||
|
||||
<Tip>
|
||||
Use `openclaw models list --provider groq` for the most up-to-date list of
|
||||
models available on your account.
|
||||
</Tip>
|
||||
|
||||
## Audio transcription
|
||||
|
||||
Groq also provides fast Whisper-based audio transcription. When configured as a
|
||||
@@ -70,36 +92,43 @@ surface.
|
||||
}
|
||||
```
|
||||
|
||||
## Environment note
|
||||
<AccordionGroup>
|
||||
<Accordion title="Audio transcription details">
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Shared config path | `tools.media.audio` |
|
||||
| Default base URL | `https://api.groq.com/openai/v1` |
|
||||
| Default model | `whisper-large-v3-turbo` |
|
||||
| API endpoint | OpenAI-compatible `/audio/transcriptions` |
|
||||
</Accordion>
|
||||
|
||||
If the Gateway runs as a daemon (launchd/systemd), make sure `GROQ_API_KEY` is
|
||||
available to that process (for example, in `~/.openclaw/.env` or via
|
||||
`env.shellEnv`).
|
||||
<Accordion title="Environment note">
|
||||
If the Gateway runs as a daemon (launchd/systemd), make sure `GROQ_API_KEY` is
|
||||
available to that process (for example, in `~/.openclaw/.env` or via
|
||||
`env.shellEnv`).
|
||||
|
||||
## Audio notes
|
||||
<Warning>
|
||||
Keys set only in your interactive shell are not visible to daemon-managed
|
||||
gateway processes. Use `~/.openclaw/.env` or `env.shellEnv` config for
|
||||
persistent availability.
|
||||
</Warning>
|
||||
|
||||
- Shared config path: `tools.media.audio`
|
||||
- Default Groq audio base URL: `https://api.groq.com/openai/v1`
|
||||
- Default Groq audio model: `whisper-large-v3-turbo`
|
||||
- Groq audio transcription uses the OpenAI-compatible `/audio/transcriptions`
|
||||
path
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Available models
|
||||
## Related
|
||||
|
||||
Groq's model catalog changes frequently. Run `openclaw models list | grep groq`
|
||||
to see currently available models, or check
|
||||
[console.groq.com/docs/models](https://console.groq.com/docs/models).
|
||||
|
||||
Popular choices include:
|
||||
|
||||
- **Llama 3.3 70B Versatile** - general-purpose, large context
|
||||
- **Llama 3.1 8B Instant** - fast, lightweight
|
||||
- **Gemma 2 9B** - compact, efficient
|
||||
- **Mixtral 8x7B** - MoE architecture, strong reasoning
|
||||
|
||||
## Links
|
||||
|
||||
- [Groq Console](https://console.groq.com)
|
||||
- [API Documentation](https://console.groq.com/docs)
|
||||
- [Model List](https://console.groq.com/docs/models)
|
||||
- [Pricing](https://groq.com/pricing)
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Model selection" href="/concepts/model-providers" icon="layers">
|
||||
Choosing providers, model refs, and failover behavior.
|
||||
</Card>
|
||||
<Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
|
||||
Full config schema including provider and audio settings.
|
||||
</Card>
|
||||
<Card title="Groq Console" href="https://console.groq.com" icon="arrow-up-right-from-square">
|
||||
Groq dashboard, API docs, and pricing.
|
||||
</Card>
|
||||
<Card title="Groq model list" href="https://console.groq.com/docs/models" icon="list">
|
||||
Official Groq model catalog.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -15,36 +15,44 @@ OpenClaw can also **auto-discover** available models from SGLang when you opt
|
||||
in with `SGLANG_API_KEY` (any value works if your server does not enforce auth)
|
||||
and you do not define an explicit `models.providers.sglang` entry.
|
||||
|
||||
## Quick start
|
||||
## Getting started
|
||||
|
||||
1. Start SGLang with an OpenAI-compatible server.
|
||||
<Steps>
|
||||
<Step title="Start SGLang">
|
||||
Launch SGLang with an OpenAI-compatible server. Your base URL should expose
|
||||
`/v1` endpoints (for example `/v1/models`, `/v1/chat/completions`). SGLang
|
||||
commonly runs on:
|
||||
|
||||
Your base URL should expose `/v1` endpoints (for example `/v1/models`,
|
||||
`/v1/chat/completions`). SGLang commonly runs on:
|
||||
- `http://127.0.0.1:30000/v1`
|
||||
|
||||
- `http://127.0.0.1:30000/v1`
|
||||
</Step>
|
||||
<Step title="Set an API key">
|
||||
Any value works if no auth is configured on your server:
|
||||
|
||||
2. Opt in (any value works if no auth is configured):
|
||||
```bash
|
||||
export SGLANG_API_KEY="sglang-local"
|
||||
```
|
||||
|
||||
```bash
|
||||
export SGLANG_API_KEY="sglang-local"
|
||||
```
|
||||
</Step>
|
||||
<Step title="Run onboarding or set a model directly">
|
||||
```bash
|
||||
openclaw onboard
|
||||
```
|
||||
|
||||
3. Run onboarding and choose `SGLang`, or set a model directly:
|
||||
Or configure the model manually:
|
||||
|
||||
```bash
|
||||
openclaw onboard
|
||||
```
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "sglang/your-model-id" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "sglang/your-model-id" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Model discovery (implicit provider)
|
||||
|
||||
@@ -55,8 +63,10 @@ define `models.providers.sglang`, OpenClaw will query:
|
||||
|
||||
and convert the returned IDs into model entries.
|
||||
|
||||
<Note>
|
||||
If you set `models.providers.sglang` explicitly, auto-discovery is skipped and
|
||||
you must define models manually.
|
||||
</Note>
|
||||
|
||||
## Explicit configuration (manual models)
|
||||
|
||||
@@ -91,25 +101,52 @@ Use explicit config when:
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
## Advanced configuration
|
||||
|
||||
- Check the server is reachable:
|
||||
<AccordionGroup>
|
||||
<Accordion title="Proxy-style behavior">
|
||||
SGLang is treated as a proxy-style OpenAI-compatible `/v1` backend, not a
|
||||
native OpenAI endpoint.
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:30000/v1/models
|
||||
```
|
||||
| Behavior | SGLang |
|
||||
|----------|--------|
|
||||
| OpenAI-only request shaping | Not applied |
|
||||
| `service_tier`, Responses `store`, prompt-cache hints | Not sent |
|
||||
| Reasoning-compat payload shaping | Not applied |
|
||||
| Hidden attribution headers (`originator`, `version`, `User-Agent`) | Not injected on custom SGLang base URLs |
|
||||
|
||||
- If requests fail with auth errors, set a real `SGLANG_API_KEY` that matches
|
||||
your server configuration, or configure the provider explicitly under
|
||||
`models.providers.sglang`.
|
||||
</Accordion>
|
||||
|
||||
## Proxy-style behavior
|
||||
<Accordion title="Troubleshooting">
|
||||
**Server not reachable**
|
||||
|
||||
SGLang is treated as a proxy-style OpenAI-compatible `/v1` backend, not a
|
||||
native OpenAI endpoint.
|
||||
Verify the server is running and responding:
|
||||
|
||||
- native OpenAI-only request shaping does not apply here
|
||||
- no `service_tier`, no Responses `store`, no prompt-cache hints, and no
|
||||
OpenAI reasoning-compat payload shaping
|
||||
- hidden OpenClaw attribution headers (`originator`, `version`, `User-Agent`)
|
||||
are not injected on custom SGLang base URLs
|
||||
```bash
|
||||
curl http://127.0.0.1:30000/v1/models
|
||||
```
|
||||
|
||||
**Auth errors**
|
||||
|
||||
If requests fail with auth errors, set a real `SGLANG_API_KEY` that matches
|
||||
your server configuration, or configure the provider explicitly under
|
||||
`models.providers.sglang`.
|
||||
|
||||
<Tip>
|
||||
If you run SGLang without authentication, any non-empty value for
|
||||
`SGLANG_API_KEY` is sufficient to opt in to model discovery.
|
||||
</Tip>
|
||||
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Related
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Model selection" href="/concepts/model-providers" icon="layers">
|
||||
Choosing providers, model refs, and failover behavior.
|
||||
</Card>
|
||||
<Card title="Configuration reference" href="/gateway/configuration-reference" icon="gear">
|
||||
Full config schema including provider entries.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
@@ -8,53 +8,78 @@ title: "vLLM"
|
||||
|
||||
# vLLM
|
||||
|
||||
vLLM can serve open-source (and some custom) models via an **OpenAI-compatible** HTTP API. OpenClaw can connect to vLLM using the `openai-completions` API.
|
||||
vLLM can serve open-source (and some custom) models via an **OpenAI-compatible** HTTP API. OpenClaw connects to vLLM using the `openai-completions` API.
|
||||
|
||||
OpenClaw can also **auto-discover** available models from vLLM when you opt in with `VLLM_API_KEY` (any value works if your server doesn’t enforce auth) and you do not define an explicit `models.providers.vllm` entry.
|
||||
OpenClaw can also **auto-discover** available models from vLLM when you opt in with `VLLM_API_KEY` (any value works if your server does not enforce auth) and you do not define an explicit `models.providers.vllm` entry.
|
||||
|
||||
## Quick start
|
||||
| Property | Value |
|
||||
| ---------------- | ---------------------------------------- |
|
||||
| Provider ID | `vllm` |
|
||||
| API | `openai-completions` (OpenAI-compatible) |
|
||||
| Auth | `VLLM_API_KEY` environment variable |
|
||||
| Default base URL | `http://127.0.0.1:8000/v1` |
|
||||
|
||||
1. Start vLLM with an OpenAI-compatible server.
|
||||
## Getting started
|
||||
|
||||
Your base URL should expose `/v1` endpoints (e.g. `/v1/models`, `/v1/chat/completions`). vLLM commonly runs on:
|
||||
<Steps>
|
||||
<Step title="Start vLLM with an OpenAI-compatible server">
|
||||
Your base URL should expose `/v1` endpoints (e.g. `/v1/models`, `/v1/chat/completions`). vLLM commonly runs on:
|
||||
|
||||
- `http://127.0.0.1:8000/v1`
|
||||
```
|
||||
http://127.0.0.1:8000/v1
|
||||
```
|
||||
|
||||
2. Opt in (any value works if no auth is configured):
|
||||
</Step>
|
||||
<Step title="Set the API key environment variable">
|
||||
Any value works if your server does not enforce auth:
|
||||
|
||||
```bash
|
||||
export VLLM_API_KEY="vllm-local"
|
||||
```
|
||||
```bash
|
||||
export VLLM_API_KEY="vllm-local"
|
||||
```
|
||||
|
||||
3. Select a model (replace with one of your vLLM model IDs):
|
||||
</Step>
|
||||
<Step title="Select a model">
|
||||
Replace with one of your vLLM model IDs:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "vllm/your-model-id" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "vllm/your-model-id" },
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
</Step>
|
||||
<Step title="Verify the model is available">
|
||||
```bash
|
||||
openclaw models list --provider vllm
|
||||
```
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Model discovery (implicit provider)
|
||||
|
||||
When `VLLM_API_KEY` is set (or an auth profile exists) and you **do not** define `models.providers.vllm`, OpenClaw will query:
|
||||
When `VLLM_API_KEY` is set (or an auth profile exists) and you **do not** define `models.providers.vllm`, OpenClaw queries:
|
||||
|
||||
- `GET http://127.0.0.1:8000/v1/models`
|
||||
```
|
||||
GET http://127.0.0.1:8000/v1/models
|
||||
```
|
||||
|
||||
…and convert the returned IDs into model entries.
|
||||
and converts the returned IDs into model entries.
|
||||
|
||||
<Note>
|
||||
If you set `models.providers.vllm` explicitly, auto-discovery is skipped and you must define models manually.
|
||||
</Note>
|
||||
|
||||
## Explicit configuration (manual models)
|
||||
|
||||
Use explicit config when:
|
||||
|
||||
- vLLM runs on a different host/port.
|
||||
- You want to pin `contextWindow`/`maxTokens` values.
|
||||
- Your server requires a real API key (or you want to control headers).
|
||||
- vLLM runs on a different host or port
|
||||
- You want to pin `contextWindow` or `maxTokens` values
|
||||
- Your server requires a real API key (or you want to control headers)
|
||||
|
||||
```json5
|
||||
{
|
||||
@@ -81,23 +106,99 @@ Use explicit config when:
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced notes
|
||||
|
||||
<AccordionGroup>
|
||||
<Accordion title="Proxy-style behavior">
|
||||
vLLM is treated as a proxy-style OpenAI-compatible `/v1` backend, not a native
|
||||
OpenAI endpoint. This means:
|
||||
|
||||
| Behavior | Applied? |
|
||||
|----------|----------|
|
||||
| Native OpenAI request shaping | No |
|
||||
| `service_tier` | Not sent |
|
||||
| Responses `store` | Not sent |
|
||||
| Prompt-cache hints | Not sent |
|
||||
| OpenAI reasoning-compat payload shaping | Not applied |
|
||||
| Hidden OpenClaw attribution headers | Not injected on custom base URLs |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Custom base URL">
|
||||
If your vLLM server runs on a non-default host or port, set `baseUrl` in the explicit provider config:
|
||||
|
||||
```json5
|
||||
{
|
||||
models: {
|
||||
providers: {
|
||||
vllm: {
|
||||
baseUrl: "http://192.168.1.50:9000/v1",
|
||||
apiKey: "${VLLM_API_KEY}",
|
||||
api: "openai-completions",
|
||||
models: [
|
||||
{
|
||||
id: "my-custom-model",
|
||||
name: "Remote vLLM Model",
|
||||
reasoning: false,
|
||||
input: ["text"],
|
||||
contextWindow: 64000,
|
||||
maxTokens: 4096,
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- Check the server is reachable:
|
||||
<AccordionGroup>
|
||||
<Accordion title="Server not reachable">
|
||||
Check that the vLLM server is running and accessible:
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:8000/v1/models
|
||||
```
|
||||
```bash
|
||||
curl http://127.0.0.1:8000/v1/models
|
||||
```
|
||||
|
||||
- If requests fail with auth errors, set a real `VLLM_API_KEY` that matches your server configuration, or configure the provider explicitly under `models.providers.vllm`.
|
||||
If you see a connection error, verify the host, port, and that vLLM started with the OpenAI-compatible server mode.
|
||||
|
||||
## Proxy-style behavior
|
||||
</Accordion>
|
||||
|
||||
vLLM is treated as a proxy-style OpenAI-compatible `/v1` backend, not a native
|
||||
OpenAI endpoint.
|
||||
<Accordion title="Auth errors on requests">
|
||||
If requests fail with auth errors, set a real `VLLM_API_KEY` that matches your server configuration, or configure the provider explicitly under `models.providers.vllm`.
|
||||
|
||||
- native OpenAI-only request shaping does not apply here
|
||||
- no `service_tier`, no Responses `store`, no prompt-cache hints, and no
|
||||
OpenAI reasoning-compat payload shaping
|
||||
- hidden OpenClaw attribution headers (`originator`, `version`, `User-Agent`)
|
||||
are not injected on custom vLLM base URLs
|
||||
<Tip>
|
||||
If your vLLM server does not enforce auth, any non-empty value for `VLLM_API_KEY` works as an opt-in signal for OpenClaw.
|
||||
</Tip>
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="No models discovered">
|
||||
Auto-discovery requires `VLLM_API_KEY` to be set **and** no explicit `models.providers.vllm` config entry. If you have defined the provider manually, OpenClaw skips discovery and uses only your declared models.
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
<Warning>
|
||||
More help: [Troubleshooting](/help/troubleshooting) and [FAQ](/help/faq).
|
||||
</Warning>
|
||||
|
||||
## Related
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Model selection" href="/concepts/model-providers" icon="layers">
|
||||
Choosing providers, model refs, and failover behavior.
|
||||
</Card>
|
||||
<Card title="OpenAI" href="/providers/openai" icon="bolt">
|
||||
Native OpenAI provider and OpenAI-compatible route behavior.
|
||||
</Card>
|
||||
<Card title="OAuth and auth" href="/gateway/authentication" icon="key">
|
||||
Auth details and credential reuse rules.
|
||||
</Card>
|
||||
<Card title="Troubleshooting" href="/help/troubleshooting" icon="wrench">
|
||||
Common issues and how to resolve them.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
Reference in New Issue
Block a user