mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 12:00:44 +00:00
docs(models): clarify local chat completions routing
This commit is contained in:
@@ -432,7 +432,7 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
|
||||
- Safe edits: use `openclaw config set models.providers.<id> '<json>' --strict-json --merge` or `openclaw config set models.providers.<id>.models '<json-array>' --strict-json --merge` for additive updates. `config set` refuses destructive replacements unless you pass `--replace`.
|
||||
</Accordion>
|
||||
<Accordion title="Provider connection and auth">
|
||||
- `models.providers.*.api`: request adapter (`openai-completions`, `openai-responses`, `anthropic-messages`, `google-generative-ai`, etc).
|
||||
- `models.providers.*.api`: request adapter (`openai-completions`, `openai-responses`, `anthropic-messages`, `google-generative-ai`, etc). For self-hosted `/v1/chat/completions` backends such as MLX, vLLM, SGLang, and most OpenAI-compatible local servers, use `openai-completions`. Use `openai-responses` only when the backend supports `/v1/responses`.
|
||||
- `models.providers.*.apiKey`: provider credential (prefer SecretRef/env substitution).
|
||||
- `models.providers.*.auth`: auth strategy (`api-key`, `token`, `oauth`, `aws-sdk`).
|
||||
- `models.providers.*.contextWindow`: default native context window for models under this provider when the model entry does not set `contextWindow`.
|
||||
|
||||
@@ -113,17 +113,26 @@ Swap the primary and fallback order; keep the same providers block and `models.m
|
||||
|
||||
## Other OpenAI-compatible local proxies
|
||||
|
||||
vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style `/v1` endpoint. Replace the provider block above with your endpoint and model ID:
|
||||
MLX (`mlx_lm.server`), vLLM, SGLang, LiteLLM, OAI-proxy, or custom
|
||||
gateways work if they expose an OpenAI-style `/v1/chat/completions`
|
||||
endpoint. Use the Chat Completions adapter unless the backend explicitly
|
||||
documents `/v1/responses` support. Replace the provider block above with your
|
||||
endpoint and model ID:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
model: { primary: "local/my-local-model" },
|
||||
},
|
||||
},
|
||||
models: {
|
||||
mode: "merge",
|
||||
providers: {
|
||||
local: {
|
||||
baseUrl: "http://127.0.0.1:8000/v1",
|
||||
apiKey: "sk-local",
|
||||
api: "openai-responses",
|
||||
api: "openai-completions",
|
||||
timeoutSeconds: 300,
|
||||
models: [
|
||||
{
|
||||
@@ -142,6 +151,14 @@ vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style
|
||||
}
|
||||
```
|
||||
|
||||
The `models.providers.<id>.models[].id` value is provider-local. Do not
|
||||
include the provider prefix there. For example, an MLX server started with
|
||||
`mlx_lm.server --model mlx-community/Qwen3-30B-A3B-6bit` should use this
|
||||
catalog id and model ref:
|
||||
|
||||
- `models.providers.mlx.models[].id: "mlx-community/Qwen3-30B-A3B-6bit"`
|
||||
- `agents.defaults.model.primary: "mlx/mlx-community/Qwen3-30B-A3B-6bit"`
|
||||
|
||||
Keep `models.mode: "merge"` so hosted models stay available as fallbacks.
|
||||
Use `models.providers.<id>.timeoutSeconds` for slow local or remote model
|
||||
servers before raising `agents.defaults.timeoutSeconds`. The provider timeout
|
||||
|
||||
@@ -118,12 +118,15 @@ openclaw logs --follow
|
||||
Look for:
|
||||
|
||||
- direct tiny calls succeed, but OpenClaw runs fail only on larger prompts
|
||||
- `model_not_found` or 404 errors even though direct `/v1/chat/completions`
|
||||
works with the same bare model id
|
||||
- backend errors about `messages[].content` expecting a string
|
||||
- intermittent `incomplete turn detected ... stopReason=stop payloads=0` warnings with an OpenAI-compatible local backend
|
||||
- backend crashes that appear only with larger prompt-token counts or full agent runtime prompts
|
||||
|
||||
<AccordionGroup>
|
||||
<Accordion title="Common signatures">
|
||||
- `model_not_found` with a local MLX/vLLM-style server → verify `baseUrl` includes `/v1`, `api` is `"openai-completions"` for `/v1/chat/completions` backends, and `models.providers.<provider>.models[].id` is the bare provider-local id. Select it with the provider prefix once, for example `mlx/mlx-community/Qwen3-30B-A3B-6bit`; keep the catalog entry as `mlx-community/Qwen3-30B-A3B-6bit`.
|
||||
- `messages[...].content: invalid type: sequence, expected a string` → backend rejects structured Chat Completions content parts. Fix: set `models.providers.<provider>.models[].compat.requiresStringContent: true`.
|
||||
- `incomplete turn detected ... stopReason=stop payloads=0` → the backend completed the Chat Completions request but returned no user-visible assistant text for that turn. OpenClaw retries replay-safe empty OpenAI-compatible turns once; persistent failures usually mean the backend is emitting empty/non-text content or suppressing final-answer text.
|
||||
- direct tiny requests succeed, but OpenClaw agent runs fail with backend/model crashes (for example Gemma on some `inferrs` builds) → OpenClaw transport is likely already correct; the backend is failing on the larger agent-runtime prompt shape.
|
||||
|
||||
Reference in New Issue
Block a user