From ed52e2ec914c08c024c783fdd82209128b8019c7 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Thu, 23 Apr 2026 09:20:52 -0700 Subject: [PATCH] =?UTF-8?q?docs(concepts):=20compress=20model-providers=20?= =?UTF-8?q?reference=20=E2=80=94=20move=20SDK=20hook=20list=20to=20provide?= =?UTF-8?q?r=20plugins=20doc,=20tabulate=20bundled=20providers?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/concepts/model-providers.md | 311 +++++-------------------------- 1 file changed, 42 insertions(+), 269 deletions(-) diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md index d8676d514e7..8688dedee63 100644 --- a/docs/concepts/model-providers.md +++ b/docs/concepts/model-providers.md @@ -3,7 +3,7 @@ summary: "Model provider overview with example configs + CLI flows" read_when: - You need a provider-by-provider model setup reference - You want example configs or CLI onboarding commands for model providers -title: "Model Providers" +title: "Model providers" --- # Model providers @@ -14,218 +14,21 @@ For model selection rules, see [/concepts/models](/concepts/models). ## Quick rules - Model refs use `provider/model` (example: `opencode/claude-opus-4-6`). -- If you set `agents.defaults.models`, it becomes the allowlist. +- `agents.defaults.models` acts as an allowlist when set. - CLI helpers: `openclaw onboard`, `openclaw models list`, `openclaw models set `. -- Fallback runtime rules, cooldown probes, and session-override persistence are - documented in [/concepts/model-failover](/concepts/model-failover). -- `models.providers.*.models[].contextWindow` is native model metadata; - `models.providers.*.models[].contextTokens` is the effective runtime cap. -- Provider plugins can inject model catalogs via `registerProvider({ catalog })`; - OpenClaw merges that output into `models.providers` before writing - `models.json`. -- Provider manifests can declare `providerAuthEnvVars` and - `providerAuthAliases` so generic env-based auth probes and provider variants - do not need to load plugin runtime. The remaining core env-var map is now - just for non-plugin/core providers and a few generic-precedence cases such - as Anthropic API-key-first onboarding. -- Provider plugins can also own provider runtime behavior via - `normalizeModelId`, `normalizeTransport`, `normalizeConfig`, - `applyNativeStreamingUsageCompat`, `resolveConfigApiKey`, - `resolveSyntheticAuth`, `shouldDeferSyntheticProfileAuth`, - `resolveDynamicModel`, `prepareDynamicModel`, - `normalizeResolvedModel`, `contributeResolvedModelCompat`, - `capabilities`, `normalizeToolSchemas`, - `inspectToolSchemas`, `resolveReasoningOutputMode`, - `prepareExtraParams`, `createStreamFn`, `wrapStreamFn`, - `resolveTransportTurnState`, `resolveWebSocketSessionPolicy`, - `createEmbeddingProvider`, `formatApiKey`, `refreshOAuth`, - `buildAuthDoctorHint`, - `matchesContextOverflowError`, `classifyFailoverReason`, - `isCacheTtlEligible`, `buildMissingAuthMessage`, `suppressBuiltInModel`, - `augmentModelCatalog`, `resolveThinkingProfile`, `isBinaryThinking`, - `supportsXHighThinking`, `resolveDefaultThinkingLevel`, - `applyConfigDefaults`, `isModernModelRef`, - `prepareRuntimeAuth`, `resolveUsageAuth`, `fetchUsageSnapshot`, and - `onModelSelected`. -- Note: provider runtime `capabilities` is shared runner metadata (provider - family, transcript/tooling quirks, transport/cache hints). It is not the - same as the [public capability model](/plugins/architecture#public-capability-model) - which describes what a plugin registers (text inference, speech, etc.). -- The bundled `codex` provider is paired with the bundled Codex agent harness. - Use `codex/gpt-*` when you want Codex-owned login, model discovery, native - thread resume, and app-server execution. Plain `openai/gpt-*` refs continue - to use the OpenAI provider and the normal OpenClaw provider transport. - Codex-only deployments can disable automatic PI fallback with - `agents.defaults.embeddedHarness.fallback: "none"`; see - [Codex Harness](/plugins/codex-harness). +- `models.providers.*.models[].contextWindow` is native model metadata; `contextTokens` is the effective runtime cap. +- Fallback rules, cooldown probes, and session-override persistence: [Model failover](/concepts/model-failover). +- Bundled `codex` is paired with the Codex agent harness — use `codex/gpt-*` for Codex-owned login, discovery, native thread resume, and app-server execution. Plain `openai/gpt-*` uses the OpenAI provider and normal transport. Disable automatic PI fallback for Codex-only deployments via `agents.defaults.embeddedHarness.fallback: "none"` — see [Codex harness](/plugins/codex-harness). ## Plugin-owned provider behavior -Provider plugins can now own most provider-specific logic while OpenClaw keeps -the generic inference loop. +Most provider-specific logic lives in provider plugins (`registerProvider(...)`) while OpenClaw keeps the generic inference loop. Plugins own onboarding, model catalogs, auth env-var mapping, transport/config normalization, tool-schema cleanup, failover classification, OAuth refresh, usage reporting, thinking/reasoning profiles, and more. -Typical split: +The full list of provider-SDK hooks and bundled-plugin examples lives in [Provider plugins](/plugins/sdk-provider-plugins). A provider that needs a totally custom request executor is a separate, deeper extension surface. -- `auth[].run` / `auth[].runNonInteractive`: provider owns onboarding/login - flows for `openclaw onboard`, `openclaw models auth`, and headless setup -- `wizard.setup` / `wizard.modelPicker`: provider owns auth-choice labels, - legacy aliases, onboarding allowlist hints, and setup entries in onboarding/model pickers -- `catalog`: provider appears in `models.providers` -- `normalizeModelId`: provider normalizes legacy/preview model ids before - lookup or canonicalization -- `normalizeTransport`: provider normalizes transport-family `api` / `baseUrl` - before generic model assembly; OpenClaw checks the matched provider first, - then other hook-capable provider plugins until one actually changes the - transport -- `normalizeConfig`: provider normalizes `models.providers.` config before - runtime uses it; OpenClaw checks the matched provider first, then other - hook-capable provider plugins until one actually changes the config. If no - provider hook rewrites the config, bundled Google-family helpers still - normalize supported Google provider entries. -- `applyNativeStreamingUsageCompat`: provider applies endpoint-driven native streaming-usage compat rewrites for config providers -- `resolveConfigApiKey`: provider resolves env-marker auth for config providers - without forcing full runtime auth loading. `amazon-bedrock` also has a - built-in AWS env-marker resolver here, even though Bedrock runtime auth uses - the AWS SDK default chain. -- `resolveSyntheticAuth`: provider can expose local/self-hosted or other - config-backed auth availability without persisting plaintext secrets -- `shouldDeferSyntheticProfileAuth`: provider can mark stored synthetic profile - placeholders as lower precedence than env/config-backed auth -- `resolveDynamicModel`: provider accepts model ids not present in the local - static catalog yet -- `prepareDynamicModel`: provider needs a metadata refresh before retrying - dynamic resolution -- `normalizeResolvedModel`: provider needs transport or base URL rewrites -- `contributeResolvedModelCompat`: provider contributes compat flags for its - vendor models even when they arrive through another compatible transport -- `capabilities`: provider publishes transcript/tooling/provider-family quirks -- `normalizeToolSchemas`: provider cleans tool schemas before the embedded - runner sees them -- `inspectToolSchemas`: provider surfaces transport-specific schema warnings - after normalization -- `resolveReasoningOutputMode`: provider chooses native vs tagged - reasoning-output contracts -- `prepareExtraParams`: provider defaults or normalizes per-model request params -- `createStreamFn`: provider replaces the normal stream path with a fully - custom transport -- `wrapStreamFn`: provider applies request headers/body/model compat wrappers -- `resolveTransportTurnState`: provider supplies per-turn native transport - headers or metadata -- `resolveWebSocketSessionPolicy`: provider supplies native WebSocket session - headers or session cool-down policy -- `createEmbeddingProvider`: provider owns memory embedding behavior when it - belongs with the provider plugin instead of the core embedding switchboard -- `formatApiKey`: provider formats stored auth profiles into the runtime - `apiKey` string expected by the transport -- `refreshOAuth`: provider owns OAuth refresh when the shared `pi-ai` - refreshers are not enough -- `buildAuthDoctorHint`: provider appends repair guidance when OAuth refresh - fails -- `matchesContextOverflowError`: provider recognizes provider-specific - context-window overflow errors that generic heuristics would miss -- `classifyFailoverReason`: provider maps provider-specific raw transport/API - errors to failover reasons such as rate limit or overload -- `isCacheTtlEligible`: provider decides which upstream model ids support prompt-cache TTL -- `buildMissingAuthMessage`: provider replaces the generic auth-store error - with a provider-specific recovery hint -- `suppressBuiltInModel`: provider hides stale upstream rows and can return a - vendor-owned error for direct resolution failures -- `augmentModelCatalog`: provider appends synthetic/final catalog rows after - discovery and config merging -- `resolveThinkingProfile`: provider owns the exact `/think` level set, - optional display labels, and default level for a selected model -- `isBinaryThinking`: compatibility hook for binary on/off thinking UX -- `supportsXHighThinking`: compatibility hook for selected `xhigh` models -- `resolveDefaultThinkingLevel`: compatibility hook for default `/think` policy -- `applyConfigDefaults`: provider applies provider-specific global defaults - during config materialization based on auth mode, env, or model family -- `isModernModelRef`: provider owns live/smoke preferred-model matching -- `prepareRuntimeAuth`: provider turns a configured credential into a short - lived runtime token -- `resolveUsageAuth`: provider resolves usage/quota credentials for `/usage` - and related status/reporting surfaces -- `fetchUsageSnapshot`: provider owns the usage endpoint fetch/parsing while - core still owns the summary shell and formatting -- `onModelSelected`: provider runs post-selection side effects such as - telemetry or provider-owned session bookkeeping - -Current bundled examples: - -- `anthropic`: Claude 4.6 forward-compat fallback, auth repair hints, usage - endpoint fetching, cache-TTL/provider-family metadata, and auth-aware global - config defaults -- `amazon-bedrock`: provider-owned context-overflow matching and failover - reason classification for Bedrock-specific throttle/not-ready errors, plus - the shared `anthropic-by-model` replay family for Claude-only replay-policy - guards on Anthropic traffic -- `anthropic-vertex`: Claude-only replay-policy guards on Anthropic-message - traffic -- `openrouter`: pass-through model ids, request wrappers, provider capability - hints, Gemini thought-signature sanitation on proxy Gemini traffic, proxy - reasoning injection through the `openrouter-thinking` stream family, routing - metadata forwarding, and cache-TTL policy -- `github-copilot`: onboarding/device login, forward-compat model fallback, - Claude-thinking transcript hints, runtime token exchange, and usage endpoint - fetching -- `openai`: GPT-5.4 forward-compat fallback, direct OpenAI transport - normalization, Codex-aware missing-auth hints, Spark suppression, synthetic - OpenAI/Codex catalog rows, thinking/live-model policy, usage-token alias - normalization (`input` / `output` and `prompt` / `completion` families), the - shared `openai-responses-defaults` stream family for native OpenAI/Codex - wrappers, provider-family metadata, bundled image-generation provider - registration for `gpt-image-2`, and bundled video-generation provider - registration for `sora-2` -- `google` and `google-gemini-cli`: Gemini 3.1 forward-compat fallback, - native Gemini replay validation, bootstrap replay sanitation, tagged - reasoning-output mode, modern-model matching, bundled image-generation - provider registration for Gemini image-preview models, and bundled - video-generation provider registration for Veo models; Gemini CLI OAuth also - owns auth-profile token formatting, usage-token parsing, and quota endpoint - fetching for usage surfaces -- `moonshot`: shared transport, plugin-owned thinking payload normalization -- `kilocode`: shared transport, plugin-owned request headers, reasoning payload - normalization, proxy-Gemini thought-signature sanitation, and cache-TTL - policy -- `zai`: GLM-5 forward-compat fallback, `tool_stream` defaults, cache-TTL - policy, binary-thinking/live-model policy, and usage auth + quota fetching; - unknown `glm-5*` ids synthesize from the bundled `glm-4.7` template -- `xai`: native Responses transport normalization, `/fast` alias rewrites for - Grok fast variants, default `tool_stream`, xAI-specific tool-schema / - reasoning-payload cleanup, and bundled video-generation provider - registration for `grok-imagine-video` -- `mistral`: plugin-owned capability metadata -- `opencode` and `opencode-go`: plugin-owned capability metadata plus - proxy-Gemini thought-signature sanitation -- `alibaba`: plugin-owned video-generation catalog for direct Wan model refs - such as `alibaba/wan2.6-t2v` -- `byteplus`: plugin-owned catalogs plus bundled video-generation provider - registration for Seedance text-to-video/image-to-video models -- `fal`: bundled video-generation provider registration for hosted third-party - image-generation provider registration for FLUX image models plus bundled - video-generation provider registration for hosted third-party video models -- `cloudflare-ai-gateway`, `huggingface`, `kimi`, `nvidia`, `qianfan`, - `stepfun`, `synthetic`, `venice`, `vercel-ai-gateway`, and `volcengine`: - plugin-owned catalogs only -- `qwen`: plugin-owned catalogs for text models plus shared - media-understanding and video-generation provider registrations for its - multimodal surfaces; Qwen video generation uses the Standard DashScope video - endpoints with bundled Wan models such as `wan2.6-t2v` and `wan2.7-r2v` -- `runway`: plugin-owned video-generation provider registration for native - Runway task-based models such as `gen4.5` -- `minimax`: plugin-owned catalogs, bundled video-generation provider - registration for Hailuo video models, bundled image-generation provider - registration for `image-01`, hybrid Anthropic/OpenAI replay-policy - selection, and usage auth/snapshot logic -- `together`: plugin-owned catalogs plus bundled video-generation provider - registration for Wan video models -- `xiaomi`: plugin-owned catalogs plus usage auth/snapshot logic - -The bundled `openai` plugin now owns both provider ids: `openai` and -`openai-codex`. - -That covers providers that still fit OpenClaw's normal transports. A provider -that needs a totally custom request executor is a separate, deeper extension -surface. + +Provider runtime `capabilities` is shared runner metadata (provider family, transcript/tooling quirks, transport/cache hints). It is not the same as the [public capability model](/plugins/architecture#public-capability-model), which describes what a plugin registers (text inference, speech, etc.). + ## API key rotation @@ -411,68 +214,38 @@ See [/providers/kilocode](/providers/kilocode) for setup details. ### Other bundled provider plugins -- OpenRouter: `openrouter` (`OPENROUTER_API_KEY`) -- Example models: `openrouter/auto`, `openrouter/moonshotai/kimi-k2.6` -- OpenClaw applies OpenRouter's documented app-attribution headers only when - the request actually targets `openrouter.ai` -- OpenRouter-specific Anthropic `cache_control` markers are likewise gated to - verified OpenRouter routes, not arbitrary proxy URLs -- OpenRouter remains on the proxy-style OpenAI-compatible path, so native - OpenAI-only request shaping (`serviceTier`, Responses `store`, - prompt-cache hints, OpenAI reasoning-compat payloads) is not forwarded -- Gemini-backed OpenRouter refs keep proxy-Gemini thought-signature sanitation - only; native Gemini replay validation and bootstrap rewrites stay off -- Kilo Gateway: `kilocode` (`KILOCODE_API_KEY`) -- Example model: `kilocode/kilo/auto` -- Gemini-backed Kilo refs keep the same proxy-Gemini thought-signature - sanitation path; `kilocode/kilo/auto` and other proxy-reasoning-unsupported - hints skip proxy reasoning injection -- MiniMax: `minimax` (API key) and `minimax-portal` (OAuth) -- Auth: `MINIMAX_API_KEY` for `minimax`; `MINIMAX_OAUTH_TOKEN` or `MINIMAX_API_KEY` for `minimax-portal` -- Example model: `minimax/MiniMax-M2.7` or `minimax-portal/MiniMax-M2.7` -- MiniMax onboarding/API-key setup writes explicit M2.7 model definitions with - `input: ["text", "image"]`; the bundled provider catalog keeps the chat refs - text-only until that provider config is materialized -- Moonshot: `moonshot` (`MOONSHOT_API_KEY`) -- Example model: `moonshot/kimi-k2.6` -- Kimi Coding: `kimi` (`KIMI_API_KEY` or `KIMICODE_API_KEY`) -- Example model: `kimi/kimi-code` -- Qianfan: `qianfan` (`QIANFAN_API_KEY`) -- Example model: `qianfan/deepseek-v3.2` -- Qwen Cloud: `qwen` (`QWEN_API_KEY`, `MODELSTUDIO_API_KEY`, or `DASHSCOPE_API_KEY`) -- Example model: `qwen/qwen3.5-plus` -- NVIDIA: `nvidia` (`NVIDIA_API_KEY`) -- Example model: `nvidia/nvidia/llama-3.1-nemotron-70b-instruct` -- StepFun: `stepfun` / `stepfun-plan` (`STEPFUN_API_KEY`) -- Example models: `stepfun/step-3.5-flash`, `stepfun-plan/step-3.5-flash-2603` -- Together: `together` (`TOGETHER_API_KEY`) -- Example model: `together/moonshotai/Kimi-K2.5` -- Venice: `venice` (`VENICE_API_KEY`) -- Xiaomi: `xiaomi` (`XIAOMI_API_KEY`) -- Example model: `xiaomi/mimo-v2-flash` -- Vercel AI Gateway: `vercel-ai-gateway` (`AI_GATEWAY_API_KEY`) -- Hugging Face Inference: `huggingface` (`HUGGINGFACE_HUB_TOKEN` or `HF_TOKEN`) -- Cloudflare AI Gateway: `cloudflare-ai-gateway` (`CLOUDFLARE_AI_GATEWAY_API_KEY`) -- Volcengine: `volcengine` (`VOLCANO_ENGINE_API_KEY`) -- Example model: `volcengine-plan/ark-code-latest` -- BytePlus: `byteplus` (`BYTEPLUS_API_KEY`) -- Example model: `byteplus-plan/ark-code-latest` -- xAI: `xai` (`XAI_API_KEY`) - - Native bundled xAI requests use the xAI Responses path - - `/fast` or `params.fastMode: true` rewrites `grok-3`, `grok-3-mini`, - `grok-4`, and `grok-4-0709` to their `*-fast` variants - - `tool_stream` defaults on; set - `agents.defaults.models["xai/"].params.tool_stream` to `false` to - disable it -- Mistral: `mistral` (`MISTRAL_API_KEY`) -- Example model: `mistral/mistral-large-latest` -- CLI: `openclaw onboard --auth-choice mistral-api-key` -- Groq: `groq` (`GROQ_API_KEY`) -- Cerebras: `cerebras` (`CEREBRAS_API_KEY`) - - GLM models on Cerebras use ids `zai-glm-4.7` and `zai-glm-4.6`. - - OpenAI-compatible base URL: `https://api.cerebras.ai/v1`. -- GitHub Copilot: `github-copilot` (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN`) -- Hugging Face Inference example model: `huggingface/deepseek-ai/DeepSeek-R1`; CLI: `openclaw onboard --auth-choice huggingface-api-key`. See [Hugging Face (Inference)](/providers/huggingface). +| Provider | Id | Auth env | Example model | +| ----------------------- | -------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | +| BytePlus | `byteplus` / `byteplus-plan` | `BYTEPLUS_API_KEY` | `byteplus-plan/ark-code-latest` | +| Cerebras | `cerebras` | `CEREBRAS_API_KEY` | `cerebras/zai-glm-4.7` | +| Cloudflare AI Gateway | `cloudflare-ai-gateway` | `CLOUDFLARE_AI_GATEWAY_API_KEY` | — | +| GitHub Copilot | `github-copilot` | `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN` | — | +| Groq | `groq` | `GROQ_API_KEY` | — | +| Hugging Face Inference | `huggingface` | `HUGGINGFACE_HUB_TOKEN` or `HF_TOKEN` | `huggingface/deepseek-ai/DeepSeek-R1` | +| Kilo Gateway | `kilocode` | `KILOCODE_API_KEY` | `kilocode/kilo/auto` | +| Kimi Coding | `kimi` | `KIMI_API_KEY` or `KIMICODE_API_KEY` | `kimi/kimi-code` | +| MiniMax | `minimax` / `minimax-portal` | `MINIMAX_API_KEY` / `MINIMAX_OAUTH_TOKEN` | `minimax/MiniMax-M2.7` | +| Mistral | `mistral` | `MISTRAL_API_KEY` | `mistral/mistral-large-latest` | +| Moonshot | `moonshot` | `MOONSHOT_API_KEY` | `moonshot/kimi-k2.6` | +| NVIDIA | `nvidia` | `NVIDIA_API_KEY` | `nvidia/nvidia/llama-3.1-nemotron-70b-instruct` | +| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` | `openrouter/auto` | +| Qianfan | `qianfan` | `QIANFAN_API_KEY` | `qianfan/deepseek-v3.2` | +| Qwen Cloud | `qwen` | `QWEN_API_KEY` / `MODELSTUDIO_API_KEY` / `DASHSCOPE_API_KEY` | `qwen/qwen3.5-plus` | +| StepFun | `stepfun` / `stepfun-plan` | `STEPFUN_API_KEY` | `stepfun/step-3.5-flash` | +| Together | `together` | `TOGETHER_API_KEY` | `together/moonshotai/Kimi-K2.5` | +| Venice | `venice` | `VENICE_API_KEY` | — | +| Vercel AI Gateway | `vercel-ai-gateway` | `AI_GATEWAY_API_KEY` | `vercel-ai-gateway/anthropic/claude-opus-4.6` | +| Volcano Engine (Doubao) | `volcengine` / `volcengine-plan` | `VOLCANO_ENGINE_API_KEY` | `volcengine-plan/ark-code-latest` | +| xAI | `xai` | `XAI_API_KEY` | `xai/grok-4` | +| Xiaomi | `xiaomi` | `XIAOMI_API_KEY` | `xiaomi/mimo-v2-flash` | + +Quirks worth knowing: + +- **OpenRouter** applies its app-attribution headers and Anthropic `cache_control` markers only on verified `openrouter.ai` routes. As a proxy-style OpenAI-compatible path, it skips native-OpenAI-only shaping (`serviceTier`, Responses `store`, prompt-cache hints, OpenAI reasoning-compat). Gemini-backed refs keep proxy-Gemini thought-signature sanitation only. +- **Kilo Gateway** Gemini-backed refs follow the same proxy-Gemini sanitation path; `kilocode/kilo/auto` and other proxy-reasoning-unsupported refs skip proxy reasoning injection. +- **MiniMax** API-key onboarding writes explicit M2.7 model definitions with `input: ["text", "image"]`; the bundled catalog keeps chat refs text-only until that config is materialized. +- **xAI** uses the xAI Responses path. `/fast` or `params.fastMode: true` rewrites `grok-3`, `grok-3-mini`, `grok-4`, and `grok-4-0709` to their `*-fast` variants. `tool_stream` defaults on; disable via `agents.defaults.models["xai/"].params.tool_stream=false`. +- **Cerebras** GLM models use `zai-glm-4.7` / `zai-glm-4.6`; OpenAI-compatible base URL is `https://api.cerebras.ai/v1`. ## Providers via `models.providers` (custom/base URL)