From 7c0fdae9b95bfcc667af3ad341b63ecc194464e4 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Mon, 27 Apr 2026 05:27:35 +0100 Subject: [PATCH] docs(providers): document local model request timeout --- docs/concepts/model-providers.md | 2 ++ docs/gateway/local-models.md | 5 +++++ docs/providers/vllm.md | 30 ++++++++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md index c7847935428..01535f1d78e 100644 --- a/docs/concepts/model-providers.md +++ b/docs/concepts/model-providers.md @@ -625,6 +625,7 @@ Example (OpenAI‑compatible): baseUrl: "http://localhost:1234/v1", apiKey: "${LM_API_TOKEN}", api: "openai-completions", + timeoutSeconds: 300, models: [ { id: "my-local-model", @@ -660,6 +661,7 @@ Example (OpenAI‑compatible): - Proxy-style OpenAI-compatible routes also skip native OpenAI-only request shaping: no `service_tier`, no Responses `store`, no Completions `store`, no prompt-cache hints, no OpenAI reasoning-compat payload shaping, and no hidden OpenClaw attribution headers. - For OpenAI-compatible Completions proxies that need vendor-specific fields, set `agents.defaults.models["provider/model"].params.extra_body` (or `extraBody`) to merge extra JSON into the outbound request body. - For vLLM chat-template controls, set `agents.defaults.models["provider/model"].params.chat_template_kwargs`. OpenClaw automatically sends `enable_thinking: false` and `force_nonempty_content: true` for `vllm/nemotron-3-*` when the session thinking level is off. + - For slow local models or remote LAN/tailnet hosts, set `models.providers..timeoutSeconds`. This extends provider model HTTP request handling, including connect, headers, body streaming, and the total guarded-fetch abort, without increasing the whole agent runtime timeout. - If `baseUrl` is empty/omitted, OpenClaw keeps the default OpenAI behavior (which resolves to `api.openai.com`). - For safety, an explicit `compat.supportsDeveloperRole: true` is still overridden on non-native `openai-completions` endpoints. diff --git a/docs/gateway/local-models.md b/docs/gateway/local-models.md index 0b023a8743a..a8b91ed2ea0 100644 --- a/docs/gateway/local-models.md +++ b/docs/gateway/local-models.md @@ -124,6 +124,7 @@ vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style baseUrl: "http://127.0.0.1:8000/v1", apiKey: "sk-local", api: "openai-responses", + timeoutSeconds: 300, models: [ { id: "my-local-model", @@ -142,6 +143,10 @@ vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style ``` Keep `models.mode: "merge"` so hosted models stay available as fallbacks. +Use `models.providers..timeoutSeconds` for slow local or remote model +servers before raising `agents.defaults.timeoutSeconds`. The provider timeout +applies only to model HTTP requests, including connect, headers, body streaming, +and the total guarded-fetch abort. Behavior note for local/proxied `/v1` backends: diff --git a/docs/providers/vllm.md b/docs/providers/vllm.md index 5ab11cd33e9..8d2970ed929 100644 --- a/docs/providers/vllm.md +++ b/docs/providers/vllm.md @@ -93,6 +93,7 @@ Use explicit config when: apiKey: "${VLLM_API_KEY}", api: "openai-completions", request: { allowPrivateNetwork: true }, + timeoutSeconds: 300, // Optional: extend connect/header/body/request timeout for slow local models models: [ { id: "your-model-id", @@ -179,6 +180,7 @@ Use explicit config when: apiKey: "${VLLM_API_KEY}", api: "openai-completions", request: { allowPrivateNetwork: true }, + timeoutSeconds: 300, models: [ { id: "my-custom-model", @@ -201,6 +203,34 @@ Use explicit config when: ## Troubleshooting + + For large local models, remote LAN hosts, or tailnet links, set a + provider-scoped request timeout: + + ```json5 + { + models: { + providers: { + vllm: { + baseUrl: "http://192.168.1.50:8000/v1", + apiKey: "${VLLM_API_KEY}", + api: "openai-completions", + request: { allowPrivateNetwork: true }, + timeoutSeconds: 300, + models: [{ id: "your-model-id", name: "Local vLLM Model" }], + }, + }, + }, + } + ``` + + `timeoutSeconds` applies to vLLM model HTTP requests only, including + connection setup, response headers, body streaming, and the total + guarded-fetch abort. Prefer this before increasing + `agents.defaults.timeoutSeconds`, which controls the whole agent run. + + + Check that the vLLM server is running and accessible: