From 624eaf5d4a7939c7b565257ecc379cdba1114d4b Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Sat, 2 May 2026 14:43:43 -0700 Subject: [PATCH] docs(local-models): add hardware floor + backend picker, restructure stricter backend section as numbered triage flow --- docs/gateway/local-models.md | 56 +++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/docs/gateway/local-models.md b/docs/gateway/local-models.md index 2e6b26cf456..d5bcf482a12 100644 --- a/docs/gateway/local-models.md +++ b/docs/gateway/local-models.md @@ -7,9 +7,22 @@ read_when: title: "Local models" --- -Local is doable, but OpenClaw expects large context + strong defenses against prompt injection. Small cards truncate context and leak safety. Aim high: **≥2 maxed-out Mac Studios or equivalent GPU rig (~$30k+)**. A single **24 GB** GPU works only for lighter prompts with higher latency. Use the **largest / full-size model variant you can run**; aggressively quantized or “small” checkpoints raise prompt-injection risk (see [Security](/gateway/security)). +Local models are doable. They also raise the bar on hardware, context size, and prompt-injection defense — small or aggressively quantized cards truncate context and leak safety. This page is the opinionated guide for higher-end local stacks and custom OpenAI-compatible local servers. For lowest-friction onboarding, start with [LM Studio](/providers/lmstudio) or [Ollama](/providers/ollama) and `openclaw onboard`. -If you want the lowest-friction local setup, start with [LM Studio](/providers/lmstudio) or [Ollama](/providers/ollama) and `openclaw onboard`. This page is the opinionated guide for higher-end local stacks and custom OpenAI-compatible local servers. +## Hardware floor + +Aim high: **≥2 maxed-out Mac Studios or an equivalent GPU rig (~$30k+)** for a comfortable agent loop. A single **24 GB** GPU works only for lighter prompts at higher latency. Always run the **largest / full-size variant you can host**; small or heavily quantized checkpoints raise prompt-injection risk (see [Security](/gateway/security)). + +## Pick a backend + +| Backend | Use when | +| ---------------------------------------------------- | --------------------------------------------------------------------------- | +| [LM Studio](/providers/lmstudio) | First-time local setup, GUI loader, native Responses API | +| [Ollama](/providers/ollama) | CLI workflow, model library, hands-off systemd service | +| MLX / vLLM / SGLang | High-throughput self-hosted serving with an OpenAI-compatible HTTP endpoint | +| LiteLLM / OAI-proxy / custom OpenAI-compatible proxy | You front another model API and need OpenClaw to treat it as OpenAI | + +Use Responses API (`api: "openai-responses"`) when the backend supports it (LM Studio does). Otherwise stick to Chat Completions (`api: "openai-completions"`). **WSL2 + Ollama + NVIDIA/CUDA users:** The official Ollama Linux installer enables a systemd service with `Restart=always`. On WSL2 GPU setups, autostart can reload the last model during boot and pin host memory. If your WSL2 VM repeatedly restarts after enabling Ollama, see [WSL2 crash loop](/providers/ollama#wsl2-crash-loop-repeated-reboots). @@ -279,36 +292,27 @@ Compatibility notes for stricter OpenAI-compatible backends: } ``` -- Some smaller or stricter local backends are unstable with OpenClaw's full - agent-runtime prompt shape, especially when tool schemas are included. First - verify the provider path with the lean local probe: +## Smaller or stricter backends - ```bash - openclaw infer model run --local --model --prompt "Reply with exactly: pong" --json - ``` +If the model loads cleanly but full agent turns misbehave, work top-down — confirm transport first, then narrow the surface. - To verify the Gateway route without the full agent prompt shape, use the - Gateway model probe instead: +1. **Confirm the local model itself responds.** No tools, no agent context: - ```bash - openclaw infer model run --gateway --model --prompt "Reply with exactly: pong" --json - ``` + ```bash + openclaw infer model run --local --model --prompt "Reply with exactly: pong" --json + ``` - Both local and Gateway model probes send only the supplied prompt. The - Gateway probe still validates Gateway routing, auth, and provider selection, - but it intentionally skips prior session transcript, AGENTS/bootstrap context, - context-engine assembly, tools, and bundled MCP servers. +2. **Confirm Gateway routing.** Sends only the supplied prompt — skips transcript, AGENTS bootstrap, context-engine assembly, tools, and bundled MCP servers, but still exercises Gateway routing, auth, and provider selection: - If that succeeds but normal OpenClaw agent turns fail, first try - `agents.defaults.experimental.localModelLean: true` to drop heavyweight - default tools like `browser`, `cron`, and `message`; this is an experimental - flag, not a stable default-mode setting. See - [Experimental Features](/concepts/experimental-features). If that still fails, try - `models.providers..models[].compat.supportsTools: false`. + ```bash + openclaw infer model run --gateway --model --prompt "Reply with exactly: pong" --json + ``` -- If the backend still fails only on larger OpenClaw runs, the remaining issue - is usually upstream model/server capacity or a backend bug, not OpenClaw's - transport layer. +3. **Try lean mode.** If both probes pass but real agent turns fail with malformed tool calls or oversized prompts, enable `agents.defaults.experimental.localModelLean: true`. It drops the three heaviest default tools (`browser`, `cron`, `message`) so the prompt shape is smaller and less brittle. See [Experimental Features → Local model lean mode](/concepts/experimental-features#local-model-lean-mode) for the full explanation, when to use it, and how to confirm it is on. + +4. **Disable tools entirely as a last resort.** If lean mode is not enough, set `models.providers..models[].compat.supportsTools: false` for that model entry. The agent will then operate without tool calls on that model. + +5. **Past that, the bottleneck is upstream.** If the backend still fails only on larger OpenClaw runs after lean mode and `supportsTools: false`, the remaining issue is usually upstream model or server capacity — context window, GPU memory, kv-cache eviction, or a backend bug. It is not OpenClaw's transport layer at that point. ## Troubleshooting