fix(cli): streamline local model probes

This commit is contained in:
Peter Steinberger
2026-04-27 23:02:26 +01:00
parent d7dcd0e21e
commit 42dddbbe78
14 changed files with 605 additions and 56 deletions

View File

@@ -130,7 +130,8 @@ This table maps common inference tasks to the corresponding infer command.
- Stateless execution commands default to local.
- Gateway-managed state commands default to gateway.
- The normal local path does not require the gateway to be running.
- `model run` is one-shot. MCP servers opened through the agent runtime for that command are retired after the reply for both local and `--gateway` execution, so repeated scripted invocations do not keep stdio MCP child processes alive.
- Local `model run` is a lean one-shot provider completion. It resolves the configured agent model and auth, but does not start a chat-agent turn, load tools, or open bundled MCP servers.
- `model run --gateway` still uses the Gateway agent runtime so it can exercise the same routed runtime path as a normal Gateway-backed turn. MCP servers opened through that runtime are retired after the reply, so repeated scripted invocations do not keep stdio MCP child processes alive.
## Model
@@ -143,10 +144,22 @@ openclaw infer model providers --json
openclaw infer model inspect --name gpt-5.5 --json
```
Use full `<provider/model>` refs to smoke-test a specific provider without
starting the Gateway or loading the full agent tool surface:
```bash
openclaw infer model run --local --model anthropic/claude-sonnet-4-6 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model cerebras/zai-glm-4.7 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model google/gemini-2.5-flash --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model groq/llama-3.1-8b-instant --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model mistral/mistral-small-latest --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: pong" --json
```
Notes:
- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
- Because `model run` is intended for headless automation, it does not retain per-session bundled MCP runtimes after the command finishes.
- Local `model run` is the narrowest CLI smoke for provider/model/auth health because it sends only the supplied prompt to the selected model.
- Use `model run --gateway` when you need to test Gateway routing, agent-runtime setup, or Gateway-managed provider state instead of the lean local completion path.
- `model auth login`, `model auth logout`, and `model auth status` manage saved provider auth state.
## Image

View File

@@ -239,14 +239,20 @@ Compatibility notes for stricter OpenAI-compatible backends:
```
- Some smaller or stricter local backends are unstable with OpenClaw's full
agent-runtime prompt shape, especially when tool schemas are included. If the
backend works for tiny direct `/v1/chat/completions` calls but fails on normal
OpenClaw agent turns, first try
agent-runtime prompt shape, especially when tool schemas are included. First
verify the provider path with the lean local probe:
```bash
openclaw infer model run --local --model <provider/model> --prompt "Reply with exactly: pong" --json
```
If that succeeds but normal OpenClaw agent turns fail, first try
`agents.defaults.experimental.localModelLean: true` to drop heavyweight
default tools like `browser`, `cron`, and `message`; this is an experimental
flag, not a stable default-mode setting. See
[Experimental Features](/concepts/experimental-features). If that still fails, try
`models.providers.<provider>.models[].compat.supportsTools: false`.
- If the backend still fails only on larger OpenClaw runs, the remaining issue
is usually upstream model/server capacity or a backend bug, not OpenClaw's
transport layer.
@@ -264,10 +270,11 @@ Compatibility notes for stricter OpenAI-compatible backends:
- Context errors? Lower `contextWindow` or raise your server limit.
- OpenAI-compatible server returns `messages[].content ... expected a string`?
Add `compat.requiresStringContent: true` on that model entry.
- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run`
fails on Gemma or another local model? Disable tool schemas first with
`compat.supportsTools: false`, then retest. If the server still crashes only
on larger OpenClaw prompts, treat it as an upstream server/model limitation.
- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run --local`
fails on Gemma or another local model? Check the provider URL, model ref, auth
marker, and server logs first; local `model run` does not include agent tools.
If local `model run` succeeds but larger agent turns fail, reduce the agent
tool surface with `localModelLean` or `compat.supportsTools: false`.
- Tool calls show up as raw JSON/XML/ReAct text, or the provider returns an
empty `tool_calls` array? Do not add a proxy that blindly converts assistant
text into tool execution. Fix the server chat template/parser first. If the

View File

@@ -185,7 +185,7 @@ When you set `OLLAMA_API_KEY` (or an auth profile) and **do not** define `models
| Token limits | Sets `maxTokens` to the default Ollama max-token cap used by OpenClaw |
| Costs | Sets all costs to `0` |
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance.
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance. You can use a full ref such as `ollama/<pulled-model>:latest` in local `infer model run`; OpenClaw resolves that installed model from Ollama's live catalog without requiring a hand-written `models.json` entry.
```bash
# See what models are available
@@ -193,6 +193,31 @@ ollama list
openclaw models list
```
For a narrow text-generation smoke test that avoids the full agent tool surface,
use local `infer model run` with a full Ollama model ref:
```bash
OLLAMA_API_KEY=ollama-local \
openclaw infer model run \
--local \
--model ollama/llama3.2:latest \
--prompt "Reply with exactly: pong" \
--json
```
That path still uses OpenClaw's configured provider, auth, and native Ollama
transport, but it does not start a chat-agent turn or load MCP/tool context. If
this succeeds while normal agent replies fail, troubleshoot the model's agent
prompt/tool capacity next.
Live-verify the local text path, native stream path, and embeddings against
local Ollama with:
```bash
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA=1 OPENCLAW_LIVE_OLLAMA_WEB_SEARCH=0 \
pnpm test:live -- extensions/ollama/ollama.live.test.ts
```
To add a new model, simply pull it with Ollama:
```bash