mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 14:50:45 +00:00
fix(cli): streamline local model probes
This commit is contained in:
@@ -130,7 +130,8 @@ This table maps common inference tasks to the corresponding infer command.
|
||||
- Stateless execution commands default to local.
|
||||
- Gateway-managed state commands default to gateway.
|
||||
- The normal local path does not require the gateway to be running.
|
||||
- `model run` is one-shot. MCP servers opened through the agent runtime for that command are retired after the reply for both local and `--gateway` execution, so repeated scripted invocations do not keep stdio MCP child processes alive.
|
||||
- Local `model run` is a lean one-shot provider completion. It resolves the configured agent model and auth, but does not start a chat-agent turn, load tools, or open bundled MCP servers.
|
||||
- `model run --gateway` still uses the Gateway agent runtime so it can exercise the same routed runtime path as a normal Gateway-backed turn. MCP servers opened through that runtime are retired after the reply, so repeated scripted invocations do not keep stdio MCP child processes alive.
|
||||
|
||||
## Model
|
||||
|
||||
@@ -143,10 +144,22 @@ openclaw infer model providers --json
|
||||
openclaw infer model inspect --name gpt-5.5 --json
|
||||
```
|
||||
|
||||
Use full `<provider/model>` refs to smoke-test a specific provider without
|
||||
starting the Gateway or loading the full agent tool surface:
|
||||
|
||||
```bash
|
||||
openclaw infer model run --local --model anthropic/claude-sonnet-4-6 --prompt "Reply with exactly: pong" --json
|
||||
openclaw infer model run --local --model cerebras/zai-glm-4.7 --prompt "Reply with exactly: pong" --json
|
||||
openclaw infer model run --local --model google/gemini-2.5-flash --prompt "Reply with exactly: pong" --json
|
||||
openclaw infer model run --local --model groq/llama-3.1-8b-instant --prompt "Reply with exactly: pong" --json
|
||||
openclaw infer model run --local --model mistral/mistral-small-latest --prompt "Reply with exactly: pong" --json
|
||||
openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: pong" --json
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
|
||||
- Because `model run` is intended for headless automation, it does not retain per-session bundled MCP runtimes after the command finishes.
|
||||
- Local `model run` is the narrowest CLI smoke for provider/model/auth health because it sends only the supplied prompt to the selected model.
|
||||
- Use `model run --gateway` when you need to test Gateway routing, agent-runtime setup, or Gateway-managed provider state instead of the lean local completion path.
|
||||
- `model auth login`, `model auth logout`, and `model auth status` manage saved provider auth state.
|
||||
|
||||
## Image
|
||||
|
||||
@@ -239,14 +239,20 @@ Compatibility notes for stricter OpenAI-compatible backends:
|
||||
```
|
||||
|
||||
- Some smaller or stricter local backends are unstable with OpenClaw's full
|
||||
agent-runtime prompt shape, especially when tool schemas are included. If the
|
||||
backend works for tiny direct `/v1/chat/completions` calls but fails on normal
|
||||
OpenClaw agent turns, first try
|
||||
agent-runtime prompt shape, especially when tool schemas are included. First
|
||||
verify the provider path with the lean local probe:
|
||||
|
||||
```bash
|
||||
openclaw infer model run --local --model <provider/model> --prompt "Reply with exactly: pong" --json
|
||||
```
|
||||
|
||||
If that succeeds but normal OpenClaw agent turns fail, first try
|
||||
`agents.defaults.experimental.localModelLean: true` to drop heavyweight
|
||||
default tools like `browser`, `cron`, and `message`; this is an experimental
|
||||
flag, not a stable default-mode setting. See
|
||||
[Experimental Features](/concepts/experimental-features). If that still fails, try
|
||||
`models.providers.<provider>.models[].compat.supportsTools: false`.
|
||||
|
||||
- If the backend still fails only on larger OpenClaw runs, the remaining issue
|
||||
is usually upstream model/server capacity or a backend bug, not OpenClaw's
|
||||
transport layer.
|
||||
@@ -264,10 +270,11 @@ Compatibility notes for stricter OpenAI-compatible backends:
|
||||
- Context errors? Lower `contextWindow` or raise your server limit.
|
||||
- OpenAI-compatible server returns `messages[].content ... expected a string`?
|
||||
Add `compat.requiresStringContent: true` on that model entry.
|
||||
- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run`
|
||||
fails on Gemma or another local model? Disable tool schemas first with
|
||||
`compat.supportsTools: false`, then retest. If the server still crashes only
|
||||
on larger OpenClaw prompts, treat it as an upstream server/model limitation.
|
||||
- Direct tiny `/v1/chat/completions` calls work, but `openclaw infer model run --local`
|
||||
fails on Gemma or another local model? Check the provider URL, model ref, auth
|
||||
marker, and server logs first; local `model run` does not include agent tools.
|
||||
If local `model run` succeeds but larger agent turns fail, reduce the agent
|
||||
tool surface with `localModelLean` or `compat.supportsTools: false`.
|
||||
- Tool calls show up as raw JSON/XML/ReAct text, or the provider returns an
|
||||
empty `tool_calls` array? Do not add a proxy that blindly converts assistant
|
||||
text into tool execution. Fix the server chat template/parser first. If the
|
||||
|
||||
@@ -185,7 +185,7 @@ When you set `OLLAMA_API_KEY` (or an auth profile) and **do not** define `models
|
||||
| Token limits | Sets `maxTokens` to the default Ollama max-token cap used by OpenClaw |
|
||||
| Costs | Sets all costs to `0` |
|
||||
|
||||
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance.
|
||||
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance. You can use a full ref such as `ollama/<pulled-model>:latest` in local `infer model run`; OpenClaw resolves that installed model from Ollama's live catalog without requiring a hand-written `models.json` entry.
|
||||
|
||||
```bash
|
||||
# See what models are available
|
||||
@@ -193,6 +193,31 @@ ollama list
|
||||
openclaw models list
|
||||
```
|
||||
|
||||
For a narrow text-generation smoke test that avoids the full agent tool surface,
|
||||
use local `infer model run` with a full Ollama model ref:
|
||||
|
||||
```bash
|
||||
OLLAMA_API_KEY=ollama-local \
|
||||
openclaw infer model run \
|
||||
--local \
|
||||
--model ollama/llama3.2:latest \
|
||||
--prompt "Reply with exactly: pong" \
|
||||
--json
|
||||
```
|
||||
|
||||
That path still uses OpenClaw's configured provider, auth, and native Ollama
|
||||
transport, but it does not start a chat-agent turn or load MCP/tool context. If
|
||||
this succeeds while normal agent replies fail, troubleshoot the model's agent
|
||||
prompt/tool capacity next.
|
||||
|
||||
Live-verify the local text path, native stream path, and embeddings against
|
||||
local Ollama with:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_OLLAMA=1 OPENCLAW_LIVE_OLLAMA_WEB_SEARCH=0 \
|
||||
pnpm test:live -- extensions/ollama/ollama.live.test.ts
|
||||
```
|
||||
|
||||
To add a new model, simply pull it with Ollama:
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user