fix(cli): streamline local model probes

2026-05-06 07:10:43 +00:00 · 2026-04-27 23:02:26 +01:00
parent d7dcd0e21e
commit 42dddbbe78
14 changed files with 605 additions and 56 deletions
--- a/docs/cli/infer.md
+++ b/docs/cli/infer.md
@@ -130,7 +130,8 @@ This table maps common inference tasks to the corresponding infer command.
 - Stateless execution commands default to local.
 - Gateway-managed state commands default to gateway.
 - The normal local path does not require the gateway to be running.
- `model run` is one-shot. MCP servers opened through the agent runtime for that command are retired after the reply for both local and `--gateway` execution, so repeated scripted invocations do not keep stdio MCP child processes alive.
+- Local `model run` is a lean one-shot provider completion. It resolves the configured agent model and auth, but does not start a chat-agent turn, load tools, or open bundled MCP servers.
+- `model run --gateway` still uses the Gateway agent runtime so it can exercise the same routed runtime path as a normal Gateway-backed turn. MCP servers opened through that runtime are retired after the reply, so repeated scripted invocations do not keep stdio MCP child processes alive.

 ## Model

@@ -143,10 +144,22 @@ openclaw infer model providers --json
 openclaw infer model inspect --name gpt-5.5 --json
 ```

+Use full `<provider/model>` refs to smoke-test a specific provider without
+starting the Gateway or loading the full agent tool surface:
+
+```bash
+openclaw infer model run --local --model anthropic/claude-sonnet-4-6 --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model cerebras/zai-glm-4.7 --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model google/gemini-2.5-flash --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model groq/llama-3.1-8b-instant --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model mistral/mistral-small-latest --prompt "Reply with exactly: pong" --json
+openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: pong" --json
+```
+
 Notes:

- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
- Because `model run` is intended for headless automation, it does not retain per-session bundled MCP runtimes after the command finishes.
+- Local `model run` is the narrowest CLI smoke for provider/model/auth health because it sends only the supplied prompt to the selected model.
+- Use `model run --gateway` when you need to test Gateway routing, agent-runtime setup, or Gateway-managed provider state instead of the lean local completion path.
 - `model auth login`, `model auth logout`, and `model auth status` manage saved provider auth state.

 ## Image