docs: rename and improve infer docs

2026-04-17 20:21:13 +00:00 · 2026-04-07 09:42:25 -05:00
parent c2995bc470
commit ac6693986b
5 changed files with 286 additions and 193 deletions
--- a/docs/cli/capability.md
+++ b/docs/cli/capability.md
@@ -1,191 +0,0 @@
---
-summary: "Infer-first CLI for provider-backed model, image, audio, TTS, video, web, and embedding workflows"
-read_when:
-  - Adding or modifying `openclaw infer` commands
-  - Designing stable headless capability automation
-title: "Inference CLI"
---
-
-# Inference CLI
-
-`openclaw infer` is the canonical headless surface for provider-backed inference workflows.
-
-It intentionally exposes capability families, not raw gateway RPC names and not raw agent tool ids.
-
-## Common tasks
-
-This table maps common inference tasks to the corresponding infer command.
-
-| If the user wants to...         | Use this command                                                       |
-| ------------------------------- | ---------------------------------------------------------------------- |
-| run a text/model prompt         | `openclaw infer model run --prompt "..." --json`                       |
-| list configured model providers | `openclaw infer model providers --json`                                |
-| generate an image               | `openclaw infer image generate --prompt "..." --json`                  |
-| describe an image file          | `openclaw infer image describe --file ./image.png --json`              |
-| transcribe audio                | `openclaw infer audio transcribe --file ./memo.m4a --json`             |
-| synthesize speech               | `openclaw infer tts convert --text "..." --output ./speech.mp3 --json` |
-| generate a video                | `openclaw infer video generate --prompt "..." --json`                  |
-| describe a video file           | `openclaw infer video describe --file ./clip.mp4 --json`               |
-| search the web                  | `openclaw infer web search --query "..." --json`                       |
-| fetch a web page                | `openclaw infer web fetch --url https://example.com --json`            |
-| create embeddings               | `openclaw infer embedding create --text "..." --json`                  |
-
-## Command tree
-
-```text
- openclaw infer
-  list
-  inspect
-
-  model
-    run
-    list
-    inspect
-    providers
-    auth login
-    auth logout
-    auth status
-
-  image
-    generate
-    edit
-    describe
-    describe-many
-    providers
-
-  audio
-    transcribe
-    providers
-
-  tts
-    convert
-    voices
-    providers
-    status
-    enable
-    disable
-    set-provider
-
-  video
-    generate
-    describe
-    providers
-
-  web
-    search
-    fetch
-    providers
-
-  embedding
-    create
-    providers
-```
-
-## Examples
-
-These examples show the standard command shape across the infer surface.
-
-```bash
-openclaw infer list --json
-openclaw infer inspect --name image.generate --json
-openclaw infer model run --prompt "Reply with exactly: smoke-ok" --json
-openclaw infer model providers --json
-openclaw infer image generate --prompt "friendly lobster illustration" --json
-openclaw infer image describe --file ./photo.jpg --json
-openclaw infer audio transcribe --file ./memo.m4a --json
-openclaw infer tts convert --text "hello from openclaw" --output ./hello.mp3 --json
-openclaw infer video generate --prompt "cinematic sunset over the ocean" --json
-openclaw infer video describe --file ./clip.mp4 --json
-openclaw infer web search --query "OpenClaw docs" --json
-openclaw infer embedding create --text "friendly lobster" --json
-```
-
-## Additional examples
-
-```bash
-openclaw infer audio transcribe --file ./team-sync.m4a --language en --prompt "Focus on names and action items" --json
-openclaw infer image describe --file ./ui-screenshot.png --model openai/gpt-4.1-mini --json
-openclaw infer tts convert --text "Your build is complete" --output ./build-complete.mp3 --json
-openclaw infer web search --query "OpenClaw docs infer web providers" --json
-openclaw infer embedding create --text "customer support ticket: delayed shipment" --model openai/text-embedding-3-large --json
-```
-
-## Transport
-
-Supported transport flags:
-
- `--local`
- `--gateway`
-
-Default transport is implicit auto at the command-family level:
-
- Stateless execution commands default to local.
- Gateway-managed state commands default to gateway.
-
-Examples:
-
-```bash
-openclaw infer model run --prompt "hello" --json
-openclaw infer image generate --prompt "friendly lobster" --json
-openclaw infer tts status --json
-openclaw infer embedding create --text "hello world" --json
-```
-
-## Usage notes
-
- `openclaw infer ...` is the primary CLI surface for these workflows.
- Use `--json` when the output will be consumed by another command or script.
- Use `--provider` or `--model provider/model` when a specific backend is required.
- For `image describe`, `audio transcribe`, and `video describe`, `--model` must use the form `<provider/model>`.
- The normal local path does not require the gateway to be running.
-
-## JSON output
-
-Capability commands normalize JSON output under a shared envelope:
-
-```json
-{
-  "ok": true,
-  "capability": "image.generate",
-  "transport": "local",
-  "provider": "openai",
-  "model": "gpt-image-1",
-  "attempts": [],
-  "outputs": []
-}
-```
-
-Top-level fields are stable:
-
- `ok`
- `capability`
- `transport`
- `provider`
- `model`
- `attempts`
- `outputs`
- `error`
-
-## Common pitfalls
-
-```bash
-# Bad
-openclaw infer media image generate --prompt "friendly lobster"
-
-# Good
-openclaw infer image generate --prompt "friendly lobster"
-```
-
-```bash
-# Bad
-openclaw infer audio transcribe --file ./memo.m4a --model whisper-1 --json
-
-# Good
-openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json
-```
-
-## Notes
-
- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
- `tts status` defaults to gateway because it reflects gateway-managed TTS state.
- `openclaw capability ...` is an alias for `openclaw infer ...`.
--- a/docs/cli/index.md
+++ b/docs/cli/index.md
@@ -35,7 +35,7 @@ This page describes the current CLI behavior. If commands change, update this do
 - [`logs`](/cli/logs)
 - [`system`](/cli/system)
 - [`models`](/cli/models)
- [`infer`](/cli/capability)
+- [`infer`](/cli/infer)
 - [`memory`](/cli/memory)
 - [`directory`](/cli/directory)
 - [`nodes`](/cli/nodes)
--- a/docs/cli/infer.md
+++ b/docs/cli/infer.md
@@ -0,0 +1,280 @@
+---
+summary: "Infer-first CLI for provider-backed model, image, audio, TTS, video, web, and embedding workflows"
+read_when:
+  - Adding or modifying `openclaw infer` commands
+  - Designing stable headless capability automation
+title: "Inference CLI"
+---
+
+# Inference CLI
+
+`openclaw infer` is the canonical headless surface for provider-backed inference workflows.
+
+It intentionally exposes capability families, not raw gateway RPC names and not raw agent tool ids.
+
+## Turn infer into a skill
+
+Copy and paste this to an agent:
+
+```text
+Read https://docs.openclaw.ai/cli/infer, then create a skill that routes my common workflows to `openclaw infer`.
+Focus on model runs, image generation, video generation, audio transcription, TTS, web search, and embeddings.
+```
+
+A good infer-based skill should:
+
+- map common user intents to the correct infer subcommand
+- include a few canonical infer examples for the workflows it covers
+- prefer `openclaw infer ...` in examples and suggestions
+- avoid re-documenting the entire infer surface inside the skill body
+
+Typical infer-focused skill coverage:
+
+- `openclaw infer model run`
+- `openclaw infer image generate`
+- `openclaw infer audio transcribe`
+- `openclaw infer tts convert`
+- `openclaw infer web search`
+- `openclaw infer embedding create`
+
+## Why use infer
+
+`openclaw infer` provides one consistent CLI for provider-backed inference tasks inside OpenClaw.
+
+Benefits:
+
+- Use the providers and models already configured in OpenClaw instead of wiring up one-off wrappers for each backend.
+- Keep model, image, audio transcription, TTS, video, web, and embedding workflows under one command tree.
+- Use a stable `--json` output shape for scripts, automation, and agent-driven workflows.
+- Prefer a first-party OpenClaw surface when the task is fundamentally "run inference."
+- Use the normal local path without requiring the gateway for most infer commands.
+
+## Command tree
+
+```text
+ openclaw infer
+  list
+  inspect
+
+  model
+    run
+    list
+    inspect
+    providers
+    auth login
+    auth logout
+    auth status
+
+  image
+    generate
+    edit
+    describe
+    describe-many
+    providers
+
+  audio
+    transcribe
+    providers
+
+  tts
+    convert
+    voices
+    providers
+    status
+    enable
+    disable
+    set-provider
+
+  video
+    generate
+    describe
+    providers
+
+  web
+    search
+    fetch
+    providers
+
+  embedding
+    create
+    providers
+```
+
+## Common tasks
+
+This table maps common inference tasks to the corresponding infer command.
+
+| Task                    | Command                                                                | Notes                                                |
+| ----------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------- |
+| Run a text/model prompt | `openclaw infer model run --prompt "..." --json`                       | Uses the normal local path by default                |
+| Generate an image       | `openclaw infer image generate --prompt "..." --json`                  | Use `image edit` when starting from an existing file |
+| Describe an image file  | `openclaw infer image describe --file ./image.png --json`              | `--model` must be `<provider/model>`                 |
+| Transcribe audio        | `openclaw infer audio transcribe --file ./memo.m4a --json`             | `--model` must be `<provider/model>`                 |
+| Synthesize speech       | `openclaw infer tts convert --text "..." --output ./speech.mp3 --json` | `tts status` is gateway-oriented                     |
+| Generate a video        | `openclaw infer video generate --prompt "..." --json`                  |                                                      |
+| Describe a video file   | `openclaw infer video describe --file ./clip.mp4 --json`               | `--model` must be `<provider/model>`                 |
+| Search the web          | `openclaw infer web search --query "..." --json`                       |                                                      |
+| Fetch a web page        | `openclaw infer web fetch --url https://example.com --json`            |                                                      |
+| Create embeddings       | `openclaw infer embedding create --text "..." --json`                  |                                                      |
+
+## Behavior
+
+- `openclaw infer ...` is the primary CLI surface for these workflows.
+- Use `--json` when the output will be consumed by another command or script.
+- Use `--provider` or `--model provider/model` when a specific backend is required.
+- For `image describe`, `audio transcribe`, and `video describe`, `--model` must use the form `<provider/model>`.
+- Stateless execution commands default to local.
+- Gateway-managed state commands default to gateway.
+- The normal local path does not require the gateway to be running.
+
+## Model
+
+Use `model` for provider-backed text inference and model/provider inspection.
+
+```bash
+openclaw infer model run --prompt "Reply with exactly: smoke-ok" --json
+openclaw infer model run --prompt "Summarize this changelog entry" --provider openai --json
+openclaw infer model providers --json
+openclaw infer model inspect --name gpt-5.4 --json
+```
+
+Notes:
+
+- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
+- `model auth login`, `model auth logout`, and `model auth status` manage saved provider auth state.
+
+## Image
+
+Use `image` for generation, edit, and description.
+
+```bash
+openclaw infer image generate --prompt "friendly lobster illustration" --json
+openclaw infer image generate --prompt "cinematic product photo of headphones" --json
+openclaw infer image describe --file ./photo.jpg --json
+openclaw infer image describe --file ./ui-screenshot.png --model openai/gpt-4.1-mini --json
+```
+
+Notes:
+
+- Use `image edit` when starting from existing input files.
+- For `image describe`, `--model` must be `<provider/model>`.
+
+## Audio
+
+Use `audio` for file transcription.
+
+```bash
+openclaw infer audio transcribe --file ./memo.m4a --json
+openclaw infer audio transcribe --file ./team-sync.m4a --language en --prompt "Focus on names and action items" --json
+openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json
+```
+
+Notes:
+
+- `audio transcribe` is for file transcription, not realtime session management.
+- `--model` must be `<provider/model>`.
+
+## TTS
+
+Use `tts` for speech synthesis and TTS provider state.
+
+```bash
+openclaw infer tts convert --text "hello from openclaw" --output ./hello.mp3 --json
+openclaw infer tts convert --text "Your build is complete" --output ./build-complete.mp3 --json
+openclaw infer tts providers --json
+openclaw infer tts status --json
+```
+
+Notes:
+
+- `tts status` defaults to gateway because it reflects gateway-managed TTS state.
+- Use `tts providers`, `tts voices`, and `tts set-provider` to inspect and configure TTS behavior.
+
+## Video
+
+Use `video` for generation and description.
+
+```bash
+openclaw infer video generate --prompt "cinematic sunset over the ocean" --json
+openclaw infer video generate --prompt "slow drone shot over a forest lake" --json
+openclaw infer video describe --file ./clip.mp4 --json
+openclaw infer video describe --file ./clip.mp4 --model openai/gpt-4.1-mini --json
+```
+
+Notes:
+
+- `--model` must be `<provider/model>` for `video describe`.
+
+## Web
+
+Use `web` for search and fetch workflows.
+
+```bash
+openclaw infer web search --query "OpenClaw docs" --json
+openclaw infer web search --query "OpenClaw infer web providers" --json
+openclaw infer web fetch --url https://docs.openclaw.ai/cli/infer --json
+openclaw infer web providers --json
+```
+
+Notes:
+
+- Use `web providers` to inspect available, configured, and selected providers.
+
+## Embedding
+
+Use `embedding` for vector creation and embedding provider inspection.
+
+```bash
+openclaw infer embedding create --text "friendly lobster" --json
+openclaw infer embedding create --text "customer support ticket: delayed shipment" --model openai/text-embedding-3-large --json
+openclaw infer embedding providers --json
+```
+
+## JSON output
+
+Infer commands normalize JSON output under a shared envelope:
+
+```json
+{
+  "ok": true,
+  "capability": "image.generate",
+  "transport": "local",
+  "provider": "openai",
+  "model": "gpt-image-1",
+  "attempts": [],
+  "outputs": []
+}
+```
+
+Top-level fields are stable:
+
+- `ok`
+- `capability`
+- `transport`
+- `provider`
+- `model`
+- `attempts`
+- `outputs`
+- `error`
+
+## Common pitfalls
+
+```bash
+# Bad
+openclaw infer media image generate --prompt "friendly lobster"
+
+# Good
+openclaw infer image generate --prompt "friendly lobster"
+```
+
+```bash
+# Bad
+openclaw infer audio transcribe --file ./memo.m4a --model whisper-1 --json
+
+# Good
+openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json
+```
+
+## Notes
+
+- `openclaw capability ...` is an alias for `openclaw infer ...`.
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -76,6 +76,10 @@
      "source": "/plugins/agent-tools",
      "destination": "/plugins/building-plugins#registering-agent-tools"
    },
+    {
+      "source": "/cli/capability",
+      "destination": "/cli/infer"
+    },
    {
      "source": "/tools/capability-cookbook",
      "destination": "/plugins/architecture"
--- a/src/cli/capability-cli.ts
+++ b/src/cli/capability-cli.ts
@@ -1185,7 +1185,7 @@ export function registerCapabilityCli(program: Command) {
    .addHelpText(
      "after",
      () =>
-        `\n${theme.muted("Docs:")} ${formatDocsLink("/cli/capability", "docs.openclaw.ai/cli/capability")}\n`,
+        `\n${theme.muted("Docs:")} ${formatDocsLink("/cli/infer", "docs.openclaw.ai/cli/infer")}\n`,
    );

  registerCapabilityListAndInspect(capability);