Files
openclaw/docs/cli/capability.md
2026-04-07 07:56:03 -05:00

5.6 KiB

summary, read_when, title
summary read_when title
Infer-first CLI for provider-backed model, image, audio, TTS, video, web, and embedding workflows
Adding or modifying `openclaw infer` commands
Designing stable headless capability automation
Inference CLI

Inference CLI

openclaw infer is the canonical headless surface for provider-backed inference workflows.

It intentionally exposes capability families, not raw gateway RPC names and not raw agent tool ids.

Common tasks

This table maps common inference tasks to the corresponding infer command.

If the user wants to... Use this command
run a text/model prompt openclaw infer model run --prompt "..." --json
list configured model providers openclaw infer model providers --json
generate an image openclaw infer image generate --prompt "..." --json
describe an image file openclaw infer image describe --file ./image.png --json
transcribe audio openclaw infer audio transcribe --file ./memo.m4a --json
synthesize speech openclaw infer tts convert --text "..." --output ./speech.mp3 --json
generate a video openclaw infer video generate --prompt "..." --json
describe a video file openclaw infer video describe --file ./clip.mp4 --json
search the web openclaw infer web search --query "..." --json
fetch a web page openclaw infer web fetch --url https://example.com --json
create embeddings openclaw infer embedding create --text "..." --json

Command tree

 openclaw infer
  list
  inspect

  model
    run
    list
    inspect
    providers
    auth login
    auth logout
    auth status

  image
    generate
    edit
    describe
    describe-many
    providers

  audio
    transcribe
    providers

  tts
    convert
    voices
    providers
    status
    enable
    disable
    set-provider

  video
    generate
    describe
    providers

  web
    search
    fetch
    providers

  embedding
    create
    providers

Examples

These examples show the standard command shape across the infer surface.

openclaw infer list --json
openclaw infer inspect --name image.generate --json
openclaw infer model run --prompt "Reply with exactly: smoke-ok" --json
openclaw infer model providers --json
openclaw infer image generate --prompt "friendly lobster illustration" --json
openclaw infer image describe --file ./photo.jpg --json
openclaw infer audio transcribe --file ./memo.m4a --json
openclaw infer tts convert --text "hello from openclaw" --output ./hello.mp3 --json
openclaw infer video generate --prompt "cinematic sunset over the ocean" --json
openclaw infer video describe --file ./clip.mp4 --json
openclaw infer web search --query "OpenClaw docs" --json
openclaw infer embedding create --text "friendly lobster" --json

Additional examples

openclaw infer audio transcribe --file ./team-sync.m4a --language en --prompt "Focus on names and action items" --json
openclaw infer image describe --file ./ui-screenshot.png --model openai/gpt-4.1-mini --json
openclaw infer tts convert --text "Your build is complete" --output ./build-complete.mp3 --json
openclaw infer web search --query "OpenClaw docs infer web providers" --json
openclaw infer embedding create --text "customer support ticket: delayed shipment" --model openai/text-embedding-3-large --json

Transport

Supported transport flags:

  • --local
  • --gateway

Default transport is implicit auto at the command-family level:

  • Stateless execution commands default to local.
  • Gateway-managed state commands default to gateway.

Examples:

openclaw infer model run --prompt "hello" --json
openclaw infer image generate --prompt "friendly lobster" --json
openclaw infer tts status --json
openclaw infer embedding create --text "hello world" --json

Usage notes

  • openclaw infer ... is the primary CLI surface for these workflows.
  • Use --json when the output will be consumed by another command or script.
  • Use --provider or --model provider/model when a specific backend is required.
  • For image describe, audio transcribe, and video describe, --model must use the form <provider/model>.
  • The normal local path does not require the gateway to be running.

JSON output

Capability commands normalize JSON output under a shared envelope:

{
  "ok": true,
  "capability": "image.generate",
  "transport": "local",
  "provider": "openai",
  "model": "gpt-image-1",
  "attempts": [],
  "outputs": []
}

Top-level fields are stable:

  • ok
  • capability
  • transport
  • provider
  • model
  • attempts
  • outputs
  • error

Common pitfalls

# Bad
openclaw infer media image generate --prompt "friendly lobster"

# Good
openclaw infer image generate --prompt "friendly lobster"
# Bad
openclaw infer audio transcribe --file ./memo.m4a --model whisper-1 --json

# Good
openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json

Notes

  • model run reuses the agent runtime so provider/model overrides behave like normal agent execution.
  • tts status defaults to gateway because it reflects gateway-managed TTS state.
  • openclaw capability ... is an alias for openclaw infer ....