Files
openclaw/docs/providers/inferrs.md
Vincent Koc 813fe0a3be docs(providers): rewrite Tencent, Mistral, and Inferrs with code-verified setup
Tencent (docs/providers/tencent.md): rewrote against
extensions/tencent/openclaw.plugin.json. Removed the duplicate
'# Tencent Cloud TokenHub' H1 (Mintlify renders title from frontmatter;
the in-body H1 produces a brittle anchor). Added a properties summary
with onboarding flag and direct CLI flag. Promoted the Quick Start
auth step into a CodeGroup covering onboarding/direct/env. Added a
tiered-pricing table sourced from the bundled cost.tieredPricing
metadata so cost expectations are visible alongside the catalog.
Replaced trailing bullet list under 'Related documentation' with a
CardGroup pointing at TokenHub product/model-card pages.

Mistral (docs/providers/mistral.md): replaced the 3-bullet provider
header with a properties table that surfaces all four contracts the
plugin registers (chat completions via the model catalog, media
understanding Voxtral batch, voice-call streaming Voxtral Realtime, and
memory embeddings via mistral-embed) plus the onboarding flag and
direct CLI flag. Removed a stray 'Z.AI uses Bearer auth' line that had
leaked into the 'Auth and base URL' accordion from a copy-paste
elsewhere; replaced it with a Mistral-correct base-URL override note.
Verified the seven-row LLM catalog matches plugin manifest model ids.

Inferrs (docs/providers/inferrs.md): added a properties table
explicitly stating that inferrs is NOT a bundled plugin and is
configured under models.providers.inferrs (no onboarding choice flag,
no extension package). Cross-linked SGLang and vLLM as bundled
self-hosted alternatives so readers picking a local backend can
compare.
2026-05-05 17:24:53 -07:00

7.0 KiB

summary, read_when, title
summary read_when title
Run OpenClaw through inferrs (OpenAI-compatible local server)
You want to run OpenClaw against a local inferrs server
You are serving Gemma or another model through inferrs
You need the exact OpenClaw compat flags for inferrs
Inferrs

inferrs can serve local models behind an OpenAI-compatible /v1 API. OpenClaw works with inferrs through the generic openai-completions path.

Property Value
Provider id inferrs (custom; configure under models.providers.inferrs)
Plugin none — inferrs is not a bundled OpenClaw provider plugin
Auth env var Optional. Any value works if your inferrs server has no auth
API OpenAI-compatible (openai-completions)
Suggested base URL http://127.0.0.1:8080/v1 (or wherever your inferrs server lives)
`inferrs` is currently best treated as a custom self-hosted OpenAI-compatible backend, not a dedicated OpenClaw provider plugin. You configure it through `models.providers.inferrs` rather than an onboarding choice flag. If you need a true bundled plugin with auto-discovery, see [SGLang](/providers/sglang) or [vLLM](/providers/vllm).

Getting started

```bash inferrs serve google/gemma-4-E2B-it \ --host 127.0.0.1 \ --port 8080 \ --device metal ``` ```bash curl http://127.0.0.1:8080/health curl http://127.0.0.1:8080/v1/models ``` Add an explicit provider entry and point your default model at it. See the full config example below.

Full config example

This example uses Gemma 4 on a local inferrs server.

{
  agents: {
    defaults: {
      model: { primary: "inferrs/google/gemma-4-E2B-it" },
      models: {
        "inferrs/google/gemma-4-E2B-it": {
          alias: "Gemma 4 (inferrs)",
        },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

Advanced configuration

Some `inferrs` Chat Completions routes accept only string `messages[].content`, not structured content-part arrays.
<Warning>
If OpenClaw runs fail with an error like:

```text
messages[1].content: invalid type: sequence, expected a string
```

set `compat.requiresStringContent: true` in your model entry.
</Warning>

```json5
compat: {
  requiresStringContent: true
}
```

OpenClaw will flatten pure text content parts into plain strings before sending
the request.
Some current `inferrs` + Gemma combinations accept small direct `/v1/chat/completions` requests but still fail on full OpenClaw agent-runtime turns.
If that happens, try this first:

```json5
compat: {
  requiresStringContent: true,
  supportsTools: false
}
```

That disables OpenClaw's tool schema surface for the model and can reduce prompt
pressure on stricter local backends.

If tiny direct requests still work but normal OpenClaw agent turns continue to
crash inside `inferrs`, the remaining issue is usually upstream model/server
behavior rather than OpenClaw's transport layer.
Once configured, test both layers:
```bash
curl http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
```

```bash
openclaw infer model run \
  --model inferrs/google/gemma-4-E2B-it \
  --prompt "What is 2 + 2? Reply with one short sentence." \
  --json
```

If the first command works but the second fails, check the troubleshooting section below.
`inferrs` is treated as a proxy-style OpenAI-compatible `/v1` backend, not a native OpenAI endpoint.
- Native OpenAI-only request shaping does not apply here
- No `service_tier`, no Responses `store`, no prompt-cache hints, and no
  OpenAI reasoning-compat payload shaping
- Hidden OpenClaw attribution headers (`originator`, `version`, `User-Agent`)
  are not injected on custom `inferrs` base URLs

Troubleshooting

`inferrs` is not running, not reachable, or not bound to the expected host/port. Make sure the server is started and listening on the address you configured. Set `compat.requiresStringContent: true` in the model entry. See the `requiresStringContent` section above for details. Try setting `compat.supportsTools: false` to disable the tool schema surface. See the Gemma tool-schema caveat above. If OpenClaw no longer gets schema errors but `inferrs` still crashes on larger agent turns, treat it as an upstream `inferrs` or model limitation. Reduce prompt pressure or switch to a different local backend or model. For general help, see [Troubleshooting](/help/troubleshooting) and [FAQ](/help/faq). Running OpenClaw against local model servers. Debugging local OpenAI-compatible backends that pass probes but fail agent runs. Overview of all providers, model refs, and failover behavior.