refactor(vllm): own qwen thinking payloads

This commit is contained in:
Peter Steinberger
2026-04-27 11:47:54 +01:00
parent 4f7038ae33
commit 836d4b4105
20 changed files with 467 additions and 129 deletions

View File

@@ -169,6 +169,13 @@ Availability can still vary by endpoint and billing plan even when a model is
present in the bundled catalog.
</Note>
## Thinking Controls
For reasoning-enabled Qwen Cloud models, the bundled provider maps OpenClaw
thinking levels to DashScope's top-level `enable_thinking` request flag. Disabled
thinking sends `enable_thinking: false`; other thinking levels send
`enable_thinking: true`.
## Multimodal add-ons
The `qwen` plugin also exposes multimodal capabilities on the **Standard**

View File

@@ -131,7 +131,7 @@ Use explicit config when:
<Accordion title="Qwen thinking controls">
For Qwen models served through vLLM, set
`compat.thinkingFormat: "qwen-chat-template"` on the model entry when the
`params.qwenThinkingFormat: "chat-template"` on the model entry when the
server expects Qwen chat-template kwargs. OpenClaw maps `/think off` to:
```json
@@ -145,8 +145,8 @@ Use explicit config when:
Non-`off` thinking levels send `enable_thinking: true`. If your endpoint
expects DashScope-style top-level flags instead, use
`compat.thinkingFormat: "qwen"` to send `enable_thinking` at the request
root.
`params.qwenThinkingFormat: "top-level"` to send `enable_thinking` at the
request root. Snake-case `params.qwen_thinking_format` is also accepted.
</Accordion>