The image-generation page was 395 lines with a 3-step quick-start
written as plain numbered prose, a sprawling 'OpenAI gpt-image-2'
section that mixed routing/legacy/OpenAI options with five inline
slash-command examples, and provider tables that mixed alphabetic
and recency order.
Restructure for scan-first reading without losing technical content:
- Wrap Quick start in a Steps component (auth -> default model ->
ask the agent), pulling the Codex OAuth note inline with the model
step where it belongs and surfacing the LAN/SSRF caveat as a
Warning callout.
- Alphabetize the Supported providers table (ComfyUI, fal, Google,
LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider
capabilities table (same order across both). Convert the Yes/No
capability table to checkmarks plus exact counts for readability.
- Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose
with a 'Provider deep dives' AccordionGroup so each backend's
routing, legacy URL handling, and provider-specific knobs collapse
by default.
- Move the four provider-selection-order notes into a small
AccordionGroup ('Per-call overrides are exact', 'Auto-detection is
auth-aware', 'Timeouts', 'Inspect at runtime').
- Collapse the five flat slash-command examples into a single Tabs
component (4K landscape / transparent PNG / two-square /
edit-one-ref / edit-multi-ref) with the matching CLI variant inline
on the transparent-PNG tab.
- Sentence-case the Related list (Tools overview, Configuration
reference) and drop the redundant generic introductory wording.
- Add sidebarTitle so the nav reads 'Image generation' explicitly.
Wording, schema fields, defaults, model refs, env vars, and the
detailed OpenAI/OpenRouter/Codex routing rules are unchanged.
The media overview was a 91-line page that opened with a redundant
Title-Case body H1 ('# Media Generation and Understanding'), then
mixed a capability table, a Yes/Yes/Yes provider matrix, dense prose
about async behaviour and STT/Voice Call surfaces, plus duplicate
'Quick links' and 'Related' sections at the end.
Restructure for scan-first reading without losing any content:
- Drop the redundant body H1; lead with a one-paragraph summary.
- Replace the 'Capabilities at a glance' table with a CardGroup of six
entry cards (Image / Video / Music / TTS / Media understanding / STT)
each linking directly to its dedicated page. Mode (sync/async) is
noted on the card so readers see latency expectations up front.
- Convert the provider matrix to checkmarks for readability and align
the column header names. Provider rows already alphabetized.
- Pull async vs synchronous behaviour into a 5-row table that names
why each capability is sync or async, then keep the operator-facing
paragraph that explains task-id handoff.
- Move the long 'Google maps to ... OpenAI maps to ... xAI maps to ...'
paragraph into a per-vendor AccordionGroup so each mapping is a
collapsible panel instead of one large prose block.
- Drop duplicate 'Quick links' section in favour of a single Related
list, sentence-cased to match the rest of the docs.
Default sidebar label fell back to title 'Text-to-speech', which is fine
on the page header but readers scanning the Tools sidebar look for the
acronym 'TTS'. Add a sidebarTitle so Mintlify renders 'Text to speech
(TTS)' in the sidebar while keeping the canonical page title intact.
Sentence case matches the rest of the Tools sidebar group (e.g.
'Image generation', 'Music generation', 'Video generation').
Preserve exact Telegram selected quote text for native quote replies, share Telegram reply parameter construction between bot delivery and direct outbound sends, and retry with legacy replies when Telegram rejects native quote parameters.\n\nThanks @rubencu.
- docs/tools/tts.md: alphabetize providers in three places that listed
them: the supported-providers table (Azure Speech ... Xiaomi MiMo),
the configuration Tabs (12 provider presets in A-Z), and the field
reference AccordionGroup. Top-level fields stay first; provider
tabs/accordions follow strict alphabetical order. Wording, schema,
and defaults unchanged.
- docs/docs.json: add tools/tts to the main Tools sidebar group
(slotted between trajectory and video-generation, matching the
alphabetical neighborhood with image-generation, music-generation,
video-generation). Previously tts only appeared under
Nodes > Media capabilities, which was a discoverability gap for
readers looking for TTS alongside the other generation tools.
The TTS doc had grown to 1008 lines with 11 separate flat 'X primary'
config blocks, a 100-line dense 'Notes on fields' bullet list, and
the new provider-personas feature (#70748) buried near the bottom.
Restructure for readability and feature visibility:
- Lead with a Steps-based 'Quick start' so first-time readers can
enable TTS in 4 explicit steps.
- Replace the 13-bullet provider list with a single 'Supported
providers' table that names auth env vars and per-provider notes
inline. Add a Warning callout for the Microsoft/edge legacy alias.
- Collapse the 11 'X primary' config blocks into one Tabs component
('OpenAI + ElevenLabs', 'Google Gemini', 'Azure Speech',
'Microsoft (no key)', 'MiniMax', 'Inworld', 'xAI', 'Volcengine',
'Xiaomi MiMo', 'OpenRouter', 'Gradium', 'Local CLI') so users see
one preset at a time and the page is scannable.
- Promote 'Personas' to its own top-level section with two examples
(minimal and the Alfred provider-neutral persona), and add a new
'How providers use persona prompts' AccordionGroup covering Google
(promptTemplate audio-profile-v1, personaPrompt), OpenAI
(instructions auto-mapping), and Other providers, plus a fallback
policy table.
- Note that agents.list[].tts.persona overrides global persona
per-agent (covers the recent feat(tts) per-agent voice-override
work).
- Convert the 100-line 'Notes on fields' wall into a per-provider
AccordionGroup using ParamField, so the field reference is
scannable and field types/defaults are visually distinct.
- Sentence-case headings, drop redundant body H1, fold the flow
diagram inline with Auto-TTS behavior, and refresh the Output
formats section to a table-first layout.
- Schema fields (label/description/provider/fallbackPolicy/prompt
with profile/scene/sampleContext/style/accent/pacing/constraints
and providers map) verified against src/config/types.tts.ts; all
defaults and env-var fallbacks preserved verbatim.
Net diff: 585 insertions, 684 deletions across the same surface
area.
Two recent commits added user-facing surface that left signature-style
references in docs stale:
- 4428661779 Alvin Tang (#20721, thanks @alvinttang) extends the
configured model 'input' modality set to also accept 'audio' and
'video', matching what providers like LM Studio already report.
docs/plugins/manifest.md model-fields table listed only
'text | image | document', so add 'audio' and 'video'.
- 44da034516 Vincent (thanks @oc-factus) adds a bounded openclaw.agent
attribute on the openclaw.tokens counter so per-agent dashboards can
group usage. docs/gateway/opentelemetry.md metric reference omitted
it; add it to the attrs list.
Honor the parent `models auth --agent <id>` flag across auth write commands: `add`, `login`, `setup-token`, `paste-token`, and `login-github-copilot`.
The auth helpers now resolve the requested configured agent before choosing the auth-profile store and provider workspace, while preserving default-agent behavior when `--agent` is omitted.
Validation:
- `pnpm test src/cli/models-cli.test.ts src/commands/models/auth.test.ts`
- `pnpm test src/commands/models/auth.test.ts`
- `pnpm docs:check-mdx`
- `pnpm check:changed`
- `pnpm check`
- `pnpm build`
- `pnpm test src/cli/run-main.test.ts`
Full `pnpm test` was also run; it failed in unrelated `src/cli/run-main.test.ts` assertions during the full-suite order, while the exact file passes on both latest main and this branch. The PR diff only touches models auth CLI/auth files, docs, and changelog.
Fixes#71864.
Thanks @neeravmakwana.