The skills doc was 409 lines of nested bullet lists describing
precedence, allowlist rules, gating fields, installer specs, and
config overrides. Heavy reference content but no Mintlify structure.
Restructure for scan-first reading without losing reference detail:
- Convert 'Locations and precedence' from a numbered list + arrow
string into a 6-row precedence table.
- Convert 'Per-agent vs shared skills' bullet/paragraph mix into a
scope/path/visibility table.
- Move agent-allowlist rules into an AccordionGroup so the example
config is the headline and the rules collapse on demand.
- Convert ClawHub install/update/sync bullets into a 3-row command
table.
- Convert SKILL.md frontmatter optional keys (homepage, user-invocable,
disable-model-invocation, command-dispatch, command-tool,
command-arg-mode) into ParamField definitions.
- Convert metadata.openclaw fields (always, emoji, homepage, os,
requires.bins, requires.anyBins, requires.env, requires.config,
primaryEnv, install) into ParamField definitions.
- Move installer-selection rules and per-installer details (Go, download)
into an AccordionGroup so the gating section reads as the canonical
schema plus collapsible operational notes.
- Convert skills.entries config-override rules (enabled, apiKey, env,
config, allowBundled) into ParamField definitions.
- Surface security caveats as a Warning callout up top instead of a
bullet list.
- Move 'Looking for more skills?' into the trailing Related list and
drop the dangling --- separator.
- Sentence-case headings (Format, Gating, Config overrides, Token
impact) and the Related entries (Creating skills, Skills config,
Slash commands).
- Drop the redundant 'Skill Workshop' Title-Case heading variant.
- Add sidebarTitle 'Skills' for explicit nav.
Skill source paths, frontmatter parser rules, gating semantics,
installer selection logic, sandboxing notes, env-injection scope,
snapshot/refresh behaviour, remote-node behaviour, and token-impact
formula are unchanged. Pure restructure plus Mintlify components.
The voice-call plugin doc was 664 lines with a flat install/setup
walkthrough, three flat 'Realtime' / 'Streaming' / 'TTS' provider
config blocks each shown twice, an italicised webhook-security
section in Title Case, and a duplicate-Voice Call body H1.
Restructure for scan-first reading without losing operational detail:
- Wrap Quick start in a Steps component (install -> configure ->
verify -> smoke), with the 'install from npm' vs 'install from
local folder' choice as a nested Tabs.
- Surface the public-webhook-URL constraint as a Warning at the top
of Quick start so readers see it before they hit setup.
- Move provider exposure caveats, streaming connection caps, and
legacy config migration notes into a single AccordionGroup so
the Configuration section reads as the canonical config plus
collapsible operational details.
- Convert the Realtime, Streaming, and TTS provider examples to
Tabs with one tab per provider (Google/OpenAI for realtime;
OpenAI/xAI for streaming; Core/ElevenLabs/OpenAI override for TTS),
removing the previous duplicate-block-per-provider pattern.
- Convert the realtime tool-policy bullet list to a 3-row table.
- Convert the agent tool action list and gateway RPC list into
small tables (action -> args).
- Surface inboundPolicy caller-ID weakness, microsoft-not-supported
for telephony, and realtime+streaming exclusivity as Warning
callouts where they were previously buried inline.
- Sentence-case 'Webhook security' (was Title Case), drop the
duplicate body H1, and refresh the Related list to alphabetical
sentence-case.
Provider names, env vars, defaults, models, voice ids, command
flags, and field semantics are unchanged. Pure restructure plus
Mintlify component upgrades.
The video-generation page was 454 lines with a 3-step Quick start
written as flat numbered prose, four separate parameter tables (Required,
Content inputs, Style controls, Advanced), the task lifecycle as a
numbered list, and a Related list mixing alphabetic and recency order.
Restructure for scan-first reading without losing technical content:
- Wrap Quick start in a Steps component (auth -> default model ->
ask the agent).
- Convert all four parameter tables into ParamField definitions grouped
under their existing sub-section headings (Required / Content inputs /
Style controls / Advanced), so types, defaults, and required flags
show as visual chips and long descriptions wrap cleanly.
- Convert the task lifecycle from a numbered list to a 4-row table for
at-a-glance scanning.
- Convert Yes/No checkmarks in both the Supported providers and
Capability matrix tables to ✓ and em-dash, matching the rest of the
media docs.
- Convert the bullet list under Actions into a 3-row table.
- Sentence-case Related entries and alphabetize the Related list.
- Add sidebarTitle so the nav reads 'Video generation' explicitly.
Schema fields, defaults, model refs, env vars, capability declarations,
fallback rules, and provider notes are unchanged. AccordionGroup of 14
provider notes was already alphabetized and is preserved verbatim.
The music-generation page was 291 lines with two side-by-side
'Quick start' subsections (shared provider-backed vs. ComfyUI
workflow), a flat parameter table, two prose paragraphs explaining
async behaviour and task lifecycle, and a 'Provider notes' bullet
list mixed with a separate 'Choosing the right path' section.
Restructure for scan-first reading without losing technical content:
- Wrap Quick start in a top-level Tabs with two child Steps blocks
(Shared provider-backed | ComfyUI workflow), so readers pick a path
first and only see the matching steps.
- Convert the tool parameter list to ParamField definitions with
type signatures and required flags surfaced visually.
- Convert the four async-behaviour bullets to a labelled bullet list
and the four-state task lifecycle to a table for at-a-glance
scanning.
- Change Capability matrix Yes/No values to checkmarks/em-dashes for
alignment with the rest of the media docs.
- Convert the 'Provider notes' free-form paragraphs into an
AccordionGroup keyed by provider (ComfyUI / Google Lyria 3 /
MiniMax), keeping wording faithful.
- Sentence-case Related entries and add sidebarTitle so the nav reads
'Music generation' explicitly.
Provider rows already alphabetized in the supported providers table
(ComfyUI / Google / MiniMax), kept that order. Wording, model refs,
defaults, env vars, and capability declarations are unchanged.
The image-generation page was 395 lines with a 3-step quick-start
written as plain numbered prose, a sprawling 'OpenAI gpt-image-2'
section that mixed routing/legacy/OpenAI options with five inline
slash-command examples, and provider tables that mixed alphabetic
and recency order.
Restructure for scan-first reading without losing technical content:
- Wrap Quick start in a Steps component (auth -> default model ->
ask the agent), pulling the Codex OAuth note inline with the model
step where it belongs and surfacing the LAN/SSRF caveat as a
Warning callout.
- Alphabetize the Supported providers table (ComfyUI, fal, Google,
LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider
capabilities table (same order across both). Convert the Yes/No
capability table to checkmarks plus exact counts for readability.
- Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose
with a 'Provider deep dives' AccordionGroup so each backend's
routing, legacy URL handling, and provider-specific knobs collapse
by default.
- Move the four provider-selection-order notes into a small
AccordionGroup ('Per-call overrides are exact', 'Auto-detection is
auth-aware', 'Timeouts', 'Inspect at runtime').
- Collapse the five flat slash-command examples into a single Tabs
component (4K landscape / transparent PNG / two-square /
edit-one-ref / edit-multi-ref) with the matching CLI variant inline
on the transparent-PNG tab.
- Sentence-case the Related list (Tools overview, Configuration
reference) and drop the redundant generic introductory wording.
- Add sidebarTitle so the nav reads 'Image generation' explicitly.
Wording, schema fields, defaults, model refs, env vars, and the
detailed OpenAI/OpenRouter/Codex routing rules are unchanged.
The media overview was a 91-line page that opened with a redundant
Title-Case body H1 ('# Media Generation and Understanding'), then
mixed a capability table, a Yes/Yes/Yes provider matrix, dense prose
about async behaviour and STT/Voice Call surfaces, plus duplicate
'Quick links' and 'Related' sections at the end.
Restructure for scan-first reading without losing any content:
- Drop the redundant body H1; lead with a one-paragraph summary.
- Replace the 'Capabilities at a glance' table with a CardGroup of six
entry cards (Image / Video / Music / TTS / Media understanding / STT)
each linking directly to its dedicated page. Mode (sync/async) is
noted on the card so readers see latency expectations up front.
- Convert the provider matrix to checkmarks for readability and align
the column header names. Provider rows already alphabetized.
- Pull async vs synchronous behaviour into a 5-row table that names
why each capability is sync or async, then keep the operator-facing
paragraph that explains task-id handoff.
- Move the long 'Google maps to ... OpenAI maps to ... xAI maps to ...'
paragraph into a per-vendor AccordionGroup so each mapping is a
collapsible panel instead of one large prose block.
- Drop duplicate 'Quick links' section in favour of a single Related
list, sentence-cased to match the rest of the docs.
Default sidebar label fell back to title 'Text-to-speech', which is fine
on the page header but readers scanning the Tools sidebar look for the
acronym 'TTS'. Add a sidebarTitle so Mintlify renders 'Text to speech
(TTS)' in the sidebar while keeping the canonical page title intact.
Sentence case matches the rest of the Tools sidebar group (e.g.
'Image generation', 'Music generation', 'Video generation').
Preserve exact Telegram selected quote text for native quote replies, share Telegram reply parameter construction between bot delivery and direct outbound sends, and retry with legacy replies when Telegram rejects native quote parameters.\n\nThanks @rubencu.
- docs/tools/tts.md: alphabetize providers in three places that listed
them: the supported-providers table (Azure Speech ... Xiaomi MiMo),
the configuration Tabs (12 provider presets in A-Z), and the field
reference AccordionGroup. Top-level fields stay first; provider
tabs/accordions follow strict alphabetical order. Wording, schema,
and defaults unchanged.
- docs/docs.json: add tools/tts to the main Tools sidebar group
(slotted between trajectory and video-generation, matching the
alphabetical neighborhood with image-generation, music-generation,
video-generation). Previously tts only appeared under
Nodes > Media capabilities, which was a discoverability gap for
readers looking for TTS alongside the other generation tools.
The TTS doc had grown to 1008 lines with 11 separate flat 'X primary'
config blocks, a 100-line dense 'Notes on fields' bullet list, and
the new provider-personas feature (#70748) buried near the bottom.
Restructure for readability and feature visibility:
- Lead with a Steps-based 'Quick start' so first-time readers can
enable TTS in 4 explicit steps.
- Replace the 13-bullet provider list with a single 'Supported
providers' table that names auth env vars and per-provider notes
inline. Add a Warning callout for the Microsoft/edge legacy alias.
- Collapse the 11 'X primary' config blocks into one Tabs component
('OpenAI + ElevenLabs', 'Google Gemini', 'Azure Speech',
'Microsoft (no key)', 'MiniMax', 'Inworld', 'xAI', 'Volcengine',
'Xiaomi MiMo', 'OpenRouter', 'Gradium', 'Local CLI') so users see
one preset at a time and the page is scannable.
- Promote 'Personas' to its own top-level section with two examples
(minimal and the Alfred provider-neutral persona), and add a new
'How providers use persona prompts' AccordionGroup covering Google
(promptTemplate audio-profile-v1, personaPrompt), OpenAI
(instructions auto-mapping), and Other providers, plus a fallback
policy table.
- Note that agents.list[].tts.persona overrides global persona
per-agent (covers the recent feat(tts) per-agent voice-override
work).
- Convert the 100-line 'Notes on fields' wall into a per-provider
AccordionGroup using ParamField, so the field reference is
scannable and field types/defaults are visually distinct.
- Sentence-case headings, drop redundant body H1, fold the flow
diagram inline with Auto-TTS behavior, and refresh the Output
formats section to a table-first layout.
- Schema fields (label/description/provider/fallbackPolicy/prompt
with profile/scene/sampleContext/style/accent/pacing/constraints
and providers map) verified against src/config/types.tts.ts; all
defaults and env-var fallbacks preserved verbatim.
Net diff: 585 insertions, 684 deletions across the same surface
area.