Default sidebar label fell back to title 'Text-to-speech', which is fine
on the page header but readers scanning the Tools sidebar look for the
acronym 'TTS'. Add a sidebarTitle so Mintlify renders 'Text to speech
(TTS)' in the sidebar while keeping the canonical page title intact.
Sentence case matches the rest of the Tools sidebar group (e.g.
'Image generation', 'Music generation', 'Video generation').
Preserve exact Telegram selected quote text for native quote replies, share Telegram reply parameter construction between bot delivery and direct outbound sends, and retry with legacy replies when Telegram rejects native quote parameters.\n\nThanks @rubencu.
- docs/tools/tts.md: alphabetize providers in three places that listed
them: the supported-providers table (Azure Speech ... Xiaomi MiMo),
the configuration Tabs (12 provider presets in A-Z), and the field
reference AccordionGroup. Top-level fields stay first; provider
tabs/accordions follow strict alphabetical order. Wording, schema,
and defaults unchanged.
- docs/docs.json: add tools/tts to the main Tools sidebar group
(slotted between trajectory and video-generation, matching the
alphabetical neighborhood with image-generation, music-generation,
video-generation). Previously tts only appeared under
Nodes > Media capabilities, which was a discoverability gap for
readers looking for TTS alongside the other generation tools.
The TTS doc had grown to 1008 lines with 11 separate flat 'X primary'
config blocks, a 100-line dense 'Notes on fields' bullet list, and
the new provider-personas feature (#70748) buried near the bottom.
Restructure for readability and feature visibility:
- Lead with a Steps-based 'Quick start' so first-time readers can
enable TTS in 4 explicit steps.
- Replace the 13-bullet provider list with a single 'Supported
providers' table that names auth env vars and per-provider notes
inline. Add a Warning callout for the Microsoft/edge legacy alias.
- Collapse the 11 'X primary' config blocks into one Tabs component
('OpenAI + ElevenLabs', 'Google Gemini', 'Azure Speech',
'Microsoft (no key)', 'MiniMax', 'Inworld', 'xAI', 'Volcengine',
'Xiaomi MiMo', 'OpenRouter', 'Gradium', 'Local CLI') so users see
one preset at a time and the page is scannable.
- Promote 'Personas' to its own top-level section with two examples
(minimal and the Alfred provider-neutral persona), and add a new
'How providers use persona prompts' AccordionGroup covering Google
(promptTemplate audio-profile-v1, personaPrompt), OpenAI
(instructions auto-mapping), and Other providers, plus a fallback
policy table.
- Note that agents.list[].tts.persona overrides global persona
per-agent (covers the recent feat(tts) per-agent voice-override
work).
- Convert the 100-line 'Notes on fields' wall into a per-provider
AccordionGroup using ParamField, so the field reference is
scannable and field types/defaults are visually distinct.
- Sentence-case headings, drop redundant body H1, fold the flow
diagram inline with Auto-TTS behavior, and refresh the Output
formats section to a table-first layout.
- Schema fields (label/description/provider/fallbackPolicy/prompt
with profile/scene/sampleContext/style/accent/pacing/constraints
and providers map) verified against src/config/types.tts.ts; all
defaults and env-var fallbacks preserved verbatim.
Net diff: 585 insertions, 684 deletions across the same surface
area.
Two recent commits added user-facing surface that left signature-style
references in docs stale:
- 4428661779 Alvin Tang (#20721, thanks @alvinttang) extends the
configured model 'input' modality set to also accept 'audio' and
'video', matching what providers like LM Studio already report.
docs/plugins/manifest.md model-fields table listed only
'text | image | document', so add 'audio' and 'video'.
- 44da034516 Vincent (thanks @oc-factus) adds a bounded openclaw.agent
attribute on the openclaw.tokens counter so per-agent dashboards can
group usage. docs/gateway/opentelemetry.md metric reference omitted
it; add it to the attrs list.
Honor the parent `models auth --agent <id>` flag across auth write commands: `add`, `login`, `setup-token`, `paste-token`, and `login-github-copilot`.
The auth helpers now resolve the requested configured agent before choosing the auth-profile store and provider workspace, while preserving default-agent behavior when `--agent` is omitted.
Validation:
- `pnpm test src/cli/models-cli.test.ts src/commands/models/auth.test.ts`
- `pnpm test src/commands/models/auth.test.ts`
- `pnpm docs:check-mdx`
- `pnpm check:changed`
- `pnpm check`
- `pnpm build`
- `pnpm test src/cli/run-main.test.ts`
Full `pnpm test` was also run; it failed in unrelated `src/cli/run-main.test.ts` assertions during the full-suite order, while the exact file passes on both latest main and this branch. The PR diff only touches models auth CLI/auth files, docs, and changelog.
Fixes#71864.
Thanks @neeravmakwana.
* fix: add placeholder transcript for silent voice notes
* fix: handle placeholder transcripts per skipped attachment
* fix: preserve synthetic transcript attachment order
* fix: scope synthetic audio merge to audio slice only, preserve cross-capability and prefer ordering
Replace the global outputs.sort() with a targeted merge that:
1. Only sorts within the audio output slice (real + synthetic),
preserving CAPABILITY_ORDER and per-capability attachments.prefer
ordering for non-audio outputs.
2. Excludes synthetic placeholder indexes from audioAttachmentIndexes
used by extractFileBlocks, so tiny audio-MIME files with text
extensions can still be recovered via forcedTextMime.
Adds mergeAudioOutputsPreservingAttachmentOrder helper.
* fix: remove unused function and use toSorted() for oxlint compliance
* fix(media-understanding): preserve selected audio order for synthetic placeholders
- merge synthetic skipped-audio placeholders using audio decision order
instead of raw attachmentIndex sorting, preserving attachments.prefer
- insert synthetic-only audio outputs at the audio capability slot
(before video) when no real audio outputs were produced
* fix(media-understanding): use neutral too-small placeholder text
Clarify that this synthetic transcript path is triggered by attachment size,
not by a silence/no-speech detection result.
* test(media-understanding): update too-small audio placeholder expectations
* test(media-understanding): cover mixed too-small audio placeholder
* test(media-understanding): cover too-small audio context
* fix(tasks): preserve visible task title before internal context
* Revert "fix(tasks): preserve visible task title before internal context"
This reverts commit dc536fb4d3c8a01168de5d05e8562193dd68a88e.
---------
Co-authored-by: Eulices Lopez <eulices@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>