52 Commits

Author SHA1 Message Date
Peter Steinberger
24853ced11 docs: outline unified talk API 2026-05-06 02:39:15 +01:00
Peter Steinberger
2dfa4b082a docs: sync docs with source truth 2026-05-02 21:45:03 +01:00
Peter Steinberger
c02605253d fix: require explicit TTS intent 2026-05-02 03:16:57 +01:00
Peter Steinberger
5e3265b09b feat: support openai tts extra body 2026-05-01 22:57:35 +01:00
Peter Steinberger
0294aebe6f feat(providers): add DeepInfra provider plugin (#73038)
* feat(providers): add DeepInfra provider plugin

* feat(deepinfra): add media provider surfaces

* fix(deepinfra): satisfy provider boundary checks

* docs: add gitcrawl maintainer skill

* test: include deepinfra in live media sweeps

* fix: remove stale tts contract import
2026-04-28 01:12:54 +01:00
Peter Steinberger
d419fb561d feat(tts): resolve channel account config generically 2026-04-26 08:10:36 +01:00
Peter Steinberger
d613c8e29b refactor(tts): resolve voice delivery from channel capabilities 2026-04-26 07:03:25 +01:00
Vincent Koc
724e92505a docs(tts): add sidebarTitle 'Text to speech (TTS)' for the nav
Default sidebar label fell back to title 'Text-to-speech', which is fine
on the page header but readers scanning the Tools sidebar look for the
acronym 'TTS'. Add a sidebarTitle so Mintlify renders 'Text to speech
(TTS)' in the sidebar while keeping the canonical page title intact.

Sentence case matches the rest of the Tools sidebar group (e.g.
'Image generation', 'Music generation', 'Video generation').
2026-04-25 22:11:31 -07:00
Vincent Koc
fbd6b3ce3c docs(tts): A-Z order providers and add tools/tts to Tools nav group
- docs/tools/tts.md: alphabetize providers in three places that listed
  them: the supported-providers table (Azure Speech ... Xiaomi MiMo),
  the configuration Tabs (12 provider presets in A-Z), and the field
  reference AccordionGroup. Top-level fields stay first; provider
  tabs/accordions follow strict alphabetical order. Wording, schema,
  and defaults unchanged.
- docs/docs.json: add tools/tts to the main Tools sidebar group
  (slotted between trajectory and video-generation, matching the
  alphabetical neighborhood with image-generation, music-generation,
  video-generation). Previously tts only appeared under
  Nodes > Media capabilities, which was a discoverability gap for
  readers looking for TTS alongside the other generation tools.
2026-04-25 22:05:46 -07:00
Vincent Koc
71b79f49ad docs(tts): rewrite tts.md around personas with Mintlify components
The TTS doc had grown to 1008 lines with 11 separate flat 'X primary'
config blocks, a 100-line dense 'Notes on fields' bullet list, and
the new provider-personas feature (#70748) buried near the bottom.
Restructure for readability and feature visibility:

- Lead with a Steps-based 'Quick start' so first-time readers can
  enable TTS in 4 explicit steps.
- Replace the 13-bullet provider list with a single 'Supported
  providers' table that names auth env vars and per-provider notes
  inline. Add a Warning callout for the Microsoft/edge legacy alias.
- Collapse the 11 'X primary' config blocks into one Tabs component
  ('OpenAI + ElevenLabs', 'Google Gemini', 'Azure Speech',
  'Microsoft (no key)', 'MiniMax', 'Inworld', 'xAI', 'Volcengine',
  'Xiaomi MiMo', 'OpenRouter', 'Gradium', 'Local CLI') so users see
  one preset at a time and the page is scannable.
- Promote 'Personas' to its own top-level section with two examples
  (minimal and the Alfred provider-neutral persona), and add a new
  'How providers use persona prompts' AccordionGroup covering Google
  (promptTemplate audio-profile-v1, personaPrompt), OpenAI
  (instructions auto-mapping), and Other providers, plus a fallback
  policy table.
- Note that agents.list[].tts.persona overrides global persona
  per-agent (covers the recent feat(tts) per-agent voice-override
  work).
- Convert the 100-line 'Notes on fields' wall into a per-provider
  AccordionGroup using ParamField, so the field reference is
  scannable and field types/defaults are visually distinct.
- Sentence-case headings, drop redundant body H1, fold the flow
  diagram inline with Auto-TTS behavior, and refresh the Output
  formats section to a table-first layout.
- Schema fields (label/description/provider/fallbackPolicy/prompt
  with profile/scene/sampleContext/style/accent/pacing/constraints
  and providers map) verified against src/config/types.tts.ts; all
  defaults and env-var fallbacks preserved verbatim.

Net diff: 585 insertions, 684 deletions across the same surface
area.
2026-04-25 22:00:19 -07:00
Barron Roth
0594fa3c4d TTS: add provider personas 2026-04-26 09:42:38 +05:30
Peter Steinberger
2c8c79de5c fix(tts): normalize streamed tts voice media 2026-04-26 04:28:19 +01:00
Peter Steinberger
a91baa16de fix(tts): honor explicit directive providers 2026-04-26 04:14:48 +01:00
Peter Steinberger
cf834e2a21 fix(tts): clean streamed directive text 2026-04-26 04:09:56 +01:00
Peter Steinberger
7a85c1a822 fix(tts): surface voice status and harden providers 2026-04-26 03:51:30 +01:00
Peter Steinberger
97ae1c7c2e feat(tts): add read-latest voice command 2026-04-26 03:44:44 +01:00
Peter Steinberger
6855b33255 docs(tts): clarify WhatsApp voice-note delivery 2026-04-26 03:28:51 +01:00
Peter Steinberger
9b91040053 fix(tts): route WhatsApp MP3 TTS as voice notes 2026-04-26 03:26:00 +01:00
Peter Steinberger
9b4f0779ce fix(tts): honor per-agent config in tts commands 2026-04-26 03:12:30 +01:00
Peter Steinberger
0ca952cdd5 feat(tts): add per-agent voice overrides 2026-04-26 02:54:13 +01:00
Peter Steinberger
5b80d0c15e feat(tts): add Azure Speech provider
Co-authored-by: Leon Chui <84605354+leonchui@users.noreply.github.com>
2026-04-26 01:42:51 +01:00
Rui Xu
1531123d35 feat(tts): add BytePlus Seed Speech provider
Add Volcengine/BytePlus Seed Speech as a bundled TTS provider with current API-key auth, legacy AppID/token fallback, native Ogg/Opus voice-note output, and MP3 audio-file output.

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-25 23:46:04 +01:00
Cale Shapera
0bcb4c95c1 feat(tts): add Inworld speech provider (#55972)
Adds the bundled Inworld speech provider with docs, config surface, SSRF-guarded fetches, directive overrides, native voice-note/telephony output coverage, and live `.profile` verification.

Co-authored-by: cshape <cshape@users.noreply.github.com>
2026-04-25 22:33:21 +01:00
Peter Steinberger
e2fd3dcee9 fix(google): emit opus voice-note tts 2026-04-25 21:33:33 +01:00
Peter Steinberger
9ffe764416 fix(whatsapp): send voice note text separately 2026-04-25 18:55:03 +01:00
Peter Steinberger
b511250e5c feat(media): add voice conversion and speech plugins 2026-04-25 12:12:33 +01:00
Peter Steinberger
a7604f8170 fix(minimax): support token plan tts auth 2026-04-25 10:36:12 +01:00
Peter Steinberger
ec8dbc4595 feat(tts): add xiaomi mimo speech provider 2026-04-25 09:48:05 +01:00
Peter Steinberger
b0c55eb659 fix(feishu): transcode voice TTS audio 2026-04-25 09:26:42 +01:00
Peter Steinberger
8acc92c881 feat(google): support Gemini TTS style profile 2026-04-25 06:11:23 +01:00
Peter Steinberger
e31aef7e19 fix(tts): migrate legacy edge config in doctor 2026-04-25 05:55:54 +01:00
Peter Steinberger
c03e5b3c3a docs(tts): clarify legacy provider migration 2026-04-25 05:01:09 +01:00
Peter Steinberger
978a50a3c5 fix(minimax): normalize tts pitch for api 2026-04-25 04:58:20 +01:00
Peter Steinberger
225ff9a866 fix(minimax): transcode voice-note tts to opus 2026-04-25 04:52:25 +01:00
Peter Steinberger
7875092f4d feat(openrouter): add tts provider 2026-04-25 04:36:49 +01:00
Laurent Mazare
d7e2939791 feat: add Gradium text-to-speech provider (#64958)
Adds the Gradium bundled plugin with TTS and speech-provider registration, docs, label routing, and focused/live coverage.

Also carries the current main lint cleanup needed for the rebased CI lane.

Co-authored-by: laurent <laurent.mazare@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 18:43:53 +01:00
Peter Steinberger
f0a7a85e7a feat(agents): add generation tool timeouts 2026-04-24 00:05:38 +01:00
Vincent Koc
789e71cdb8 docs: remove H1 on pages where frontmatter + summary already cover the parenthetical 2026-04-23 15:47:48 -07:00
Vincent Koc
6667f66fd8 docs(tools): add Related sections and unify See also to Related 2026-04-23 15:41:56 -07:00
Vincent Koc
2777b089b5 docs: normalize frontmatter titles to sentence case 2026-04-23 13:15:17 -07:00
KateWilkins
f342da5fcc feat: add xai media providers
Add xAI image generation and text-to-speech provider support with docs, live tests, and guarded provider HTTP handling.\n\nThanks @KateWilkins.
2026-04-23 00:07:39 +01:00
Barron Roth
bf59917cd1 fix: add Google Gemini TTS provider (#67515) (thanks @barronlroth)
* Add Google Gemini TTS provider

* Remove committed planning artifact

* Explain Google media provider type shape

* google: distill Gemini TTS provider

* fix: add Google Gemini TTS provider (#67515) (thanks @barronlroth)

* fix: honor cfg-backed Google TTS selection (#67515) (thanks @barronlroth)

* fix: narrow Google TTS directive aliases (#67515) (thanks @barronlroth)

---------

Co-authored-by: Ayaan Zaidi <hi@obviy.us>
2026-04-16 11:54:35 +05:30
Marcus Castro
403783a3b1 fix(tts): correct tagged TTS syntax guidance (#65573) 2026-04-12 19:41:13 -03:00
Gustavo Madeira Santana
17a2290f49 Docs: refresh schema, slash commands, and TTS refs 2026-04-08 01:10:00 -04:00
gnuduncan
e934211170 fix(minimax): use global TTS endpoint default and add missing Talk Mode overrides
Switch DEFAULT_MINIMAX_TTS_BASE_URL from api.minimaxi.com (CN) to
api.minimax.io (global) so international API keys work out of the box.
Add vol and pitch to resolveTalkOverrides for parity with resolveTalkConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:19:45 +01:00
gnuduncan
7d7f5d85b4 feat(minimax): add native TTS speech provider (T2A v2)
Add MiniMax as a fourth TTS provider alongside OpenAI, ElevenLabs, and
Microsoft. Registers a SpeechProviderPlugin in the existing minimax
extension with config resolution, directive parsing, and Talk Mode
support. Hex-encoded audio response from the T2A v2 API is decoded to
MP3.

Closes #52720

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:19:45 +01:00
Josh Avant
44674525f2 feat(tts): add structured provider diagnostics and fallback attempt analytics (#57954)
* feat(tts): add structured fallback diagnostics and attempt analytics

* docs(tts): document attempt-detail and provider error diagnostics

* TTS: harden fallback loops and share error helpers

* TTS: bound provider error-body reads

* tts: add double-prefix regression test and clean baseline drift

* tests(tts): satisfy error narrowing in double-prefix regression

* changelog

Signed-off-by: joshavant <830519+joshavant@users.noreply.github.com>

---------

Signed-off-by: joshavant <830519+joshavant@users.noreply.github.com>
2026-03-30 22:55:28 -05:00
Josh Avant
c918ab4faf fix(tts): restore 3.28 schema compatibility and fallback observability (#57953)
* fix(tts): restore legacy config compatibility and fallback observability

* fix(tts): surface fallback attempts in status and telephony

* test(tts): cover /tts audio to /tts status fallback flow

* docs(tts): align migration and fallback observability guidance

* TTS: redact fallback logs and scope legacy plugin migration

* Infra: dedupe UV_EXTRA_INDEX_URL in host env policy

* Docs: scope doctor TTS migration to voice-call

* voice-call: restore strict known TTS provider validation
2026-03-30 22:05:03 -05:00
Peter Steinberger
01bcbcf8d5 refactor: require legacy config migration on read 2026-03-26 23:23:47 +00:00
Jealous
2c3cf4f387 chore(tts): rename VOICE_BUBBLE identifiers to OPUS and update docs 2026-03-25 10:49:21 +05:30