Commit Graph

1313 Commits

Author SHA1 Message Date
Vincent Koc
6e3eeb526f docs(video-generation): rewrite around Steps, ParamField, AZ providers
The video-generation page was 454 lines with a 3-step Quick start
written as flat numbered prose, four separate parameter tables (Required,
Content inputs, Style controls, Advanced), the task lifecycle as a
numbered list, and a Related list mixing alphabetic and recency order.

Restructure for scan-first reading without losing technical content:

- Wrap Quick start in a Steps component (auth -> default model ->
  ask the agent).
- Convert all four parameter tables into ParamField definitions grouped
  under their existing sub-section headings (Required / Content inputs /
  Style controls / Advanced), so types, defaults, and required flags
  show as visual chips and long descriptions wrap cleanly.
- Convert the task lifecycle from a numbered list to a 4-row table for
  at-a-glance scanning.
- Convert Yes/No checkmarks in both the Supported providers and
  Capability matrix tables to ✓ and em-dash, matching the rest of the
  media docs.
- Convert the bullet list under Actions into a 3-row table.
- Sentence-case Related entries and alphabetize the Related list.
- Add sidebarTitle so the nav reads 'Video generation' explicitly.

Schema fields, defaults, model refs, env vars, capability declarations,
fallback rules, and provider notes are unchanged. AccordionGroup of 14
provider notes was already alphabetized and is preserved verbatim.
2026-04-25 22:27:56 -07:00
Vincent Koc
d531760898 docs(music-generation): rewrite around Steps, Tabs, and provider Accordion
The music-generation page was 291 lines with two side-by-side
'Quick start' subsections (shared provider-backed vs. ComfyUI
workflow), a flat parameter table, two prose paragraphs explaining
async behaviour and task lifecycle, and a 'Provider notes' bullet
list mixed with a separate 'Choosing the right path' section.

Restructure for scan-first reading without losing technical content:

- Wrap Quick start in a top-level Tabs with two child Steps blocks
  (Shared provider-backed | ComfyUI workflow), so readers pick a path
  first and only see the matching steps.
- Convert the tool parameter list to ParamField definitions with
  type signatures and required flags surfaced visually.
- Convert the four async-behaviour bullets to a labelled bullet list
  and the four-state task lifecycle to a table for at-a-glance
  scanning.
- Change Capability matrix Yes/No values to checkmarks/em-dashes for
  alignment with the rest of the media docs.
- Convert the 'Provider notes' free-form paragraphs into an
  AccordionGroup keyed by provider (ComfyUI / Google Lyria 3 /
  MiniMax), keeping wording faithful.
- Sentence-case Related entries and add sidebarTitle so the nav reads
  'Music generation' explicitly.

Provider rows already alphabetized in the supported providers table
(ComfyUI / Google / MiniMax), kept that order. Wording, model refs,
defaults, env vars, and capability declarations are unchanged.
2026-04-25 22:24:58 -07:00
Vincent Koc
f0ea901a0d docs(image-generation): rewrite around Steps, Tabs, and AZ providers
The image-generation page was 395 lines with a 3-step quick-start
written as plain numbered prose, a sprawling 'OpenAI gpt-image-2'
section that mixed routing/legacy/OpenAI options with five inline
slash-command examples, and provider tables that mixed alphabetic
and recency order.

Restructure for scan-first reading without losing technical content:

- Wrap Quick start in a Steps component (auth -> default model ->
  ask the agent), pulling the Codex OAuth note inline with the model
  step where it belongs and surfacing the LAN/SSRF caveat as a
  Warning callout.
- Alphabetize the Supported providers table (ComfyUI, fal, Google,
  LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider
  capabilities table (same order across both). Convert the Yes/No
  capability table to checkmarks plus exact counts for readability.
- Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose
  with a 'Provider deep dives' AccordionGroup so each backend's
  routing, legacy URL handling, and provider-specific knobs collapse
  by default.
- Move the four provider-selection-order notes into a small
  AccordionGroup ('Per-call overrides are exact', 'Auto-detection is
  auth-aware', 'Timeouts', 'Inspect at runtime').
- Collapse the five flat slash-command examples into a single Tabs
  component (4K landscape / transparent PNG / two-square /
  edit-one-ref / edit-multi-ref) with the matching CLI variant inline
  on the transparent-PNG tab.
- Sentence-case the Related list (Tools overview, Configuration
  reference) and drop the redundant generic introductory wording.
- Add sidebarTitle so the nav reads 'Image generation' explicitly.

Wording, schema fields, defaults, model refs, env vars, and the
detailed OpenAI/OpenRouter/Codex routing rules are unchanged.
2026-04-25 22:23:09 -07:00
Vincent Koc
d1502c2ba1 docs(media-overview): rewrite around CardGroup, sync/async split, and AZ providers
The media overview was a 91-line page that opened with a redundant
Title-Case body H1 ('# Media Generation and Understanding'), then
mixed a capability table, a Yes/Yes/Yes provider matrix, dense prose
about async behaviour and STT/Voice Call surfaces, plus duplicate
'Quick links' and 'Related' sections at the end.

Restructure for scan-first reading without losing any content:

- Drop the redundant body H1; lead with a one-paragraph summary.
- Replace the 'Capabilities at a glance' table with a CardGroup of six
  entry cards (Image / Video / Music / TTS / Media understanding / STT)
  each linking directly to its dedicated page. Mode (sync/async) is
  noted on the card so readers see latency expectations up front.
- Convert the provider matrix to checkmarks for readability and align
  the column header names. Provider rows already alphabetized.
- Pull async vs synchronous behaviour into a 5-row table that names
  why each capability is sync or async, then keep the operator-facing
  paragraph that explains task-id handoff.
- Move the long 'Google maps to ... OpenAI maps to ... xAI maps to ...'
  paragraph into a per-vendor AccordionGroup so each mapping is a
  collapsible panel instead of one large prose block.
- Drop duplicate 'Quick links' section in favour of a single Related
  list, sentence-cased to match the rest of the docs.
2026-04-25 22:20:35 -07:00
Vincent Koc
724e92505a docs(tts): add sidebarTitle 'Text to speech (TTS)' for the nav
Default sidebar label fell back to title 'Text-to-speech', which is fine
on the page header but readers scanning the Tools sidebar look for the
acronym 'TTS'. Add a sidebarTitle so Mintlify renders 'Text to speech
(TTS)' in the sidebar while keeping the canonical page title intact.

Sentence case matches the rest of the Tools sidebar group (e.g.
'Image generation', 'Music generation', 'Video generation').
2026-04-25 22:11:31 -07:00
Vincent Koc
fbd6b3ce3c docs(tts): A-Z order providers and add tools/tts to Tools nav group
- docs/tools/tts.md: alphabetize providers in three places that listed
  them: the supported-providers table (Azure Speech ... Xiaomi MiMo),
  the configuration Tabs (12 provider presets in A-Z), and the field
  reference AccordionGroup. Top-level fields stay first; provider
  tabs/accordions follow strict alphabetical order. Wording, schema,
  and defaults unchanged.
- docs/docs.json: add tools/tts to the main Tools sidebar group
  (slotted between trajectory and video-generation, matching the
  alphabetical neighborhood with image-generation, music-generation,
  video-generation). Previously tts only appeared under
  Nodes > Media capabilities, which was a discoverability gap for
  readers looking for TTS alongside the other generation tools.
2026-04-25 22:05:46 -07:00
Vincent Koc
71b79f49ad docs(tts): rewrite tts.md around personas with Mintlify components
The TTS doc had grown to 1008 lines with 11 separate flat 'X primary'
config blocks, a 100-line dense 'Notes on fields' bullet list, and
the new provider-personas feature (#70748) buried near the bottom.
Restructure for readability and feature visibility:

- Lead with a Steps-based 'Quick start' so first-time readers can
  enable TTS in 4 explicit steps.
- Replace the 13-bullet provider list with a single 'Supported
  providers' table that names auth env vars and per-provider notes
  inline. Add a Warning callout for the Microsoft/edge legacy alias.
- Collapse the 11 'X primary' config blocks into one Tabs component
  ('OpenAI + ElevenLabs', 'Google Gemini', 'Azure Speech',
  'Microsoft (no key)', 'MiniMax', 'Inworld', 'xAI', 'Volcengine',
  'Xiaomi MiMo', 'OpenRouter', 'Gradium', 'Local CLI') so users see
  one preset at a time and the page is scannable.
- Promote 'Personas' to its own top-level section with two examples
  (minimal and the Alfred provider-neutral persona), and add a new
  'How providers use persona prompts' AccordionGroup covering Google
  (promptTemplate audio-profile-v1, personaPrompt), OpenAI
  (instructions auto-mapping), and Other providers, plus a fallback
  policy table.
- Note that agents.list[].tts.persona overrides global persona
  per-agent (covers the recent feat(tts) per-agent voice-override
  work).
- Convert the 100-line 'Notes on fields' wall into a per-provider
  AccordionGroup using ParamField, so the field reference is
  scannable and field types/defaults are visually distinct.
- Sentence-case headings, drop redundant body H1, fold the flow
  diagram inline with Auto-TTS behavior, and refresh the Output
  formats section to a table-first layout.
- Schema fields (label/description/provider/fallbackPolicy/prompt
  with profile/scene/sampleContext/style/accent/pacing/constraints
  and providers map) verified against src/config/types.tts.ts; all
  defaults and env-var fallbacks preserved verbatim.

Net diff: 585 insertions, 684 deletions across the same surface
area.
2026-04-25 22:00:19 -07:00
Peter Steinberger
6a67f65568 fix(voice): reuse preflight transcripts across channels 2026-04-26 05:42:04 +01:00
Barron Roth
0594fa3c4d TTS: add provider personas 2026-04-26 09:42:38 +05:30
Peter Steinberger
9ed11d6c49 fix: steer agents to safe gateway config flow 2026-04-26 05:00:17 +01:00
Peter Steinberger
540c70d166 fix(plugins): ignore bundled load path aliases 2026-04-26 04:46:05 +01:00
Peter Steinberger
4edf22f63f fix(acpx): avoid startup agent probes by default 2026-04-26 04:40:26 +01:00
Peter Steinberger
ed1ac2fc44 feat(browser): add CDP role snapshot fallback 2026-04-26 04:40:26 +01:00
Peter Steinberger
6d4f65c9d4 docs: clarify codex runtime routing 2026-04-26 04:38:39 +01:00
Peter Steinberger
2c8c79de5c fix(tts): normalize streamed tts voice media 2026-04-26 04:28:19 +01:00
Peter Steinberger
a91baa16de fix(tts): honor explicit directive providers 2026-04-26 04:14:48 +01:00
Peter Steinberger
cf834e2a21 fix(tts): clean streamed directive text 2026-04-26 04:09:56 +01:00
Peter Steinberger
7a85c1a822 fix(tts): surface voice status and harden providers 2026-04-26 03:51:30 +01:00
Peter Steinberger
97ae1c7c2e feat(tts): add read-latest voice command 2026-04-26 03:44:44 +01:00
Peter Steinberger
3989510251 docs: expand ACP agents guide 2026-04-26 03:42:44 +01:00
Peter Steinberger
f0fa35082b fix: keep ACP completion prompts harness-safe 2026-04-26 03:39:24 +01:00
Peter Steinberger
6855b33255 docs(tts): clarify WhatsApp voice-note delivery 2026-04-26 03:28:51 +01:00
Peter Steinberger
9b91040053 fix(tts): route WhatsApp MP3 TTS as voice notes 2026-04-26 03:26:00 +01:00
Peter Steinberger
9b4f0779ce fix(tts): honor per-agent config in tts commands 2026-04-26 03:12:30 +01:00
Peter Steinberger
a6d9926d1d fix: keep acp management commands local 2026-04-26 03:02:04 +01:00
Peter Steinberger
0ca952cdd5 feat(tts): add per-agent voice overrides 2026-04-26 02:54:13 +01:00
Shivanker Goel
a932a58e87 feat(fal): support Seedance reference video
Adds fal Seedance 2.0 reference-to-video support with model-aware reference input limits.
2026-04-26 02:30:23 +01:00
Peter Steinberger
5b80d0c15e feat(tts): add Azure Speech provider
Co-authored-by: Leon Chui <84605354+leonchui@users.noreply.github.com>
2026-04-26 01:42:51 +01:00
Peter Steinberger
81c2a1de26 test: add Droid ACP bind Docker lane 2026-04-26 01:31:27 +01:00
Peter Steinberger
e6ee4d6e68 fix(browser): preserve tabs across target swaps 2026-04-26 01:21:59 +01:00
Vincent Koc
f3accc753c feat(plugins): add before agent finalize hook (#71765) 2026-04-25 17:21:17 -07:00
Peter Steinberger
3a4325b285 fix: prevent duplicate channel plugin tools 2026-04-26 01:06:11 +01:00
Shakker
babbad81a9 fix: preserve plugin install records without manifests 2026-04-26 01:03:13 +01:00
Shakker
37ce39b5c5 docs: describe plugin install index store 2026-04-26 01:03:12 +01:00
Peter Steinberger
8e12c24d17 fix: prefer native codex app-server controls 2026-04-26 00:59:02 +01:00
Peter Steinberger
12c16576cd fix: gate acp spawn affordances 2026-04-26 00:30:27 +01:00
Peter Steinberger
41b27024bb docs(gateway): clarify backend RPC pairing 2026-04-26 00:26:35 +01:00
Rui Xu
1531123d35 feat(tts): add BytePlus Seed Speech provider
Add Volcengine/BytePlus Seed Speech as a bundled TTS provider with current API-key auth, legacy AppID/token fallback, native Ogg/Opus voice-note output, and MP3 audio-file output.

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-25 23:46:04 +01:00
Peter Steinberger
b1b29a8fc2 fix: stabilize remote skill node probes 2026-04-25 23:42:02 +01:00
Peter Steinberger
b721f1dbad fix: update Ollama web search endpoint 2026-04-25 22:34:43 +01:00
Cale Shapera
0bcb4c95c1 feat(tts): add Inworld speech provider (#55972)
Adds the bundled Inworld speech provider with docs, config surface, SSRF-guarded fetches, directive overrides, native voice-note/telephony output coverage, and live `.profile` verification.

Co-authored-by: cshape <cshape@users.noreply.github.com>
2026-04-25 22:33:21 +01:00
Peter Steinberger
2febe72108 fix: isolate ACP spawned runs 2026-04-25 22:06:53 +01:00
Peter Steinberger
9e9e024188 docs: clarify ACP model override support 2026-04-25 21:52:36 +01:00
Peter Steinberger
e2fd3dcee9 fix(google): emit opus voice-note tts 2026-04-25 21:33:33 +01:00
Tars
d5b6667823 fix(minimax): enable portal music and video generation 2026-04-25 21:30:10 +01:00
Peter Steinberger
6a7b76e119 fix(acp): guard sessions_spawn runtime targets 2026-04-25 21:23:24 +01:00
Vincent Koc
793b58b3f1 fix(plugins): add doctor registry repair 2026-04-25 12:45:43 -07:00
Peter Steinberger
75d64cd4b8 feat: expose generic image background option 2026-04-25 20:21:46 +01:00
Quratulain-bilal
7d58362f3f docs(browser): note tilde expansion also covers per-profile paths (#71601)
* docs(browser): note tilde expansion also covers per-profile paths

The 95a2c9b fix expanded "~" for both `browser.executablePath` and
per-profile `profiles.<name>.executablePath` (config.ts:382 calls
`normalizeExecutablePath` for profile overrides). Per-profile
`userDataDir` on existing-session profiles is also tilde-expanded
(config.ts:391 via `resolveUserPath`). The configuration reference
only mentioned the top-level `browser.executablePath` case.

* docs(browser): align tilde path config help

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-25 20:05:03 +01:00
Quratulain-bilal
8170df9127 docs(browser): document local startup timeout bounds (#71672)
* docs(browser): document local startup timeout bounds

The new browser.localLaunchTimeoutMs and browser.localCdpReadyTimeoutMs
options are clamped to MAX_BROWSER_STARTUP_TIMEOUT_MS (120000 ms) by
normalizeStartupTimeoutMs in extensions/browser/src/browser/config.ts,
and zero/negative/non-finite values fall back to the defaults. Without
this in the configuration reference, users setting a higher value see
no error and silently get the 120 s ceiling, or set 0 expecting 'no
timeout' and silently get the default.

* docs(browser): clarify startup timeout validation

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-25 19:59:53 +01:00