fix(tts): restore 3.28 schema compatibility and fallback observability (#57953)

* fix(tts): restore legacy config compatibility and fallback observability * fix(tts): surface fallback attempts in status and telephony * test(tts): cover /tts audio to /tts status fallback flow * docs(tts): align migration and fallback observability guidance * TTS: redact fallback logs and scope legacy plugin migration * Infra: dedupe UV_EXTRA_INDEX_URL in host env policy * Docs: scope doctor TTS migration to voice-call * voice-call: restore strict known TTS provider validation
2026-05-05 08:40:21 +00:00 · 2026-03-30 22:05:03 -05:00
parent 697dddbeb6
commit c918ab4faf
19 changed files with 838 additions and 154 deletions
--- a/docs/gateway/doctor.md
+++ b/docs/gateway/doctor.md
@@ -122,6 +122,10 @@ Current migrations:
 - `routing.agents`/`routing.defaultAgentId` → `agents.list` + `agents.list[].default`
 - `routing.agentToAgent` → `tools.agentToAgent`
 - `routing.transcribeAudio` → `tools.media.audio.models`
+- `messages.tts.<provider>` (`openai`/`elevenlabs`/`microsoft`/`edge`) → `messages.tts.providers.<provider>`
+- `channels.discord.voice.tts.<provider>` (`openai`/`elevenlabs`/`microsoft`/`edge`) → `channels.discord.voice.tts.providers.<provider>`
+- `channels.discord.accounts.<id>.voice.tts.<provider>` (`openai`/`elevenlabs`/`microsoft`/`edge`) → `channels.discord.accounts.<id>.voice.tts.providers.<provider>`
+- `plugins.entries.voice-call.config.tts.<provider>` (`openai`/`elevenlabs`/`microsoft`/`edge`) → `plugins.entries.voice-call.config.tts.providers.<provider>`
 - `bindings[].match.accountID` → `bindings[].match.accountId`
 - For channels with named `accounts` but missing `accounts.default`, move account-scoped top-level single-account channel values into `channels.<channel>.accounts.default` when present
 - `identity` → `agents.list[].identity`
--- a/docs/plugins/voice-call.md
+++ b/docs/plugins/voice-call.md
@@ -219,9 +219,11 @@ streaming speech on calls. You can override it under the plugin config with the
 {
  tts: {
    provider: "elevenlabs",
-    elevenlabs: {
-      voiceId: "pMsXgVXv3BLzUgSXRplE",
-      modelId: "eleven_multilingual_v2",
+    providers: {
+      elevenlabs: {
+        voiceId: "pMsXgVXv3BLzUgSXRplE",
+        modelId: "eleven_multilingual_v2",
+      },
    },
  },
 }
@@ -229,9 +231,11 @@ streaming speech on calls. You can override it under the plugin config with the

 Notes:

+- Legacy `tts.<provider>` keys inside plugin config (`openai`, `elevenlabs`, `microsoft`, `edge`) are auto-migrated to `tts.providers.<provider>` on load. Prefer the `providers` shape in committed config.
 - **Microsoft speech is ignored for voice calls** (telephony audio needs PCM; the current Microsoft transport does not expose telephony PCM output).
 - Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
 - If a Twilio media stream is already active, Voice Call does not fall back to TwiML `<Say>`. If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths.
+- When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from`, `to`, `attempts`) for debugging.

 ### More examples

@@ -242,7 +246,9 @@ Use core TTS only (no override):
  messages: {
    tts: {
      provider: "openai",
-      openai: { voice: "alloy" },
+      providers: {
+        openai: { voice: "alloy" },
+      },
    },
  },
 }
@@ -258,10 +264,12 @@ Override to ElevenLabs just for calls (keep core default elsewhere):
        config: {
          tts: {
            provider: "elevenlabs",
-            elevenlabs: {
-              apiKey: "elevenlabs_key",
-              voiceId: "pMsXgVXv3BLzUgSXRplE",
-              modelId: "eleven_multilingual_v2",
+            providers: {
+              elevenlabs: {
+                apiKey: "elevenlabs_key",
+                voiceId: "pMsXgVXv3BLzUgSXRplE",
+                modelId: "eleven_multilingual_v2",
+              },
            },
          },
        },
@@ -280,9 +288,11 @@ Override only the OpenAI model for calls (deep‑merge example):
      "voice-call": {
        config: {
          tts: {
-            openai: {
-              model: "gpt-4o-mini-tts",
-              voice: "marin",
+            providers: {
+              openai: {
+                model: "gpt-4o-mini-tts",
+                voice: "marin",
+              },
            },
          },
        },
--- a/docs/tools/tts.md
+++ b/docs/tools/tts.md
@@ -219,6 +219,7 @@ Then run:
 - `modelOverrides`: allow the model to emit TTS directives (on by default).
  - `allowProvider` defaults to `false` (provider switching is opt-in).
 - `providers.<id>`: provider-owned settings keyed by speech provider id.
+- Legacy direct provider blocks (`messages.tts.openai`, `messages.tts.elevenlabs`, `messages.tts.microsoft`, `messages.tts.edge`) are auto-migrated to `messages.tts.providers.<id>` on load.
 - `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
 - `timeoutMs`: request timeout (ms).
 - `prefsPath`: override the local prefs JSON path (provider/limit/summary).
@@ -391,6 +392,9 @@ Notes:
 - `off|always|inbound|tagged` are per‑session toggles (`/tts on` is an alias for `/tts always`).
 - `limit` and `summary` are stored in local prefs, not the main config.
 - `/tts audio` generates a one-off audio reply (does not toggle TTS on).
+- `/tts status` includes fallback visibility for the latest attempt:
+  - success fallback: `Fallback: <primary> -> <used>` plus `Attempts: ...`
+  - failure: `Error: ...` plus `Attempts: ...`

 ## Agent tool

--- a/docs/tts.md
+++ b/docs/tts.md
@@ -219,6 +219,7 @@ Then run:
 - `modelOverrides`: allow the model to emit TTS directives (on by default).
  - `allowProvider` defaults to `false` (provider switching is opt-in).
 - `providers.<id>`: provider-owned settings keyed by speech provider id.
+- Legacy direct provider blocks (`messages.tts.openai`, `messages.tts.elevenlabs`, `messages.tts.microsoft`, `messages.tts.edge`) are auto-migrated to `messages.tts.providers.<id>` on load.
 - `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
 - `timeoutMs`: request timeout (ms).
 - `prefsPath`: override the local prefs JSON path (provider/limit/summary).
@@ -391,6 +392,9 @@ Notes:
 - `off|always|inbound|tagged` are per‑session toggles (`/tts on` is an alias for `/tts always`).
 - `limit` and `summary` are stored in local prefs, not the main config.
 - `/tts audio` generates a one-off audio reply (does not toggle TTS on).
+- `/tts status` includes fallback visibility for the latest attempt:
+  - success fallback: `Fallback: <primary> -> <used>` plus `Attempts: ...`
+  - failure: `Error: ...` plus `Attempts: ...`

 ## Agent tool