fix(whatsapp): send voice note text separately

2026-05-06 17:00:50 +00:00 · 2026-04-25 18:54:33 +01:00
parent 617e1dd6bf
commit 9ffe764416
9 changed files with 64 additions and 16 deletions
--- a/docs/channels/whatsapp.md
+++ b/docs/channels/whatsapp.md
@@ -365,7 +365,7 @@ When the linked self number is also present in `allowFrom`, WhatsApp self-chat s
    - non-Ogg audio, including Microsoft Edge TTS MP3/WebM output, is transcoded to Ogg/Opus before PTT delivery
    - native Ogg/Opus audio is sent with `audio/ogg; codecs=opus` for voice-note compatibility
    - animated GIF playback is supported via `gifPlayback: true` on video sends
-    - captions are applied to the first media item when sending multi-media reply payloads
+    - captions are applied to the first media item when sending multi-media reply payloads, except PTT voice notes send the audio first and visible text separately because WhatsApp clients do not render voice-note captions consistently
    - media source can be HTTP(S), `file://`, or local paths
  </Accordion>

--- a/docs/tools/tts.md
+++ b/docs/tools/tts.md
@@ -664,6 +664,8 @@ reply delivery. When the channel is Feishu, Matrix, Telegram, or WhatsApp,
 the audio is delivered as a voice message rather than a file attachment.
 Feishu can transcode non-Opus TTS output on this path when `ffmpeg` is
 available.
+WhatsApp sends visible text separately from PTT voice-note audio because clients
+do not consistently render captions on voice notes.
 It accepts optional `channel` and `timeoutMs` fields; `timeoutMs` is a
 per-call provider request timeout in milliseconds.