fix(tts): pre-transcode synthesized audio to opus-in-CAF for native iMessage voice-memo bubbles via BlueBubbles (#72586)

End-to-end testing on macOS + BlueBubbles + ElevenLabs walked through three CAF flavors before landing on the format Apple's Messages.app actually emits when a user records a native iMessage voice memo:

- PCM int16 @ 44.1 kHz CAF: BlueBubbles' internal `afconvert -f m4af -d aac` conversion fails; the original CAF reaches iMessage but renders with 0 s duration.
- AAC @ 22.05 kHz mono CAF: BlueBubbles' conversion succeeds and the server silently downgrades the delivery, sending the converted MP3 as a generic audio attachment.
- **Opus @ 24 kHz mono CAF**: byte-identical to the descriptor block Apple's Messages.app produces; BlueBubbles passes it through unchanged and iMessage renders a native voice-memo bubble with proper duration and waveform UI.

Adds an opt-in `tts.voice.preferAudioFileFormat` channel capability and a macOS `afconvert`-backed pre-transcode in the speech-core pipeline. BlueBubbles declares `preferAudioFileFormat: "caf"`. Other channels are unaffected. Falls back to the original buffer when the host platform, the source/target pair, or the transcoder process can't produce the preferred container — so non-Darwin hosts and unsupported provider combinations are unchanged.

Also adds a `caff` magic-byte sniff in `src/media/mime.ts` so the auto-reply host-local-media validator (which uses `file-type` and didn't recognize CAF natively) accepts the buffer instead of dropping it as "⚠️ Media failed."

Fixes #72506.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Omar Shahine
2026-04-27 14:15:16 -07:00
committed by GitHub
parent fb4d9fc4fb
commit da3d17e1ca
9 changed files with 409 additions and 7 deletions

View File

@@ -35,6 +35,13 @@ export const bluebubblesCapabilities: ChannelPlugin<ResolvedBlueBubblesAccount>[
voice: {
synthesisTarget: "audio-file",
audioFileFormats: ["mp3", "caf", "audio/mpeg", "audio/x-caf"],
// Prefer CAF when the host can pre-transcode (afconvert on macOS).
// The BlueBubbles server otherwise races a CAF→MP3 conversion against
// the upload write completing and silently falls back to a generic
// attachment send when its conversion fails. Pre-encoding to CAF
// bypasses that race so iMessage renders the result as a native voice
// memo bubble (waveform UI) instead of a plain audio attachment.
preferAudioFileFormat: "caf",
},
},
reactions: true,