Files
openclaw/extensions/speech-core/src/audio-transcode.test.ts
Omar Shahine da3d17e1ca fix(tts): pre-transcode synthesized audio to opus-in-CAF for native iMessage voice-memo bubbles via BlueBubbles (#72586)
End-to-end testing on macOS + BlueBubbles + ElevenLabs walked through three CAF flavors before landing on the format Apple's Messages.app actually emits when a user records a native iMessage voice memo:

- PCM int16 @ 44.1 kHz CAF: BlueBubbles' internal `afconvert -f m4af -d aac` conversion fails; the original CAF reaches iMessage but renders with 0 s duration.
- AAC @ 22.05 kHz mono CAF: BlueBubbles' conversion succeeds and the server silently downgrades the delivery, sending the converted MP3 as a generic audio attachment.
- **Opus @ 24 kHz mono CAF**: byte-identical to the descriptor block Apple's Messages.app produces; BlueBubbles passes it through unchanged and iMessage renders a native voice-memo bubble with proper duration and waveform UI.

Adds an opt-in `tts.voice.preferAudioFileFormat` channel capability and a macOS `afconvert`-backed pre-transcode in the speech-core pipeline. BlueBubbles declares `preferAudioFileFormat: "caf"`. Other channels are unaffected. Falls back to the original buffer when the host platform, the source/target pair, or the transcoder process can't produce the preferred container — so non-Darwin hosts and unsupported provider combinations are unchanged.

Also adds a `caff` magic-byte sniff in `src/media/mime.ts` so the auto-reply host-local-media validator (which uses `file-type` and didn't recognize CAF natively) accepts the buffer instead of dropping it as "⚠️ Media failed."

Fixes #72506.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:15:16 -07:00

65 lines
2.3 KiB
TypeScript

import { describe, expect, it } from "vitest";
import { transcodeAudioBuffer } from "./audio-transcode.js";
describe("transcodeAudioBuffer", () => {
it("returns noop-same-container when source and target containers match", async () => {
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "mp3",
targetExtension: ".mp3",
});
expect(result).toEqual({ ok: false, reason: "noop-same-container" });
});
it("returns no-recipe when no afconvert recipe is defined for the requested pair", async () => {
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "mp3",
targetExtension: "flac",
});
expect(result).toEqual({ ok: false, reason: "no-recipe" });
});
it("returns invalid-extension for an empty source extension", async () => {
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "",
targetExtension: "caf",
});
expect(result).toEqual({ ok: false, reason: "invalid-extension" });
});
it("returns invalid-extension for an empty target extension", async () => {
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "mp3",
targetExtension: "",
});
expect(result).toEqual({ ok: false, reason: "invalid-extension" });
});
it("rejects path-traversal style extensions", async () => {
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "../etc/passwd",
targetExtension: "caf",
});
expect(result).toEqual({ ok: false, reason: "invalid-extension" });
});
it("returns platform-unsupported off-Darwin without invoking afconvert", async () => {
if (process.platform === "darwin") {
// macOS: a valid mp3→caf request would proceed to spawn `afconvert`,
// which we don't want to run from a unit test. The Darwin happy path
// is exercised end-to-end via the BlueBubbles voice-memo flow.
return;
}
const result = await transcodeAudioBuffer({
audioBuffer: Buffer.from("payload"),
sourceExtension: "mp3",
targetExtension: "caf",
});
expect(result).toEqual({ ok: false, reason: "platform-unsupported" });
});
});