fix: add placeholder transcript for silent voice notes (#49131)

* fix: add placeholder transcript for silent voice notes * fix: handle placeholder transcripts per skipped attachment * fix: preserve synthetic transcript attachment order * fix: scope synthetic audio merge to audio slice only, preserve cross-capability and prefer ordering Replace the global outputs.sort() with a targeted merge that: 1. Only sorts within the audio output slice (real + synthetic), preserving CAPABILITY_ORDER and per-capability attachments.prefer ordering for non-audio outputs. 2. Excludes synthetic placeholder indexes from audioAttachmentIndexes used by extractFileBlocks, so tiny audio-MIME files with text extensions can still be recovered via forcedTextMime. Adds mergeAudioOutputsPreservingAttachmentOrder helper. * fix: remove unused function and use toSorted() for oxlint compliance * fix(media-understanding): preserve selected audio order for synthetic placeholders - merge synthetic skipped-audio placeholders using audio decision order instead of raw attachmentIndex sorting, preserving attachments.prefer - insert synthetic-only audio outputs at the audio capability slot (before video) when no real audio outputs were produced * fix(media-understanding): use neutral too-small placeholder text Clarify that this synthetic transcript path is triggered by attachment size, not by a silence/no-speech detection result. * test(media-understanding): update too-small audio placeholder expectations * test(media-understanding): cover mixed too-small audio placeholder * test(media-understanding): cover too-small audio context * fix(tasks): preserve visible task title before internal context * Revert "fix(tasks): preserve visible task title before internal context" This reverts commit dc536fb4d3c8a01168de5d05e8562193dd68a88e. --------- Co-authored-by: Eulices Lopez <eulices@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-05-06 17:31:06 +00:00 · 2026-04-26 00:14:01 -04:00
parent bcc9fc4cf5
commit 008e4ca81f
5 changed files with 287 additions and 5 deletions
--- a/docs/nodes/media-understanding.md
+++ b/docs/nodes/media-understanding.md
@@ -130,7 +130,7 @@ Recommended defaults:
 Rules:

 - If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription.
+- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription; inbound reply context receives a deterministic placeholder transcript so the agent knows the note was too small.
 - If the model returns more than `maxChars`, output is trimmed.
 - `prompt` defaults to simple “Describe the {media}.” plus the `maxChars` guidance (image/video only).
 - If the active primary image model already supports vision natively, OpenClaw