fix: add placeholder transcript for silent voice notes (#49131)

* fix: add placeholder transcript for silent voice notes

* fix: handle placeholder transcripts per skipped attachment

* fix: preserve synthetic transcript attachment order

* fix: scope synthetic audio merge to audio slice only, preserve cross-capability and prefer ordering

Replace the global outputs.sort() with a targeted merge that:
1. Only sorts within the audio output slice (real + synthetic),
   preserving CAPABILITY_ORDER and per-capability attachments.prefer
   ordering for non-audio outputs.
2. Excludes synthetic placeholder indexes from audioAttachmentIndexes
   used by extractFileBlocks, so tiny audio-MIME files with text
   extensions can still be recovered via forcedTextMime.

Adds mergeAudioOutputsPreservingAttachmentOrder helper.

* fix: remove unused function and use toSorted() for oxlint compliance

* fix(media-understanding): preserve selected audio order for synthetic placeholders

- merge synthetic skipped-audio placeholders using audio decision order
  instead of raw attachmentIndex sorting, preserving attachments.prefer
- insert synthetic-only audio outputs at the audio capability slot
  (before video) when no real audio outputs were produced

* fix(media-understanding): use neutral too-small placeholder text

Clarify that this synthetic transcript path is triggered by attachment size,
not by a silence/no-speech detection result.

* test(media-understanding): update too-small audio placeholder expectations

* test(media-understanding): cover mixed too-small audio placeholder

* test(media-understanding): cover too-small audio context

* fix(tasks): preserve visible task title before internal context

* Revert "fix(tasks): preserve visible task title before internal context"

This reverts commit dc536fb4d3c8a01168de5d05e8562193dd68a88e.

---------

Co-authored-by: Eulices Lopez <eulices@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
Eulices
2026-04-26 00:14:01 -04:00
committed by GitHub
parent bcc9fc4cf5
commit 008e4ca81f
5 changed files with 287 additions and 5 deletions

View File

@@ -130,7 +130,7 @@ Recommended defaults:
Rules:
- If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription.
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription; inbound reply context receives a deterministic placeholder transcript so the agent knows the note was too small.
- If the model returns more than `maxChars`, output is trimmed.
- `prompt` defaults to simple “Describe the {media}.” plus the `maxChars` guidance (image/video only).
- If the active primary image model already supports vision natively, OpenClaw