mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 17:31:06 +00:00
fix: add placeholder transcript for silent voice notes (#49131)
* fix: add placeholder transcript for silent voice notes * fix: handle placeholder transcripts per skipped attachment * fix: preserve synthetic transcript attachment order * fix: scope synthetic audio merge to audio slice only, preserve cross-capability and prefer ordering Replace the global outputs.sort() with a targeted merge that: 1. Only sorts within the audio output slice (real + synthetic), preserving CAPABILITY_ORDER and per-capability attachments.prefer ordering for non-audio outputs. 2. Excludes synthetic placeholder indexes from audioAttachmentIndexes used by extractFileBlocks, so tiny audio-MIME files with text extensions can still be recovered via forcedTextMime. Adds mergeAudioOutputsPreservingAttachmentOrder helper. * fix: remove unused function and use toSorted() for oxlint compliance * fix(media-understanding): preserve selected audio order for synthetic placeholders - merge synthetic skipped-audio placeholders using audio decision order instead of raw attachmentIndex sorting, preserving attachments.prefer - insert synthetic-only audio outputs at the audio capability slot (before video) when no real audio outputs were produced * fix(media-understanding): use neutral too-small placeholder text Clarify that this synthetic transcript path is triggered by attachment size, not by a silence/no-speech detection result. * test(media-understanding): update too-small audio placeholder expectations * test(media-understanding): cover mixed too-small audio placeholder * test(media-understanding): cover too-small audio context * fix(tasks): preserve visible task title before internal context * Revert "fix(tasks): preserve visible task title before internal context" This reverts commit dc536fb4d3c8a01168de5d05e8562193dd68a88e. --------- Co-authored-by: Eulices Lopez <eulices@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
@@ -130,7 +130,7 @@ Recommended defaults:
|
||||
Rules:
|
||||
|
||||
- If media exceeds `maxBytes`, that model is skipped and the **next model is tried**.
|
||||
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription.
|
||||
- Audio files smaller than **1024 bytes** are treated as empty/corrupt and skipped before provider/CLI transcription; inbound reply context receives a deterministic placeholder transcript so the agent knows the note was too small.
|
||||
- If the model returns more than `maxChars`, output is trimmed.
|
||||
- `prompt` defaults to simple “Describe the {media}.” plus the `maxChars` guidance (image/video only).
|
||||
- If the active primary image model already supports vision natively, OpenClaw
|
||||
|
||||
Reference in New Issue
Block a user