mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 07:10:43 +00:00
The tts tool previously returned a fixed "Generated audio reply." string in its content, so session transcripts lost what was actually spoken. Across every channel, a voice-only reply left no text record for future turns, forcing users to recover transcripts from the provider's API. Echo the synthesized text back in the tool result content (audio still delivered via details.media). Sanitize the transcript before embedding so crafted utterances cannot inject reply directives when tool output is rendered in verbose mode: MEDIA: at line start and [[…]] markers are interrupted with a zero-width word joiner (U+2060) that defuses parseReplyDirectives without altering the visible text.