fix(gateway): skip text-only assistant media supplements

Gate WebChat assistant-media transcript supplements on resolved display media so stale TTS/media refs cannot persist a text-only gateway-injected duplicate.

Keep resolved media supplements and non-agent command fallback behavior covered by adjacent tests.

Fixes #73956.
This commit is contained in:
Hemant Sudarshan
2026-05-02 10:52:02 +05:30
committed by GitHub
parent 63c9fbcfa3
commit d5dbc45eb6
3 changed files with 46 additions and 0 deletions

View File

@@ -33,6 +33,7 @@ Docs: https://docs.openclaw.ai
- Telegram: inherit the process DNS result order for Bot API transport and downgrade recovered sticky IPv4 fallback promotions to debug logs, while keeping pinned-IP escalation warnings visible. Fixes #75904. Thanks @highfly-hi and @neeravmakwana.
- Web search/MiniMax: allow `MINIMAX_OAUTH_TOKEN` to satisfy MiniMax Search credentials, so OAuth-authorized MiniMax Token Plan setups do not need a separate web-search key. Fixes #65768. Thanks @kikibrian and @zhouhe-xydt.
- Providers/MiniMax: derive Coding Plan usage polling from the configured MiniMax base URL, so global setups no longer query the CN usage host. Fixes #65054. Thanks @sixone74 and @Yanhu007.
- Control UI/WebChat: skip assistant-media transcript supplements when stale media refs resolve to no playable media, so text-only final replies are not stored a second time as gateway-injected assistant messages. Fixes #73956. Thanks @HemantSudarshan.
- Sessions: reject `sessions_send` targets that resolve to thread-scoped chat sessions, so inter-agent coordination cannot be injected into active human-facing Slack or Discord threads. Fixes #52496. Thanks @barry-p5cc.
- Subagents: honor `sessions_spawn` with `expectsCompletionMessage: false` by skipping parent completion handoff delivery while still running child cleanup. Fixes #75848. Thanks @alfredjbclaw.
- Media/completions: treat media-only message-tool sends as delivered async completion output, avoiding duplicate raw `MEDIA:` fallback posts after video or music generation finishes.

View File

@@ -715,6 +715,48 @@ describe("chat directive tag stripping for non-streaming final payloads", () =>
);
});
it("does not persist agent media supplements when no playable media resolves", async () => {
const transcriptDir = createTranscriptFixture("openclaw-chat-send-agent-stale-tts-");
const staleAudioPath = path.join(transcriptDir, "stale.mp3");
mockState.config = {
agents: {
defaults: {
workspace: transcriptDir,
},
},
};
mockState.triggerAgentRunStart = true;
mockState.dispatchedReplies = [
{
kind: "final",
payload: {
text: "Text-only test: one clean reply, no TTS, no media, no tool narration.",
mediaUrl: staleAudioPath,
mediaUrls: [staleAudioPath],
trustedLocalMedia: true,
},
},
];
const respond = vi.fn();
const context = createChatContext();
await runNonStreamingChatSend({
context,
respond,
idempotencyKey: "idem-stale-agent-media",
expectBroadcast: false,
waitFor: "dedupe",
});
const assistantUpdates = mockState.emittedTranscriptUpdates.filter(
(update) =>
typeof update.message === "object" &&
update.message !== null &&
(update.message as { role?: unknown }).role === "assistant",
);
expect(assistantUpdates).toEqual([]);
});
it("keeps visible text on non-agent TTS final media because no model transcript exists", async () => {
const transcriptDir = createTranscriptFixture("openclaw-chat-send-command-tts-final-");
const audioPath = path.join(transcriptDir, "tts.mp3");

View File

@@ -2265,6 +2265,9 @@ export const chatHandlers: GatewayRequestHandlers = {
const persistedContentForAppend = hasAssistantDisplayMediaContent(persistedAssistantContent)
? persistedAssistantContent
: undefined;
if (!persistedContentForAppend?.length) {
return;
}
const transcriptReply =
mediaMessage?.transcriptText ??
extractAssistantDisplayTextFromContent(assistantContent) ??