docs: cover reply media and voice-call fixes

This commit is contained in:
Peter Steinberger
2026-04-25 05:48:29 +01:00
parent 938b53698e
commit 759fe0bf95
5 changed files with 62 additions and 0 deletions

View File

@@ -33,6 +33,10 @@ scripts:
openclaw voicecall setup --json
```
For external providers (`twilio`, `telnyx`, `plivo`), setup must resolve a public
webhook URL from `publicUrl`, a tunnel, or Tailscale exposure. A loopback/private
serve fallback is rejected because carriers cannot reach it.
`smoke` runs the same readiness checks. It will not place a real phone call
unless both `--to` and `--yes` are present:

View File

@@ -54,6 +54,19 @@ Legend:
`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
### Media delivery with block streaming
`MEDIA:` directives are normal delivery metadata. When block streaming sends a
media block early, OpenClaw remembers that delivery for the turn. If the final
assistant payload repeats the same media URL, the final delivery strips the
duplicate media instead of sending the attachment again.
Exact duplicate final payloads are suppressed. If the final payload adds
distinct text around media that was already streamed, OpenClaw still sends the
new text while keeping the media single-delivery. This prevents duplicate voice
notes or files on channels such as Telegram when an agent emits `MEDIA:` during
streaming and the provider also includes it in the completed reply.
## Chunking algorithm (low/high bounds)
Block chunking is implemented by `EmbeddedBlockChunker`:

View File

@@ -13,6 +13,35 @@ For quick start, QA runners, unit/integration suites, and Docker flows, see
suites: model matrix, CLI backends, ACP, and media-provider live tests, plus
credential handling.
## Live: local profile smoke commands
Source `~/.profile` before ad hoc live checks so provider keys and local tool
paths match your shell:
```bash
source ~/.profile
```
Safe media smoke:
```bash
pnpm openclaw infer tts convert --local --json \
--text "OpenClaw live smoke." \
--output /tmp/openclaw-live-smoke.mp3
```
Safe voice-call readiness smoke:
```bash
pnpm openclaw voicecall setup --json
pnpm openclaw voicecall smoke --to "+15555550123"
```
`voicecall smoke` is a dry run unless `--yes` is also present. Use `--yes` only
when you intentionally want to place a real notify call. For Twilio, Telnyx, and
Plivo, a successful readiness check requires a public webhook URL; local-only
loopback/private fallbacks are rejected by design.
## Live: Android node capability sweep
- Test: `src/gateway/android-node.capabilities.live.test.ts`

View File

@@ -152,6 +152,11 @@ whether the plugin is enabled, the provider and credentials are present, webhook
exposure is configured, and only one audio mode is active. Use
`openclaw voicecall setup --json` for scripts.
For Twilio, Telnyx, and Plivo, setup must resolve to a public webhook URL. If the
configured `publicUrl`, tunnel URL, Tailscale URL, or serve fallback resolves to
loopback or private network space, setup fails instead of starting a provider
that cannot receive real carrier webhooks.
For a no-surprises smoke test, run:
```bash
@@ -478,6 +483,9 @@ Notes:
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
- If a Twilio media stream is already active, Voice Call does not fall back to TwiML `<Say>`. If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths.
- When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from`, `to`, `attempts`) for debugging.
- When Twilio barge-in or stream teardown clears the pending TTS queue, queued
playback requests settle instead of hanging callers that are awaiting playback
completion.
### More examples
@@ -589,6 +597,9 @@ For outbound `conversation` calls, first-message handling is tied to live playba
- Barge-in queue clear and auto-response are suppressed only while the initial greeting is actively speaking.
- If initial playback fails, the call returns to `listening` and the initial message remains queued for retry.
- Initial playback for Twilio streaming starts on stream connect without extra delay.
- Barge-in aborts active playback and clears queued-but-not-yet-playing Twilio
TTS entries. Cleared entries resolve as skipped, so follow-up response logic
can continue without waiting on audio that will never play.
- Realtime voice conversations use the realtime stream's own opening turn. Voice Call does not post a legacy `<Say>` TwiML update for that initial message, so outbound `<Connect><Stream>` sessions stay attached.
### Twilio stream disconnect grace

View File

@@ -15,6 +15,11 @@ Assistant output can carry a small set of delivery/render directives:
These directives are separate. `MEDIA:` and reply/voice tags remain delivery metadata; `[embed ...]` is the web-only rich render path.
When block streaming is enabled, `MEDIA:` remains single-delivery metadata for a
turn. If the same media URL is sent in a streamed block and repeated in the final
assistant payload, OpenClaw delivers the attachment once and strips the duplicate
from the final payload.
## `[embed ...]`
`[embed ...]` is the only agent-facing rich render syntax for the Control UI.