mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 06:50:43 +00:00
docs: cover reply media and voice-call fixes
This commit is contained in:
@@ -33,6 +33,10 @@ scripts:
|
||||
openclaw voicecall setup --json
|
||||
```
|
||||
|
||||
For external providers (`twilio`, `telnyx`, `plivo`), setup must resolve a public
|
||||
webhook URL from `publicUrl`, a tunnel, or Tailscale exposure. A loopback/private
|
||||
serve fallback is rejected because carriers cannot reach it.
|
||||
|
||||
`smoke` runs the same readiness checks. It will not place a real phone call
|
||||
unless both `--to` and `--yes` are present:
|
||||
|
||||
|
||||
@@ -54,6 +54,19 @@ Legend:
|
||||
|
||||
`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
|
||||
|
||||
### Media delivery with block streaming
|
||||
|
||||
`MEDIA:` directives are normal delivery metadata. When block streaming sends a
|
||||
media block early, OpenClaw remembers that delivery for the turn. If the final
|
||||
assistant payload repeats the same media URL, the final delivery strips the
|
||||
duplicate media instead of sending the attachment again.
|
||||
|
||||
Exact duplicate final payloads are suppressed. If the final payload adds
|
||||
distinct text around media that was already streamed, OpenClaw still sends the
|
||||
new text while keeping the media single-delivery. This prevents duplicate voice
|
||||
notes or files on channels such as Telegram when an agent emits `MEDIA:` during
|
||||
streaming and the provider also includes it in the completed reply.
|
||||
|
||||
## Chunking algorithm (low/high bounds)
|
||||
|
||||
Block chunking is implemented by `EmbeddedBlockChunker`:
|
||||
|
||||
@@ -13,6 +13,35 @@ For quick start, QA runners, unit/integration suites, and Docker flows, see
|
||||
suites: model matrix, CLI backends, ACP, and media-provider live tests, plus
|
||||
credential handling.
|
||||
|
||||
## Live: local profile smoke commands
|
||||
|
||||
Source `~/.profile` before ad hoc live checks so provider keys and local tool
|
||||
paths match your shell:
|
||||
|
||||
```bash
|
||||
source ~/.profile
|
||||
```
|
||||
|
||||
Safe media smoke:
|
||||
|
||||
```bash
|
||||
pnpm openclaw infer tts convert --local --json \
|
||||
--text "OpenClaw live smoke." \
|
||||
--output /tmp/openclaw-live-smoke.mp3
|
||||
```
|
||||
|
||||
Safe voice-call readiness smoke:
|
||||
|
||||
```bash
|
||||
pnpm openclaw voicecall setup --json
|
||||
pnpm openclaw voicecall smoke --to "+15555550123"
|
||||
```
|
||||
|
||||
`voicecall smoke` is a dry run unless `--yes` is also present. Use `--yes` only
|
||||
when you intentionally want to place a real notify call. For Twilio, Telnyx, and
|
||||
Plivo, a successful readiness check requires a public webhook URL; local-only
|
||||
loopback/private fallbacks are rejected by design.
|
||||
|
||||
## Live: Android node capability sweep
|
||||
|
||||
- Test: `src/gateway/android-node.capabilities.live.test.ts`
|
||||
|
||||
@@ -152,6 +152,11 @@ whether the plugin is enabled, the provider and credentials are present, webhook
|
||||
exposure is configured, and only one audio mode is active. Use
|
||||
`openclaw voicecall setup --json` for scripts.
|
||||
|
||||
For Twilio, Telnyx, and Plivo, setup must resolve to a public webhook URL. If the
|
||||
configured `publicUrl`, tunnel URL, Tailscale URL, or serve fallback resolves to
|
||||
loopback or private network space, setup fails instead of starting a provider
|
||||
that cannot receive real carrier webhooks.
|
||||
|
||||
For a no-surprises smoke test, run:
|
||||
|
||||
```bash
|
||||
@@ -478,6 +483,9 @@ Notes:
|
||||
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
|
||||
- If a Twilio media stream is already active, Voice Call does not fall back to TwiML `<Say>`. If telephony TTS is unavailable in that state, the playback request fails instead of mixing two playback paths.
|
||||
- When telephony TTS falls back to a secondary provider, Voice Call logs a warning with the provider chain (`from`, `to`, `attempts`) for debugging.
|
||||
- When Twilio barge-in or stream teardown clears the pending TTS queue, queued
|
||||
playback requests settle instead of hanging callers that are awaiting playback
|
||||
completion.
|
||||
|
||||
### More examples
|
||||
|
||||
@@ -589,6 +597,9 @@ For outbound `conversation` calls, first-message handling is tied to live playba
|
||||
- Barge-in queue clear and auto-response are suppressed only while the initial greeting is actively speaking.
|
||||
- If initial playback fails, the call returns to `listening` and the initial message remains queued for retry.
|
||||
- Initial playback for Twilio streaming starts on stream connect without extra delay.
|
||||
- Barge-in aborts active playback and clears queued-but-not-yet-playing Twilio
|
||||
TTS entries. Cleared entries resolve as skipped, so follow-up response logic
|
||||
can continue without waiting on audio that will never play.
|
||||
- Realtime voice conversations use the realtime stream's own opening turn. Voice Call does not post a legacy `<Say>` TwiML update for that initial message, so outbound `<Connect><Stream>` sessions stay attached.
|
||||
|
||||
### Twilio stream disconnect grace
|
||||
|
||||
@@ -15,6 +15,11 @@ Assistant output can carry a small set of delivery/render directives:
|
||||
|
||||
These directives are separate. `MEDIA:` and reply/voice tags remain delivery metadata; `[embed ...]` is the web-only rich render path.
|
||||
|
||||
When block streaming is enabled, `MEDIA:` remains single-delivery metadata for a
|
||||
turn. If the same media URL is sent in a streamed block and repeated in the final
|
||||
assistant payload, OpenClaw delivers the attachment once and strips the duplicate
|
||||
from the final payload.
|
||||
|
||||
## `[embed ...]`
|
||||
|
||||
`[embed ...]` is the only agent-facing rich render syntax for the Control UI.
|
||||
|
||||
Reference in New Issue
Block a user