mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 05:30:42 +00:00
docs: remove refactor notes
This commit is contained in:
@@ -187,8 +187,6 @@ Core owns Talk session semantics. Provider plugins own vendor session setup.
|
||||
Voice-call and Google Meet own telephony/meeting adapters. Browser and native
|
||||
apps own device capture/playback UX.
|
||||
|
||||
The detailed implementation plan lives in [Talk refactor plan](/refactor/talk).
|
||||
|
||||
## Compatibility policy
|
||||
|
||||
For external plugins, compatibility work follows this order:
|
||||
|
||||
@@ -1,448 +0,0 @@
|
||||
---
|
||||
title: "fs-safe Cleanup Plan"
|
||||
summary: "Plan for consolidating OpenClaw filesystem helpers around @openclaw/fs-safe"
|
||||
read_when:
|
||||
- You are refactoring OpenClaw filesystem helpers
|
||||
- You are changing @openclaw/fs-safe imports, wrappers, or plugin SDK file APIs
|
||||
- You are deciding whether a local file helper belongs in OpenClaw or fs-safe
|
||||
---
|
||||
|
||||
## Status
|
||||
|
||||
Implemented on `codex/extract-fs-safe-primitives`. Keep this file as the
|
||||
cleanup checklist for follow-up reviews and future fs-safe surface changes.
|
||||
|
||||
## Goal
|
||||
|
||||
Make OpenClaw's filesystem access boring and predictable:
|
||||
|
||||
- Core code uses one small set of OpenClaw wrappers that apply OpenClaw policy.
|
||||
- Plugin SDK compatibility aliases stay deliberate and documented.
|
||||
- fs-safe keeps a small public story centered on `root()`, with lower-level
|
||||
primitives behind explicit subpaths.
|
||||
- Duplicate JSON, temp, private-store, and path helper names disappear from
|
||||
OpenClaw internals.
|
||||
- Security-sensitive behavior keeps regression tests before names move.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Do not remove public plugin SDK exports in this cleanup. Keep deprecated
|
||||
aliases until a versioned SDK migration removes them.
|
||||
- Do not make fs-safe a sandbox. It remains a library guardrail for local file
|
||||
access, not OS isolation.
|
||||
- Do not convert all absolute-path reads to root-bounded reads. Some OpenClaw
|
||||
paths are trusted absolute paths and should stay explicit.
|
||||
- Do not chase cosmetic import churn without reducing helper count or clarifying
|
||||
trust boundaries.
|
||||
|
||||
## fs-safe Package Pin
|
||||
|
||||
`@openclaw/fs-safe` is published on npm and consumed through a semver range.
|
||||
Fresh checkouts and CI runners should install the package from the public
|
||||
registry, not from a local `link:../fs-safe` checkout or a GitHub tarball.
|
||||
|
||||
Current range:
|
||||
|
||||
- `^0.1.0`
|
||||
|
||||
The published package ships built `dist` files, so OpenClaw should not list it
|
||||
in `pnpm.onlyBuiltDependencies`.
|
||||
|
||||
## Current Shape
|
||||
|
||||
fs-safe's main entry is intentionally narrow:
|
||||
|
||||
- `root`
|
||||
- `FsSafeError`
|
||||
- `categorizeFsSafeError`
|
||||
- root option/result types
|
||||
- Python helper configuration
|
||||
|
||||
The wider surface lives behind subpaths:
|
||||
|
||||
- `/json`
|
||||
- `/store`
|
||||
- `/temp`
|
||||
- `/atomic`
|
||||
- `/root`
|
||||
- `/advanced`
|
||||
- `/archive`
|
||||
- `/walk`
|
||||
|
||||
OpenClaw now keeps fs-safe behind a small wrapper boundary:
|
||||
|
||||
- local `src/infra/*` wrappers for core policy defaults
|
||||
- public plugin SDK aliases, including older names from before fs-safe
|
||||
- package-local utility exports where importing `src/infra` would cross a
|
||||
package boundary
|
||||
|
||||
An import-boundary test rejects new direct fs-safe imports outside those
|
||||
allowed areas.
|
||||
|
||||
## Usage Map
|
||||
|
||||
### Root-bounded access
|
||||
|
||||
Representative use:
|
||||
|
||||
- `src/gateway/server-methods/agents.ts`
|
||||
- `src/agents/pi-tools.read.ts`
|
||||
- `src/agents/apply-patch.ts`
|
||||
- `src/plugins/install.ts`
|
||||
- `src/auto-reply/reply/stage-sandbox-media.ts`
|
||||
- `src/gateway/canvas-documents.ts`
|
||||
|
||||
Keep this family. `root()` is the fs-safe product surface OpenClaw should push
|
||||
callers toward.
|
||||
|
||||
### JSON helpers
|
||||
|
||||
OpenClaw still uses many names for the same operations:
|
||||
|
||||
- `readJsonFile`
|
||||
- `readJsonFileStrict`
|
||||
- `readDurableJsonFile`
|
||||
- `writeJsonAtomic`
|
||||
- `loadJsonFile`
|
||||
- `saveJsonFile`
|
||||
- `readJsonFileWithFallback`
|
||||
- `writeJsonFileAtomically`
|
||||
|
||||
fs-safe's canonical names are clearer:
|
||||
|
||||
- `tryReadJson`
|
||||
- `readJson`
|
||||
- `readJsonIfExists`
|
||||
- `writeJson`
|
||||
- `readJsonSync`
|
||||
- `tryReadJsonSync`
|
||||
- `writeJsonSync`
|
||||
|
||||
This was the highest-value cleanup because it removed naming drift without
|
||||
changing semantics. Compatibility aliases stay in `src/infra/json-files.ts` and
|
||||
plugin SDK barrels.
|
||||
|
||||
### Private state and stores
|
||||
|
||||
Representative use:
|
||||
|
||||
- `src/commitments/store.ts`
|
||||
- `src/agents/models-config.ts`
|
||||
- `src/agents/pi-auth-json.ts`
|
||||
- `src/cron/run-log.ts`
|
||||
- `src/secrets/shared.ts`
|
||||
- `src/infra/device-auth-store.ts`
|
||||
- `src/infra/device-identity.ts`
|
||||
|
||||
Current overlap:
|
||||
|
||||
- `fileStore`
|
||||
- `fileStore({ private: true })`
|
||||
- plugin SDK private-state aliases
|
||||
|
||||
The concepts are now one family. fs-safe exposes private mode through
|
||||
`fileStore({ private: true })`; OpenClaw internals and bundled plugins use
|
||||
store-shaped wrappers instead of standalone private JSON/text helpers.
|
||||
|
||||
### Temp workspaces
|
||||
|
||||
Representative use:
|
||||
|
||||
- `src/media/qr-image.ts`
|
||||
- `extensions/discord/src/send.voice.ts`
|
||||
- `extensions/discord/src/voice/audio.ts`
|
||||
- `extensions/qa-lab/src/temp-dir.test-helper.ts`
|
||||
|
||||
`tempWorkspace` is the stable useful primitive. One-shot temp targets and
|
||||
sibling-temp helpers are lower-level implementation tools.
|
||||
|
||||
### Atomic writes
|
||||
|
||||
Representative use:
|
||||
|
||||
- config and session stores
|
||||
- cron stores
|
||||
- plugin install paths
|
||||
- extension state files
|
||||
|
||||
Keep atomic replacement as a public fs-safe subpath. OpenClaw should use the
|
||||
same canonical JSON/text helpers where possible instead of hand-picking lower
|
||||
level atomic calls for ordinary JSON state.
|
||||
|
||||
### Regular, secure, and root file reads
|
||||
|
||||
These are not true duplicates:
|
||||
|
||||
- `root()` protects root-relative untrusted paths.
|
||||
- regular-file helpers read trusted absolute paths with regular-file checks.
|
||||
- secure-file helpers add ownership and mode checks for secret references.
|
||||
|
||||
Keep them separate. Document the trust boundary instead of hiding it behind one
|
||||
generic "read file" helper.
|
||||
|
||||
### Archive helpers
|
||||
|
||||
Representative use:
|
||||
|
||||
- plugin install
|
||||
- skill install
|
||||
- marketplace and ClawHub archive flows
|
||||
|
||||
Keep as a separate fs-safe subpath. Do not leak archive entry plumbing into
|
||||
OpenClaw core call sites unless the caller is actually validating archive
|
||||
metadata.
|
||||
|
||||
## Target Design
|
||||
|
||||
### OpenClaw imports
|
||||
|
||||
Core OpenClaw code should use local policy wrappers:
|
||||
|
||||
- `src/infra/fs-safe.ts` for common root/error helpers
|
||||
- `src/infra/json-files.ts` for the temporary JSON compatibility layer
|
||||
- `src/infra/private-file-store.ts` until private stores are unified
|
||||
- `src/infra/replace-file.ts` for low-level atomic replacement
|
||||
- `src/infra/boundary-file-read.ts` for loader/package boundary reads
|
||||
- `src/infra/archive.ts` for archive extraction policy
|
||||
- `src/infra/file-lock-manager.ts` for the rare core service that needs
|
||||
manager-style lock lifecycle/diagnostics
|
||||
|
||||
New direct imports from `@openclaw/fs-safe/*` should be reserved for:
|
||||
|
||||
- package-level utilities outside core that cannot import `src/infra`
|
||||
- compatibility shims
|
||||
- code that intentionally consumes a narrow fs-safe subpath, such as
|
||||
`openclaw/plugin-sdk/file-lock` using `@openclaw/fs-safe/file-lock`
|
||||
|
||||
### Plugin SDK exports
|
||||
|
||||
Plugin SDK exports are contractual. Keep aliases even when OpenClaw internals
|
||||
move to canonical names.
|
||||
|
||||
Mark older names as deprecated in types/docs when the replacement is stable:
|
||||
|
||||
- `readJsonFileWithFallback` -> `readJsonIfExists` or a store method
|
||||
- `writeJsonFileAtomically` -> `writeJson`
|
||||
- `loadJsonFile` -> `tryReadJson`
|
||||
- `saveJsonFile` -> `writeJson`
|
||||
- `readFileWithinRoot` -> `root(...).read*`
|
||||
- `writeFileWithinRoot` -> `root(...).write`
|
||||
|
||||
### fs-safe stores
|
||||
|
||||
Move toward one store family:
|
||||
|
||||
```ts
|
||||
const store = fileStore({
|
||||
rootDir,
|
||||
private: true,
|
||||
mode: 0o600,
|
||||
dirMode: 0o700,
|
||||
});
|
||||
```
|
||||
|
||||
or a thin alias:
|
||||
|
||||
```ts
|
||||
const store = stateStore({ rootDir, private: true });
|
||||
```
|
||||
|
||||
The store family should cover:
|
||||
|
||||
- `read`
|
||||
- `readText`
|
||||
- `readJson`
|
||||
- `readTextIfExists`
|
||||
- `readJsonIfExists`
|
||||
- `write`
|
||||
- `writeJson`
|
||||
- `remove`
|
||||
- `exists`
|
||||
- `open`
|
||||
- `copyIn`
|
||||
- `writeStream`
|
||||
- `pruneExpired`
|
||||
|
||||
This cleanup added that store shape in fs-safe, removed the unshipped
|
||||
`privateStateStore` surface, and moved OpenClaw internals and bundled plugins
|
||||
onto explicit store reads/writes.
|
||||
|
||||
### Temp
|
||||
|
||||
Keep stable public temp surface small:
|
||||
|
||||
```ts
|
||||
await using workspace = await tempWorkspace({ prefix: "openclaw-" });
|
||||
const target = workspace.path("payload.bin");
|
||||
```
|
||||
|
||||
Move one-shot temp target helpers and sibling-temp helpers to advanced/internal
|
||||
unless a concrete OpenClaw caller needs the public contract.
|
||||
|
||||
## Refactor Phases
|
||||
|
||||
### Phase 1: Inventory and Guards
|
||||
|
||||
- Add a small import-boundary test that lists allowed direct
|
||||
`@openclaw/fs-safe/*` imports in OpenClaw core.
|
||||
- Add regression tests for the JSON symlink behavior kept by
|
||||
`src/infra/json-file.ts`.
|
||||
- Add regression tests for public plugin SDK aliases that must keep resolving.
|
||||
- Add a doc note to the plugin SDK runtime docs once aliases are marked
|
||||
deprecated.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The current compatibility surface is executable-tested.
|
||||
- New direct fs-safe imports are visible in review.
|
||||
|
||||
### Phase 2: JSON Name Cleanup
|
||||
|
||||
- Convert OpenClaw internal callers from old JSON names to canonical fs-safe
|
||||
names where the semantics are identical.
|
||||
- Keep plugin SDK aliases unchanged.
|
||||
- Collapse `src/infra/json-file.ts` and `src/infra/json-files.ts` into one
|
||||
compatibility module if that reduces indirection without losing symlink
|
||||
semantics.
|
||||
- Keep `saveJsonFile` symlink-target behavior until every caller/test is
|
||||
intentionally migrated.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Core internal code no longer imports `readJsonFileStrict`,
|
||||
`readDurableJsonFile`, or `writeJsonAtomic` unless it is a compatibility shim.
|
||||
- Plugin SDK aliases still pass import/type tests.
|
||||
|
||||
### Phase 3: Store Unification
|
||||
|
||||
- Add the unified private mode to fs-safe's store API.
|
||||
- Remove the unshipped `privateStateStore` surface instead of keeping a second
|
||||
store family.
|
||||
- Migrate OpenClaw private-state internals to the unified store shape in small
|
||||
groups:
|
||||
- auth/profile state
|
||||
- device identity and device auth
|
||||
- cron/run logs
|
||||
- commitments
|
||||
- extension state
|
||||
- Regenerate the plugin SDK API baseline for the intentional pre-release
|
||||
private-helper removal.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- OpenClaw internals and bundled plugins do not call standalone private
|
||||
JSON/text helpers.
|
||||
- `fileStore({ private: true })` is the only private multi-file store API.
|
||||
|
||||
### Phase 4: Temp Simplification
|
||||
|
||||
- Replace OpenClaw one-shot temp target call sites with `tempWorkspace`.
|
||||
- Keep `resolvePreferredOpenClawTmpDir` as OpenClaw policy.
|
||||
- Move one-shot temp and sibling-temp helpers out of the curated OpenClaw
|
||||
wrapper surface.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- OpenClaw uses `tempWorkspace` for temporary file lifetimes unless a low-level
|
||||
atomic helper owns the temp path.
|
||||
|
||||
### Phase 5: Shim Reduction
|
||||
|
||||
- Group one-line fs-safe shims into a smaller number of named OpenClaw policy
|
||||
modules.
|
||||
- Delete shims that are no longer imported.
|
||||
- Keep shims that preserve public SDK names or OpenClaw-specific defaults.
|
||||
|
||||
Candidate stable shims:
|
||||
|
||||
- `src/infra/fs-safe.ts`
|
||||
- `src/infra/json-files.ts`
|
||||
- `src/infra/private-file-store.ts`
|
||||
- `src/infra/replace-file.ts`
|
||||
- `src/infra/boundary-file-read.ts`
|
||||
- `src/infra/archive.ts`
|
||||
|
||||
Candidate advanced-only grouping:
|
||||
|
||||
- path guards
|
||||
- symlink parent guards
|
||||
- hardlink guards
|
||||
- move-path helpers
|
||||
- file identity helpers
|
||||
- sibling temp helpers
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- The local wrapper list has policy meaning, not one file per fs-safe module.
|
||||
|
||||
### Phase 6: fs-safe Public Surface Finalization
|
||||
|
||||
- Keep `@openclaw/fs-safe` main entry curated.
|
||||
- Keep `root()` as the primary README/API story.
|
||||
- Keep `openPinnedFileSync` internal. Use `readSecureFile`, `root().open`, or
|
||||
`openRootFile*` wrappers instead of exposing the fd-level pinned primitive.
|
||||
- Keep `createSidecarLockManager` internal. Public callers should use
|
||||
`acquireFileLock` / `withFileLock`; `createFileLockManager` is subpath-only
|
||||
for long-lived services that need held-lock inspection or drain/reset.
|
||||
- Move rare root escape hatches such as `openWritable` to advanced only if API
|
||||
checks show no supported caller needs the main root interface.
|
||||
- Keep `regular-file`, `secure-file`, archive, and root helpers separate
|
||||
because their trust models differ.
|
||||
- Remove or mark unstable any standalone helper that is fully covered by root or
|
||||
store methods.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- fs-safe has a stable pre-1.0 public surface.
|
||||
- OpenClaw imports only stable fs-safe APIs outside compatibility shims.
|
||||
|
||||
## Verification
|
||||
|
||||
Use targeted proof per phase:
|
||||
|
||||
- JSON cleanup:
|
||||
- JSON symlink tests
|
||||
- plugin SDK JSON-store import tests
|
||||
- representative extension tests that use JSON store aliases
|
||||
- Store unification:
|
||||
- private mode tests in fs-safe
|
||||
- auth profile persistence tests
|
||||
- device identity tests
|
||||
- cron/run-log tests
|
||||
- Temp cleanup:
|
||||
- media temp tests
|
||||
- Discord voice temp tests
|
||||
- QA-lab temp helper tests
|
||||
- Shim reduction:
|
||||
- plugin SDK API generation/check
|
||||
- import-boundary tests
|
||||
- `pnpm build`
|
||||
|
||||
Before merging a broad cleanup batch, run the changed gate and build:
|
||||
|
||||
```sh
|
||||
pnpm check:changed
|
||||
pnpm build
|
||||
```
|
||||
|
||||
Implementation proof from this cleanup:
|
||||
|
||||
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/plugin-sdk/temp-path.test.ts src/agents/models-config.write-serialization.test.ts src/infra/json-file.test.ts src/infra/json-files.test.ts`
|
||||
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/device-auth-store.test.ts src/infra/device-identity.test.ts src/infra/exec-approvals.test.ts src/agents/models-config.write-serialization.test.ts src/agents/pi-embedded-runner/openrouter-model-capabilities.test.ts src/agents/harness/native-hook-relay.test.ts`
|
||||
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/hardlink-guards.test.ts src/infra/file-identity.test.ts src/plugin-sdk/fs-safe-compat.test.ts src/plugin-sdk/temp-path.test.ts`
|
||||
- `pnpm plugin-sdk:api:check`
|
||||
- `pnpm build`
|
||||
- Blacksmith Testbox `pnpm install --frozen-lockfile --config.minimum-release-age=0 && pnpm check:changed`
|
||||
- In `../fs-safe`: `pnpm docs:site && pnpm build && pnpm test test/api-coverage.test.ts test/new-primitives.test.ts`
|
||||
|
||||
## Review Checklist
|
||||
|
||||
- Does this change reduce a public name, local wrapper, or duplicated semantic
|
||||
family?
|
||||
- Is the old name public plugin SDK surface? If yes, keep a deprecated alias.
|
||||
- Does the replacement preserve symlink, hardlink, mode, and missing-file
|
||||
behavior?
|
||||
- Is the caller using an untrusted relative path, trusted absolute path, secret
|
||||
path, archive entry, or temp lifetime? Pick the helper that says that out
|
||||
loud.
|
||||
- Are docs and plugin SDK API snapshots updated when exported names change?
|
||||
@@ -1,320 +0,0 @@
|
||||
---
|
||||
summary: "Detailed API, event, runtime, cancellation, and tool-policy contract for the Talk refactor"
|
||||
read_when:
|
||||
- Implementing Talk Gateway methods or protocol schemas
|
||||
- Changing Talk config, events, cancellation, or provider tool policy
|
||||
- Reviewing whether a Talk behavior belongs in core or an adapter
|
||||
title: "Talk API and runtime contract"
|
||||
---
|
||||
|
||||
# Talk API And Runtime Contract
|
||||
|
||||
This is the detailed contract for [Talk refactor plan](/refactor/talk).
|
||||
|
||||
## Config Contract
|
||||
|
||||
Config stays under the existing `talk` object. Do not add `talk.speech` in this
|
||||
refactor.
|
||||
|
||||
```ts
|
||||
type TalkConfig = {
|
||||
provider?: string;
|
||||
providers?: Record<string, unknown>;
|
||||
realtime?: {
|
||||
provider?: string;
|
||||
model?: string;
|
||||
voice?: string;
|
||||
mode?: TalkMode;
|
||||
transport?: TalkTransport;
|
||||
brain?: TalkBrain;
|
||||
providers?: Record<string, unknown>;
|
||||
};
|
||||
input?: {
|
||||
interruptOnSpeech?: boolean;
|
||||
silenceTimeoutMs?: number;
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- `talk.provider` and `talk.providers.*` remain speech/STT/TTS provider config.
|
||||
- `talk.realtime.provider` and `talk.realtime.providers.*` are realtime voice provider config.
|
||||
- `talk.config` returns effective config without secrets unless privileged.
|
||||
- `talk.catalog` returns capabilities, not inferred provider-id guesses.
|
||||
- Doctor migrates old realtime selectors into `talk.realtime`.
|
||||
- Runtime does not silently reinterpret Voice Call or TTS config as realtime config.
|
||||
|
||||
## Method Semantics
|
||||
|
||||
### `talk.catalog`
|
||||
|
||||
Returns effective Talk capabilities:
|
||||
|
||||
- modes
|
||||
- transports
|
||||
- brain strategies
|
||||
- providers
|
||||
- models
|
||||
- voices
|
||||
- input audio formats
|
||||
- output audio formats
|
||||
- browser-safe client session support
|
||||
- Gateway relay support
|
||||
- managed-room support
|
||||
- local STT/TTS support
|
||||
|
||||
Provider capability declarations drive this. Core must not infer support from
|
||||
provider ids.
|
||||
|
||||
### `talk.speak`
|
||||
|
||||
One-shot TTS:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.speak", {
|
||||
text: "Ready.",
|
||||
voice: "alloy",
|
||||
});
|
||||
```
|
||||
|
||||
`talk.speak` does not create live session state, turn state, transcript state,
|
||||
barge-in state, or provider realtime state.
|
||||
|
||||
### `talk.client.create`
|
||||
|
||||
Creates a client-owned provider session while Gateway still owns config,
|
||||
instructions, credentials, and tool policy.
|
||||
|
||||
Use it for browser WebRTC, browser provider WebSocket, and native provider media
|
||||
sessions that require client-owned sockets. Reject `gateway-relay` and
|
||||
`managed-room`; the error points clients to `talk.session.create`.
|
||||
|
||||
### `talk.client.toolCall`
|
||||
|
||||
Forwards provider tool calls from client-owned provider sessions to Gateway
|
||||
policy:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.client.toolCall", {
|
||||
sessionId,
|
||||
callId,
|
||||
name,
|
||||
argumentsJson,
|
||||
});
|
||||
```
|
||||
|
||||
Validate session identity, caller ownership, brain strategy, and policy. Pass an
|
||||
`AbortSignal` into agent/tool runtime, reject stale or closed sessions, and never
|
||||
accept request-time instructions.
|
||||
|
||||
### `talk.session.create`
|
||||
|
||||
Creates a Gateway-owned live Talk session.
|
||||
|
||||
| Mode | Transport | Brain | Owner |
|
||||
| --------------- | --------------- | --------------- | ------------------- |
|
||||
| `realtime` | `gateway-relay` | `agent-consult` | Gateway |
|
||||
| `transcription` | `gateway-relay` | `none` | Gateway |
|
||||
| `stt-tts` | `managed-room` | `agent-consult` | Gateway/client room |
|
||||
| `stt-tts` | `managed-room` | `direct-tools` | trusted room |
|
||||
|
||||
Reject `webrtc` and `provider-websocket`; the error points clients to
|
||||
`talk.client.create`.
|
||||
|
||||
### `talk.session.join`
|
||||
|
||||
Joins or reconnects to a Gateway-owned managed room. Validate session id and
|
||||
token, never expose token hashes, emit `session.replaced` to the displaced
|
||||
client, and emit `session.ready` to the new owner.
|
||||
|
||||
### `talk.session.appendAudio`
|
||||
|
||||
Appends an input audio frame to a Gateway-owned relay session:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.session.appendAudio", {
|
||||
sessionId,
|
||||
audioBase64,
|
||||
timestamp,
|
||||
});
|
||||
```
|
||||
|
||||
Use for realtime Gateway relay and streaming transcription. Do not use this for
|
||||
managed-room native push-to-talk when the native node captures audio locally and
|
||||
returns transcript/output through node command results.
|
||||
|
||||
### Turn Verbs
|
||||
|
||||
Use explicit verbs instead of generic controls:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.session.startTurn", { sessionId });
|
||||
await gateway.request("talk.session.endTurn", { sessionId, turnId });
|
||||
await gateway.request("talk.session.cancelTurn", { sessionId, turnId, reason });
|
||||
await gateway.request("talk.session.cancelOutput", { sessionId, turnId, reason });
|
||||
```
|
||||
|
||||
`endTurn` rejects stale `turnId` before clearing active state. `cancelTurn`
|
||||
aborts capture, STT, provider response, agent consult, tools, TTS, relay output,
|
||||
and room streams tied to that turn. `cancelOutput` stops assistant audio without
|
||||
necessarily ending the user turn. Barge-in must be speech/VAD gated.
|
||||
|
||||
### `talk.session.submitToolResult`
|
||||
|
||||
Completes a provider tool call emitted inside a Gateway-owned relay session:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.session.submitToolResult", {
|
||||
sessionId,
|
||||
callId,
|
||||
output,
|
||||
});
|
||||
```
|
||||
|
||||
### `talk.session.close`
|
||||
|
||||
Closes a Gateway-owned session. Close emits one terminal event, stops capture and
|
||||
playback, aborts provider and agent work, drains TTS, revokes room join state,
|
||||
and removes retained state after its replay/debug window.
|
||||
|
||||
## Event Contract
|
||||
|
||||
All live Talk paths emit one public event channel:
|
||||
|
||||
```ts
|
||||
talk.event;
|
||||
```
|
||||
|
||||
Every event uses this envelope:
|
||||
|
||||
```ts
|
||||
type TalkEvent<TPayload = unknown> = {
|
||||
id: string;
|
||||
type: TalkEventType;
|
||||
sessionId: string;
|
||||
turnId?: string;
|
||||
captureId?: string;
|
||||
seq: number;
|
||||
timestamp: string;
|
||||
mode: TalkMode;
|
||||
transport: TalkTransport;
|
||||
brain: TalkBrain;
|
||||
provider?: string;
|
||||
final?: boolean;
|
||||
callId?: string;
|
||||
itemId?: string;
|
||||
parentId?: string;
|
||||
source?: string;
|
||||
payload: TPayload;
|
||||
};
|
||||
```
|
||||
|
||||
Core event types include `session.*`, `turn.*`, `capture.*`, `input.audio.*`,
|
||||
`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`,
|
||||
`latency.metrics`, and `health.changed`.
|
||||
|
||||
Rules:
|
||||
|
||||
- `sessionId` is required for every event.
|
||||
- `turnId` is required for turn-bound input, output, transcript, tool, and cancellation events.
|
||||
- `captureId` is required while capture is active.
|
||||
- `seq` monotonically increases per session.
|
||||
- `timestamp` uses ISO 8601 UTC.
|
||||
- `callId`, `itemId`, and `parentId` correlate provider responses, tool calls, TTS jobs, and relay frames.
|
||||
- payloads must not duplicate large raw audio frames when transport already carries them.
|
||||
- consumers should rely on envelope fields instead of provider-specific payloads.
|
||||
|
||||
Text-ready is not audio-ready. Clients may show text after `output.text.done`,
|
||||
but should not enter speaking/playback state until `output.audio.started` or
|
||||
`output.audio.delta`.
|
||||
|
||||
## Shared Runtime Target
|
||||
|
||||
Keep one provider-agnostic runtime under `src/talk`. The first pass keeps names
|
||||
close to the old runtime modules so the move stays reviewable:
|
||||
|
||||
```text
|
||||
src/talk/
|
||||
audio-codec.ts
|
||||
agent-consult-runtime.ts
|
||||
agent-consult-tool.ts
|
||||
agent-talkback-runtime.ts
|
||||
fast-context-runtime.ts
|
||||
provider-registry.ts
|
||||
provider-resolver.ts
|
||||
provider-types.ts
|
||||
session-log-runtime.ts
|
||||
session-runtime.ts
|
||||
talk-events.ts
|
||||
talk-session-controller.ts
|
||||
```
|
||||
|
||||
New code should import the shared runtime from `src/talk` inside core. Plugins
|
||||
that already use the stable SDK subpath keep importing
|
||||
`openclaw/plugin-sdk/realtime-voice`; that facade re-exports the Talk runtime
|
||||
contract without exposing core file layout.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- normalize modes, transports, brains, codecs, and audio metadata
|
||||
- create, close, and replace session records
|
||||
- allocate turn ids and capture ids
|
||||
- reject stale turn ids before mutation
|
||||
- sequence events
|
||||
- retain recent events for replay, reconnect, and diagnostics
|
||||
- track active input capture and assistant output
|
||||
- coordinate barge-in and output cancellation
|
||||
- propagate abort signals
|
||||
- register provider tool calls and bind tool results
|
||||
- expose test builders for session/event assertions
|
||||
|
||||
Gateway method files should become thin adapters:
|
||||
|
||||
```text
|
||||
src/gateway/server-methods/
|
||||
talk.ts
|
||||
talk-client.ts
|
||||
talk-session.ts
|
||||
```
|
||||
|
||||
Internal Gateway helpers may exist only as staging files while code moves to
|
||||
`src/talk`.
|
||||
|
||||
## Cancellation Contract
|
||||
|
||||
Cancellation must abort underlying work, not only ignore stale output.
|
||||
|
||||
When a turn or session is cancelled:
|
||||
|
||||
- provider realtime response is cancelled when supported
|
||||
- provider session is closed or reset when cancellation cannot be scoped
|
||||
- streaming STT receives abort
|
||||
- agent consult receives abort
|
||||
- queued tools do not start after abort
|
||||
- already-started side-effecting tools receive abort and report cancellation
|
||||
- pending TTS jobs are drained
|
||||
- playback sources are stopped
|
||||
- relay streams are cleared
|
||||
- managed-room capture and output state reset
|
||||
- stale finals and stale audio deltas are ignored
|
||||
- one terminal cancellation event is emitted
|
||||
|
||||
Barge-in uses VAD or provider speech-started signals, ignores silence and echo,
|
||||
cancels output only after real user speech, and starts or ensures a turn before
|
||||
emitting `turn.cancelled`.
|
||||
|
||||
## Tool Policy Contract
|
||||
|
||||
Gateway owns Talk tool policy.
|
||||
|
||||
Client-owned flow: `talk.client.create`, provider tool call to client,
|
||||
`talk.client.toolCall`, Gateway policy validation, agent/direct-tool execution,
|
||||
client result submission to provider.
|
||||
|
||||
Gateway-owned flow: `talk.session.create`, provider tool call to Gateway,
|
||||
Gateway policy validation, agent/direct-tool execution, provider result
|
||||
submission, `talk.event` emission.
|
||||
|
||||
No Talk path accepts caller-provided instructions. Gateway builds instructions
|
||||
from trusted config and session context.
|
||||
@@ -1,229 +0,0 @@
|
||||
---
|
||||
summary: "Implementation packages, deletion checklist, test matrix, and verification commands for the Talk refactor"
|
||||
read_when:
|
||||
- Implementing the Talk refactor plan
|
||||
- Deleting legacy Talk RPCs, event channels, or realtime endpoint code
|
||||
- Verifying browser, native, telephony, meeting, STT, or TTS Talk behavior after refactor work
|
||||
title: "Talk refactor execution checklist"
|
||||
---
|
||||
|
||||
# Talk Refactor Execution Checklist
|
||||
|
||||
Use this as the PR tracker for [Talk refactor plan](/refactor/talk).
|
||||
|
||||
## Implementation Packages
|
||||
|
||||
### Package 1: Protocol
|
||||
|
||||
- update `src/gateway/protocol/schema/channels.ts`
|
||||
- update `src/gateway/protocol/schema/protocol-schemas.ts`
|
||||
- update `src/gateway/protocol/schema/types.ts`
|
||||
- update `src/gateway/protocol/index.ts`
|
||||
- regenerate generated protocol clients
|
||||
- remove old schemas from generated metadata
|
||||
- update protocol tests
|
||||
|
||||
Done when old RPC/event names are absent from generated protocol output.
|
||||
|
||||
### Package 2: Gateway Methods
|
||||
|
||||
- split client-owned methods into `talk-client.ts`
|
||||
- keep session-owned methods in `talk-session.ts`
|
||||
- keep catalog/config/speak/mode in `talk.ts`
|
||||
- classify every new method in method scopes
|
||||
- advertise only `talk.event` in hello event features
|
||||
- remove old method list entries
|
||||
- update authorization tests
|
||||
|
||||
Done when every public Talk method has an explicit scope.
|
||||
|
||||
### Package 3: Session Runtime
|
||||
|
||||
- add `src/talk` primitives
|
||||
- move event sequencing into shared runtime
|
||||
- move stale-turn rejection into shared runtime
|
||||
- move active output state into shared runtime
|
||||
- move cancellation bookkeeping into shared runtime
|
||||
- expose small test helpers
|
||||
|
||||
Done when relay, transcription, handoff, telephony, and meetings do not each
|
||||
invent event and turn bookkeeping.
|
||||
|
||||
### Package 4: Browser UI
|
||||
|
||||
- update realtime startup to `talk.client.create`
|
||||
- update realtime tool consult to `talk.client.toolCall`
|
||||
- update relay startup to `talk.session.create`
|
||||
- update relay audio to `talk.session.appendAudio`
|
||||
- update relay tool result to `talk.session.submitToolResult`
|
||||
- update relay output cancel to `talk.session.cancelOutput`
|
||||
- update relay close to `talk.session.close`
|
||||
- listen only to `talk.event`
|
||||
- remove relay mark RPC
|
||||
|
||||
Done when UI tests prove no removed RPC names remain.
|
||||
|
||||
### Package 5: Native And Nodes
|
||||
|
||||
- route native Talk through session events
|
||||
- map push-to-talk commands to managed-room turn lifecycle
|
||||
- clean capture state on failed start
|
||||
- keep local STT/TTS as adapter behavior
|
||||
- remove chat-history polling from the success path
|
||||
- keep fallback polling only if explicitly needed
|
||||
|
||||
Done when native voice success path is event-driven.
|
||||
|
||||
### Package 6: Voice Call
|
||||
|
||||
- map telephony realtime events into `talk.event`
|
||||
- map local speech detection to `startTurn`, `cancelOutput`, and `cancelTurn`
|
||||
- pass abort through agent consult and tools
|
||||
- keep marks, clear, u-law, and call lifecycle in the plugin
|
||||
- add tests for early speech before provider speech-started
|
||||
|
||||
Done when Voice Call shares event and cancellation semantics without leaking
|
||||
telephony into core.
|
||||
|
||||
### Package 7: Meetings
|
||||
|
||||
- map meeting speech and transcript state into `talk.event`
|
||||
- keep participant and room state in meeting adapter
|
||||
- add echo-suppression aware barge-in tests
|
||||
- ensure meeting adapters can choose realtime, transcription, or `stt-tts`
|
||||
|
||||
Done when meeting behavior is an adapter over Talk, not a parallel realtime loop.
|
||||
|
||||
### Package 8: Doctor And Migration
|
||||
|
||||
- detect old realtime selectors outside `talk.realtime`
|
||||
- write explicit `talk.realtime.provider`, `model`, `voice`, `transport`, and `brain`
|
||||
- report removed RPC names when logs show old clients
|
||||
- keep startup free of hidden config rewrites
|
||||
- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs
|
||||
|
||||
Done when runtime config is explicit and docs mention removed API only in
|
||||
migration notes.
|
||||
|
||||
## Deletion Checklist
|
||||
|
||||
Delete or prove absent:
|
||||
|
||||
- `src/gateway/voiceclaw-realtime/`
|
||||
- `/voiceclaw/realtime`
|
||||
- `instructionsOverride`
|
||||
- `talk.realtime.*` public RPCs
|
||||
- `talk.transcription.*` public RPCs
|
||||
- `talk.handoff.*` public RPCs
|
||||
- `talk.session.inputAudio`
|
||||
- `talk.session.control`
|
||||
- `talk.session.toolResult`
|
||||
- `talk.realtime.relay`
|
||||
- `talk.transcription.relay`
|
||||
- old generated protocol models
|
||||
- old UI relay method calls
|
||||
|
||||
Keep only these old names in explicit migration tables.
|
||||
|
||||
## Test Matrix
|
||||
|
||||
Protocol:
|
||||
|
||||
- final methods exist in protocol schemas
|
||||
- removed methods are absent from protocol schemas
|
||||
- final event is advertised in hello features
|
||||
- removed events are absent from broadcast guards
|
||||
- generated clients match schema
|
||||
- request-time instruction override is rejected or impossible by schema
|
||||
|
||||
Gateway:
|
||||
|
||||
- `talk.client.create` creates WebRTC session result
|
||||
- `talk.client.create` creates provider WebSocket session result
|
||||
- `talk.client.create` rejects Gateway-owned transports
|
||||
- `talk.client.toolCall` validates caller, session, brain, and policy
|
||||
- `talk.session.create` creates realtime Gateway relay
|
||||
- `talk.session.create` creates transcription relay
|
||||
- `talk.session.create` creates STT/TTS managed room
|
||||
- `talk.session.create` rejects client-owned transports
|
||||
- `talk.session.join` replacement notifies displaced client
|
||||
- `talk.session.appendAudio` routes to relay/transcription session
|
||||
- `talk.session.startTurn` starts managed-room turn
|
||||
- `talk.session.endTurn` rejects stale turn ids
|
||||
- `talk.session.cancelTurn` aborts provider, agent, tools, TTS, and streams
|
||||
- `talk.session.cancelOutput` cancels playback only
|
||||
- `talk.session.submitToolResult` binds to provider call id
|
||||
- `talk.session.close` emits terminal event and releases resources
|
||||
|
||||
Browser:
|
||||
|
||||
- WebRTC path calls `talk.client.create`
|
||||
- provider WebSocket path calls `talk.client.create`
|
||||
- provider tool calls use `talk.client.toolCall`
|
||||
- Gateway relay uses only `talk.session.*`
|
||||
- Gateway relay listens only to `talk.event`
|
||||
- barge-in requires speech/VAD
|
||||
- relay close rejects or aborts pending consult runs
|
||||
- no removed RPC names in UI tests
|
||||
|
||||
Native:
|
||||
|
||||
- push-to-talk start emits capture/turn events
|
||||
- failed push-to-talk start cleans capture state
|
||||
- cancel clears capture and output state
|
||||
- STT/TTS success path is event-driven
|
||||
- fallback polling is explicit and tested if kept
|
||||
- node policy rejects untrusted Talk commands
|
||||
|
||||
Telephony:
|
||||
|
||||
- early speech before provider speech-started creates or guards turn before cancellation
|
||||
- marks and clear events map to output state
|
||||
- u-law codec stays adapter-owned
|
||||
- cancellation aborts consult run
|
||||
- closed call prevents stale tool result submission
|
||||
|
||||
Meetings:
|
||||
|
||||
- participant context appears as metadata, not core branching
|
||||
- echo suppression prevents false barge-in
|
||||
- transcript events use common envelope
|
||||
- meeting close aborts active work
|
||||
|
||||
Architecture:
|
||||
|
||||
- no removed public RPC names in protocol metadata
|
||||
- no retired realtime endpoint route
|
||||
- no retired realtime folder
|
||||
- no request-time instruction override field
|
||||
- no core branches on app platform names
|
||||
- provider behavior comes from capabilities
|
||||
|
||||
## Verification Commands
|
||||
|
||||
Focused local loop:
|
||||
|
||||
```sh
|
||||
pnpm test src/gateway/protocol/index.test.ts
|
||||
pnpm test src/gateway/server-methods/talk.test.ts
|
||||
pnpm test src/gateway/method-scopes.test.ts src/gateway/server-methods-list.test.ts
|
||||
pnpm test src/gateway/talk-realtime-relay.test.ts src/gateway/talk-transcription-relay.test.ts
|
||||
pnpm test ui/src/ui/realtime-talk.test.ts ui/src/ui/realtime-talk-gateway-relay.test.ts ui/src/ui/realtime-talk-webrtc.test.ts ui/src/ui/realtime-talk-google-live.test.ts
|
||||
pnpm exec oxfmt --check --threads=1 docs/refactor/talk.md docs/refactor/talk-execution.md
|
||||
```
|
||||
|
||||
Generation and docs:
|
||||
|
||||
```sh
|
||||
pnpm protocol:gen && pnpm protocol:gen:swift
|
||||
pnpm docs:check-mdx
|
||||
pnpm plugin-sdk:api:check
|
||||
```
|
||||
|
||||
Broad gate before push:
|
||||
|
||||
```sh
|
||||
pnpm check:changed
|
||||
```
|
||||
|
||||
Use Testbox for broad gates on maintainer machines.
|
||||
@@ -1,128 +0,0 @@
|
||||
---
|
||||
summary: "Surface adapter plan for browser, native, walkie-talkie, telephony, and meeting Talk refactor work"
|
||||
read_when:
|
||||
- Updating browser realtime Talk, native Talk, walkie-talkie handoff, Voice Call, or meeting voice code
|
||||
- Deciding whether a Talk behavior belongs in an adapter or shared runtime
|
||||
title: "Talk surface mapping"
|
||||
---
|
||||
|
||||
# Talk Surface Mapping
|
||||
|
||||
This maps product surfaces into [Talk refactor plan](/refactor/talk) primitives.
|
||||
|
||||
## Browser
|
||||
|
||||
WebRTC:
|
||||
|
||||
- call `talk.client.create`
|
||||
- open provider media connection in browser
|
||||
- forward provider tool calls through `talk.client.toolCall`
|
||||
- receive provider audio through provider media/data channel
|
||||
|
||||
Provider WebSocket:
|
||||
|
||||
- call `talk.client.create`
|
||||
- connect using constrained provider result
|
||||
- keep provider-specific framing in the browser adapter
|
||||
- forward tool calls through `talk.client.toolCall`
|
||||
|
||||
Gateway relay:
|
||||
|
||||
- call `talk.session.create`
|
||||
- send PCM frames with `talk.session.appendAudio`
|
||||
- listen only to `talk.event`
|
||||
- submit tool results with `talk.session.submitToolResult`
|
||||
- barge-in with `talk.session.cancelOutput`
|
||||
- close with `talk.session.close`
|
||||
|
||||
## Native And Nodes
|
||||
|
||||
Native apps map local audio lifecycle into Talk primitives.
|
||||
|
||||
Native realtime:
|
||||
|
||||
- use `talk.client.create` when the app owns provider media
|
||||
- use `talk.session.create` when Gateway owns provider relay
|
||||
|
||||
Native STT/TTS:
|
||||
|
||||
- use `talk.session.create({ mode: "stt-tts", transport: "managed-room" })`
|
||||
- keep local STT and local TTS behind native adapters
|
||||
- drive success path from Talk events
|
||||
- keep history polling only as a degraded fallback if explicitly tested
|
||||
|
||||
Native push-to-talk:
|
||||
|
||||
- press maps to `talk.session.startTurn`
|
||||
- release maps to `talk.session.endTurn`
|
||||
- cancel maps to `talk.session.cancelTurn`
|
||||
- node capture commands emit capture events
|
||||
- failed start cleans capture state
|
||||
- opening voice UI never mutates global Talk config
|
||||
|
||||
Trusted node command adapters may remain:
|
||||
|
||||
```ts
|
||||
talk.ptt.start;
|
||||
talk.ptt.stop;
|
||||
talk.ptt.cancel;
|
||||
talk.ptt.once;
|
||||
```
|
||||
|
||||
## Walkie-Talkie
|
||||
|
||||
Walkie-talkie is managed-room Talk:
|
||||
|
||||
```ts
|
||||
await gateway.request("talk.session.create", {
|
||||
mode: "stt-tts",
|
||||
transport: "managed-room",
|
||||
brain: "agent-consult",
|
||||
sessionKey,
|
||||
});
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
- client joins with `talk.session.join`
|
||||
- press calls `talk.session.startTurn`
|
||||
- release calls `talk.session.endTurn`
|
||||
- cancel calls `talk.session.cancelTurn`
|
||||
- assistant speech emits `output.text.*` and `output.audio.*`
|
||||
- replacement emits `session.replaced` to old owner
|
||||
- close calls `talk.session.close`
|
||||
|
||||
Room state includes canonical session id, route/channel target, caller identity,
|
||||
mode, transport, brain, provider, model, voice, locale, expiry, token hash,
|
||||
active client id, active turn id, and replacement state.
|
||||
|
||||
Two simultaneous rooms must not share turn ids, transcripts, audio output, or
|
||||
cancellation tokens.
|
||||
|
||||
## Telephony
|
||||
|
||||
Voice Call becomes a telephony adapter over Talk semantics.
|
||||
|
||||
Keep telephony-owned: Twilio/Plivo WebSocket contracts, stream ids, call ids,
|
||||
G.711 u-law, marks, clear events, backpressure, phone call lifecycle, and inbound
|
||||
speech detection quirks.
|
||||
|
||||
Move shared behavior to Talk: event envelope, turn ids, cancellation, agent
|
||||
consult abort, tool policy, usage and latency metrics, and output state.
|
||||
|
||||
Telephony should emit `talk.event` for observability, even if phone media
|
||||
remains plugin-owned.
|
||||
|
||||
## Meetings
|
||||
|
||||
Google Meet and future meeting integrations become meeting adapters over Talk
|
||||
semantics.
|
||||
|
||||
Keep meeting-owned: meeting join/leave, participant identity, room permissions,
|
||||
echo suppression, transcript context, and meeting-specific mute/deafen behavior.
|
||||
|
||||
Move shared behavior to Talk: turn lifecycle, transcript events, assistant output
|
||||
events, tool policy, cancellation, and metrics.
|
||||
|
||||
Meeting adapters may run `transcription`, `stt-tts`, or `realtime` depending on
|
||||
provider support.
|
||||
@@ -1,499 +0,0 @@
|
||||
---
|
||||
summary: "Breaking refactor plan for one Talk architecture across realtime voice, STT/TTS, browser, native, telephony, meetings, and walkie-talkie handoff"
|
||||
read_when:
|
||||
- Refactoring Talk mode, realtime voice, voice-call, Google Meet, browser realtime voice, native push-to-talk, STT, or TTS
|
||||
- Changing Talk Gateway protocol, provider contracts, realtime transports, managed rooms, audio events, cancellation, or tool policy
|
||||
- Deciding whether a voice feature belongs in core, a provider plugin, a native app, a meeting adapter, or a telephony adapter
|
||||
title: "Talk refactor plan"
|
||||
---
|
||||
|
||||
# Talk Refactor Plan
|
||||
|
||||
This is the breaking-clean plan for unifying every live voice path behind one
|
||||
Talk architecture.
|
||||
|
||||
The old architecture grew by product surface: browser realtime, Gateway relay,
|
||||
managed native handoff, streaming transcription, Voice Call, Google Meet, local
|
||||
STT/TTS, one-shot TTS, and a retired realtime WebSocket endpoint each learned
|
||||
their own names for sessions, turns, capture, output, barge-in, tool calls,
|
||||
cancellation, and transcript events.
|
||||
|
||||
The new architecture grows by primitive. There is one public Talk API, one
|
||||
event envelope, one turn model, one cancellation contract, one provider policy
|
||||
boundary, and one place for shared runtime state. Browser, native, telephony,
|
||||
meetings, and walkie-talkie become adapters over those primitives.
|
||||
|
||||
## Product Target
|
||||
|
||||
OpenClaw supports three Talk products:
|
||||
|
||||
| Product | User experience | Mode |
|
||||
| --------------------- | ----------------------------------------------------------------------- | --------------- |
|
||||
| Realtime conversation | Low-latency duplex speech with interruption and provider tool calls | `realtime` |
|
||||
| Walkie-talkie | Press or hold to speak, release, then hear OpenClaw answer | `stt-tts` |
|
||||
| Transcription | Live captions, dictation, notes, meeting transcript, no assistant audio | `transcription` |
|
||||
|
||||
All three products share session identity, join/reconnect state, turn and
|
||||
capture ids, input audio metadata, output text/audio state, transcript finality,
|
||||
tool-call correlation, cancellation, replay, provider capabilities, policy,
|
||||
auth, and observability.
|
||||
|
||||
One-shot uploaded audio and one-shot TTS do not need live Talk session state
|
||||
unless they participate in live capture, turns, interruption, replay, or
|
||||
cancellation.
|
||||
|
||||
## Hard Decisions
|
||||
|
||||
This refactor intentionally removes compatibility that would keep the design
|
||||
muddy:
|
||||
|
||||
- remove public `talk.realtime.*` RPCs
|
||||
- remove public `talk.transcription.*` RPCs
|
||||
- remove public `talk.handoff.*` RPCs
|
||||
- remove generic `talk.session.inputAudio`, `talk.session.control`, and
|
||||
`talk.session.toolResult`
|
||||
- remove old relay event channels
|
||||
- remove `/voiceclaw/realtime`
|
||||
- remove `src/gateway/voiceclaw-realtime/`
|
||||
- remove request-time instruction overrides
|
||||
- keep `talk.speak` as one-shot TTS, not a live session API
|
||||
- keep legacy realtime config repair in doctor, not startup
|
||||
- keep platform and product names out of core branching
|
||||
|
||||
## Vocabulary
|
||||
|
||||
Keep mode, transport, brain, and surface separate.
|
||||
|
||||
```ts
|
||||
type TalkMode = "realtime" | "stt-tts" | "transcription";
|
||||
|
||||
type TalkTransport = "webrtc" | "provider-websocket" | "gateway-relay" | "managed-room";
|
||||
|
||||
type TalkBrain = "agent-consult" | "direct-tools" | "none";
|
||||
```
|
||||
|
||||
### Modes
|
||||
|
||||
`realtime` means a provider owns a live voice session. Audio goes in, audio
|
||||
comes out, interruptions are possible, and provider tool calls may happen during
|
||||
one provider session.
|
||||
|
||||
`stt-tts` means input speech is transcribed, OpenClaw answers as text, and TTS
|
||||
renders the answer. This is the native Talk and walkie-talkie path when a full
|
||||
duplex provider session is not the right shape.
|
||||
|
||||
`transcription` means speech-to-text without assistant audio output. It covers
|
||||
captions, dictation, notes, meeting transcript capture, and live voice-note
|
||||
ingestion.
|
||||
|
||||
### Transports
|
||||
|
||||
`webrtc` is client-owned SDP/media/data-channel transport. It fits browser-owned
|
||||
OpenAI Realtime sessions with ephemeral credentials.
|
||||
|
||||
`provider-websocket` is client-owned provider JSON and audio framing. It fits
|
||||
browser-owned Google Live style sessions.
|
||||
|
||||
`gateway-relay` means the Gateway owns the provider connection. The client sends
|
||||
authenticated audio frames to the Gateway and receives `talk.event` plus audio
|
||||
output through Gateway-managed relay state.
|
||||
|
||||
`managed-room` means the Gateway owns a room-like session that clients can join,
|
||||
replace, and drive with explicit turn verbs. It is the primitive for
|
||||
walkie-talkie and native handoff.
|
||||
|
||||
Telephony and meetings are not core transports. They are adapters that map
|
||||
phone or meeting media into `gateway-relay`, `managed-room`, or `stt-tts` while
|
||||
keeping call and meeting lifecycle outside core.
|
||||
|
||||
### Brain Strategies
|
||||
|
||||
`agent-consult` means provider tool calls or session turns consult an OpenClaw
|
||||
agent. Gateway owns prompt construction, context selection, authorization, abort
|
||||
signals, and final result delivery.
|
||||
|
||||
`direct-tools` means a trusted first-party surface can call selected OpenClaw
|
||||
tools directly through Gateway policy. Keep this privileged.
|
||||
|
||||
`none` means transcription-only, external orchestration, or no OpenClaw tool
|
||||
access.
|
||||
|
||||
## Ownership Boundaries
|
||||
|
||||
Core owns generic Talk semantics:
|
||||
|
||||
- mode, transport, brain, codec, and audio descriptors
|
||||
- session records and session ownership
|
||||
- turn ids and capture ids
|
||||
- event envelope, sequencing, replay, and stale-output suppression
|
||||
- active capture state
|
||||
- active assistant output state
|
||||
- replacement and reconnect state
|
||||
- cancellation propagation
|
||||
- tool policy and tool-call correlation
|
||||
- usage, latency, and health events
|
||||
|
||||
Provider plugins own vendor behavior:
|
||||
|
||||
- OpenAI Realtime SDP and data-channel details
|
||||
- Google Live WebSocket framing
|
||||
- streaming STT provider details
|
||||
- TTS provider details
|
||||
- provider auth, model, voice, codec, and resume quirks
|
||||
- provider capability declarations
|
||||
|
||||
Surface adapters own IO and product quirks:
|
||||
|
||||
- browser capture and playback
|
||||
- native audio sessions, local speech engines, and foreground Talk UX
|
||||
- node command dispatch
|
||||
- telephony media streams, marks, clear events, u-law, and call lifecycle
|
||||
- meeting join/leave, participants, echo suppression, and authorization
|
||||
|
||||
Core may store optional surface metadata for diagnostics. Core must not branch
|
||||
on browser, iOS, Android, macOS, Google Meet, Voice Call, or any retired product
|
||||
name.
|
||||
|
||||
## Final Gateway API
|
||||
|
||||
The public Gateway surface is deliberately small:
|
||||
|
||||
```ts
|
||||
// Discovery and configuration.
|
||||
talk.catalog;
|
||||
talk.config;
|
||||
|
||||
// One-shot speech output.
|
||||
talk.speak;
|
||||
|
||||
// Client-owned provider sessions.
|
||||
talk.client.create;
|
||||
talk.client.toolCall;
|
||||
|
||||
// Gateway-owned live sessions.
|
||||
talk.session.create;
|
||||
talk.session.join;
|
||||
talk.session.appendAudio;
|
||||
talk.session.startTurn;
|
||||
talk.session.endTurn;
|
||||
talk.session.cancelTurn;
|
||||
talk.session.cancelOutput;
|
||||
talk.session.submitToolResult;
|
||||
talk.session.close;
|
||||
|
||||
// Events and foreground node mode.
|
||||
talk.event;
|
||||
talk.mode;
|
||||
```
|
||||
|
||||
Use `talk.client.*` when the client owns provider media transport. Use
|
||||
`talk.session.*` when the Gateway owns live session state.
|
||||
|
||||
`talk.mode` is the existing foreground node mode broadcast. It can stay, but it
|
||||
is not part of the Talk session control API.
|
||||
|
||||
### Supported Creation Matrix
|
||||
|
||||
| Method | Mode | Transport | Brain | Owner |
|
||||
| --------------------- | --------------- | -------------------- | --------------- | ------- |
|
||||
| `talk.client.create` | `realtime` | `webrtc` | `agent-consult` | client |
|
||||
| `talk.client.create` | `realtime` | `provider-websocket` | `agent-consult` | client |
|
||||
| `talk.session.create` | `realtime` | `gateway-relay` | `agent-consult` | Gateway |
|
||||
| `talk.session.create` | `transcription` | `gateway-relay` | `none` | Gateway |
|
||||
| `talk.session.create` | `stt-tts` | `managed-room` | `agent-consult` | Gateway |
|
||||
| `talk.session.create` | `stt-tts` | `managed-room` | `direct-tools` | Gateway |
|
||||
|
||||
Reject combinations that blur ownership. `talk.client.create` must reject
|
||||
Gateway-owned transports. `talk.session.create` must reject client-owned
|
||||
transports.
|
||||
|
||||
## Removed API
|
||||
|
||||
Remove these names from handlers, method lists, scopes, protocol schemas,
|
||||
generated clients, broadcast guards, tests, and docs except explicit migration
|
||||
tables:
|
||||
|
||||
| Removed | Replacement |
|
||||
| ------------------------------- | -------------------------------------------------------- |
|
||||
| `talk.realtime.session` | `talk.client.create` |
|
||||
| `talk.realtime.toolCall` | `talk.client.toolCall` |
|
||||
| `talk.realtime.relayAudio` | `talk.session.appendAudio` |
|
||||
| `talk.realtime.relayCancel` | `talk.session.cancelOutput` or `talk.session.cancelTurn` |
|
||||
| `talk.realtime.relayMark` | internal relay output state |
|
||||
| `talk.realtime.relayToolResult` | `talk.session.submitToolResult` |
|
||||
| `talk.realtime.relayClose` | `talk.session.close` |
|
||||
| `talk.realtime.relay` | `talk.event` |
|
||||
| `talk.transcription.session` | `talk.session.create({ mode: "transcription" })` |
|
||||
| `talk.transcription.audio` | `talk.session.appendAudio` |
|
||||
| `talk.transcription.cancel` | `talk.session.cancelTurn` |
|
||||
| `talk.transcription.close` | `talk.session.close` |
|
||||
| `talk.transcription.relay` | `talk.event` |
|
||||
| `talk.handoff.create` | `talk.session.create({ transport: "managed-room" })` |
|
||||
| `talk.handoff.join` | `talk.session.join` |
|
||||
| `talk.handoff.revoke` | `talk.session.close` |
|
||||
| `talk.session.inputAudio` | `talk.session.appendAudio` |
|
||||
| `talk.session.control` | explicit turn/output verbs |
|
||||
| `talk.session.toolResult` | `talk.session.submitToolResult` |
|
||||
|
||||
Delete this endpoint:
|
||||
|
||||
```text
|
||||
/voiceclaw/realtime
|
||||
```
|
||||
|
||||
Delete this folder:
|
||||
|
||||
```text
|
||||
src/gateway/voiceclaw-realtime/
|
||||
```
|
||||
|
||||
Do not leave a compatibility namespace around retired code.
|
||||
|
||||
## Target Source Layout
|
||||
|
||||
Shared runtime:
|
||||
|
||||
```text
|
||||
src/talk/
|
||||
audio-codec.ts
|
||||
agent-consult-runtime.ts
|
||||
agent-consult-tool.ts
|
||||
agent-talkback-runtime.ts
|
||||
fast-context-runtime.ts
|
||||
provider-registry.ts
|
||||
provider-resolver.ts
|
||||
provider-types.ts
|
||||
session-log-runtime.ts
|
||||
session-runtime.ts
|
||||
talk-events.ts
|
||||
talk-session-controller.ts
|
||||
```
|
||||
|
||||
Gateway adapters:
|
||||
|
||||
```text
|
||||
src/gateway/server-methods/
|
||||
talk.ts # catalog, config, speak, mode, composition
|
||||
talk-client.ts # client-owned provider sessions
|
||||
talk-session.ts # Gateway-owned live sessions
|
||||
```
|
||||
|
||||
Gateway relay helpers can exist while the code moves, but the long-term shape
|
||||
is that relay, transcription, and handoff state use `src/talk` primitives
|
||||
instead of each reimplementing turns and events.
|
||||
|
||||
Public SDK:
|
||||
|
||||
```text
|
||||
src/plugin-sdk/realtime-voice.ts
|
||||
```
|
||||
|
||||
Keep this SDK subpath as the stable plugin import facade. It may re-export
|
||||
Talk runtime contracts, but plugin authors should not import core file layout.
|
||||
|
||||
## Event Contract
|
||||
|
||||
All live paths emit `talk.event` with the envelope defined in
|
||||
[Talk API and runtime contract](/refactor/talk-api-contract). The required
|
||||
shape is: `id`, `type`, `sessionId`, `seq`, `timestamp`, `mode`, `transport`,
|
||||
`brain`, and `payload`, with `turnId`, `captureId`, `callId`, `itemId`, and
|
||||
`parentId` when the event is tied to turn, capture, provider item, tool call, or
|
||||
TTS output.
|
||||
|
||||
Core event families are `session.*`, `turn.*`, `capture.*`, `input.audio.*`,
|
||||
`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`,
|
||||
`latency.metrics`, and `health.changed`. Payloads must not duplicate large raw
|
||||
audio frames when the transport already carries them. Text-ready is not
|
||||
audio-ready; clients enter playback state only on audio events.
|
||||
|
||||
## Cancellation Contract
|
||||
|
||||
Cancellation must abort underlying work, not only ignore stale output.
|
||||
|
||||
When a turn or session is cancelled:
|
||||
|
||||
- provider realtime response is cancelled when supported
|
||||
- provider session is closed or reset when cancellation cannot be scoped
|
||||
- streaming STT receives abort
|
||||
- agent consult receives abort
|
||||
- queued tools do not start after abort
|
||||
- already-started side-effecting tools receive abort and report cancellation
|
||||
- pending TTS jobs are drained
|
||||
- playback sources are stopped
|
||||
- relay streams are cleared
|
||||
- managed-room capture and output state reset
|
||||
- stale finals and stale audio deltas are ignored
|
||||
- one terminal cancellation event is emitted
|
||||
|
||||
Barge-in requires real speech: provider speech-started, local VAD, or an
|
||||
adapter-owned speech detector. Silence, echo, or microphone buffers alone must
|
||||
not cancel assistant output.
|
||||
|
||||
## Config Contract
|
||||
|
||||
Config stays under `talk`; do not add `talk.speech`. `talk.provider` and
|
||||
`talk.providers.*` remain speech/STT/TTS provider config. Realtime selectors
|
||||
live under `talk.realtime.provider`, `talk.realtime.providers.*`, `model`,
|
||||
`voice`, `mode`, `transport`, and `brain`.
|
||||
|
||||
`talk.config` returns effective config without secrets unless privileged.
|
||||
`talk.catalog` returns provider capabilities, not inferred provider-id guesses.
|
||||
Doctor migrates old realtime placement into `talk.realtime`; runtime startup
|
||||
does not reinterpret Voice Call, STT, or TTS config as realtime config.
|
||||
|
||||
## Surface Mapping
|
||||
|
||||
| Surface | Talk mapping |
|
||||
| ------------------------------- | ----------------------------------------------------------------------------------------------------- |
|
||||
| Browser WebRTC | `talk.client.create`, client-owned provider media, `talk.client.toolCall` for provider tool calls |
|
||||
| Browser provider WebSocket | `talk.client.create`, browser-owned provider framing, Gateway-owned credentials and policy |
|
||||
| Browser Gateway relay | `talk.session.create`, `appendAudio`, `submitToolResult`, `cancelOutput`, `close`, and `talk.event` |
|
||||
| Native push-to-talk | `stt-tts` plus `managed-room`; press/startTurn, release/endTurn, cancel/cancelTurn |
|
||||
| Walkie-talkie | managed-room join/replacement plus shared turn/output events |
|
||||
| Voice Call | telephony adapter over Talk events; call ids, stream ids, u-law, marks, clear events stay plugin side |
|
||||
| Google Meet and future meetings | meeting adapter over Talk events; participant state, permissions, mute, and echo suppression stay out |
|
||||
|
||||
See [Talk surface mapping](/refactor/talk-surfaces) for the adapter-level
|
||||
rules.
|
||||
|
||||
## Detailed Refactor Phases
|
||||
|
||||
### Phase 1: Protocol Is The Source Of Truth
|
||||
|
||||
- define final `talk.client.*`, `talk.session.*`, `talk.event`, `talk.catalog`, `talk.config`, `talk.speak`, and `talk.mode`
|
||||
- delete removed RPCs from method lists and generated metadata
|
||||
- delete removed event channels from hello feature advertising
|
||||
- classify every final method in `METHOD_SCOPE_GROUPS`
|
||||
- regenerate TypeScript and Swift protocol clients
|
||||
- add protocol tests proving removed names are absent
|
||||
|
||||
Exit criteria: generated clients expose only the final public Talk API.
|
||||
|
||||
### Phase 2: Shared Runtime Becomes `src/talk`
|
||||
|
||||
- move provider-agnostic realtime voice modules into `src/talk`
|
||||
- keep the plugin SDK facade at `openclaw/plugin-sdk/realtime-voice`
|
||||
- rename logs and tests from realtime-voice wording to Talk wording where that improves clarity
|
||||
- centralize event sequencing, active turn state, capture state, output state, stale-turn rejection, and replay history
|
||||
- keep provider adapters out of this folder
|
||||
|
||||
Exit criteria: core and bundled surfaces import shared semantics from `src/talk`
|
||||
or the SDK facade, not from surface-local helpers.
|
||||
|
||||
### Phase 3: Gateway Method Split
|
||||
|
||||
- make `talk.ts` a composition point for catalog, config, speak, mode, client, and session handlers
|
||||
- put client-owned provider session methods in `talk-client.ts`
|
||||
- put Gateway-owned session methods in `talk-session.ts`
|
||||
- make relay, transcription, and managed-room handlers thin adapters over shared runtime primitives
|
||||
- route session replacement notifications to the displaced connection
|
||||
- reject stale turn completion before mutating active room state
|
||||
|
||||
Exit criteria: public RPC handlers read like API adapters, not separate Talk
|
||||
implementations.
|
||||
|
||||
### Phase 4: Browser UI Uses The Final API
|
||||
|
||||
- update WebRTC and provider WebSocket startup to `talk.client.create`
|
||||
- update browser provider tool calls to `talk.client.toolCall`
|
||||
- update Gateway relay startup to `talk.session.create`
|
||||
- update relay audio to `talk.session.appendAudio`
|
||||
- update relay tool result submission to `talk.session.submitToolResult`
|
||||
- update relay close to `talk.session.close`
|
||||
- listen only to `talk.event`
|
||||
- handle aborted consult runs immediately instead of timing out
|
||||
- gate relay barge-in on speech or VAD
|
||||
|
||||
Exit criteria: UI tests contain no calls to removed Talk RPC names.
|
||||
|
||||
### Phase 5: Native And Nodes Become Event-Driven
|
||||
|
||||
- map native push-to-talk into managed-room sessions
|
||||
- start, end, cancel, and replace turns through explicit session verbs
|
||||
- clean capture state when push-to-talk start fails
|
||||
- keep local STT and TTS as native adapter behavior
|
||||
- remove chat-history polling from the success path
|
||||
- keep fallback polling only if there is an explicit degraded-mode test
|
||||
|
||||
Exit criteria: native Talk success path is driven by `talk.event`, not hidden
|
||||
chat side effects.
|
||||
|
||||
### Phase 6: Telephony And Meetings Become Adapters
|
||||
|
||||
- map Voice Call realtime and streaming STT into Talk event/cancellation semantics
|
||||
- create or guard a turn before early speech cancellation events
|
||||
- keep telephony codec, marks, clear events, and call lifecycle outside core
|
||||
- map Google Meet transcript and assistant output into `talk.event`
|
||||
- keep participant and echo-suppression behavior in the meeting adapter
|
||||
- pass abort signals into agent consult and tool runtime
|
||||
|
||||
Exit criteria: Voice Call and meetings share event and cancellation semantics
|
||||
without introducing telephony or meeting branches in core.
|
||||
|
||||
### Phase 7: Config And Doctor Cleanup
|
||||
|
||||
- keep `talk.provider` and `talk.providers.*` as speech/STT/TTS config
|
||||
- keep realtime voice selectors under `talk.realtime`
|
||||
- make `talk.config` return only resolved effective provider data
|
||||
- repair legacy realtime placement in doctor
|
||||
- document that runtime startup does not guess or rewrite config
|
||||
- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs
|
||||
|
||||
Exit criteria: no second speech namespace, no startup migrations, and no
|
||||
ambiguous active provider in `talk.config`.
|
||||
|
||||
### Phase 8: Delete The Retired Stack
|
||||
|
||||
- remove `/voiceclaw/realtime`
|
||||
- delete `src/gateway/voiceclaw-realtime/`
|
||||
- remove request-time `instructionsOverride`
|
||||
- remove old RPC handlers, scopes, broadcast guards, protocol schemas, generated clients, docs, and UI calls
|
||||
- keep old names only in explicit migration tables and negative tests
|
||||
|
||||
Exit criteria: repository search finds removed public names only in migration
|
||||
notes or tests that assert absence.
|
||||
|
||||
## Test And Verification Plan
|
||||
|
||||
The full matrix lives in
|
||||
[Talk refactor execution checklist](/refactor/talk-execution). The required
|
||||
proof areas are:
|
||||
|
||||
- protocol and generated clients expose only the final Talk API
|
||||
- Gateway tests cover every `talk.client.*` and `talk.session.*` method
|
||||
- UI tests prove browser WebRTC, provider WebSocket, and relay paths use the final API
|
||||
- native tests prove managed-room push-to-talk cleanup, replacement, and event flow
|
||||
- Voice Call and meeting tests prove early speech, barge-in, output state, and cancellation behavior
|
||||
- config tests prove `talk.config` reports only resolved effective provider data
|
||||
- architecture searches prove removed RPCs, events, endpoint, folder, and instruction override stay gone
|
||||
- docs, protocol generation, SDK API checks, Android tests, build, and `pnpm check:changed` pass before push
|
||||
|
||||
## Definition Of Done
|
||||
|
||||
The refactor is complete when:
|
||||
|
||||
- final API is the only advertised public API
|
||||
- removed RPCs are gone from handlers, scopes, method lists, schemas, generated clients, docs, and UI
|
||||
- removed event channels are gone
|
||||
- retired realtime HTTP endpoint is gone
|
||||
- retired realtime folder is gone
|
||||
- browser Talk works through `talk.client.*` or `talk.session.*`
|
||||
- native Talk works through session events
|
||||
- streaming STT works through `talk.session.*`
|
||||
- TTS one-shot remains `talk.speak`
|
||||
- walkie-talkie works through managed-room sessions
|
||||
- Voice Call and meetings use shared events and cancellation semantics
|
||||
- cancellation aborts underlying work
|
||||
- event envelopes are consistent
|
||||
- config migration is handled by doctor
|
||||
- tests prove the deleted API cannot accidentally return
|
||||
|
||||
Supporting details:
|
||||
|
||||
- [Talk API and runtime contract](/refactor/talk-api-contract)
|
||||
- [Talk surface mapping](/refactor/talk-surfaces)
|
||||
- [Talk refactor execution checklist](/refactor/talk-execution)
|
||||
|
||||
The end state: one Talk system, a small public API, provider-owned vendor
|
||||
logic, surface-owned IO, and a Gateway core that owns policy, events, sessions,
|
||||
turns, cancellation, and observability.
|
||||
Reference in New Issue
Block a user