From 71cd132f1f3dec572c8b17ede9072542120c6acd Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Wed, 6 May 2026 02:40:34 +0100 Subject: [PATCH] docs: remove refactor notes --- docs/plugins/sdk-migration.md | 2 - docs/refactor/fs-cleanup.md | 448 -------------------------- docs/refactor/talk-api-contract.md | 320 ------------------ docs/refactor/talk-execution.md | 229 ------------- docs/refactor/talk-surfaces.md | 128 -------- docs/refactor/talk.md | 499 ----------------------------- 6 files changed, 1626 deletions(-) delete mode 100644 docs/refactor/fs-cleanup.md delete mode 100644 docs/refactor/talk-api-contract.md delete mode 100644 docs/refactor/talk-execution.md delete mode 100644 docs/refactor/talk-surfaces.md delete mode 100644 docs/refactor/talk.md diff --git a/docs/plugins/sdk-migration.md b/docs/plugins/sdk-migration.md index 153edf10924..8ddb9f322a8 100644 --- a/docs/plugins/sdk-migration.md +++ b/docs/plugins/sdk-migration.md @@ -187,8 +187,6 @@ Core owns Talk session semantics. Provider plugins own vendor session setup. Voice-call and Google Meet own telephony/meeting adapters. Browser and native apps own device capture/playback UX. -The detailed implementation plan lives in [Talk refactor plan](/refactor/talk). - ## Compatibility policy For external plugins, compatibility work follows this order: diff --git a/docs/refactor/fs-cleanup.md b/docs/refactor/fs-cleanup.md deleted file mode 100644 index 0e5481c6bdc..00000000000 --- a/docs/refactor/fs-cleanup.md +++ /dev/null @@ -1,448 +0,0 @@ ---- -title: "fs-safe Cleanup Plan" -summary: "Plan for consolidating OpenClaw filesystem helpers around @openclaw/fs-safe" -read_when: - - You are refactoring OpenClaw filesystem helpers - - You are changing @openclaw/fs-safe imports, wrappers, or plugin SDK file APIs - - You are deciding whether a local file helper belongs in OpenClaw or fs-safe ---- - -## Status - -Implemented on `codex/extract-fs-safe-primitives`. Keep this file as the -cleanup checklist for follow-up reviews and future fs-safe surface changes. - -## Goal - -Make OpenClaw's filesystem access boring and predictable: - -- Core code uses one small set of OpenClaw wrappers that apply OpenClaw policy. -- Plugin SDK compatibility aliases stay deliberate and documented. -- fs-safe keeps a small public story centered on `root()`, with lower-level - primitives behind explicit subpaths. -- Duplicate JSON, temp, private-store, and path helper names disappear from - OpenClaw internals. -- Security-sensitive behavior keeps regression tests before names move. - -## Non-goals - -- Do not remove public plugin SDK exports in this cleanup. Keep deprecated - aliases until a versioned SDK migration removes them. -- Do not make fs-safe a sandbox. It remains a library guardrail for local file - access, not OS isolation. -- Do not convert all absolute-path reads to root-bounded reads. Some OpenClaw - paths are trusted absolute paths and should stay explicit. -- Do not chase cosmetic import churn without reducing helper count or clarifying - trust boundaries. - -## fs-safe Package Pin - -`@openclaw/fs-safe` is published on npm and consumed through a semver range. -Fresh checkouts and CI runners should install the package from the public -registry, not from a local `link:../fs-safe` checkout or a GitHub tarball. - -Current range: - -- `^0.1.0` - -The published package ships built `dist` files, so OpenClaw should not list it -in `pnpm.onlyBuiltDependencies`. - -## Current Shape - -fs-safe's main entry is intentionally narrow: - -- `root` -- `FsSafeError` -- `categorizeFsSafeError` -- root option/result types -- Python helper configuration - -The wider surface lives behind subpaths: - -- `/json` -- `/store` -- `/temp` -- `/atomic` -- `/root` -- `/advanced` -- `/archive` -- `/walk` - -OpenClaw now keeps fs-safe behind a small wrapper boundary: - -- local `src/infra/*` wrappers for core policy defaults -- public plugin SDK aliases, including older names from before fs-safe -- package-local utility exports where importing `src/infra` would cross a - package boundary - -An import-boundary test rejects new direct fs-safe imports outside those -allowed areas. - -## Usage Map - -### Root-bounded access - -Representative use: - -- `src/gateway/server-methods/agents.ts` -- `src/agents/pi-tools.read.ts` -- `src/agents/apply-patch.ts` -- `src/plugins/install.ts` -- `src/auto-reply/reply/stage-sandbox-media.ts` -- `src/gateway/canvas-documents.ts` - -Keep this family. `root()` is the fs-safe product surface OpenClaw should push -callers toward. - -### JSON helpers - -OpenClaw still uses many names for the same operations: - -- `readJsonFile` -- `readJsonFileStrict` -- `readDurableJsonFile` -- `writeJsonAtomic` -- `loadJsonFile` -- `saveJsonFile` -- `readJsonFileWithFallback` -- `writeJsonFileAtomically` - -fs-safe's canonical names are clearer: - -- `tryReadJson` -- `readJson` -- `readJsonIfExists` -- `writeJson` -- `readJsonSync` -- `tryReadJsonSync` -- `writeJsonSync` - -This was the highest-value cleanup because it removed naming drift without -changing semantics. Compatibility aliases stay in `src/infra/json-files.ts` and -plugin SDK barrels. - -### Private state and stores - -Representative use: - -- `src/commitments/store.ts` -- `src/agents/models-config.ts` -- `src/agents/pi-auth-json.ts` -- `src/cron/run-log.ts` -- `src/secrets/shared.ts` -- `src/infra/device-auth-store.ts` -- `src/infra/device-identity.ts` - -Current overlap: - -- `fileStore` -- `fileStore({ private: true })` -- plugin SDK private-state aliases - -The concepts are now one family. fs-safe exposes private mode through -`fileStore({ private: true })`; OpenClaw internals and bundled plugins use -store-shaped wrappers instead of standalone private JSON/text helpers. - -### Temp workspaces - -Representative use: - -- `src/media/qr-image.ts` -- `extensions/discord/src/send.voice.ts` -- `extensions/discord/src/voice/audio.ts` -- `extensions/qa-lab/src/temp-dir.test-helper.ts` - -`tempWorkspace` is the stable useful primitive. One-shot temp targets and -sibling-temp helpers are lower-level implementation tools. - -### Atomic writes - -Representative use: - -- config and session stores -- cron stores -- plugin install paths -- extension state files - -Keep atomic replacement as a public fs-safe subpath. OpenClaw should use the -same canonical JSON/text helpers where possible instead of hand-picking lower -level atomic calls for ordinary JSON state. - -### Regular, secure, and root file reads - -These are not true duplicates: - -- `root()` protects root-relative untrusted paths. -- regular-file helpers read trusted absolute paths with regular-file checks. -- secure-file helpers add ownership and mode checks for secret references. - -Keep them separate. Document the trust boundary instead of hiding it behind one -generic "read file" helper. - -### Archive helpers - -Representative use: - -- plugin install -- skill install -- marketplace and ClawHub archive flows - -Keep as a separate fs-safe subpath. Do not leak archive entry plumbing into -OpenClaw core call sites unless the caller is actually validating archive -metadata. - -## Target Design - -### OpenClaw imports - -Core OpenClaw code should use local policy wrappers: - -- `src/infra/fs-safe.ts` for common root/error helpers -- `src/infra/json-files.ts` for the temporary JSON compatibility layer -- `src/infra/private-file-store.ts` until private stores are unified -- `src/infra/replace-file.ts` for low-level atomic replacement -- `src/infra/boundary-file-read.ts` for loader/package boundary reads -- `src/infra/archive.ts` for archive extraction policy -- `src/infra/file-lock-manager.ts` for the rare core service that needs - manager-style lock lifecycle/diagnostics - -New direct imports from `@openclaw/fs-safe/*` should be reserved for: - -- package-level utilities outside core that cannot import `src/infra` -- compatibility shims -- code that intentionally consumes a narrow fs-safe subpath, such as - `openclaw/plugin-sdk/file-lock` using `@openclaw/fs-safe/file-lock` - -### Plugin SDK exports - -Plugin SDK exports are contractual. Keep aliases even when OpenClaw internals -move to canonical names. - -Mark older names as deprecated in types/docs when the replacement is stable: - -- `readJsonFileWithFallback` -> `readJsonIfExists` or a store method -- `writeJsonFileAtomically` -> `writeJson` -- `loadJsonFile` -> `tryReadJson` -- `saveJsonFile` -> `writeJson` -- `readFileWithinRoot` -> `root(...).read*` -- `writeFileWithinRoot` -> `root(...).write` - -### fs-safe stores - -Move toward one store family: - -```ts -const store = fileStore({ - rootDir, - private: true, - mode: 0o600, - dirMode: 0o700, -}); -``` - -or a thin alias: - -```ts -const store = stateStore({ rootDir, private: true }); -``` - -The store family should cover: - -- `read` -- `readText` -- `readJson` -- `readTextIfExists` -- `readJsonIfExists` -- `write` -- `writeJson` -- `remove` -- `exists` -- `open` -- `copyIn` -- `writeStream` -- `pruneExpired` - -This cleanup added that store shape in fs-safe, removed the unshipped -`privateStateStore` surface, and moved OpenClaw internals and bundled plugins -onto explicit store reads/writes. - -### Temp - -Keep stable public temp surface small: - -```ts -await using workspace = await tempWorkspace({ prefix: "openclaw-" }); -const target = workspace.path("payload.bin"); -``` - -Move one-shot temp target helpers and sibling-temp helpers to advanced/internal -unless a concrete OpenClaw caller needs the public contract. - -## Refactor Phases - -### Phase 1: Inventory and Guards - -- Add a small import-boundary test that lists allowed direct - `@openclaw/fs-safe/*` imports in OpenClaw core. -- Add regression tests for the JSON symlink behavior kept by - `src/infra/json-file.ts`. -- Add regression tests for public plugin SDK aliases that must keep resolving. -- Add a doc note to the plugin SDK runtime docs once aliases are marked - deprecated. - -Exit criteria: - -- The current compatibility surface is executable-tested. -- New direct fs-safe imports are visible in review. - -### Phase 2: JSON Name Cleanup - -- Convert OpenClaw internal callers from old JSON names to canonical fs-safe - names where the semantics are identical. -- Keep plugin SDK aliases unchanged. -- Collapse `src/infra/json-file.ts` and `src/infra/json-files.ts` into one - compatibility module if that reduces indirection without losing symlink - semantics. -- Keep `saveJsonFile` symlink-target behavior until every caller/test is - intentionally migrated. - -Exit criteria: - -- Core internal code no longer imports `readJsonFileStrict`, - `readDurableJsonFile`, or `writeJsonAtomic` unless it is a compatibility shim. -- Plugin SDK aliases still pass import/type tests. - -### Phase 3: Store Unification - -- Add the unified private mode to fs-safe's store API. -- Remove the unshipped `privateStateStore` surface instead of keeping a second - store family. -- Migrate OpenClaw private-state internals to the unified store shape in small - groups: - - auth/profile state - - device identity and device auth - - cron/run logs - - commitments - - extension state -- Regenerate the plugin SDK API baseline for the intentional pre-release - private-helper removal. - -Exit criteria: - -- OpenClaw internals and bundled plugins do not call standalone private - JSON/text helpers. -- `fileStore({ private: true })` is the only private multi-file store API. - -### Phase 4: Temp Simplification - -- Replace OpenClaw one-shot temp target call sites with `tempWorkspace`. -- Keep `resolvePreferredOpenClawTmpDir` as OpenClaw policy. -- Move one-shot temp and sibling-temp helpers out of the curated OpenClaw - wrapper surface. - -Exit criteria: - -- OpenClaw uses `tempWorkspace` for temporary file lifetimes unless a low-level - atomic helper owns the temp path. - -### Phase 5: Shim Reduction - -- Group one-line fs-safe shims into a smaller number of named OpenClaw policy - modules. -- Delete shims that are no longer imported. -- Keep shims that preserve public SDK names or OpenClaw-specific defaults. - -Candidate stable shims: - -- `src/infra/fs-safe.ts` -- `src/infra/json-files.ts` -- `src/infra/private-file-store.ts` -- `src/infra/replace-file.ts` -- `src/infra/boundary-file-read.ts` -- `src/infra/archive.ts` - -Candidate advanced-only grouping: - -- path guards -- symlink parent guards -- hardlink guards -- move-path helpers -- file identity helpers -- sibling temp helpers - -Exit criteria: - -- The local wrapper list has policy meaning, not one file per fs-safe module. - -### Phase 6: fs-safe Public Surface Finalization - -- Keep `@openclaw/fs-safe` main entry curated. -- Keep `root()` as the primary README/API story. -- Keep `openPinnedFileSync` internal. Use `readSecureFile`, `root().open`, or - `openRootFile*` wrappers instead of exposing the fd-level pinned primitive. -- Keep `createSidecarLockManager` internal. Public callers should use - `acquireFileLock` / `withFileLock`; `createFileLockManager` is subpath-only - for long-lived services that need held-lock inspection or drain/reset. -- Move rare root escape hatches such as `openWritable` to advanced only if API - checks show no supported caller needs the main root interface. -- Keep `regular-file`, `secure-file`, archive, and root helpers separate - because their trust models differ. -- Remove or mark unstable any standalone helper that is fully covered by root or - store methods. - -Exit criteria: - -- fs-safe has a stable pre-1.0 public surface. -- OpenClaw imports only stable fs-safe APIs outside compatibility shims. - -## Verification - -Use targeted proof per phase: - -- JSON cleanup: - - JSON symlink tests - - plugin SDK JSON-store import tests - - representative extension tests that use JSON store aliases -- Store unification: - - private mode tests in fs-safe - - auth profile persistence tests - - device identity tests - - cron/run-log tests -- Temp cleanup: - - media temp tests - - Discord voice temp tests - - QA-lab temp helper tests -- Shim reduction: - - plugin SDK API generation/check - - import-boundary tests - - `pnpm build` - -Before merging a broad cleanup batch, run the changed gate and build: - -```sh -pnpm check:changed -pnpm build -``` - -Implementation proof from this cleanup: - -- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/plugin-sdk/temp-path.test.ts src/agents/models-config.write-serialization.test.ts src/infra/json-file.test.ts src/infra/json-files.test.ts` -- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/device-auth-store.test.ts src/infra/device-identity.test.ts src/infra/exec-approvals.test.ts src/agents/models-config.write-serialization.test.ts src/agents/pi-embedded-runner/openrouter-model-capabilities.test.ts src/agents/harness/native-hook-relay.test.ts` -- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/hardlink-guards.test.ts src/infra/file-identity.test.ts src/plugin-sdk/fs-safe-compat.test.ts src/plugin-sdk/temp-path.test.ts` -- `pnpm plugin-sdk:api:check` -- `pnpm build` -- Blacksmith Testbox `pnpm install --frozen-lockfile --config.minimum-release-age=0 && pnpm check:changed` -- In `../fs-safe`: `pnpm docs:site && pnpm build && pnpm test test/api-coverage.test.ts test/new-primitives.test.ts` - -## Review Checklist - -- Does this change reduce a public name, local wrapper, or duplicated semantic - family? -- Is the old name public plugin SDK surface? If yes, keep a deprecated alias. -- Does the replacement preserve symlink, hardlink, mode, and missing-file - behavior? -- Is the caller using an untrusted relative path, trusted absolute path, secret - path, archive entry, or temp lifetime? Pick the helper that says that out - loud. -- Are docs and plugin SDK API snapshots updated when exported names change? diff --git a/docs/refactor/talk-api-contract.md b/docs/refactor/talk-api-contract.md deleted file mode 100644 index d39d8d0c637..00000000000 --- a/docs/refactor/talk-api-contract.md +++ /dev/null @@ -1,320 +0,0 @@ ---- -summary: "Detailed API, event, runtime, cancellation, and tool-policy contract for the Talk refactor" -read_when: - - Implementing Talk Gateway methods or protocol schemas - - Changing Talk config, events, cancellation, or provider tool policy - - Reviewing whether a Talk behavior belongs in core or an adapter -title: "Talk API and runtime contract" ---- - -# Talk API And Runtime Contract - -This is the detailed contract for [Talk refactor plan](/refactor/talk). - -## Config Contract - -Config stays under the existing `talk` object. Do not add `talk.speech` in this -refactor. - -```ts -type TalkConfig = { - provider?: string; - providers?: Record; - realtime?: { - provider?: string; - model?: string; - voice?: string; - mode?: TalkMode; - transport?: TalkTransport; - brain?: TalkBrain; - providers?: Record; - }; - input?: { - interruptOnSpeech?: boolean; - silenceTimeoutMs?: number; - }; -}; -``` - -Rules: - -- `talk.provider` and `talk.providers.*` remain speech/STT/TTS provider config. -- `talk.realtime.provider` and `talk.realtime.providers.*` are realtime voice provider config. -- `talk.config` returns effective config without secrets unless privileged. -- `talk.catalog` returns capabilities, not inferred provider-id guesses. -- Doctor migrates old realtime selectors into `talk.realtime`. -- Runtime does not silently reinterpret Voice Call or TTS config as realtime config. - -## Method Semantics - -### `talk.catalog` - -Returns effective Talk capabilities: - -- modes -- transports -- brain strategies -- providers -- models -- voices -- input audio formats -- output audio formats -- browser-safe client session support -- Gateway relay support -- managed-room support -- local STT/TTS support - -Provider capability declarations drive this. Core must not infer support from -provider ids. - -### `talk.speak` - -One-shot TTS: - -```ts -await gateway.request("talk.speak", { - text: "Ready.", - voice: "alloy", -}); -``` - -`talk.speak` does not create live session state, turn state, transcript state, -barge-in state, or provider realtime state. - -### `talk.client.create` - -Creates a client-owned provider session while Gateway still owns config, -instructions, credentials, and tool policy. - -Use it for browser WebRTC, browser provider WebSocket, and native provider media -sessions that require client-owned sockets. Reject `gateway-relay` and -`managed-room`; the error points clients to `talk.session.create`. - -### `talk.client.toolCall` - -Forwards provider tool calls from client-owned provider sessions to Gateway -policy: - -```ts -await gateway.request("talk.client.toolCall", { - sessionId, - callId, - name, - argumentsJson, -}); -``` - -Validate session identity, caller ownership, brain strategy, and policy. Pass an -`AbortSignal` into agent/tool runtime, reject stale or closed sessions, and never -accept request-time instructions. - -### `talk.session.create` - -Creates a Gateway-owned live Talk session. - -| Mode | Transport | Brain | Owner | -| --------------- | --------------- | --------------- | ------------------- | -| `realtime` | `gateway-relay` | `agent-consult` | Gateway | -| `transcription` | `gateway-relay` | `none` | Gateway | -| `stt-tts` | `managed-room` | `agent-consult` | Gateway/client room | -| `stt-tts` | `managed-room` | `direct-tools` | trusted room | - -Reject `webrtc` and `provider-websocket`; the error points clients to -`talk.client.create`. - -### `talk.session.join` - -Joins or reconnects to a Gateway-owned managed room. Validate session id and -token, never expose token hashes, emit `session.replaced` to the displaced -client, and emit `session.ready` to the new owner. - -### `talk.session.appendAudio` - -Appends an input audio frame to a Gateway-owned relay session: - -```ts -await gateway.request("talk.session.appendAudio", { - sessionId, - audioBase64, - timestamp, -}); -``` - -Use for realtime Gateway relay and streaming transcription. Do not use this for -managed-room native push-to-talk when the native node captures audio locally and -returns transcript/output through node command results. - -### Turn Verbs - -Use explicit verbs instead of generic controls: - -```ts -await gateway.request("talk.session.startTurn", { sessionId }); -await gateway.request("talk.session.endTurn", { sessionId, turnId }); -await gateway.request("talk.session.cancelTurn", { sessionId, turnId, reason }); -await gateway.request("talk.session.cancelOutput", { sessionId, turnId, reason }); -``` - -`endTurn` rejects stale `turnId` before clearing active state. `cancelTurn` -aborts capture, STT, provider response, agent consult, tools, TTS, relay output, -and room streams tied to that turn. `cancelOutput` stops assistant audio without -necessarily ending the user turn. Barge-in must be speech/VAD gated. - -### `talk.session.submitToolResult` - -Completes a provider tool call emitted inside a Gateway-owned relay session: - -```ts -await gateway.request("talk.session.submitToolResult", { - sessionId, - callId, - output, -}); -``` - -### `talk.session.close` - -Closes a Gateway-owned session. Close emits one terminal event, stops capture and -playback, aborts provider and agent work, drains TTS, revokes room join state, -and removes retained state after its replay/debug window. - -## Event Contract - -All live Talk paths emit one public event channel: - -```ts -talk.event; -``` - -Every event uses this envelope: - -```ts -type TalkEvent = { - id: string; - type: TalkEventType; - sessionId: string; - turnId?: string; - captureId?: string; - seq: number; - timestamp: string; - mode: TalkMode; - transport: TalkTransport; - brain: TalkBrain; - provider?: string; - final?: boolean; - callId?: string; - itemId?: string; - parentId?: string; - source?: string; - payload: TPayload; -}; -``` - -Core event types include `session.*`, `turn.*`, `capture.*`, `input.audio.*`, -`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`, -`latency.metrics`, and `health.changed`. - -Rules: - -- `sessionId` is required for every event. -- `turnId` is required for turn-bound input, output, transcript, tool, and cancellation events. -- `captureId` is required while capture is active. -- `seq` monotonically increases per session. -- `timestamp` uses ISO 8601 UTC. -- `callId`, `itemId`, and `parentId` correlate provider responses, tool calls, TTS jobs, and relay frames. -- payloads must not duplicate large raw audio frames when transport already carries them. -- consumers should rely on envelope fields instead of provider-specific payloads. - -Text-ready is not audio-ready. Clients may show text after `output.text.done`, -but should not enter speaking/playback state until `output.audio.started` or -`output.audio.delta`. - -## Shared Runtime Target - -Keep one provider-agnostic runtime under `src/talk`. The first pass keeps names -close to the old runtime modules so the move stays reviewable: - -```text -src/talk/ - audio-codec.ts - agent-consult-runtime.ts - agent-consult-tool.ts - agent-talkback-runtime.ts - fast-context-runtime.ts - provider-registry.ts - provider-resolver.ts - provider-types.ts - session-log-runtime.ts - session-runtime.ts - talk-events.ts - talk-session-controller.ts -``` - -New code should import the shared runtime from `src/talk` inside core. Plugins -that already use the stable SDK subpath keep importing -`openclaw/plugin-sdk/realtime-voice`; that facade re-exports the Talk runtime -contract without exposing core file layout. - -Responsibilities: - -- normalize modes, transports, brains, codecs, and audio metadata -- create, close, and replace session records -- allocate turn ids and capture ids -- reject stale turn ids before mutation -- sequence events -- retain recent events for replay, reconnect, and diagnostics -- track active input capture and assistant output -- coordinate barge-in and output cancellation -- propagate abort signals -- register provider tool calls and bind tool results -- expose test builders for session/event assertions - -Gateway method files should become thin adapters: - -```text -src/gateway/server-methods/ - talk.ts - talk-client.ts - talk-session.ts -``` - -Internal Gateway helpers may exist only as staging files while code moves to -`src/talk`. - -## Cancellation Contract - -Cancellation must abort underlying work, not only ignore stale output. - -When a turn or session is cancelled: - -- provider realtime response is cancelled when supported -- provider session is closed or reset when cancellation cannot be scoped -- streaming STT receives abort -- agent consult receives abort -- queued tools do not start after abort -- already-started side-effecting tools receive abort and report cancellation -- pending TTS jobs are drained -- playback sources are stopped -- relay streams are cleared -- managed-room capture and output state reset -- stale finals and stale audio deltas are ignored -- one terminal cancellation event is emitted - -Barge-in uses VAD or provider speech-started signals, ignores silence and echo, -cancels output only after real user speech, and starts or ensures a turn before -emitting `turn.cancelled`. - -## Tool Policy Contract - -Gateway owns Talk tool policy. - -Client-owned flow: `talk.client.create`, provider tool call to client, -`talk.client.toolCall`, Gateway policy validation, agent/direct-tool execution, -client result submission to provider. - -Gateway-owned flow: `talk.session.create`, provider tool call to Gateway, -Gateway policy validation, agent/direct-tool execution, provider result -submission, `talk.event` emission. - -No Talk path accepts caller-provided instructions. Gateway builds instructions -from trusted config and session context. diff --git a/docs/refactor/talk-execution.md b/docs/refactor/talk-execution.md deleted file mode 100644 index 1a17bcbc5bf..00000000000 --- a/docs/refactor/talk-execution.md +++ /dev/null @@ -1,229 +0,0 @@ ---- -summary: "Implementation packages, deletion checklist, test matrix, and verification commands for the Talk refactor" -read_when: - - Implementing the Talk refactor plan - - Deleting legacy Talk RPCs, event channels, or realtime endpoint code - - Verifying browser, native, telephony, meeting, STT, or TTS Talk behavior after refactor work -title: "Talk refactor execution checklist" ---- - -# Talk Refactor Execution Checklist - -Use this as the PR tracker for [Talk refactor plan](/refactor/talk). - -## Implementation Packages - -### Package 1: Protocol - -- update `src/gateway/protocol/schema/channels.ts` -- update `src/gateway/protocol/schema/protocol-schemas.ts` -- update `src/gateway/protocol/schema/types.ts` -- update `src/gateway/protocol/index.ts` -- regenerate generated protocol clients -- remove old schemas from generated metadata -- update protocol tests - -Done when old RPC/event names are absent from generated protocol output. - -### Package 2: Gateway Methods - -- split client-owned methods into `talk-client.ts` -- keep session-owned methods in `talk-session.ts` -- keep catalog/config/speak/mode in `talk.ts` -- classify every new method in method scopes -- advertise only `talk.event` in hello event features -- remove old method list entries -- update authorization tests - -Done when every public Talk method has an explicit scope. - -### Package 3: Session Runtime - -- add `src/talk` primitives -- move event sequencing into shared runtime -- move stale-turn rejection into shared runtime -- move active output state into shared runtime -- move cancellation bookkeeping into shared runtime -- expose small test helpers - -Done when relay, transcription, handoff, telephony, and meetings do not each -invent event and turn bookkeeping. - -### Package 4: Browser UI - -- update realtime startup to `talk.client.create` -- update realtime tool consult to `talk.client.toolCall` -- update relay startup to `talk.session.create` -- update relay audio to `talk.session.appendAudio` -- update relay tool result to `talk.session.submitToolResult` -- update relay output cancel to `talk.session.cancelOutput` -- update relay close to `talk.session.close` -- listen only to `talk.event` -- remove relay mark RPC - -Done when UI tests prove no removed RPC names remain. - -### Package 5: Native And Nodes - -- route native Talk through session events -- map push-to-talk commands to managed-room turn lifecycle -- clean capture state on failed start -- keep local STT/TTS as adapter behavior -- remove chat-history polling from the success path -- keep fallback polling only if explicitly needed - -Done when native voice success path is event-driven. - -### Package 6: Voice Call - -- map telephony realtime events into `talk.event` -- map local speech detection to `startTurn`, `cancelOutput`, and `cancelTurn` -- pass abort through agent consult and tools -- keep marks, clear, u-law, and call lifecycle in the plugin -- add tests for early speech before provider speech-started - -Done when Voice Call shares event and cancellation semantics without leaking -telephony into core. - -### Package 7: Meetings - -- map meeting speech and transcript state into `talk.event` -- keep participant and room state in meeting adapter -- add echo-suppression aware barge-in tests -- ensure meeting adapters can choose realtime, transcription, or `stt-tts` - -Done when meeting behavior is an adapter over Talk, not a parallel realtime loop. - -### Package 8: Doctor And Migration - -- detect old realtime selectors outside `talk.realtime` -- write explicit `talk.realtime.provider`, `model`, `voice`, `transport`, and `brain` -- report removed RPC names when logs show old clients -- keep startup free of hidden config rewrites -- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs - -Done when runtime config is explicit and docs mention removed API only in -migration notes. - -## Deletion Checklist - -Delete or prove absent: - -- `src/gateway/voiceclaw-realtime/` -- `/voiceclaw/realtime` -- `instructionsOverride` -- `talk.realtime.*` public RPCs -- `talk.transcription.*` public RPCs -- `talk.handoff.*` public RPCs -- `talk.session.inputAudio` -- `talk.session.control` -- `talk.session.toolResult` -- `talk.realtime.relay` -- `talk.transcription.relay` -- old generated protocol models -- old UI relay method calls - -Keep only these old names in explicit migration tables. - -## Test Matrix - -Protocol: - -- final methods exist in protocol schemas -- removed methods are absent from protocol schemas -- final event is advertised in hello features -- removed events are absent from broadcast guards -- generated clients match schema -- request-time instruction override is rejected or impossible by schema - -Gateway: - -- `talk.client.create` creates WebRTC session result -- `talk.client.create` creates provider WebSocket session result -- `talk.client.create` rejects Gateway-owned transports -- `talk.client.toolCall` validates caller, session, brain, and policy -- `talk.session.create` creates realtime Gateway relay -- `talk.session.create` creates transcription relay -- `talk.session.create` creates STT/TTS managed room -- `talk.session.create` rejects client-owned transports -- `talk.session.join` replacement notifies displaced client -- `talk.session.appendAudio` routes to relay/transcription session -- `talk.session.startTurn` starts managed-room turn -- `talk.session.endTurn` rejects stale turn ids -- `talk.session.cancelTurn` aborts provider, agent, tools, TTS, and streams -- `talk.session.cancelOutput` cancels playback only -- `talk.session.submitToolResult` binds to provider call id -- `talk.session.close` emits terminal event and releases resources - -Browser: - -- WebRTC path calls `talk.client.create` -- provider WebSocket path calls `talk.client.create` -- provider tool calls use `talk.client.toolCall` -- Gateway relay uses only `talk.session.*` -- Gateway relay listens only to `talk.event` -- barge-in requires speech/VAD -- relay close rejects or aborts pending consult runs -- no removed RPC names in UI tests - -Native: - -- push-to-talk start emits capture/turn events -- failed push-to-talk start cleans capture state -- cancel clears capture and output state -- STT/TTS success path is event-driven -- fallback polling is explicit and tested if kept -- node policy rejects untrusted Talk commands - -Telephony: - -- early speech before provider speech-started creates or guards turn before cancellation -- marks and clear events map to output state -- u-law codec stays adapter-owned -- cancellation aborts consult run -- closed call prevents stale tool result submission - -Meetings: - -- participant context appears as metadata, not core branching -- echo suppression prevents false barge-in -- transcript events use common envelope -- meeting close aborts active work - -Architecture: - -- no removed public RPC names in protocol metadata -- no retired realtime endpoint route -- no retired realtime folder -- no request-time instruction override field -- no core branches on app platform names -- provider behavior comes from capabilities - -## Verification Commands - -Focused local loop: - -```sh -pnpm test src/gateway/protocol/index.test.ts -pnpm test src/gateway/server-methods/talk.test.ts -pnpm test src/gateway/method-scopes.test.ts src/gateway/server-methods-list.test.ts -pnpm test src/gateway/talk-realtime-relay.test.ts src/gateway/talk-transcription-relay.test.ts -pnpm test ui/src/ui/realtime-talk.test.ts ui/src/ui/realtime-talk-gateway-relay.test.ts ui/src/ui/realtime-talk-webrtc.test.ts ui/src/ui/realtime-talk-google-live.test.ts -pnpm exec oxfmt --check --threads=1 docs/refactor/talk.md docs/refactor/talk-execution.md -``` - -Generation and docs: - -```sh -pnpm protocol:gen && pnpm protocol:gen:swift -pnpm docs:check-mdx -pnpm plugin-sdk:api:check -``` - -Broad gate before push: - -```sh -pnpm check:changed -``` - -Use Testbox for broad gates on maintainer machines. diff --git a/docs/refactor/talk-surfaces.md b/docs/refactor/talk-surfaces.md deleted file mode 100644 index 52c31dc1b82..00000000000 --- a/docs/refactor/talk-surfaces.md +++ /dev/null @@ -1,128 +0,0 @@ ---- -summary: "Surface adapter plan for browser, native, walkie-talkie, telephony, and meeting Talk refactor work" -read_when: - - Updating browser realtime Talk, native Talk, walkie-talkie handoff, Voice Call, or meeting voice code - - Deciding whether a Talk behavior belongs in an adapter or shared runtime -title: "Talk surface mapping" ---- - -# Talk Surface Mapping - -This maps product surfaces into [Talk refactor plan](/refactor/talk) primitives. - -## Browser - -WebRTC: - -- call `talk.client.create` -- open provider media connection in browser -- forward provider tool calls through `talk.client.toolCall` -- receive provider audio through provider media/data channel - -Provider WebSocket: - -- call `talk.client.create` -- connect using constrained provider result -- keep provider-specific framing in the browser adapter -- forward tool calls through `talk.client.toolCall` - -Gateway relay: - -- call `talk.session.create` -- send PCM frames with `talk.session.appendAudio` -- listen only to `talk.event` -- submit tool results with `talk.session.submitToolResult` -- barge-in with `talk.session.cancelOutput` -- close with `talk.session.close` - -## Native And Nodes - -Native apps map local audio lifecycle into Talk primitives. - -Native realtime: - -- use `talk.client.create` when the app owns provider media -- use `talk.session.create` when Gateway owns provider relay - -Native STT/TTS: - -- use `talk.session.create({ mode: "stt-tts", transport: "managed-room" })` -- keep local STT and local TTS behind native adapters -- drive success path from Talk events -- keep history polling only as a degraded fallback if explicitly tested - -Native push-to-talk: - -- press maps to `talk.session.startTurn` -- release maps to `talk.session.endTurn` -- cancel maps to `talk.session.cancelTurn` -- node capture commands emit capture events -- failed start cleans capture state -- opening voice UI never mutates global Talk config - -Trusted node command adapters may remain: - -```ts -talk.ptt.start; -talk.ptt.stop; -talk.ptt.cancel; -talk.ptt.once; -``` - -## Walkie-Talkie - -Walkie-talkie is managed-room Talk: - -```ts -await gateway.request("talk.session.create", { - mode: "stt-tts", - transport: "managed-room", - brain: "agent-consult", - sessionKey, -}); -``` - -Then: - -- client joins with `talk.session.join` -- press calls `talk.session.startTurn` -- release calls `talk.session.endTurn` -- cancel calls `talk.session.cancelTurn` -- assistant speech emits `output.text.*` and `output.audio.*` -- replacement emits `session.replaced` to old owner -- close calls `talk.session.close` - -Room state includes canonical session id, route/channel target, caller identity, -mode, transport, brain, provider, model, voice, locale, expiry, token hash, -active client id, active turn id, and replacement state. - -Two simultaneous rooms must not share turn ids, transcripts, audio output, or -cancellation tokens. - -## Telephony - -Voice Call becomes a telephony adapter over Talk semantics. - -Keep telephony-owned: Twilio/Plivo WebSocket contracts, stream ids, call ids, -G.711 u-law, marks, clear events, backpressure, phone call lifecycle, and inbound -speech detection quirks. - -Move shared behavior to Talk: event envelope, turn ids, cancellation, agent -consult abort, tool policy, usage and latency metrics, and output state. - -Telephony should emit `talk.event` for observability, even if phone media -remains plugin-owned. - -## Meetings - -Google Meet and future meeting integrations become meeting adapters over Talk -semantics. - -Keep meeting-owned: meeting join/leave, participant identity, room permissions, -echo suppression, transcript context, and meeting-specific mute/deafen behavior. - -Move shared behavior to Talk: turn lifecycle, transcript events, assistant output -events, tool policy, cancellation, and metrics. - -Meeting adapters may run `transcription`, `stt-tts`, or `realtime` depending on -provider support. diff --git a/docs/refactor/talk.md b/docs/refactor/talk.md deleted file mode 100644 index 485448e0dca..00000000000 --- a/docs/refactor/talk.md +++ /dev/null @@ -1,499 +0,0 @@ ---- -summary: "Breaking refactor plan for one Talk architecture across realtime voice, STT/TTS, browser, native, telephony, meetings, and walkie-talkie handoff" -read_when: - - Refactoring Talk mode, realtime voice, voice-call, Google Meet, browser realtime voice, native push-to-talk, STT, or TTS - - Changing Talk Gateway protocol, provider contracts, realtime transports, managed rooms, audio events, cancellation, or tool policy - - Deciding whether a voice feature belongs in core, a provider plugin, a native app, a meeting adapter, or a telephony adapter -title: "Talk refactor plan" ---- - -# Talk Refactor Plan - -This is the breaking-clean plan for unifying every live voice path behind one -Talk architecture. - -The old architecture grew by product surface: browser realtime, Gateway relay, -managed native handoff, streaming transcription, Voice Call, Google Meet, local -STT/TTS, one-shot TTS, and a retired realtime WebSocket endpoint each learned -their own names for sessions, turns, capture, output, barge-in, tool calls, -cancellation, and transcript events. - -The new architecture grows by primitive. There is one public Talk API, one -event envelope, one turn model, one cancellation contract, one provider policy -boundary, and one place for shared runtime state. Browser, native, telephony, -meetings, and walkie-talkie become adapters over those primitives. - -## Product Target - -OpenClaw supports three Talk products: - -| Product | User experience | Mode | -| --------------------- | ----------------------------------------------------------------------- | --------------- | -| Realtime conversation | Low-latency duplex speech with interruption and provider tool calls | `realtime` | -| Walkie-talkie | Press or hold to speak, release, then hear OpenClaw answer | `stt-tts` | -| Transcription | Live captions, dictation, notes, meeting transcript, no assistant audio | `transcription` | - -All three products share session identity, join/reconnect state, turn and -capture ids, input audio metadata, output text/audio state, transcript finality, -tool-call correlation, cancellation, replay, provider capabilities, policy, -auth, and observability. - -One-shot uploaded audio and one-shot TTS do not need live Talk session state -unless they participate in live capture, turns, interruption, replay, or -cancellation. - -## Hard Decisions - -This refactor intentionally removes compatibility that would keep the design -muddy: - -- remove public `talk.realtime.*` RPCs -- remove public `talk.transcription.*` RPCs -- remove public `talk.handoff.*` RPCs -- remove generic `talk.session.inputAudio`, `talk.session.control`, and - `talk.session.toolResult` -- remove old relay event channels -- remove `/voiceclaw/realtime` -- remove `src/gateway/voiceclaw-realtime/` -- remove request-time instruction overrides -- keep `talk.speak` as one-shot TTS, not a live session API -- keep legacy realtime config repair in doctor, not startup -- keep platform and product names out of core branching - -## Vocabulary - -Keep mode, transport, brain, and surface separate. - -```ts -type TalkMode = "realtime" | "stt-tts" | "transcription"; - -type TalkTransport = "webrtc" | "provider-websocket" | "gateway-relay" | "managed-room"; - -type TalkBrain = "agent-consult" | "direct-tools" | "none"; -``` - -### Modes - -`realtime` means a provider owns a live voice session. Audio goes in, audio -comes out, interruptions are possible, and provider tool calls may happen during -one provider session. - -`stt-tts` means input speech is transcribed, OpenClaw answers as text, and TTS -renders the answer. This is the native Talk and walkie-talkie path when a full -duplex provider session is not the right shape. - -`transcription` means speech-to-text without assistant audio output. It covers -captions, dictation, notes, meeting transcript capture, and live voice-note -ingestion. - -### Transports - -`webrtc` is client-owned SDP/media/data-channel transport. It fits browser-owned -OpenAI Realtime sessions with ephemeral credentials. - -`provider-websocket` is client-owned provider JSON and audio framing. It fits -browser-owned Google Live style sessions. - -`gateway-relay` means the Gateway owns the provider connection. The client sends -authenticated audio frames to the Gateway and receives `talk.event` plus audio -output through Gateway-managed relay state. - -`managed-room` means the Gateway owns a room-like session that clients can join, -replace, and drive with explicit turn verbs. It is the primitive for -walkie-talkie and native handoff. - -Telephony and meetings are not core transports. They are adapters that map -phone or meeting media into `gateway-relay`, `managed-room`, or `stt-tts` while -keeping call and meeting lifecycle outside core. - -### Brain Strategies - -`agent-consult` means provider tool calls or session turns consult an OpenClaw -agent. Gateway owns prompt construction, context selection, authorization, abort -signals, and final result delivery. - -`direct-tools` means a trusted first-party surface can call selected OpenClaw -tools directly through Gateway policy. Keep this privileged. - -`none` means transcription-only, external orchestration, or no OpenClaw tool -access. - -## Ownership Boundaries - -Core owns generic Talk semantics: - -- mode, transport, brain, codec, and audio descriptors -- session records and session ownership -- turn ids and capture ids -- event envelope, sequencing, replay, and stale-output suppression -- active capture state -- active assistant output state -- replacement and reconnect state -- cancellation propagation -- tool policy and tool-call correlation -- usage, latency, and health events - -Provider plugins own vendor behavior: - -- OpenAI Realtime SDP and data-channel details -- Google Live WebSocket framing -- streaming STT provider details -- TTS provider details -- provider auth, model, voice, codec, and resume quirks -- provider capability declarations - -Surface adapters own IO and product quirks: - -- browser capture and playback -- native audio sessions, local speech engines, and foreground Talk UX -- node command dispatch -- telephony media streams, marks, clear events, u-law, and call lifecycle -- meeting join/leave, participants, echo suppression, and authorization - -Core may store optional surface metadata for diagnostics. Core must not branch -on browser, iOS, Android, macOS, Google Meet, Voice Call, or any retired product -name. - -## Final Gateway API - -The public Gateway surface is deliberately small: - -```ts -// Discovery and configuration. -talk.catalog; -talk.config; - -// One-shot speech output. -talk.speak; - -// Client-owned provider sessions. -talk.client.create; -talk.client.toolCall; - -// Gateway-owned live sessions. -talk.session.create; -talk.session.join; -talk.session.appendAudio; -talk.session.startTurn; -talk.session.endTurn; -talk.session.cancelTurn; -talk.session.cancelOutput; -talk.session.submitToolResult; -talk.session.close; - -// Events and foreground node mode. -talk.event; -talk.mode; -``` - -Use `talk.client.*` when the client owns provider media transport. Use -`talk.session.*` when the Gateway owns live session state. - -`talk.mode` is the existing foreground node mode broadcast. It can stay, but it -is not part of the Talk session control API. - -### Supported Creation Matrix - -| Method | Mode | Transport | Brain | Owner | -| --------------------- | --------------- | -------------------- | --------------- | ------- | -| `talk.client.create` | `realtime` | `webrtc` | `agent-consult` | client | -| `talk.client.create` | `realtime` | `provider-websocket` | `agent-consult` | client | -| `talk.session.create` | `realtime` | `gateway-relay` | `agent-consult` | Gateway | -| `talk.session.create` | `transcription` | `gateway-relay` | `none` | Gateway | -| `talk.session.create` | `stt-tts` | `managed-room` | `agent-consult` | Gateway | -| `talk.session.create` | `stt-tts` | `managed-room` | `direct-tools` | Gateway | - -Reject combinations that blur ownership. `talk.client.create` must reject -Gateway-owned transports. `talk.session.create` must reject client-owned -transports. - -## Removed API - -Remove these names from handlers, method lists, scopes, protocol schemas, -generated clients, broadcast guards, tests, and docs except explicit migration -tables: - -| Removed | Replacement | -| ------------------------------- | -------------------------------------------------------- | -| `talk.realtime.session` | `talk.client.create` | -| `talk.realtime.toolCall` | `talk.client.toolCall` | -| `talk.realtime.relayAudio` | `talk.session.appendAudio` | -| `talk.realtime.relayCancel` | `talk.session.cancelOutput` or `talk.session.cancelTurn` | -| `talk.realtime.relayMark` | internal relay output state | -| `talk.realtime.relayToolResult` | `talk.session.submitToolResult` | -| `talk.realtime.relayClose` | `talk.session.close` | -| `talk.realtime.relay` | `talk.event` | -| `talk.transcription.session` | `talk.session.create({ mode: "transcription" })` | -| `talk.transcription.audio` | `talk.session.appendAudio` | -| `talk.transcription.cancel` | `talk.session.cancelTurn` | -| `talk.transcription.close` | `talk.session.close` | -| `talk.transcription.relay` | `talk.event` | -| `talk.handoff.create` | `talk.session.create({ transport: "managed-room" })` | -| `talk.handoff.join` | `talk.session.join` | -| `talk.handoff.revoke` | `talk.session.close` | -| `talk.session.inputAudio` | `talk.session.appendAudio` | -| `talk.session.control` | explicit turn/output verbs | -| `talk.session.toolResult` | `talk.session.submitToolResult` | - -Delete this endpoint: - -```text -/voiceclaw/realtime -``` - -Delete this folder: - -```text -src/gateway/voiceclaw-realtime/ -``` - -Do not leave a compatibility namespace around retired code. - -## Target Source Layout - -Shared runtime: - -```text -src/talk/ - audio-codec.ts - agent-consult-runtime.ts - agent-consult-tool.ts - agent-talkback-runtime.ts - fast-context-runtime.ts - provider-registry.ts - provider-resolver.ts - provider-types.ts - session-log-runtime.ts - session-runtime.ts - talk-events.ts - talk-session-controller.ts -``` - -Gateway adapters: - -```text -src/gateway/server-methods/ - talk.ts # catalog, config, speak, mode, composition - talk-client.ts # client-owned provider sessions - talk-session.ts # Gateway-owned live sessions -``` - -Gateway relay helpers can exist while the code moves, but the long-term shape -is that relay, transcription, and handoff state use `src/talk` primitives -instead of each reimplementing turns and events. - -Public SDK: - -```text -src/plugin-sdk/realtime-voice.ts -``` - -Keep this SDK subpath as the stable plugin import facade. It may re-export -Talk runtime contracts, but plugin authors should not import core file layout. - -## Event Contract - -All live paths emit `talk.event` with the envelope defined in -[Talk API and runtime contract](/refactor/talk-api-contract). The required -shape is: `id`, `type`, `sessionId`, `seq`, `timestamp`, `mode`, `transport`, -`brain`, and `payload`, with `turnId`, `captureId`, `callId`, `itemId`, and -`parentId` when the event is tied to turn, capture, provider item, tool call, or -TTS output. - -Core event families are `session.*`, `turn.*`, `capture.*`, `input.audio.*`, -`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`, -`latency.metrics`, and `health.changed`. Payloads must not duplicate large raw -audio frames when the transport already carries them. Text-ready is not -audio-ready; clients enter playback state only on audio events. - -## Cancellation Contract - -Cancellation must abort underlying work, not only ignore stale output. - -When a turn or session is cancelled: - -- provider realtime response is cancelled when supported -- provider session is closed or reset when cancellation cannot be scoped -- streaming STT receives abort -- agent consult receives abort -- queued tools do not start after abort -- already-started side-effecting tools receive abort and report cancellation -- pending TTS jobs are drained -- playback sources are stopped -- relay streams are cleared -- managed-room capture and output state reset -- stale finals and stale audio deltas are ignored -- one terminal cancellation event is emitted - -Barge-in requires real speech: provider speech-started, local VAD, or an -adapter-owned speech detector. Silence, echo, or microphone buffers alone must -not cancel assistant output. - -## Config Contract - -Config stays under `talk`; do not add `talk.speech`. `talk.provider` and -`talk.providers.*` remain speech/STT/TTS provider config. Realtime selectors -live under `talk.realtime.provider`, `talk.realtime.providers.*`, `model`, -`voice`, `mode`, `transport`, and `brain`. - -`talk.config` returns effective config without secrets unless privileged. -`talk.catalog` returns provider capabilities, not inferred provider-id guesses. -Doctor migrates old realtime placement into `talk.realtime`; runtime startup -does not reinterpret Voice Call, STT, or TTS config as realtime config. - -## Surface Mapping - -| Surface | Talk mapping | -| ------------------------------- | ----------------------------------------------------------------------------------------------------- | -| Browser WebRTC | `talk.client.create`, client-owned provider media, `talk.client.toolCall` for provider tool calls | -| Browser provider WebSocket | `talk.client.create`, browser-owned provider framing, Gateway-owned credentials and policy | -| Browser Gateway relay | `talk.session.create`, `appendAudio`, `submitToolResult`, `cancelOutput`, `close`, and `talk.event` | -| Native push-to-talk | `stt-tts` plus `managed-room`; press/startTurn, release/endTurn, cancel/cancelTurn | -| Walkie-talkie | managed-room join/replacement plus shared turn/output events | -| Voice Call | telephony adapter over Talk events; call ids, stream ids, u-law, marks, clear events stay plugin side | -| Google Meet and future meetings | meeting adapter over Talk events; participant state, permissions, mute, and echo suppression stay out | - -See [Talk surface mapping](/refactor/talk-surfaces) for the adapter-level -rules. - -## Detailed Refactor Phases - -### Phase 1: Protocol Is The Source Of Truth - -- define final `talk.client.*`, `talk.session.*`, `talk.event`, `talk.catalog`, `talk.config`, `talk.speak`, and `talk.mode` -- delete removed RPCs from method lists and generated metadata -- delete removed event channels from hello feature advertising -- classify every final method in `METHOD_SCOPE_GROUPS` -- regenerate TypeScript and Swift protocol clients -- add protocol tests proving removed names are absent - -Exit criteria: generated clients expose only the final public Talk API. - -### Phase 2: Shared Runtime Becomes `src/talk` - -- move provider-agnostic realtime voice modules into `src/talk` -- keep the plugin SDK facade at `openclaw/plugin-sdk/realtime-voice` -- rename logs and tests from realtime-voice wording to Talk wording where that improves clarity -- centralize event sequencing, active turn state, capture state, output state, stale-turn rejection, and replay history -- keep provider adapters out of this folder - -Exit criteria: core and bundled surfaces import shared semantics from `src/talk` -or the SDK facade, not from surface-local helpers. - -### Phase 3: Gateway Method Split - -- make `talk.ts` a composition point for catalog, config, speak, mode, client, and session handlers -- put client-owned provider session methods in `talk-client.ts` -- put Gateway-owned session methods in `talk-session.ts` -- make relay, transcription, and managed-room handlers thin adapters over shared runtime primitives -- route session replacement notifications to the displaced connection -- reject stale turn completion before mutating active room state - -Exit criteria: public RPC handlers read like API adapters, not separate Talk -implementations. - -### Phase 4: Browser UI Uses The Final API - -- update WebRTC and provider WebSocket startup to `talk.client.create` -- update browser provider tool calls to `talk.client.toolCall` -- update Gateway relay startup to `talk.session.create` -- update relay audio to `talk.session.appendAudio` -- update relay tool result submission to `talk.session.submitToolResult` -- update relay close to `talk.session.close` -- listen only to `talk.event` -- handle aborted consult runs immediately instead of timing out -- gate relay barge-in on speech or VAD - -Exit criteria: UI tests contain no calls to removed Talk RPC names. - -### Phase 5: Native And Nodes Become Event-Driven - -- map native push-to-talk into managed-room sessions -- start, end, cancel, and replace turns through explicit session verbs -- clean capture state when push-to-talk start fails -- keep local STT and TTS as native adapter behavior -- remove chat-history polling from the success path -- keep fallback polling only if there is an explicit degraded-mode test - -Exit criteria: native Talk success path is driven by `talk.event`, not hidden -chat side effects. - -### Phase 6: Telephony And Meetings Become Adapters - -- map Voice Call realtime and streaming STT into Talk event/cancellation semantics -- create or guard a turn before early speech cancellation events -- keep telephony codec, marks, clear events, and call lifecycle outside core -- map Google Meet transcript and assistant output into `talk.event` -- keep participant and echo-suppression behavior in the meeting adapter -- pass abort signals into agent consult and tool runtime - -Exit criteria: Voice Call and meetings share event and cancellation semantics -without introducing telephony or meeting branches in core. - -### Phase 7: Config And Doctor Cleanup - -- keep `talk.provider` and `talk.providers.*` as speech/STT/TTS config -- keep realtime voice selectors under `talk.realtime` -- make `talk.config` return only resolved effective provider data -- repair legacy realtime placement in doctor -- document that runtime startup does not guess or rewrite config -- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs - -Exit criteria: no second speech namespace, no startup migrations, and no -ambiguous active provider in `talk.config`. - -### Phase 8: Delete The Retired Stack - -- remove `/voiceclaw/realtime` -- delete `src/gateway/voiceclaw-realtime/` -- remove request-time `instructionsOverride` -- remove old RPC handlers, scopes, broadcast guards, protocol schemas, generated clients, docs, and UI calls -- keep old names only in explicit migration tables and negative tests - -Exit criteria: repository search finds removed public names only in migration -notes or tests that assert absence. - -## Test And Verification Plan - -The full matrix lives in -[Talk refactor execution checklist](/refactor/talk-execution). The required -proof areas are: - -- protocol and generated clients expose only the final Talk API -- Gateway tests cover every `talk.client.*` and `talk.session.*` method -- UI tests prove browser WebRTC, provider WebSocket, and relay paths use the final API -- native tests prove managed-room push-to-talk cleanup, replacement, and event flow -- Voice Call and meeting tests prove early speech, barge-in, output state, and cancellation behavior -- config tests prove `talk.config` reports only resolved effective provider data -- architecture searches prove removed RPCs, events, endpoint, folder, and instruction override stay gone -- docs, protocol generation, SDK API checks, Android tests, build, and `pnpm check:changed` pass before push - -## Definition Of Done - -The refactor is complete when: - -- final API is the only advertised public API -- removed RPCs are gone from handlers, scopes, method lists, schemas, generated clients, docs, and UI -- removed event channels are gone -- retired realtime HTTP endpoint is gone -- retired realtime folder is gone -- browser Talk works through `talk.client.*` or `talk.session.*` -- native Talk works through session events -- streaming STT works through `talk.session.*` -- TTS one-shot remains `talk.speak` -- walkie-talkie works through managed-room sessions -- Voice Call and meetings use shared events and cancellation semantics -- cancellation aborts underlying work -- event envelopes are consistent -- config migration is handled by doctor -- tests prove the deleted API cannot accidentally return - -Supporting details: - -- [Talk API and runtime contract](/refactor/talk-api-contract) -- [Talk surface mapping](/refactor/talk-surfaces) -- [Talk refactor execution checklist](/refactor/talk-execution) - -The end state: one Talk system, a small public API, provider-owned vendor -logic, surface-owned IO, and a Gateway core that owns policy, events, sessions, -turns, cancellation, and observability.