docs: remove refactor notes

This commit is contained in:
Peter Steinberger
2026-05-06 02:40:34 +01:00
parent 9b1d28edf1
commit 71cd132f1f
6 changed files with 0 additions and 1626 deletions

View File

@@ -187,8 +187,6 @@ Core owns Talk session semantics. Provider plugins own vendor session setup.
Voice-call and Google Meet own telephony/meeting adapters. Browser and native
apps own device capture/playback UX.
The detailed implementation plan lives in [Talk refactor plan](/refactor/talk).
## Compatibility policy
For external plugins, compatibility work follows this order:

View File

@@ -1,448 +0,0 @@
---
title: "fs-safe Cleanup Plan"
summary: "Plan for consolidating OpenClaw filesystem helpers around @openclaw/fs-safe"
read_when:
- You are refactoring OpenClaw filesystem helpers
- You are changing @openclaw/fs-safe imports, wrappers, or plugin SDK file APIs
- You are deciding whether a local file helper belongs in OpenClaw or fs-safe
---
## Status
Implemented on `codex/extract-fs-safe-primitives`. Keep this file as the
cleanup checklist for follow-up reviews and future fs-safe surface changes.
## Goal
Make OpenClaw's filesystem access boring and predictable:
- Core code uses one small set of OpenClaw wrappers that apply OpenClaw policy.
- Plugin SDK compatibility aliases stay deliberate and documented.
- fs-safe keeps a small public story centered on `root()`, with lower-level
primitives behind explicit subpaths.
- Duplicate JSON, temp, private-store, and path helper names disappear from
OpenClaw internals.
- Security-sensitive behavior keeps regression tests before names move.
## Non-goals
- Do not remove public plugin SDK exports in this cleanup. Keep deprecated
aliases until a versioned SDK migration removes them.
- Do not make fs-safe a sandbox. It remains a library guardrail for local file
access, not OS isolation.
- Do not convert all absolute-path reads to root-bounded reads. Some OpenClaw
paths are trusted absolute paths and should stay explicit.
- Do not chase cosmetic import churn without reducing helper count or clarifying
trust boundaries.
## fs-safe Package Pin
`@openclaw/fs-safe` is published on npm and consumed through a semver range.
Fresh checkouts and CI runners should install the package from the public
registry, not from a local `link:../fs-safe` checkout or a GitHub tarball.
Current range:
- `^0.1.0`
The published package ships built `dist` files, so OpenClaw should not list it
in `pnpm.onlyBuiltDependencies`.
## Current Shape
fs-safe's main entry is intentionally narrow:
- `root`
- `FsSafeError`
- `categorizeFsSafeError`
- root option/result types
- Python helper configuration
The wider surface lives behind subpaths:
- `/json`
- `/store`
- `/temp`
- `/atomic`
- `/root`
- `/advanced`
- `/archive`
- `/walk`
OpenClaw now keeps fs-safe behind a small wrapper boundary:
- local `src/infra/*` wrappers for core policy defaults
- public plugin SDK aliases, including older names from before fs-safe
- package-local utility exports where importing `src/infra` would cross a
package boundary
An import-boundary test rejects new direct fs-safe imports outside those
allowed areas.
## Usage Map
### Root-bounded access
Representative use:
- `src/gateway/server-methods/agents.ts`
- `src/agents/pi-tools.read.ts`
- `src/agents/apply-patch.ts`
- `src/plugins/install.ts`
- `src/auto-reply/reply/stage-sandbox-media.ts`
- `src/gateway/canvas-documents.ts`
Keep this family. `root()` is the fs-safe product surface OpenClaw should push
callers toward.
### JSON helpers
OpenClaw still uses many names for the same operations:
- `readJsonFile`
- `readJsonFileStrict`
- `readDurableJsonFile`
- `writeJsonAtomic`
- `loadJsonFile`
- `saveJsonFile`
- `readJsonFileWithFallback`
- `writeJsonFileAtomically`
fs-safe's canonical names are clearer:
- `tryReadJson`
- `readJson`
- `readJsonIfExists`
- `writeJson`
- `readJsonSync`
- `tryReadJsonSync`
- `writeJsonSync`
This was the highest-value cleanup because it removed naming drift without
changing semantics. Compatibility aliases stay in `src/infra/json-files.ts` and
plugin SDK barrels.
### Private state and stores
Representative use:
- `src/commitments/store.ts`
- `src/agents/models-config.ts`
- `src/agents/pi-auth-json.ts`
- `src/cron/run-log.ts`
- `src/secrets/shared.ts`
- `src/infra/device-auth-store.ts`
- `src/infra/device-identity.ts`
Current overlap:
- `fileStore`
- `fileStore({ private: true })`
- plugin SDK private-state aliases
The concepts are now one family. fs-safe exposes private mode through
`fileStore({ private: true })`; OpenClaw internals and bundled plugins use
store-shaped wrappers instead of standalone private JSON/text helpers.
### Temp workspaces
Representative use:
- `src/media/qr-image.ts`
- `extensions/discord/src/send.voice.ts`
- `extensions/discord/src/voice/audio.ts`
- `extensions/qa-lab/src/temp-dir.test-helper.ts`
`tempWorkspace` is the stable useful primitive. One-shot temp targets and
sibling-temp helpers are lower-level implementation tools.
### Atomic writes
Representative use:
- config and session stores
- cron stores
- plugin install paths
- extension state files
Keep atomic replacement as a public fs-safe subpath. OpenClaw should use the
same canonical JSON/text helpers where possible instead of hand-picking lower
level atomic calls for ordinary JSON state.
### Regular, secure, and root file reads
These are not true duplicates:
- `root()` protects root-relative untrusted paths.
- regular-file helpers read trusted absolute paths with regular-file checks.
- secure-file helpers add ownership and mode checks for secret references.
Keep them separate. Document the trust boundary instead of hiding it behind one
generic "read file" helper.
### Archive helpers
Representative use:
- plugin install
- skill install
- marketplace and ClawHub archive flows
Keep as a separate fs-safe subpath. Do not leak archive entry plumbing into
OpenClaw core call sites unless the caller is actually validating archive
metadata.
## Target Design
### OpenClaw imports
Core OpenClaw code should use local policy wrappers:
- `src/infra/fs-safe.ts` for common root/error helpers
- `src/infra/json-files.ts` for the temporary JSON compatibility layer
- `src/infra/private-file-store.ts` until private stores are unified
- `src/infra/replace-file.ts` for low-level atomic replacement
- `src/infra/boundary-file-read.ts` for loader/package boundary reads
- `src/infra/archive.ts` for archive extraction policy
- `src/infra/file-lock-manager.ts` for the rare core service that needs
manager-style lock lifecycle/diagnostics
New direct imports from `@openclaw/fs-safe/*` should be reserved for:
- package-level utilities outside core that cannot import `src/infra`
- compatibility shims
- code that intentionally consumes a narrow fs-safe subpath, such as
`openclaw/plugin-sdk/file-lock` using `@openclaw/fs-safe/file-lock`
### Plugin SDK exports
Plugin SDK exports are contractual. Keep aliases even when OpenClaw internals
move to canonical names.
Mark older names as deprecated in types/docs when the replacement is stable:
- `readJsonFileWithFallback` -> `readJsonIfExists` or a store method
- `writeJsonFileAtomically` -> `writeJson`
- `loadJsonFile` -> `tryReadJson`
- `saveJsonFile` -> `writeJson`
- `readFileWithinRoot` -> `root(...).read*`
- `writeFileWithinRoot` -> `root(...).write`
### fs-safe stores
Move toward one store family:
```ts
const store = fileStore({
rootDir,
private: true,
mode: 0o600,
dirMode: 0o700,
});
```
or a thin alias:
```ts
const store = stateStore({ rootDir, private: true });
```
The store family should cover:
- `read`
- `readText`
- `readJson`
- `readTextIfExists`
- `readJsonIfExists`
- `write`
- `writeJson`
- `remove`
- `exists`
- `open`
- `copyIn`
- `writeStream`
- `pruneExpired`
This cleanup added that store shape in fs-safe, removed the unshipped
`privateStateStore` surface, and moved OpenClaw internals and bundled plugins
onto explicit store reads/writes.
### Temp
Keep stable public temp surface small:
```ts
await using workspace = await tempWorkspace({ prefix: "openclaw-" });
const target = workspace.path("payload.bin");
```
Move one-shot temp target helpers and sibling-temp helpers to advanced/internal
unless a concrete OpenClaw caller needs the public contract.
## Refactor Phases
### Phase 1: Inventory and Guards
- Add a small import-boundary test that lists allowed direct
`@openclaw/fs-safe/*` imports in OpenClaw core.
- Add regression tests for the JSON symlink behavior kept by
`src/infra/json-file.ts`.
- Add regression tests for public plugin SDK aliases that must keep resolving.
- Add a doc note to the plugin SDK runtime docs once aliases are marked
deprecated.
Exit criteria:
- The current compatibility surface is executable-tested.
- New direct fs-safe imports are visible in review.
### Phase 2: JSON Name Cleanup
- Convert OpenClaw internal callers from old JSON names to canonical fs-safe
names where the semantics are identical.
- Keep plugin SDK aliases unchanged.
- Collapse `src/infra/json-file.ts` and `src/infra/json-files.ts` into one
compatibility module if that reduces indirection without losing symlink
semantics.
- Keep `saveJsonFile` symlink-target behavior until every caller/test is
intentionally migrated.
Exit criteria:
- Core internal code no longer imports `readJsonFileStrict`,
`readDurableJsonFile`, or `writeJsonAtomic` unless it is a compatibility shim.
- Plugin SDK aliases still pass import/type tests.
### Phase 3: Store Unification
- Add the unified private mode to fs-safe's store API.
- Remove the unshipped `privateStateStore` surface instead of keeping a second
store family.
- Migrate OpenClaw private-state internals to the unified store shape in small
groups:
- auth/profile state
- device identity and device auth
- cron/run logs
- commitments
- extension state
- Regenerate the plugin SDK API baseline for the intentional pre-release
private-helper removal.
Exit criteria:
- OpenClaw internals and bundled plugins do not call standalone private
JSON/text helpers.
- `fileStore({ private: true })` is the only private multi-file store API.
### Phase 4: Temp Simplification
- Replace OpenClaw one-shot temp target call sites with `tempWorkspace`.
- Keep `resolvePreferredOpenClawTmpDir` as OpenClaw policy.
- Move one-shot temp and sibling-temp helpers out of the curated OpenClaw
wrapper surface.
Exit criteria:
- OpenClaw uses `tempWorkspace` for temporary file lifetimes unless a low-level
atomic helper owns the temp path.
### Phase 5: Shim Reduction
- Group one-line fs-safe shims into a smaller number of named OpenClaw policy
modules.
- Delete shims that are no longer imported.
- Keep shims that preserve public SDK names or OpenClaw-specific defaults.
Candidate stable shims:
- `src/infra/fs-safe.ts`
- `src/infra/json-files.ts`
- `src/infra/private-file-store.ts`
- `src/infra/replace-file.ts`
- `src/infra/boundary-file-read.ts`
- `src/infra/archive.ts`
Candidate advanced-only grouping:
- path guards
- symlink parent guards
- hardlink guards
- move-path helpers
- file identity helpers
- sibling temp helpers
Exit criteria:
- The local wrapper list has policy meaning, not one file per fs-safe module.
### Phase 6: fs-safe Public Surface Finalization
- Keep `@openclaw/fs-safe` main entry curated.
- Keep `root()` as the primary README/API story.
- Keep `openPinnedFileSync` internal. Use `readSecureFile`, `root().open`, or
`openRootFile*` wrappers instead of exposing the fd-level pinned primitive.
- Keep `createSidecarLockManager` internal. Public callers should use
`acquireFileLock` / `withFileLock`; `createFileLockManager` is subpath-only
for long-lived services that need held-lock inspection or drain/reset.
- Move rare root escape hatches such as `openWritable` to advanced only if API
checks show no supported caller needs the main root interface.
- Keep `regular-file`, `secure-file`, archive, and root helpers separate
because their trust models differ.
- Remove or mark unstable any standalone helper that is fully covered by root or
store methods.
Exit criteria:
- fs-safe has a stable pre-1.0 public surface.
- OpenClaw imports only stable fs-safe APIs outside compatibility shims.
## Verification
Use targeted proof per phase:
- JSON cleanup:
- JSON symlink tests
- plugin SDK JSON-store import tests
- representative extension tests that use JSON store aliases
- Store unification:
- private mode tests in fs-safe
- auth profile persistence tests
- device identity tests
- cron/run-log tests
- Temp cleanup:
- media temp tests
- Discord voice temp tests
- QA-lab temp helper tests
- Shim reduction:
- plugin SDK API generation/check
- import-boundary tests
- `pnpm build`
Before merging a broad cleanup batch, run the changed gate and build:
```sh
pnpm check:changed
pnpm build
```
Implementation proof from this cleanup:
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/plugin-sdk/temp-path.test.ts src/agents/models-config.write-serialization.test.ts src/infra/json-file.test.ts src/infra/json-files.test.ts`
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/device-auth-store.test.ts src/infra/device-identity.test.ts src/infra/exec-approvals.test.ts src/agents/models-config.write-serialization.test.ts src/agents/pi-embedded-runner/openrouter-model-capabilities.test.ts src/agents/harness/native-hook-relay.test.ts`
- `pnpm test src/infra/fs-safe-import-boundary.test.ts src/infra/hardlink-guards.test.ts src/infra/file-identity.test.ts src/plugin-sdk/fs-safe-compat.test.ts src/plugin-sdk/temp-path.test.ts`
- `pnpm plugin-sdk:api:check`
- `pnpm build`
- Blacksmith Testbox `pnpm install --frozen-lockfile --config.minimum-release-age=0 && pnpm check:changed`
- In `../fs-safe`: `pnpm docs:site && pnpm build && pnpm test test/api-coverage.test.ts test/new-primitives.test.ts`
## Review Checklist
- Does this change reduce a public name, local wrapper, or duplicated semantic
family?
- Is the old name public plugin SDK surface? If yes, keep a deprecated alias.
- Does the replacement preserve symlink, hardlink, mode, and missing-file
behavior?
- Is the caller using an untrusted relative path, trusted absolute path, secret
path, archive entry, or temp lifetime? Pick the helper that says that out
loud.
- Are docs and plugin SDK API snapshots updated when exported names change?

View File

@@ -1,320 +0,0 @@
---
summary: "Detailed API, event, runtime, cancellation, and tool-policy contract for the Talk refactor"
read_when:
- Implementing Talk Gateway methods or protocol schemas
- Changing Talk config, events, cancellation, or provider tool policy
- Reviewing whether a Talk behavior belongs in core or an adapter
title: "Talk API and runtime contract"
---
# Talk API And Runtime Contract
This is the detailed contract for [Talk refactor plan](/refactor/talk).
## Config Contract
Config stays under the existing `talk` object. Do not add `talk.speech` in this
refactor.
```ts
type TalkConfig = {
provider?: string;
providers?: Record<string, unknown>;
realtime?: {
provider?: string;
model?: string;
voice?: string;
mode?: TalkMode;
transport?: TalkTransport;
brain?: TalkBrain;
providers?: Record<string, unknown>;
};
input?: {
interruptOnSpeech?: boolean;
silenceTimeoutMs?: number;
};
};
```
Rules:
- `talk.provider` and `talk.providers.*` remain speech/STT/TTS provider config.
- `talk.realtime.provider` and `talk.realtime.providers.*` are realtime voice provider config.
- `talk.config` returns effective config without secrets unless privileged.
- `talk.catalog` returns capabilities, not inferred provider-id guesses.
- Doctor migrates old realtime selectors into `talk.realtime`.
- Runtime does not silently reinterpret Voice Call or TTS config as realtime config.
## Method Semantics
### `talk.catalog`
Returns effective Talk capabilities:
- modes
- transports
- brain strategies
- providers
- models
- voices
- input audio formats
- output audio formats
- browser-safe client session support
- Gateway relay support
- managed-room support
- local STT/TTS support
Provider capability declarations drive this. Core must not infer support from
provider ids.
### `talk.speak`
One-shot TTS:
```ts
await gateway.request("talk.speak", {
text: "Ready.",
voice: "alloy",
});
```
`talk.speak` does not create live session state, turn state, transcript state,
barge-in state, or provider realtime state.
### `talk.client.create`
Creates a client-owned provider session while Gateway still owns config,
instructions, credentials, and tool policy.
Use it for browser WebRTC, browser provider WebSocket, and native provider media
sessions that require client-owned sockets. Reject `gateway-relay` and
`managed-room`; the error points clients to `talk.session.create`.
### `talk.client.toolCall`
Forwards provider tool calls from client-owned provider sessions to Gateway
policy:
```ts
await gateway.request("talk.client.toolCall", {
sessionId,
callId,
name,
argumentsJson,
});
```
Validate session identity, caller ownership, brain strategy, and policy. Pass an
`AbortSignal` into agent/tool runtime, reject stale or closed sessions, and never
accept request-time instructions.
### `talk.session.create`
Creates a Gateway-owned live Talk session.
| Mode | Transport | Brain | Owner |
| --------------- | --------------- | --------------- | ------------------- |
| `realtime` | `gateway-relay` | `agent-consult` | Gateway |
| `transcription` | `gateway-relay` | `none` | Gateway |
| `stt-tts` | `managed-room` | `agent-consult` | Gateway/client room |
| `stt-tts` | `managed-room` | `direct-tools` | trusted room |
Reject `webrtc` and `provider-websocket`; the error points clients to
`talk.client.create`.
### `talk.session.join`
Joins or reconnects to a Gateway-owned managed room. Validate session id and
token, never expose token hashes, emit `session.replaced` to the displaced
client, and emit `session.ready` to the new owner.
### `talk.session.appendAudio`
Appends an input audio frame to a Gateway-owned relay session:
```ts
await gateway.request("talk.session.appendAudio", {
sessionId,
audioBase64,
timestamp,
});
```
Use for realtime Gateway relay and streaming transcription. Do not use this for
managed-room native push-to-talk when the native node captures audio locally and
returns transcript/output through node command results.
### Turn Verbs
Use explicit verbs instead of generic controls:
```ts
await gateway.request("talk.session.startTurn", { sessionId });
await gateway.request("talk.session.endTurn", { sessionId, turnId });
await gateway.request("talk.session.cancelTurn", { sessionId, turnId, reason });
await gateway.request("talk.session.cancelOutput", { sessionId, turnId, reason });
```
`endTurn` rejects stale `turnId` before clearing active state. `cancelTurn`
aborts capture, STT, provider response, agent consult, tools, TTS, relay output,
and room streams tied to that turn. `cancelOutput` stops assistant audio without
necessarily ending the user turn. Barge-in must be speech/VAD gated.
### `talk.session.submitToolResult`
Completes a provider tool call emitted inside a Gateway-owned relay session:
```ts
await gateway.request("talk.session.submitToolResult", {
sessionId,
callId,
output,
});
```
### `talk.session.close`
Closes a Gateway-owned session. Close emits one terminal event, stops capture and
playback, aborts provider and agent work, drains TTS, revokes room join state,
and removes retained state after its replay/debug window.
## Event Contract
All live Talk paths emit one public event channel:
```ts
talk.event;
```
Every event uses this envelope:
```ts
type TalkEvent<TPayload = unknown> = {
id: string;
type: TalkEventType;
sessionId: string;
turnId?: string;
captureId?: string;
seq: number;
timestamp: string;
mode: TalkMode;
transport: TalkTransport;
brain: TalkBrain;
provider?: string;
final?: boolean;
callId?: string;
itemId?: string;
parentId?: string;
source?: string;
payload: TPayload;
};
```
Core event types include `session.*`, `turn.*`, `capture.*`, `input.audio.*`,
`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`,
`latency.metrics`, and `health.changed`.
Rules:
- `sessionId` is required for every event.
- `turnId` is required for turn-bound input, output, transcript, tool, and cancellation events.
- `captureId` is required while capture is active.
- `seq` monotonically increases per session.
- `timestamp` uses ISO 8601 UTC.
- `callId`, `itemId`, and `parentId` correlate provider responses, tool calls, TTS jobs, and relay frames.
- payloads must not duplicate large raw audio frames when transport already carries them.
- consumers should rely on envelope fields instead of provider-specific payloads.
Text-ready is not audio-ready. Clients may show text after `output.text.done`,
but should not enter speaking/playback state until `output.audio.started` or
`output.audio.delta`.
## Shared Runtime Target
Keep one provider-agnostic runtime under `src/talk`. The first pass keeps names
close to the old runtime modules so the move stays reviewable:
```text
src/talk/
audio-codec.ts
agent-consult-runtime.ts
agent-consult-tool.ts
agent-talkback-runtime.ts
fast-context-runtime.ts
provider-registry.ts
provider-resolver.ts
provider-types.ts
session-log-runtime.ts
session-runtime.ts
talk-events.ts
talk-session-controller.ts
```
New code should import the shared runtime from `src/talk` inside core. Plugins
that already use the stable SDK subpath keep importing
`openclaw/plugin-sdk/realtime-voice`; that facade re-exports the Talk runtime
contract without exposing core file layout.
Responsibilities:
- normalize modes, transports, brains, codecs, and audio metadata
- create, close, and replace session records
- allocate turn ids and capture ids
- reject stale turn ids before mutation
- sequence events
- retain recent events for replay, reconnect, and diagnostics
- track active input capture and assistant output
- coordinate barge-in and output cancellation
- propagate abort signals
- register provider tool calls and bind tool results
- expose test builders for session/event assertions
Gateway method files should become thin adapters:
```text
src/gateway/server-methods/
talk.ts
talk-client.ts
talk-session.ts
```
Internal Gateway helpers may exist only as staging files while code moves to
`src/talk`.
## Cancellation Contract
Cancellation must abort underlying work, not only ignore stale output.
When a turn or session is cancelled:
- provider realtime response is cancelled when supported
- provider session is closed or reset when cancellation cannot be scoped
- streaming STT receives abort
- agent consult receives abort
- queued tools do not start after abort
- already-started side-effecting tools receive abort and report cancellation
- pending TTS jobs are drained
- playback sources are stopped
- relay streams are cleared
- managed-room capture and output state reset
- stale finals and stale audio deltas are ignored
- one terminal cancellation event is emitted
Barge-in uses VAD or provider speech-started signals, ignores silence and echo,
cancels output only after real user speech, and starts or ensures a turn before
emitting `turn.cancelled`.
## Tool Policy Contract
Gateway owns Talk tool policy.
Client-owned flow: `talk.client.create`, provider tool call to client,
`talk.client.toolCall`, Gateway policy validation, agent/direct-tool execution,
client result submission to provider.
Gateway-owned flow: `talk.session.create`, provider tool call to Gateway,
Gateway policy validation, agent/direct-tool execution, provider result
submission, `talk.event` emission.
No Talk path accepts caller-provided instructions. Gateway builds instructions
from trusted config and session context.

View File

@@ -1,229 +0,0 @@
---
summary: "Implementation packages, deletion checklist, test matrix, and verification commands for the Talk refactor"
read_when:
- Implementing the Talk refactor plan
- Deleting legacy Talk RPCs, event channels, or realtime endpoint code
- Verifying browser, native, telephony, meeting, STT, or TTS Talk behavior after refactor work
title: "Talk refactor execution checklist"
---
# Talk Refactor Execution Checklist
Use this as the PR tracker for [Talk refactor plan](/refactor/talk).
## Implementation Packages
### Package 1: Protocol
- update `src/gateway/protocol/schema/channels.ts`
- update `src/gateway/protocol/schema/protocol-schemas.ts`
- update `src/gateway/protocol/schema/types.ts`
- update `src/gateway/protocol/index.ts`
- regenerate generated protocol clients
- remove old schemas from generated metadata
- update protocol tests
Done when old RPC/event names are absent from generated protocol output.
### Package 2: Gateway Methods
- split client-owned methods into `talk-client.ts`
- keep session-owned methods in `talk-session.ts`
- keep catalog/config/speak/mode in `talk.ts`
- classify every new method in method scopes
- advertise only `talk.event` in hello event features
- remove old method list entries
- update authorization tests
Done when every public Talk method has an explicit scope.
### Package 3: Session Runtime
- add `src/talk` primitives
- move event sequencing into shared runtime
- move stale-turn rejection into shared runtime
- move active output state into shared runtime
- move cancellation bookkeeping into shared runtime
- expose small test helpers
Done when relay, transcription, handoff, telephony, and meetings do not each
invent event and turn bookkeeping.
### Package 4: Browser UI
- update realtime startup to `talk.client.create`
- update realtime tool consult to `talk.client.toolCall`
- update relay startup to `talk.session.create`
- update relay audio to `talk.session.appendAudio`
- update relay tool result to `talk.session.submitToolResult`
- update relay output cancel to `talk.session.cancelOutput`
- update relay close to `talk.session.close`
- listen only to `talk.event`
- remove relay mark RPC
Done when UI tests prove no removed RPC names remain.
### Package 5: Native And Nodes
- route native Talk through session events
- map push-to-talk commands to managed-room turn lifecycle
- clean capture state on failed start
- keep local STT/TTS as adapter behavior
- remove chat-history polling from the success path
- keep fallback polling only if explicitly needed
Done when native voice success path is event-driven.
### Package 6: Voice Call
- map telephony realtime events into `talk.event`
- map local speech detection to `startTurn`, `cancelOutput`, and `cancelTurn`
- pass abort through agent consult and tools
- keep marks, clear, u-law, and call lifecycle in the plugin
- add tests for early speech before provider speech-started
Done when Voice Call shares event and cancellation semantics without leaking
telephony into core.
### Package 7: Meetings
- map meeting speech and transcript state into `talk.event`
- keep participant and room state in meeting adapter
- add echo-suppression aware barge-in tests
- ensure meeting adapters can choose realtime, transcription, or `stt-tts`
Done when meeting behavior is an adapter over Talk, not a parallel realtime loop.
### Package 8: Doctor And Migration
- detect old realtime selectors outside `talk.realtime`
- write explicit `talk.realtime.provider`, `model`, `voice`, `transport`, and `brain`
- report removed RPC names when logs show old clients
- keep startup free of hidden config rewrites
- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs
Done when runtime config is explicit and docs mention removed API only in
migration notes.
## Deletion Checklist
Delete or prove absent:
- `src/gateway/voiceclaw-realtime/`
- `/voiceclaw/realtime`
- `instructionsOverride`
- `talk.realtime.*` public RPCs
- `talk.transcription.*` public RPCs
- `talk.handoff.*` public RPCs
- `talk.session.inputAudio`
- `talk.session.control`
- `talk.session.toolResult`
- `talk.realtime.relay`
- `talk.transcription.relay`
- old generated protocol models
- old UI relay method calls
Keep only these old names in explicit migration tables.
## Test Matrix
Protocol:
- final methods exist in protocol schemas
- removed methods are absent from protocol schemas
- final event is advertised in hello features
- removed events are absent from broadcast guards
- generated clients match schema
- request-time instruction override is rejected or impossible by schema
Gateway:
- `talk.client.create` creates WebRTC session result
- `talk.client.create` creates provider WebSocket session result
- `talk.client.create` rejects Gateway-owned transports
- `talk.client.toolCall` validates caller, session, brain, and policy
- `talk.session.create` creates realtime Gateway relay
- `talk.session.create` creates transcription relay
- `talk.session.create` creates STT/TTS managed room
- `talk.session.create` rejects client-owned transports
- `talk.session.join` replacement notifies displaced client
- `talk.session.appendAudio` routes to relay/transcription session
- `talk.session.startTurn` starts managed-room turn
- `talk.session.endTurn` rejects stale turn ids
- `talk.session.cancelTurn` aborts provider, agent, tools, TTS, and streams
- `talk.session.cancelOutput` cancels playback only
- `talk.session.submitToolResult` binds to provider call id
- `talk.session.close` emits terminal event and releases resources
Browser:
- WebRTC path calls `talk.client.create`
- provider WebSocket path calls `talk.client.create`
- provider tool calls use `talk.client.toolCall`
- Gateway relay uses only `talk.session.*`
- Gateway relay listens only to `talk.event`
- barge-in requires speech/VAD
- relay close rejects or aborts pending consult runs
- no removed RPC names in UI tests
Native:
- push-to-talk start emits capture/turn events
- failed push-to-talk start cleans capture state
- cancel clears capture and output state
- STT/TTS success path is event-driven
- fallback polling is explicit and tested if kept
- node policy rejects untrusted Talk commands
Telephony:
- early speech before provider speech-started creates or guards turn before cancellation
- marks and clear events map to output state
- u-law codec stays adapter-owned
- cancellation aborts consult run
- closed call prevents stale tool result submission
Meetings:
- participant context appears as metadata, not core branching
- echo suppression prevents false barge-in
- transcript events use common envelope
- meeting close aborts active work
Architecture:
- no removed public RPC names in protocol metadata
- no retired realtime endpoint route
- no retired realtime folder
- no request-time instruction override field
- no core branches on app platform names
- provider behavior comes from capabilities
## Verification Commands
Focused local loop:
```sh
pnpm test src/gateway/protocol/index.test.ts
pnpm test src/gateway/server-methods/talk.test.ts
pnpm test src/gateway/method-scopes.test.ts src/gateway/server-methods-list.test.ts
pnpm test src/gateway/talk-realtime-relay.test.ts src/gateway/talk-transcription-relay.test.ts
pnpm test ui/src/ui/realtime-talk.test.ts ui/src/ui/realtime-talk-gateway-relay.test.ts ui/src/ui/realtime-talk-webrtc.test.ts ui/src/ui/realtime-talk-google-live.test.ts
pnpm exec oxfmt --check --threads=1 docs/refactor/talk.md docs/refactor/talk-execution.md
```
Generation and docs:
```sh
pnpm protocol:gen && pnpm protocol:gen:swift
pnpm docs:check-mdx
pnpm plugin-sdk:api:check
```
Broad gate before push:
```sh
pnpm check:changed
```
Use Testbox for broad gates on maintainer machines.

View File

@@ -1,128 +0,0 @@
---
summary: "Surface adapter plan for browser, native, walkie-talkie, telephony, and meeting Talk refactor work"
read_when:
- Updating browser realtime Talk, native Talk, walkie-talkie handoff, Voice Call, or meeting voice code
- Deciding whether a Talk behavior belongs in an adapter or shared runtime
title: "Talk surface mapping"
---
# Talk Surface Mapping
This maps product surfaces into [Talk refactor plan](/refactor/talk) primitives.
## Browser
WebRTC:
- call `talk.client.create`
- open provider media connection in browser
- forward provider tool calls through `talk.client.toolCall`
- receive provider audio through provider media/data channel
Provider WebSocket:
- call `talk.client.create`
- connect using constrained provider result
- keep provider-specific framing in the browser adapter
- forward tool calls through `talk.client.toolCall`
Gateway relay:
- call `talk.session.create`
- send PCM frames with `talk.session.appendAudio`
- listen only to `talk.event`
- submit tool results with `talk.session.submitToolResult`
- barge-in with `talk.session.cancelOutput`
- close with `talk.session.close`
## Native And Nodes
Native apps map local audio lifecycle into Talk primitives.
Native realtime:
- use `talk.client.create` when the app owns provider media
- use `talk.session.create` when Gateway owns provider relay
Native STT/TTS:
- use `talk.session.create({ mode: "stt-tts", transport: "managed-room" })`
- keep local STT and local TTS behind native adapters
- drive success path from Talk events
- keep history polling only as a degraded fallback if explicitly tested
Native push-to-talk:
- press maps to `talk.session.startTurn`
- release maps to `talk.session.endTurn`
- cancel maps to `talk.session.cancelTurn`
- node capture commands emit capture events
- failed start cleans capture state
- opening voice UI never mutates global Talk config
Trusted node command adapters may remain:
```ts
talk.ptt.start;
talk.ptt.stop;
talk.ptt.cancel;
talk.ptt.once;
```
## Walkie-Talkie
Walkie-talkie is managed-room Talk:
```ts
await gateway.request("talk.session.create", {
mode: "stt-tts",
transport: "managed-room",
brain: "agent-consult",
sessionKey,
});
```
Then:
- client joins with `talk.session.join`
- press calls `talk.session.startTurn`
- release calls `talk.session.endTurn`
- cancel calls `talk.session.cancelTurn`
- assistant speech emits `output.text.*` and `output.audio.*`
- replacement emits `session.replaced` to old owner
- close calls `talk.session.close`
Room state includes canonical session id, route/channel target, caller identity,
mode, transport, brain, provider, model, voice, locale, expiry, token hash,
active client id, active turn id, and replacement state.
Two simultaneous rooms must not share turn ids, transcripts, audio output, or
cancellation tokens.
## Telephony
Voice Call becomes a telephony adapter over Talk semantics.
Keep telephony-owned: Twilio/Plivo WebSocket contracts, stream ids, call ids,
G.711 u-law, marks, clear events, backpressure, phone call lifecycle, and inbound
speech detection quirks.
Move shared behavior to Talk: event envelope, turn ids, cancellation, agent
consult abort, tool policy, usage and latency metrics, and output state.
Telephony should emit `talk.event` for observability, even if phone media
remains plugin-owned.
## Meetings
Google Meet and future meeting integrations become meeting adapters over Talk
semantics.
Keep meeting-owned: meeting join/leave, participant identity, room permissions,
echo suppression, transcript context, and meeting-specific mute/deafen behavior.
Move shared behavior to Talk: turn lifecycle, transcript events, assistant output
events, tool policy, cancellation, and metrics.
Meeting adapters may run `transcription`, `stt-tts`, or `realtime` depending on
provider support.

View File

@@ -1,499 +0,0 @@
---
summary: "Breaking refactor plan for one Talk architecture across realtime voice, STT/TTS, browser, native, telephony, meetings, and walkie-talkie handoff"
read_when:
- Refactoring Talk mode, realtime voice, voice-call, Google Meet, browser realtime voice, native push-to-talk, STT, or TTS
- Changing Talk Gateway protocol, provider contracts, realtime transports, managed rooms, audio events, cancellation, or tool policy
- Deciding whether a voice feature belongs in core, a provider plugin, a native app, a meeting adapter, or a telephony adapter
title: "Talk refactor plan"
---
# Talk Refactor Plan
This is the breaking-clean plan for unifying every live voice path behind one
Talk architecture.
The old architecture grew by product surface: browser realtime, Gateway relay,
managed native handoff, streaming transcription, Voice Call, Google Meet, local
STT/TTS, one-shot TTS, and a retired realtime WebSocket endpoint each learned
their own names for sessions, turns, capture, output, barge-in, tool calls,
cancellation, and transcript events.
The new architecture grows by primitive. There is one public Talk API, one
event envelope, one turn model, one cancellation contract, one provider policy
boundary, and one place for shared runtime state. Browser, native, telephony,
meetings, and walkie-talkie become adapters over those primitives.
## Product Target
OpenClaw supports three Talk products:
| Product | User experience | Mode |
| --------------------- | ----------------------------------------------------------------------- | --------------- |
| Realtime conversation | Low-latency duplex speech with interruption and provider tool calls | `realtime` |
| Walkie-talkie | Press or hold to speak, release, then hear OpenClaw answer | `stt-tts` |
| Transcription | Live captions, dictation, notes, meeting transcript, no assistant audio | `transcription` |
All three products share session identity, join/reconnect state, turn and
capture ids, input audio metadata, output text/audio state, transcript finality,
tool-call correlation, cancellation, replay, provider capabilities, policy,
auth, and observability.
One-shot uploaded audio and one-shot TTS do not need live Talk session state
unless they participate in live capture, turns, interruption, replay, or
cancellation.
## Hard Decisions
This refactor intentionally removes compatibility that would keep the design
muddy:
- remove public `talk.realtime.*` RPCs
- remove public `talk.transcription.*` RPCs
- remove public `talk.handoff.*` RPCs
- remove generic `talk.session.inputAudio`, `talk.session.control`, and
`talk.session.toolResult`
- remove old relay event channels
- remove `/voiceclaw/realtime`
- remove `src/gateway/voiceclaw-realtime/`
- remove request-time instruction overrides
- keep `talk.speak` as one-shot TTS, not a live session API
- keep legacy realtime config repair in doctor, not startup
- keep platform and product names out of core branching
## Vocabulary
Keep mode, transport, brain, and surface separate.
```ts
type TalkMode = "realtime" | "stt-tts" | "transcription";
type TalkTransport = "webrtc" | "provider-websocket" | "gateway-relay" | "managed-room";
type TalkBrain = "agent-consult" | "direct-tools" | "none";
```
### Modes
`realtime` means a provider owns a live voice session. Audio goes in, audio
comes out, interruptions are possible, and provider tool calls may happen during
one provider session.
`stt-tts` means input speech is transcribed, OpenClaw answers as text, and TTS
renders the answer. This is the native Talk and walkie-talkie path when a full
duplex provider session is not the right shape.
`transcription` means speech-to-text without assistant audio output. It covers
captions, dictation, notes, meeting transcript capture, and live voice-note
ingestion.
### Transports
`webrtc` is client-owned SDP/media/data-channel transport. It fits browser-owned
OpenAI Realtime sessions with ephemeral credentials.
`provider-websocket` is client-owned provider JSON and audio framing. It fits
browser-owned Google Live style sessions.
`gateway-relay` means the Gateway owns the provider connection. The client sends
authenticated audio frames to the Gateway and receives `talk.event` plus audio
output through Gateway-managed relay state.
`managed-room` means the Gateway owns a room-like session that clients can join,
replace, and drive with explicit turn verbs. It is the primitive for
walkie-talkie and native handoff.
Telephony and meetings are not core transports. They are adapters that map
phone or meeting media into `gateway-relay`, `managed-room`, or `stt-tts` while
keeping call and meeting lifecycle outside core.
### Brain Strategies
`agent-consult` means provider tool calls or session turns consult an OpenClaw
agent. Gateway owns prompt construction, context selection, authorization, abort
signals, and final result delivery.
`direct-tools` means a trusted first-party surface can call selected OpenClaw
tools directly through Gateway policy. Keep this privileged.
`none` means transcription-only, external orchestration, or no OpenClaw tool
access.
## Ownership Boundaries
Core owns generic Talk semantics:
- mode, transport, brain, codec, and audio descriptors
- session records and session ownership
- turn ids and capture ids
- event envelope, sequencing, replay, and stale-output suppression
- active capture state
- active assistant output state
- replacement and reconnect state
- cancellation propagation
- tool policy and tool-call correlation
- usage, latency, and health events
Provider plugins own vendor behavior:
- OpenAI Realtime SDP and data-channel details
- Google Live WebSocket framing
- streaming STT provider details
- TTS provider details
- provider auth, model, voice, codec, and resume quirks
- provider capability declarations
Surface adapters own IO and product quirks:
- browser capture and playback
- native audio sessions, local speech engines, and foreground Talk UX
- node command dispatch
- telephony media streams, marks, clear events, u-law, and call lifecycle
- meeting join/leave, participants, echo suppression, and authorization
Core may store optional surface metadata for diagnostics. Core must not branch
on browser, iOS, Android, macOS, Google Meet, Voice Call, or any retired product
name.
## Final Gateway API
The public Gateway surface is deliberately small:
```ts
// Discovery and configuration.
talk.catalog;
talk.config;
// One-shot speech output.
talk.speak;
// Client-owned provider sessions.
talk.client.create;
talk.client.toolCall;
// Gateway-owned live sessions.
talk.session.create;
talk.session.join;
talk.session.appendAudio;
talk.session.startTurn;
talk.session.endTurn;
talk.session.cancelTurn;
talk.session.cancelOutput;
talk.session.submitToolResult;
talk.session.close;
// Events and foreground node mode.
talk.event;
talk.mode;
```
Use `talk.client.*` when the client owns provider media transport. Use
`talk.session.*` when the Gateway owns live session state.
`talk.mode` is the existing foreground node mode broadcast. It can stay, but it
is not part of the Talk session control API.
### Supported Creation Matrix
| Method | Mode | Transport | Brain | Owner |
| --------------------- | --------------- | -------------------- | --------------- | ------- |
| `talk.client.create` | `realtime` | `webrtc` | `agent-consult` | client |
| `talk.client.create` | `realtime` | `provider-websocket` | `agent-consult` | client |
| `talk.session.create` | `realtime` | `gateway-relay` | `agent-consult` | Gateway |
| `talk.session.create` | `transcription` | `gateway-relay` | `none` | Gateway |
| `talk.session.create` | `stt-tts` | `managed-room` | `agent-consult` | Gateway |
| `talk.session.create` | `stt-tts` | `managed-room` | `direct-tools` | Gateway |
Reject combinations that blur ownership. `talk.client.create` must reject
Gateway-owned transports. `talk.session.create` must reject client-owned
transports.
## Removed API
Remove these names from handlers, method lists, scopes, protocol schemas,
generated clients, broadcast guards, tests, and docs except explicit migration
tables:
| Removed | Replacement |
| ------------------------------- | -------------------------------------------------------- |
| `talk.realtime.session` | `talk.client.create` |
| `talk.realtime.toolCall` | `talk.client.toolCall` |
| `talk.realtime.relayAudio` | `talk.session.appendAudio` |
| `talk.realtime.relayCancel` | `talk.session.cancelOutput` or `talk.session.cancelTurn` |
| `talk.realtime.relayMark` | internal relay output state |
| `talk.realtime.relayToolResult` | `talk.session.submitToolResult` |
| `talk.realtime.relayClose` | `talk.session.close` |
| `talk.realtime.relay` | `talk.event` |
| `talk.transcription.session` | `talk.session.create({ mode: "transcription" })` |
| `talk.transcription.audio` | `talk.session.appendAudio` |
| `talk.transcription.cancel` | `talk.session.cancelTurn` |
| `talk.transcription.close` | `talk.session.close` |
| `talk.transcription.relay` | `talk.event` |
| `talk.handoff.create` | `talk.session.create({ transport: "managed-room" })` |
| `talk.handoff.join` | `talk.session.join` |
| `talk.handoff.revoke` | `talk.session.close` |
| `talk.session.inputAudio` | `talk.session.appendAudio` |
| `talk.session.control` | explicit turn/output verbs |
| `talk.session.toolResult` | `talk.session.submitToolResult` |
Delete this endpoint:
```text
/voiceclaw/realtime
```
Delete this folder:
```text
src/gateway/voiceclaw-realtime/
```
Do not leave a compatibility namespace around retired code.
## Target Source Layout
Shared runtime:
```text
src/talk/
audio-codec.ts
agent-consult-runtime.ts
agent-consult-tool.ts
agent-talkback-runtime.ts
fast-context-runtime.ts
provider-registry.ts
provider-resolver.ts
provider-types.ts
session-log-runtime.ts
session-runtime.ts
talk-events.ts
talk-session-controller.ts
```
Gateway adapters:
```text
src/gateway/server-methods/
talk.ts # catalog, config, speak, mode, composition
talk-client.ts # client-owned provider sessions
talk-session.ts # Gateway-owned live sessions
```
Gateway relay helpers can exist while the code moves, but the long-term shape
is that relay, transcription, and handoff state use `src/talk` primitives
instead of each reimplementing turns and events.
Public SDK:
```text
src/plugin-sdk/realtime-voice.ts
```
Keep this SDK subpath as the stable plugin import facade. It may re-export
Talk runtime contracts, but plugin authors should not import core file layout.
## Event Contract
All live paths emit `talk.event` with the envelope defined in
[Talk API and runtime contract](/refactor/talk-api-contract). The required
shape is: `id`, `type`, `sessionId`, `seq`, `timestamp`, `mode`, `transport`,
`brain`, and `payload`, with `turnId`, `captureId`, `callId`, `itemId`, and
`parentId` when the event is tied to turn, capture, provider item, tool call, or
TTS output.
Core event families are `session.*`, `turn.*`, `capture.*`, `input.audio.*`,
`transcript.*`, `output.text.*`, `output.audio.*`, `tool.*`, `usage.metrics`,
`latency.metrics`, and `health.changed`. Payloads must not duplicate large raw
audio frames when the transport already carries them. Text-ready is not
audio-ready; clients enter playback state only on audio events.
## Cancellation Contract
Cancellation must abort underlying work, not only ignore stale output.
When a turn or session is cancelled:
- provider realtime response is cancelled when supported
- provider session is closed or reset when cancellation cannot be scoped
- streaming STT receives abort
- agent consult receives abort
- queued tools do not start after abort
- already-started side-effecting tools receive abort and report cancellation
- pending TTS jobs are drained
- playback sources are stopped
- relay streams are cleared
- managed-room capture and output state reset
- stale finals and stale audio deltas are ignored
- one terminal cancellation event is emitted
Barge-in requires real speech: provider speech-started, local VAD, or an
adapter-owned speech detector. Silence, echo, or microphone buffers alone must
not cancel assistant output.
## Config Contract
Config stays under `talk`; do not add `talk.speech`. `talk.provider` and
`talk.providers.*` remain speech/STT/TTS provider config. Realtime selectors
live under `talk.realtime.provider`, `talk.realtime.providers.*`, `model`,
`voice`, `mode`, `transport`, and `brain`.
`talk.config` returns effective config without secrets unless privileged.
`talk.catalog` returns provider capabilities, not inferred provider-id guesses.
Doctor migrates old realtime placement into `talk.realtime`; runtime startup
does not reinterpret Voice Call, STT, or TTS config as realtime config.
## Surface Mapping
| Surface | Talk mapping |
| ------------------------------- | ----------------------------------------------------------------------------------------------------- |
| Browser WebRTC | `talk.client.create`, client-owned provider media, `talk.client.toolCall` for provider tool calls |
| Browser provider WebSocket | `talk.client.create`, browser-owned provider framing, Gateway-owned credentials and policy |
| Browser Gateway relay | `talk.session.create`, `appendAudio`, `submitToolResult`, `cancelOutput`, `close`, and `talk.event` |
| Native push-to-talk | `stt-tts` plus `managed-room`; press/startTurn, release/endTurn, cancel/cancelTurn |
| Walkie-talkie | managed-room join/replacement plus shared turn/output events |
| Voice Call | telephony adapter over Talk events; call ids, stream ids, u-law, marks, clear events stay plugin side |
| Google Meet and future meetings | meeting adapter over Talk events; participant state, permissions, mute, and echo suppression stay out |
See [Talk surface mapping](/refactor/talk-surfaces) for the adapter-level
rules.
## Detailed Refactor Phases
### Phase 1: Protocol Is The Source Of Truth
- define final `talk.client.*`, `talk.session.*`, `talk.event`, `talk.catalog`, `talk.config`, `talk.speak`, and `talk.mode`
- delete removed RPCs from method lists and generated metadata
- delete removed event channels from hello feature advertising
- classify every final method in `METHOD_SCOPE_GROUPS`
- regenerate TypeScript and Swift protocol clients
- add protocol tests proving removed names are absent
Exit criteria: generated clients expose only the final public Talk API.
### Phase 2: Shared Runtime Becomes `src/talk`
- move provider-agnostic realtime voice modules into `src/talk`
- keep the plugin SDK facade at `openclaw/plugin-sdk/realtime-voice`
- rename logs and tests from realtime-voice wording to Talk wording where that improves clarity
- centralize event sequencing, active turn state, capture state, output state, stale-turn rejection, and replay history
- keep provider adapters out of this folder
Exit criteria: core and bundled surfaces import shared semantics from `src/talk`
or the SDK facade, not from surface-local helpers.
### Phase 3: Gateway Method Split
- make `talk.ts` a composition point for catalog, config, speak, mode, client, and session handlers
- put client-owned provider session methods in `talk-client.ts`
- put Gateway-owned session methods in `talk-session.ts`
- make relay, transcription, and managed-room handlers thin adapters over shared runtime primitives
- route session replacement notifications to the displaced connection
- reject stale turn completion before mutating active room state
Exit criteria: public RPC handlers read like API adapters, not separate Talk
implementations.
### Phase 4: Browser UI Uses The Final API
- update WebRTC and provider WebSocket startup to `talk.client.create`
- update browser provider tool calls to `talk.client.toolCall`
- update Gateway relay startup to `talk.session.create`
- update relay audio to `talk.session.appendAudio`
- update relay tool result submission to `talk.session.submitToolResult`
- update relay close to `talk.session.close`
- listen only to `talk.event`
- handle aborted consult runs immediately instead of timing out
- gate relay barge-in on speech or VAD
Exit criteria: UI tests contain no calls to removed Talk RPC names.
### Phase 5: Native And Nodes Become Event-Driven
- map native push-to-talk into managed-room sessions
- start, end, cancel, and replace turns through explicit session verbs
- clean capture state when push-to-talk start fails
- keep local STT and TTS as native adapter behavior
- remove chat-history polling from the success path
- keep fallback polling only if there is an explicit degraded-mode test
Exit criteria: native Talk success path is driven by `talk.event`, not hidden
chat side effects.
### Phase 6: Telephony And Meetings Become Adapters
- map Voice Call realtime and streaming STT into Talk event/cancellation semantics
- create or guard a turn before early speech cancellation events
- keep telephony codec, marks, clear events, and call lifecycle outside core
- map Google Meet transcript and assistant output into `talk.event`
- keep participant and echo-suppression behavior in the meeting adapter
- pass abort signals into agent consult and tool runtime
Exit criteria: Voice Call and meetings share event and cancellation semantics
without introducing telephony or meeting branches in core.
### Phase 7: Config And Doctor Cleanup
- keep `talk.provider` and `talk.providers.*` as speech/STT/TTS config
- keep realtime voice selectors under `talk.realtime`
- make `talk.config` return only resolved effective provider data
- repair legacy realtime placement in doctor
- document that runtime startup does not guess or rewrite config
- update SDK migration, Gateway protocol, Talk node, Control UI, and TTS docs
Exit criteria: no second speech namespace, no startup migrations, and no
ambiguous active provider in `talk.config`.
### Phase 8: Delete The Retired Stack
- remove `/voiceclaw/realtime`
- delete `src/gateway/voiceclaw-realtime/`
- remove request-time `instructionsOverride`
- remove old RPC handlers, scopes, broadcast guards, protocol schemas, generated clients, docs, and UI calls
- keep old names only in explicit migration tables and negative tests
Exit criteria: repository search finds removed public names only in migration
notes or tests that assert absence.
## Test And Verification Plan
The full matrix lives in
[Talk refactor execution checklist](/refactor/talk-execution). The required
proof areas are:
- protocol and generated clients expose only the final Talk API
- Gateway tests cover every `talk.client.*` and `talk.session.*` method
- UI tests prove browser WebRTC, provider WebSocket, and relay paths use the final API
- native tests prove managed-room push-to-talk cleanup, replacement, and event flow
- Voice Call and meeting tests prove early speech, barge-in, output state, and cancellation behavior
- config tests prove `talk.config` reports only resolved effective provider data
- architecture searches prove removed RPCs, events, endpoint, folder, and instruction override stay gone
- docs, protocol generation, SDK API checks, Android tests, build, and `pnpm check:changed` pass before push
## Definition Of Done
The refactor is complete when:
- final API is the only advertised public API
- removed RPCs are gone from handlers, scopes, method lists, schemas, generated clients, docs, and UI
- removed event channels are gone
- retired realtime HTTP endpoint is gone
- retired realtime folder is gone
- browser Talk works through `talk.client.*` or `talk.session.*`
- native Talk works through session events
- streaming STT works through `talk.session.*`
- TTS one-shot remains `talk.speak`
- walkie-talkie works through managed-room sessions
- Voice Call and meetings use shared events and cancellation semantics
- cancellation aborts underlying work
- event envelopes are consistent
- config migration is handled by doctor
- tests prove the deleted API cannot accidentally return
Supporting details:
- [Talk API and runtime contract](/refactor/talk-api-contract)
- [Talk surface mapping](/refactor/talk-surfaces)
- [Talk refactor execution checklist](/refactor/talk-execution)
The end state: one Talk system, a small public API, provider-owned vendor
logic, surface-owned IO, and a Gateway core that owns policy, events, sessions,
turns, cancellation, and observability.