mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-14 19:40:40 +00:00
* refactor discord thread bindings to idle and max-age lifecycle * fix: migrate legacy thread binding expiry and reduce hot-path disk writes * refactor: remove remaining thread-binding ttl legacy paths * fix: harden thread-binding lifecycle persistence * Discord: fix thread binding types in message/reply paths * Infra: handle win32 unknown inode in file identity checks * Infra: relax win32 guarded-open identity checks * Config: migrate threadBindings ttlHours to idleHours * Revert "Infra: relax win32 guarded-open identity checks" This reverts commitde94126771. * Revert "Infra: handle win32 unknown inode in file identity checks" This reverts commit96fc5ddfb3. * Discord: re-read live binding state before sweep unbind * fix: add changelog note for thread binding lifecycle update (#27845) (thanks @osolmaz) --------- Co-authored-by: Onur Solmaz <onur@textcortex.com>
801 lines
29 KiB
Markdown
801 lines
29 KiB
Markdown
---
|
|
summary: "Integrate ACP coding agents via a first-class ACP control plane in core and plugin-backed runtimes (acpx first)"
|
|
owner: "onutc"
|
|
status: "draft"
|
|
last_updated: "2026-02-25"
|
|
title: "ACP Thread Bound Agents"
|
|
---
|
|
|
|
# ACP Thread Bound Agents
|
|
|
|
## Overview
|
|
|
|
This plan defines how OpenClaw should support ACP coding agents in thread-capable channels (Discord first) with production-level lifecycle and recovery.
|
|
|
|
Related document:
|
|
|
|
- [Unified Runtime Streaming Refactor Plan](/experiments/plans/acp-unified-streaming-refactor)
|
|
|
|
Target user experience:
|
|
|
|
- a user spawns or focuses an ACP session into a thread
|
|
- user messages in that thread route to the bound ACP session
|
|
- agent output streams back to the same thread persona
|
|
- session can be persistent or one shot with explicit cleanup controls
|
|
|
|
## Decision summary
|
|
|
|
Long term recommendation is a hybrid architecture:
|
|
|
|
- OpenClaw core owns ACP control plane concerns
|
|
- session identity and metadata
|
|
- thread binding and routing decisions
|
|
- delivery invariants and duplicate suppression
|
|
- lifecycle cleanup and recovery semantics
|
|
- ACP runtime backend is pluggable
|
|
- first backend is an acpx-backed plugin service
|
|
- runtime does ACP transport, queueing, cancel, reconnect
|
|
|
|
OpenClaw should not reimplement ACP transport internals in core.
|
|
OpenClaw should not rely on a pure plugin-only interception path for routing.
|
|
|
|
## North-star architecture (holy grail)
|
|
|
|
Treat ACP as a first-class control plane in OpenClaw, with pluggable runtime adapters.
|
|
|
|
Non-negotiable invariants:
|
|
|
|
- every ACP thread binding references a valid ACP session record
|
|
- every ACP session has explicit lifecycle state (`creating`, `idle`, `running`, `cancelling`, `closed`, `error`)
|
|
- every ACP run has explicit run state (`queued`, `running`, `completed`, `failed`, `cancelled`)
|
|
- spawn, bind, and initial enqueue are atomic
|
|
- command retries are idempotent (no duplicate runs or duplicate Discord outputs)
|
|
- bound-thread channel output is a projection of ACP run events, never ad-hoc side effects
|
|
|
|
Long-term ownership model:
|
|
|
|
- `AcpSessionManager` is the single ACP writer and orchestrator
|
|
- manager lives in gateway process first; can be moved to a dedicated sidecar later behind the same interface
|
|
- per ACP session key, manager owns one in-memory actor (serialized command execution)
|
|
- adapters (`acpx`, future backends) are transport/runtime implementations only
|
|
|
|
Long-term persistence model:
|
|
|
|
- move ACP control-plane state to a dedicated SQLite store (WAL mode) under OpenClaw state dir
|
|
- keep `SessionEntry.acp` as compatibility projection during migration, not source-of-truth
|
|
- store ACP events append-only to support replay, crash recovery, and deterministic delivery
|
|
|
|
### Delivery strategy (bridge to holy-grail)
|
|
|
|
- short-term bridge
|
|
- keep current thread binding mechanics and existing ACP config surface
|
|
- fix metadata-gap bugs and route ACP turns through a single core ACP branch
|
|
- add idempotency keys and fail-closed routing checks immediately
|
|
- long-term cutover
|
|
- move ACP source-of-truth to control-plane DB + actors
|
|
- make bound-thread delivery purely event-projection based
|
|
- remove legacy fallback behavior that depends on opportunistic session-entry metadata
|
|
|
|
## Why not pure plugin only
|
|
|
|
Current plugin hooks are not sufficient for end to end ACP session routing without core changes.
|
|
|
|
- inbound routing from thread binding resolves to a session key in core dispatch first
|
|
- message hooks are fire-and-forget and cannot short-circuit the main reply path
|
|
- plugin commands are good for control operations but not for replacing core per-turn dispatch flow
|
|
|
|
Result:
|
|
|
|
- ACP runtime can be pluginized
|
|
- ACP routing branch must exist in core
|
|
|
|
## Existing foundation to reuse
|
|
|
|
Already implemented and should remain canonical:
|
|
|
|
- thread binding target supports `subagent` and `acp`
|
|
- inbound thread routing override resolves by binding before normal dispatch
|
|
- outbound thread identity via webhook in reply delivery
|
|
- `/focus` and `/unfocus` flow with ACP target compatibility
|
|
- persistent binding store with restore on startup
|
|
- unbind lifecycle on archive, delete, unfocus, reset, and delete
|
|
|
|
This plan extends that foundation rather than replacing it.
|
|
|
|
## Architecture
|
|
|
|
### Boundary model
|
|
|
|
Core (must be in OpenClaw core):
|
|
|
|
- ACP session-mode dispatch branch in the reply pipeline
|
|
- delivery arbitration to avoid parent plus thread duplication
|
|
- ACP control-plane persistence (with `SessionEntry.acp` compatibility projection during migration)
|
|
- lifecycle unbind and runtime detach semantics tied to session reset/delete
|
|
|
|
Plugin backend (acpx implementation):
|
|
|
|
- ACP runtime worker supervision
|
|
- acpx process invocation and event parsing
|
|
- ACP command handlers (`/acp ...`) and operator UX
|
|
- backend-specific config defaults and diagnostics
|
|
|
|
### Runtime ownership model
|
|
|
|
- one gateway process owns ACP orchestration state
|
|
- ACP execution runs in supervised child processes via acpx backend
|
|
- process strategy is long lived per active ACP session key, not per message
|
|
|
|
This avoids startup cost on every prompt and keeps cancel and reconnect semantics reliable.
|
|
|
|
### Core runtime contract
|
|
|
|
Add a core ACP runtime contract so routing code does not depend on CLI details and can switch backends without changing dispatch logic:
|
|
|
|
```ts
|
|
export type AcpRuntimePromptMode = "prompt" | "steer";
|
|
|
|
export type AcpRuntimeHandle = {
|
|
sessionKey: string;
|
|
backend: string;
|
|
runtimeSessionName: string;
|
|
};
|
|
|
|
export type AcpRuntimeEvent =
|
|
| { type: "text_delta"; stream: "output" | "thought"; text: string }
|
|
| { type: "tool_call"; name: string; argumentsText: string }
|
|
| { type: "done"; usage?: Record<string, number> }
|
|
| { type: "error"; code: string; message: string; retryable?: boolean };
|
|
|
|
export interface AcpRuntime {
|
|
ensureSession(input: {
|
|
sessionKey: string;
|
|
agent: string;
|
|
mode: "persistent" | "oneshot";
|
|
cwd?: string;
|
|
env?: Record<string, string>;
|
|
idempotencyKey: string;
|
|
}): Promise<AcpRuntimeHandle>;
|
|
|
|
submit(input: {
|
|
handle: AcpRuntimeHandle;
|
|
text: string;
|
|
mode: AcpRuntimePromptMode;
|
|
idempotencyKey: string;
|
|
}): Promise<{ runtimeRunId: string }>;
|
|
|
|
stream(input: {
|
|
handle: AcpRuntimeHandle;
|
|
runtimeRunId: string;
|
|
onEvent: (event: AcpRuntimeEvent) => Promise<void> | void;
|
|
signal?: AbortSignal;
|
|
}): Promise<void>;
|
|
|
|
cancel(input: {
|
|
handle: AcpRuntimeHandle;
|
|
runtimeRunId?: string;
|
|
reason?: string;
|
|
idempotencyKey: string;
|
|
}): Promise<void>;
|
|
|
|
close(input: { handle: AcpRuntimeHandle; reason: string; idempotencyKey: string }): Promise<void>;
|
|
|
|
health?(): Promise<{ ok: boolean; details?: string }>;
|
|
}
|
|
```
|
|
|
|
Implementation detail:
|
|
|
|
- first backend: `AcpxRuntime` shipped as a plugin service
|
|
- core resolves runtime via registry and fails with explicit operator error when no ACP runtime backend is available
|
|
|
|
### Control-plane data model and persistence
|
|
|
|
Long-term source-of-truth is a dedicated ACP SQLite database (WAL mode), for transactional updates and crash-safe recovery:
|
|
|
|
- `acp_sessions`
|
|
- `session_key` (pk), `backend`, `agent`, `mode`, `cwd`, `state`, `created_at`, `updated_at`, `last_error`
|
|
- `acp_runs`
|
|
- `run_id` (pk), `session_key` (fk), `state`, `requester_message_id`, `idempotency_key`, `started_at`, `ended_at`, `error_code`, `error_message`
|
|
- `acp_bindings`
|
|
- `binding_key` (pk), `thread_id`, `channel_id`, `account_id`, `session_key` (fk), `expires_at`, `bound_at`
|
|
- `acp_events`
|
|
- `event_id` (pk), `run_id` (fk), `seq`, `kind`, `payload_json`, `created_at`
|
|
- `acp_delivery_checkpoint`
|
|
- `run_id` (pk/fk), `last_event_seq`, `last_discord_message_id`, `updated_at`
|
|
- `acp_idempotency`
|
|
- `scope`, `idempotency_key`, `result_json`, `created_at`, unique `(scope, idempotency_key)`
|
|
|
|
```ts
|
|
export type AcpSessionMeta = {
|
|
backend: string;
|
|
agent: string;
|
|
runtimeSessionName: string;
|
|
mode: "persistent" | "oneshot";
|
|
cwd?: string;
|
|
state: "idle" | "running" | "error";
|
|
lastActivityAt: number;
|
|
lastError?: string;
|
|
};
|
|
```
|
|
|
|
Storage rules:
|
|
|
|
- keep `SessionEntry.acp` as a compatibility projection during migration
|
|
- process ids and sockets stay in memory only
|
|
- durable lifecycle and run status live in ACP DB, not generic session JSON
|
|
- if runtime owner dies, gateway rehydrates from ACP DB and resumes from checkpoints
|
|
|
|
### Routing and delivery
|
|
|
|
Inbound:
|
|
|
|
- keep current thread binding lookup as first routing step
|
|
- if bound target is ACP session, route to ACP runtime branch instead of `getReplyFromConfig`
|
|
- explicit `/acp steer` command uses `mode: "steer"`
|
|
|
|
Outbound:
|
|
|
|
- ACP event stream is normalized to OpenClaw reply chunks
|
|
- delivery target is resolved through existing bound destination path
|
|
- when a bound thread is active for that session turn, parent channel completion is suppressed
|
|
|
|
Streaming policy:
|
|
|
|
- stream partial output with coalescing window
|
|
- configurable min interval and max chunk bytes to stay under Discord rate limits
|
|
- final message always emitted on completion or failure
|
|
|
|
### State machines and transaction boundaries
|
|
|
|
Session state machine:
|
|
|
|
- `creating -> idle -> running -> idle`
|
|
- `running -> cancelling -> idle | error`
|
|
- `idle -> closed`
|
|
- `error -> idle | closed`
|
|
|
|
Run state machine:
|
|
|
|
- `queued -> running -> completed`
|
|
- `running -> failed | cancelled`
|
|
- `queued -> cancelled`
|
|
|
|
Required transaction boundaries:
|
|
|
|
- spawn transaction
|
|
- create ACP session row
|
|
- create/update ACP thread binding row
|
|
- enqueue initial run row
|
|
- close transaction
|
|
- mark session closed
|
|
- delete/expire binding rows
|
|
- write final close event
|
|
- cancel transaction
|
|
- mark target run cancelling/cancelled with idempotency key
|
|
|
|
No partial success is allowed across these boundaries.
|
|
|
|
### Per-session actor model
|
|
|
|
`AcpSessionManager` runs one actor per ACP session key:
|
|
|
|
- actor mailbox serializes `submit`, `cancel`, `close`, and `stream` side effects
|
|
- actor owns runtime handle hydration and runtime adapter process lifecycle for that session
|
|
- actor writes run events in-order (`seq`) before any Discord delivery
|
|
- actor updates delivery checkpoints after successful outbound send
|
|
|
|
This removes cross-turn races and prevents duplicate or out-of-order thread output.
|
|
|
|
### Idempotency and delivery projection
|
|
|
|
All external ACP actions must carry idempotency keys:
|
|
|
|
- spawn idempotency key
|
|
- prompt/steer idempotency key
|
|
- cancel idempotency key
|
|
- close idempotency key
|
|
|
|
Delivery rules:
|
|
|
|
- Discord messages are derived from `acp_events` plus `acp_delivery_checkpoint`
|
|
- retries resume from checkpoint without re-sending already delivered chunks
|
|
- final reply emission is exactly-once per run from projection logic
|
|
|
|
### Recovery and self-healing
|
|
|
|
On gateway start:
|
|
|
|
- load non-terminal ACP sessions (`creating`, `idle`, `running`, `cancelling`, `error`)
|
|
- recreate actors lazily on first inbound event or eagerly under configured cap
|
|
- reconcile any `running` runs missing heartbeats and mark `failed` or recover via adapter
|
|
|
|
On inbound Discord thread message:
|
|
|
|
- if binding exists but ACP session is missing, fail closed with explicit stale-binding message
|
|
- optionally auto-unbind stale binding after operator-safe validation
|
|
- never silently route stale ACP bindings to normal LLM path
|
|
|
|
### Lifecycle and safety
|
|
|
|
Supported operations:
|
|
|
|
- cancel current run: `/acp cancel`
|
|
- unbind thread: `/unfocus`
|
|
- close ACP session: `/acp close`
|
|
- auto close idle sessions by effective TTL
|
|
|
|
TTL policy:
|
|
|
|
- effective TTL is minimum of
|
|
- global/session TTL
|
|
- Discord thread binding TTL
|
|
- ACP runtime owner TTL
|
|
|
|
Safety controls:
|
|
|
|
- allowlist ACP agents by name
|
|
- restrict workspace roots for ACP sessions
|
|
- env allowlist passthrough
|
|
- max concurrent ACP sessions per account and globally
|
|
- bounded restart backoff for runtime crashes
|
|
|
|
## Config surface
|
|
|
|
Core keys:
|
|
|
|
- `acp.enabled`
|
|
- `acp.dispatch.enabled` (independent ACP routing kill switch)
|
|
- `acp.backend` (default `acpx`)
|
|
- `acp.defaultAgent`
|
|
- `acp.allowedAgents[]`
|
|
- `acp.maxConcurrentSessions`
|
|
- `acp.stream.coalesceIdleMs`
|
|
- `acp.stream.maxChunkChars`
|
|
- `acp.runtime.ttlMinutes`
|
|
- `acp.controlPlane.store` (`sqlite` default)
|
|
- `acp.controlPlane.storePath`
|
|
- `acp.controlPlane.recovery.eagerActors`
|
|
- `acp.controlPlane.recovery.reconcileRunningAfterMs`
|
|
- `acp.controlPlane.checkpoint.flushEveryEvents`
|
|
- `acp.controlPlane.checkpoint.flushEveryMs`
|
|
- `acp.idempotency.ttlHours`
|
|
- `channels.discord.threadBindings.spawnAcpSessions`
|
|
|
|
Plugin/backend keys (acpx plugin section):
|
|
|
|
- backend command/path overrides
|
|
- backend env allowlist
|
|
- backend per-agent presets
|
|
- backend startup/stop timeouts
|
|
- backend max inflight runs per session
|
|
|
|
## Implementation specification
|
|
|
|
### Control-plane modules (new)
|
|
|
|
Add dedicated ACP control-plane modules in core:
|
|
|
|
- `src/acp/control-plane/manager.ts`
|
|
- owns ACP actors, lifecycle transitions, command serialization
|
|
- `src/acp/control-plane/store.ts`
|
|
- SQLite schema management, transactions, query helpers
|
|
- `src/acp/control-plane/events.ts`
|
|
- typed ACP event definitions and serialization
|
|
- `src/acp/control-plane/checkpoint.ts`
|
|
- durable delivery checkpoints and replay cursors
|
|
- `src/acp/control-plane/idempotency.ts`
|
|
- idempotency key reservation and response replay
|
|
- `src/acp/control-plane/recovery.ts`
|
|
- boot-time reconciliation and actor rehydrate plan
|
|
|
|
Compatibility bridge modules:
|
|
|
|
- `src/acp/runtime/session-meta.ts`
|
|
- remains temporarily for projection into `SessionEntry.acp`
|
|
- must stop being source-of-truth after migration cutover
|
|
|
|
### Required invariants (must enforce in code)
|
|
|
|
- ACP session creation and thread bind are atomic (single transaction)
|
|
- there is at most one active run per ACP session actor at a time
|
|
- event `seq` is strictly increasing per run
|
|
- delivery checkpoint never advances past last committed event
|
|
- idempotency replay returns previous success payload for duplicate command keys
|
|
- stale/missing ACP metadata cannot route into normal non-ACP reply path
|
|
|
|
### Core touchpoints
|
|
|
|
Core files to change:
|
|
|
|
- `src/auto-reply/reply/dispatch-from-config.ts`
|
|
- ACP branch calls `AcpSessionManager.submit` and event-projection delivery
|
|
- remove direct ACP fallback that bypasses control-plane invariants
|
|
- `src/auto-reply/reply/inbound-context.ts` (or nearest normalized context boundary)
|
|
- expose normalized routing keys and idempotency seeds for ACP control plane
|
|
- `src/config/sessions/types.ts`
|
|
- keep `SessionEntry.acp` as projection-only compatibility field
|
|
- `src/gateway/server-methods/sessions.ts`
|
|
- reset/delete/archive must call ACP manager close/unbind transaction path
|
|
- `src/infra/outbound/bound-delivery-router.ts`
|
|
- enforce fail-closed destination behavior for ACP bound session turns
|
|
- `src/discord/monitor/thread-bindings.ts`
|
|
- add ACP stale-binding validation helpers wired to control-plane lookups
|
|
- `src/auto-reply/reply/commands-acp.ts`
|
|
- route spawn/cancel/close/steer through ACP manager APIs
|
|
- `src/agents/acp-spawn.ts`
|
|
- stop ad-hoc metadata writes; call ACP manager spawn transaction
|
|
- `src/plugin-sdk/**` and plugin runtime bridge
|
|
- expose ACP backend registration and health semantics cleanly
|
|
|
|
Core files explicitly not replaced:
|
|
|
|
- `src/discord/monitor/message-handler.preflight.ts`
|
|
- keep thread binding override behavior as the canonical session-key resolver
|
|
|
|
### ACP runtime registry API
|
|
|
|
Add a core registry module:
|
|
|
|
- `src/acp/runtime/registry.ts`
|
|
|
|
Required API:
|
|
|
|
```ts
|
|
export type AcpRuntimeBackend = {
|
|
id: string;
|
|
runtime: AcpRuntime;
|
|
healthy?: () => boolean;
|
|
};
|
|
|
|
export function registerAcpRuntimeBackend(backend: AcpRuntimeBackend): void;
|
|
export function unregisterAcpRuntimeBackend(id: string): void;
|
|
export function getAcpRuntimeBackend(id?: string): AcpRuntimeBackend | null;
|
|
export function requireAcpRuntimeBackend(id?: string): AcpRuntimeBackend;
|
|
```
|
|
|
|
Behavior:
|
|
|
|
- `requireAcpRuntimeBackend` throws a typed ACP backend missing error when unavailable
|
|
- plugin service registers backend on `start` and unregisters on `stop`
|
|
- runtime lookups are read-only and process-local
|
|
|
|
### acpx runtime plugin contract (implementation detail)
|
|
|
|
For the first production backend (`extensions/acpx`), OpenClaw and acpx are
|
|
connected with a strict command contract:
|
|
|
|
- backend id: `acpx`
|
|
- plugin service id: `acpx-runtime`
|
|
- runtime handle encoding: `runtimeSessionName = acpx:v1:<base64url(json)>`
|
|
- encoded payload fields:
|
|
- `name` (acpx named session; uses OpenClaw `sessionKey`)
|
|
- `agent` (acpx agent command)
|
|
- `cwd` (session workspace root)
|
|
- `mode` (`persistent | oneshot`)
|
|
|
|
Command mapping:
|
|
|
|
- ensure session:
|
|
- `acpx --format json --json-strict --cwd <cwd> <agent> sessions ensure --name <name>`
|
|
- prompt turn:
|
|
- `acpx --format json --json-strict --cwd <cwd> <agent> prompt --session <name> --file -`
|
|
- cancel:
|
|
- `acpx --format json --json-strict --cwd <cwd> <agent> cancel --session <name>`
|
|
- close:
|
|
- `acpx --format json --json-strict --cwd <cwd> <agent> sessions close <name>`
|
|
|
|
Streaming:
|
|
|
|
- OpenClaw consumes ndjson events from `acpx --format json --json-strict`
|
|
- `text` => `text_delta/output`
|
|
- `thought` => `text_delta/thought`
|
|
- `tool_call` => `tool_call`
|
|
- `done` => `done`
|
|
- `error` => `error`
|
|
|
|
### Session schema patch
|
|
|
|
Patch `SessionEntry` in `src/config/sessions/types.ts`:
|
|
|
|
```ts
|
|
type SessionAcpMeta = {
|
|
backend: string;
|
|
agent: string;
|
|
runtimeSessionName: string;
|
|
mode: "persistent" | "oneshot";
|
|
cwd?: string;
|
|
state: "idle" | "running" | "error";
|
|
lastActivityAt: number;
|
|
lastError?: string;
|
|
};
|
|
```
|
|
|
|
Persisted field:
|
|
|
|
- `SessionEntry.acp?: SessionAcpMeta`
|
|
|
|
Migration rules:
|
|
|
|
- phase A: dual-write (`acp` projection + ACP SQLite source-of-truth)
|
|
- phase B: read-primary from ACP SQLite, fallback-read from legacy `SessionEntry.acp`
|
|
- phase C: migration command backfills missing ACP rows from valid legacy entries
|
|
- phase D: remove fallback-read and keep projection optional for UX only
|
|
- legacy fields (`cliSessionIds`, `claudeCliSessionId`) remain untouched
|
|
|
|
### Error contract
|
|
|
|
Add stable ACP error codes and user-facing messages:
|
|
|
|
- `ACP_BACKEND_MISSING`
|
|
- message: `ACP runtime backend is not configured. Install and enable the acpx runtime plugin.`
|
|
- `ACP_BACKEND_UNAVAILABLE`
|
|
- message: `ACP runtime backend is currently unavailable. Try again in a moment.`
|
|
- `ACP_SESSION_INIT_FAILED`
|
|
- message: `Could not initialize ACP session runtime.`
|
|
- `ACP_TURN_FAILED`
|
|
- message: `ACP turn failed before completion.`
|
|
|
|
Rules:
|
|
|
|
- return actionable user-safe message in-thread
|
|
- log detailed backend/system error only in runtime logs
|
|
- never silently fall back to normal LLM path when ACP routing was explicitly selected
|
|
|
|
### Duplicate delivery arbitration
|
|
|
|
Single routing rule for ACP bound turns:
|
|
|
|
- if an active thread binding exists for the target ACP session and requester context, deliver only to that bound thread
|
|
- do not also send to parent channel for the same turn
|
|
- if bound destination selection is ambiguous, fail closed with explicit error (no implicit parent fallback)
|
|
- if no active binding exists, use normal session destination behavior
|
|
|
|
### Observability and operational readiness
|
|
|
|
Required metrics:
|
|
|
|
- ACP spawn success/failure count by backend and error code
|
|
- ACP run latency percentiles (queue wait, runtime turn time, delivery projection time)
|
|
- ACP actor restart count and restart reason
|
|
- stale-binding detection count
|
|
- idempotency replay hit rate
|
|
- Discord delivery retry and rate-limit counters
|
|
|
|
Required logs:
|
|
|
|
- structured logs keyed by `sessionKey`, `runId`, `backend`, `threadId`, `idempotencyKey`
|
|
- explicit state transition logs for session and run state machines
|
|
- adapter command logs with redaction-safe arguments and exit summary
|
|
|
|
Required diagnostics:
|
|
|
|
- `/acp sessions` includes state, active run, last error, and binding status
|
|
- `/acp doctor` (or equivalent) validates backend registration, store health, and stale bindings
|
|
|
|
### Config precedence and effective values
|
|
|
|
ACP enablement precedence:
|
|
|
|
- account override: `channels.discord.accounts.<id>.threadBindings.spawnAcpSessions`
|
|
- channel override: `channels.discord.threadBindings.spawnAcpSessions`
|
|
- global ACP gate: `acp.enabled`
|
|
- dispatch gate: `acp.dispatch.enabled`
|
|
- backend availability: registered backend for `acp.backend`
|
|
|
|
Auto-enable behavior:
|
|
|
|
- when ACP is configured (`acp.enabled=true`, `acp.dispatch.enabled=true`, or
|
|
`acp.backend=acpx`), plugin auto-enable marks `plugins.entries.acpx.enabled=true`
|
|
unless denylisted or explicitly disabled
|
|
|
|
TTL effective value:
|
|
|
|
- `min(session ttl, discord thread binding ttl, acp runtime ttl)`
|
|
|
|
### Test map
|
|
|
|
Unit tests:
|
|
|
|
- `src/acp/runtime/registry.test.ts` (new)
|
|
- `src/auto-reply/reply/dispatch-from-config.acp.test.ts` (new)
|
|
- `src/infra/outbound/bound-delivery-router.test.ts` (extend ACP fail-closed cases)
|
|
- `src/config/sessions/types.test.ts` or nearest session-store tests (ACP metadata persistence)
|
|
|
|
Integration tests:
|
|
|
|
- `src/discord/monitor/reply-delivery.test.ts` (bound ACP delivery target behavior)
|
|
- `src/discord/monitor/message-handler.preflight*.test.ts` (bound ACP session-key routing continuity)
|
|
- acpx plugin runtime tests in backend package (service register/start/stop + event normalization)
|
|
|
|
Gateway e2e tests:
|
|
|
|
- `src/gateway/server.sessions.gateway-server-sessions-a.e2e.test.ts` (extend ACP reset/delete lifecycle coverage)
|
|
- ACP thread turn roundtrip e2e for spawn, message, stream, cancel, unfocus, restart recovery
|
|
|
|
### Rollout guard
|
|
|
|
Add independent ACP dispatch kill switch:
|
|
|
|
- `acp.dispatch.enabled` default `false` for first release
|
|
- when disabled:
|
|
- ACP spawn/focus control commands may still bind sessions
|
|
- ACP dispatch path does not activate
|
|
- user receives explicit message that ACP dispatch is disabled by policy
|
|
- after canary validation, default can be flipped to `true` in a later release
|
|
|
|
## Command and UX plan
|
|
|
|
### New commands
|
|
|
|
- `/acp spawn <agent-id> [--mode persistent|oneshot] [--thread auto|here|off]`
|
|
- `/acp cancel [session]`
|
|
- `/acp steer <instruction>`
|
|
- `/acp close [session]`
|
|
- `/acp sessions`
|
|
|
|
### Existing command compatibility
|
|
|
|
- `/focus <sessionKey>` continues to support ACP targets
|
|
- `/unfocus` keeps current semantics
|
|
- `/session idle` and `/session max-age` replace the old TTL override
|
|
|
|
## Phased rollout
|
|
|
|
### Phase 0 ADR and schema freeze
|
|
|
|
- ship ADR for ACP control-plane ownership and adapter boundaries
|
|
- freeze DB schema (`acp_sessions`, `acp_runs`, `acp_bindings`, `acp_events`, `acp_delivery_checkpoint`, `acp_idempotency`)
|
|
- define stable ACP error codes, event contract, and state-transition guards
|
|
|
|
### Phase 1 Control-plane foundation in core
|
|
|
|
- implement `AcpSessionManager` and per-session actor runtime
|
|
- implement ACP SQLite store and transaction helpers
|
|
- implement idempotency store and replay helpers
|
|
- implement event append + delivery checkpoint modules
|
|
- wire spawn/cancel/close APIs to manager with transactional guarantees
|
|
|
|
### Phase 2 Core routing and lifecycle integration
|
|
|
|
- route thread-bound ACP turns from dispatch pipeline into ACP manager
|
|
- enforce fail-closed routing when ACP binding/session invariants fail
|
|
- integrate reset/delete/archive/unfocus lifecycle with ACP close/unbind transactions
|
|
- add stale-binding detection and optional auto-unbind policy
|
|
|
|
### Phase 3 acpx backend adapter/plugin
|
|
|
|
- implement `acpx` adapter against runtime contract (`ensureSession`, `submit`, `stream`, `cancel`, `close`)
|
|
- add backend health checks and startup/teardown registration
|
|
- normalize acpx ndjson events into ACP runtime events
|
|
- enforce backend timeouts, process supervision, and restart/backoff policy
|
|
|
|
### Phase 4 Delivery projection and channel UX (Discord first)
|
|
|
|
- implement event-driven channel projection with checkpoint resume (Discord first)
|
|
- coalesce streaming chunks with rate-limit aware flush policy
|
|
- guarantee exactly-once final completion message per run
|
|
- ship `/acp spawn`, `/acp cancel`, `/acp steer`, `/acp close`, `/acp sessions`
|
|
|
|
### Phase 5 Migration and cutover
|
|
|
|
- introduce dual-write to `SessionEntry.acp` projection plus ACP SQLite source-of-truth
|
|
- add migration utility for legacy ACP metadata rows
|
|
- flip read path to ACP SQLite primary
|
|
- remove legacy fallback routing that depends on missing `SessionEntry.acp`
|
|
|
|
### Phase 6 Hardening, SLOs, and scale limits
|
|
|
|
- enforce concurrency limits (global/account/session), queue policies, and timeout budgets
|
|
- add full telemetry, dashboards, and alert thresholds
|
|
- chaos-test crash recovery and duplicate-delivery suppression
|
|
- publish runbook for backend outage, DB corruption, and stale-binding remediation
|
|
|
|
### Full implementation checklist
|
|
|
|
- core control-plane modules and tests
|
|
- DB migrations and rollback plan
|
|
- ACP manager API integration across dispatch and commands
|
|
- adapter registration interface in plugin runtime bridge
|
|
- acpx adapter implementation and tests
|
|
- thread-capable channel delivery projection logic with checkpoint replay (Discord first)
|
|
- lifecycle hooks for reset/delete/archive/unfocus
|
|
- stale-binding detector and operator-facing diagnostics
|
|
- config validation and precedence tests for all new ACP keys
|
|
- operational docs and troubleshooting runbook
|
|
|
|
## Test plan
|
|
|
|
Unit tests:
|
|
|
|
- ACP DB transaction boundaries (spawn/bind/enqueue atomicity, cancel, close)
|
|
- ACP state-machine transition guards for sessions and runs
|
|
- idempotency reservation/replay semantics across all ACP commands
|
|
- per-session actor serialization and queue ordering
|
|
- acpx event parser and chunk coalescer
|
|
- runtime supervisor restart and backoff policy
|
|
- config precedence and effective TTL calculation
|
|
- core ACP routing branch selection and fail-closed behavior when backend/session is invalid
|
|
|
|
Integration tests:
|
|
|
|
- fake ACP adapter process for deterministic streaming and cancel behavior
|
|
- ACP manager + dispatch integration with transactional persistence
|
|
- thread-bound inbound routing to ACP session key
|
|
- thread-bound outbound delivery suppresses parent channel duplication
|
|
- checkpoint replay recovers after delivery failure and resumes from last event
|
|
- plugin service registration and teardown of ACP runtime backend
|
|
|
|
Gateway e2e tests:
|
|
|
|
- spawn ACP with thread, exchange multi-turn prompts, unfocus
|
|
- gateway restart with persisted ACP DB and bindings, then continue same session
|
|
- concurrent ACP sessions in multiple threads have no cross-talk
|
|
- duplicate command retries (same idempotency key) do not create duplicate runs or replies
|
|
- stale-binding scenario yields explicit error and optional auto-clean behavior
|
|
|
|
## Risks and mitigations
|
|
|
|
- Duplicate deliveries during transition
|
|
- Mitigation: single destination resolver and idempotent event checkpoint
|
|
- Runtime process churn under load
|
|
- Mitigation: long lived per session owners + concurrency caps + backoff
|
|
- Plugin absent or misconfigured
|
|
- Mitigation: explicit operator-facing error and fail-closed ACP routing (no implicit fallback to normal session path)
|
|
- Config confusion between subagent and ACP gates
|
|
- Mitigation: explicit ACP keys and command feedback that includes effective policy source
|
|
- Control-plane store corruption or migration bugs
|
|
- Mitigation: WAL mode, backup/restore hooks, migration smoke tests, and read-only fallback diagnostics
|
|
- Actor deadlocks or mailbox starvation
|
|
- Mitigation: watchdog timers, actor health probes, and bounded mailbox depth with rejection telemetry
|
|
|
|
## Acceptance checklist
|
|
|
|
- ACP session spawn can create or bind a thread in a supported channel adapter (currently Discord)
|
|
- all thread messages route to bound ACP session only
|
|
- ACP outputs appear in the same thread identity with streaming or batches
|
|
- no duplicate output in parent channel for bound turns
|
|
- spawn+bind+initial enqueue are atomic in persistent store
|
|
- ACP command retries are idempotent and do not duplicate runs or outputs
|
|
- cancel, close, unfocus, archive, reset, and delete perform deterministic cleanup
|
|
- crash restart preserves mapping and resumes multi turn continuity
|
|
- concurrent thread bound ACP sessions work independently
|
|
- ACP backend missing state produces clear actionable error
|
|
- stale bindings are detected and surfaced explicitly (with optional safe auto-clean)
|
|
- control-plane metrics and diagnostics are available for operators
|
|
- new unit, integration, and e2e coverage passes
|
|
|
|
## Addendum: targeted refactors for current implementation (status)
|
|
|
|
These are non-blocking follow-ups to keep the ACP path maintainable after the current feature set lands.
|
|
|
|
### 1) Centralize ACP dispatch policy evaluation (completed)
|
|
|
|
- implemented via shared ACP policy helpers in `src/acp/policy.ts`
|
|
- dispatch, ACP command lifecycle handlers, and ACP spawn path now consume shared policy logic
|
|
|
|
### 2) Split ACP command handler by subcommand domain (completed)
|
|
|
|
- `src/auto-reply/reply/commands-acp.ts` is now a thin router
|
|
- subcommand behavior is split into:
|
|
- `src/auto-reply/reply/commands-acp/lifecycle.ts`
|
|
- `src/auto-reply/reply/commands-acp/runtime-options.ts`
|
|
- `src/auto-reply/reply/commands-acp/diagnostics.ts`
|
|
- shared helpers in `src/auto-reply/reply/commands-acp/shared.ts`
|
|
|
|
### 3) Split ACP session manager by responsibility (completed)
|
|
|
|
- manager is split into:
|
|
- `src/acp/control-plane/manager.ts` (public facade + singleton)
|
|
- `src/acp/control-plane/manager.core.ts` (manager implementation)
|
|
- `src/acp/control-plane/manager.types.ts` (manager types/deps)
|
|
- `src/acp/control-plane/manager.utils.ts` (normalization + helper functions)
|
|
|
|
### 4) Optional acpx runtime adapter cleanup
|
|
|
|
- `extensions/acpx/src/runtime.ts` can be split into:
|
|
- process execution/supervision
|
|
- ndjson event parsing/normalization
|
|
- runtime API surface (`submit`, `cancel`, `close`, etc.)
|
|
- improves testability and makes backend behavior easier to audit
|