* refactor: extract agent core package Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts. * refactor: extract shared llm runtime Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout. * refactor: remove pi runtime internals Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code. * refactor: tighten agent session runtime Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts. * refactor: remove static model and pi auth paths Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities. * refactor: remove legacy provider compat paths * docs: remove agent parity notes * fix: skip provider wildcard metadata parsing * refactor: share session extension sdk loading * refactor: inline acpx proxy error formatter * refactor: fold edit recovery into edit tool * fix: accept extension batch separator * test: align startup provider plugin expectations * fix: restore provider-scoped release discovery * test: align static asset packaging expectations * fix: run static provider catalogs during scoped discovery * fix: add provider entry catalogs for scoped live discovery * fix: load lightweight provider catalog entries * fix: refresh provider-scoped plugin metadata * fix: keep provider catalog entries on release live path * fix: keep static manifest models in release live checks * fix: harden release model discovery * fix: reduce OpenAI live cache probe reasoning * fix: disable OpenAI cache probe reasoning * ci: extend OpenAI gateway live timeout * fix: extend live gateway model budget * fix: stabilize release validation regressions * fix: honor provider aliases in model rows * fix: stabilize release validation lanes * fix: stabilize release memory qa * ci: stabilize release validation lanes * ci: prefer ipv4 for live docker node calls * fix: restore shared tool-call stream wrapper * ci: remove legacy pi test shard alias * fix: clean up embedded agent test drift * fix: stabilize runtime alias status * fix: clean up embedded agent ci drift * fix: restore release ci invariants * fix: clean up post-rebase runtime drift * fix: restore release ci checks * fix: restore release ci after rebase * fix: remove stale pi runtime path * test: align compaction runtime expectations * test: update plugin prerelease expectations * fix: handle claude live tool approvals * fix: stabilize release validation gates * fix: finish agent runtime import * test: finish post-rebase agent runtime mocks * fix: keep codex compaction native * fix: stabilize codex app-server hook tests * test: isolate codex diagnostic active run * test: remove codex diagnostic completion race # Conflicts: # extensions/codex/src/app-server/run-attempt.test.ts * ci: fix full release manifest performance run id * refactor: narrow llm plugin sdk boundary * chore: drop generated google boundary stamps * fix: repair rebase fallout * fix: clean up rebased runtime references * fix: decode codex jwt payloads as base64url * fix: preserve shipped pi runtime alias * fix: add scoped sdk virtual modules * fix: decode llm codex oauth jwt as base64url * fix: avoid stale vertex adc negative cache * fix: harden tool arg decoding and codeql path * fix: keep vertex adc negative checks live * refactor: consolidate codex jwt and edit helpers * fix: await codex oauth node runtime imports * fix: preserve sdk tool and notice contracts * fix: preserve shipped compat config boundaries * fix: align codex oauth callback host * fix: terminate agent-core loop streams on failure * fix: keep codex oauth callback alive during fallback * ci: include session tools in critical codeql scans * fix: keep Cloudflare Anthropic provider auth header * docs: redirect legacy pi runtime pages * fix: honor bundled web provider compat discovery * fix: protect session output spill files * fix: keep legacy agent dir env blocked * fix: contain auto-discovered skill symlinks * fix: harden agent core sdk proxy surfaces * fix: restore approval reaction sdk compat * fix: keep live docker runs bounded * fix: keep codex oauth redirect host aligned * fix: resolve post-rebase agent runtime drift * fix: redact anthropic oauth parse failures * fix: preserve responses strict tool shaping * fix: repair agent runtime rebase cleanup * docs: redirect retired parity pages * fix: bound auto-discovered resources to roots * fix: repair post-rebase agent test drift * fix: preserve bundled provider allowlist migration * fix: preserve manifest-owned provider aliases * fix: declare photon image dependency * fix: keep provider headers out of proxy body * fix: preserve shipped env aliases * fix: refresh control ui i18n generated state * fix: quote read fallback paths * fix: preview edits through configured backend * test: satisfy core test typecheck * fix: preserve ZAI usage auth fallback * test: repair codex diagnostic test * fix: repair agent runtime rebase drift * test: finish embedded runner import rename * fix: repair agent runtime rebase integrations * test: align compaction oauth fallback expectations * fix: allow sdk-auth session models * fix: update doctor tool schema import * fix: preserve bedrock plugin region * fix: stream harmony-like prose immediately * ci: include session runtime in codeql shards * fix: repair latest rebase integrations * fix: honor explicit codex websocket transport * fix: keep openai-compatible credentials provider-scoped * fix: refresh sdk api baseline after rebase * fix: route cli runtime aliases through openclaw harness * test: rename stale harness mock expectation * test: rename embedded agent overflow calls * test: clean embedded auth test wording * test: use openclaw stream types in deepinfra cache test * fix: refresh sdk api baseline on latest main * fix: honor bundled discovery compat allowlists * fix: refresh sdk api baseline after latest rebase * fix: remove stale rebase imports * test: rename stale model catalog mock * test: mock renamed doctor runtime modules * fix: map canonical kimi env auth * fix: use internal model registry in bench script * fix: migrate deepinfra provider catalog entry * fix: enforce builtin tool suppression * fix: route compaction auth and proxy payloads safely * refactor: prune unused llm registry leftovers * test: update codex hooks session import * test: fix model picker ci coverage * test: align model picker auth mock types
23 KiB
summary, read_when, title
| summary | read_when | title | |||
|---|---|---|---|---|---|
| Deep dive: session store + transcripts, lifecycle, and (auto)compaction internals |
|
Session management deep dive |
OpenClaw manages sessions end-to-end across these areas:
- Session routing (how inbound messages map to a
sessionKey) - Session store (
sessions.json) and what it tracks - Transcript persistence (
*.jsonl) and its structure - Transcript hygiene (provider-specific fixups before runs)
- Context limits (context window vs tracked tokens)
- Compaction (manual and auto-compaction) and where to hook pre-compaction work
- Silent housekeeping (memory writes that should not produce user-visible output)
If you want a higher-level overview first, start with:
Source of truth: the Gateway
OpenClaw is designed around a single Gateway process that owns session state.
- UIs (macOS app, web Control UI, TUI) should query the Gateway for session lists and token counts.
- In remote mode, session files are on the remote host; "checking your local Mac files" won't reflect what the Gateway is using.
Two persistence layers
OpenClaw persists sessions in two layers:
-
Session store (
sessions.json)- Key/value map:
sessionKey -> SessionEntry - Small, mutable, safe to edit (or delete entries)
- Tracks session metadata (current session id, last activity, toggles, token counters, etc.)
- Key/value map:
-
Transcript (
<sessionId>.jsonl)- Append-only transcript with tree structure (entries have
id+parentId) - Stores the actual conversation + tool calls + compaction summaries
- Used to rebuild the model context for future turns
- Compaction checkpoints are metadata over the compacted successor
transcript. New compactions do not write a second
.checkpoint.*.jsonlcopy.
- Append-only transcript with tree structure (entries have
Gateway history readers should avoid materializing the whole transcript unless
the surface explicitly needs arbitrary historical access. First-page history,
embedded chat history, restart recovery, and token/usage checks use bounded tail
reads. Full transcript scans go through the async transcript index, which is
cached by file path plus mtimeMs/size and shared across concurrent readers.
On-disk locations
Per agent, on the Gateway host:
- Store:
~/.openclaw/agents/<agentId>/sessions/sessions.json - Transcripts:
~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl- Telegram topic sessions:
.../<sessionId>-topic-<threadId>.jsonl
- Telegram topic sessions:
OpenClaw resolves these via src/config/sessions.ts.
Store maintenance and disk controls
Session persistence has automatic maintenance controls (session.maintenance) for sessions.json, transcript artifacts, and trajectory sidecars:
mode:warn(default) orenforcepruneAfter: stale-entry age cutoff (default30d)maxEntries: cap entries insessions.json(default500)resetArchiveRetention: retention for*.reset.<timestamp>transcript archives (default: same aspruneAfter;falsedisables cleanup)maxDiskBytes: optional sessions-directory budgethighWaterBytes: optional target after cleanup (default80%ofmaxDiskBytes)
Normal Gateway writes flow through a per-store session writer that serializes in-process mutations without taking a runtime file lock. Hot-path patch helpers borrow the validated mutable cache while they hold that writer slot, so large sessions.json files are not cloned or reread for every metadata update. Runtime code should prefer updateSessionStore(...) or updateSessionStoreEntry(...); direct whole-store saves are compatibility and offline-maintenance tools. When a Gateway is reachable, non-dry-run openclaw sessions cleanup and openclaw agents delete delegate store mutations to the Gateway so cleanup joins the same writer queue; --store <path> is the explicit offline repair path for direct file maintenance. maxEntries cleanup is still batched for production-sized caps, so a store may briefly exceed the configured cap before the next high-water cleanup rewrites it back down. Session store reads do not prune or cap entries during Gateway startup; use writes or openclaw sessions cleanup --enforce for cleanup. openclaw sessions cleanup --enforce still applies the configured cap immediately and prunes old unreferenced transcript, checkpoint, and trajectory artifacts even when no disk budget is configured.
Maintenance keeps durable external conversation pointers such as group sessions and thread-scoped chat sessions, but synthetic runtime entries for cron, hooks, heartbeat, ACP, and sub-agents can still be removed when they exceed the configured age, count, or disk budget.
OpenClaw no longer creates automatic sessions.json.bak.* rotation backups during Gateway writes. The legacy session.maintenance.rotateBytes key is ignored and openclaw doctor --fix removes it from older configs.
Transcript mutations use a session write lock on the transcript file. Lock acquisition waits up to
session.writeLock.acquireTimeoutMs before surfacing a busy-session error; the default is 60000
ms. Raise this only when legitimate prep, cleanup, compaction, or transcript mirror work contends
longer on slow machines. session.writeLock.staleMs controls when an existing lock can be
reclaimed as stale; the default is 1800000 ms. session.writeLock.maxHoldMs controls the
in-process watchdog release threshold; the default is 300000 ms. Emergency env overrides are
OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS, OPENCLAW_SESSION_WRITE_LOCK_STALE_MS, and
OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS.
Enforcement order for disk budget cleanup (mode: "enforce"):
- Remove oldest archived, orphan transcript, or orphan trajectory artifacts first.
- If still above the target, evict oldest session entries and their transcript/trajectory files.
- Keep going until usage is at or below
highWaterBytes.
In mode: "warn", OpenClaw reports potential evictions but does not mutate the store/files.
Run maintenance on demand:
openclaw sessions cleanup --dry-run
openclaw sessions cleanup --enforce
Cron sessions and run logs
Isolated cron runs also create session entries/transcripts, and they have dedicated retention controls:
cron.sessionRetention(default24h) prunes old isolated cron run sessions from the session store (falsedisables).cron.runLog.maxBytes+cron.runLog.keepLinesprune~/.openclaw/cron/runs/<jobId>.jsonlfiles (defaults:2_000_000bytes and2000lines).
When cron force-creates a new isolated run session, it sanitizes the previous
cron:<jobId> session entry before writing the new row. It carries safe
preferences such as thinking/fast/verbose settings, labels, and explicit
user-selected model/auth overrides. It drops ambient conversation context such
as channel/group routing, send or queue policy, elevation, origin, and ACP
runtime binding so a fresh isolated run cannot inherit stale delivery or
runtime authority from an older run.
Session keys (sessionKey)
A sessionKey identifies which conversation bucket you're in (routing + isolation).
Common patterns:
- Main/direct chat (per agent):
agent:<agentId>:<mainKey>(defaultmain) - Group:
agent:<agentId>:<channel>:group:<id> - Room/channel (Discord/Slack):
agent:<agentId>:<channel>:channel:<id>or...:room:<id> - Cron:
cron:<job.id> - Webhook:
hook:<uuid>(unless overridden)
The canonical rules are documented at /concepts/session.
Session ids (sessionId)
Each sessionKey points at a current sessionId (the transcript file that continues the conversation).
Rules of thumb:
- Reset (
/new,/reset) creates a newsessionIdfor thatsessionKey. - Daily reset (default 4:00 AM local time on the gateway host) creates a new
sessionIdon the next message after the reset boundary. - Idle expiry (
session.reset.idleMinutesor legacysession.idleMinutes) creates a newsessionIdwhen a message arrives after the idle window. When daily + idle are both configured, whichever expires first wins. - System events (heartbeat, cron wakeups, exec notifications, gateway bookkeeping) may mutate the session row but do not extend daily/idle reset freshness. Reset rollover discards queued system-event notices for the previous session before the fresh prompt is built.
- Parent fork policy uses OpenClaw's active branch when creating a thread or subagent fork. If that branch is too large, OpenClaw starts the child with isolated context instead of failing or inheriting unusable history. The sizing policy is automatic; legacy
session.parentForkMaxTokensconfig is removed byopenclaw doctor --fix.
Implementation detail: the decision happens in initSessionState() in src/auto-reply/reply/session.ts.
Session store schema (sessions.json)
The store's value type is SessionEntry in src/config/sessions.ts.
Key fields (not exhaustive):
sessionId: current transcript id (filename is derived from this unlesssessionFileis set)sessionStartedAt: start timestamp for the currentsessionId; daily reset freshness uses this. Legacy rows may derive it from the JSONL session header.lastInteractionAt: last real user/channel interaction timestamp; idle reset freshness uses this so heartbeat, cron, and exec events do not keep sessions alive. Legacy rows without this field fall back to the recovered session start time for idle freshness.updatedAt: last store-row mutation timestamp, used for listing, pruning, and bookkeeping. It is not the authority for daily/idle reset freshness.sessionFile: optional explicit transcript path overridechatType:direct | group | room(helps UIs and send policy)provider,subject,room,space,displayName: metadata for group/channel labeling- Toggles:
thinkingLevel,verboseLevel,reasoningLevel,elevatedLevelsendPolicy(per-session override)
- Model selection:
providerOverride,modelOverride,authProfileOverride
- Token counters (best-effort / provider-dependent):
inputTokens,outputTokens,totalTokens,contextTokens
compactionCount: how often auto-compaction completed for this session keymemoryFlushAt: timestamp for the last pre-compaction memory flushmemoryFlushCompactionCount: compaction count when the last flush ran
The store is safe to edit, but the Gateway is the authority: it may rewrite or rehydrate entries as sessions run.
Transcript structure (*.jsonl)
Transcripts are managed by openclaw/plugin-sdk/agent-sessions's SessionManager.
The file is JSONL:
- First line: session header (
type: "session", includesid,cwd,timestamp, optionalparentSession) - Then: session entries with
id+parentId(tree)
Notable entry types:
message: user/assistant/toolResult messagescustom_message: extension-injected messages that do enter model context (can be hidden from UI)custom: extension state that does not enter model contextcompaction: persisted compaction summary withfirstKeptEntryIdandtokensBeforebranch_summary: persisted summary when navigating a tree branch
OpenClaw intentionally does not "fix up" transcripts; the Gateway uses SessionManager to read/write them.
Context windows vs tracked tokens
Two different concepts matter:
- Model context window: hard cap per model (tokens visible to the model)
- Session store counters: rolling stats written into
sessions.json(used for /status and dashboards)
If you're tuning limits:
- The context window comes from the model catalog (and can be overridden via config).
contextTokensin the store is a runtime estimate/reporting value; don't treat it as a strict guarantee.
For more, see /token-use.
Compaction: what it is
Compaction summarizes older conversation into a persisted compaction entry in the transcript and keeps recent messages intact.
After compaction, future turns see:
- The compaction summary
- Messages after
firstKeptEntryId
AGENTS.md section reinjection after compaction is opt-in via
agents.defaults.compaction.postCompactionSections; when unset or [],
OpenClaw does not append AGENTS.md excerpts on top of the compaction summary.
Compaction is persistent (unlike session pruning). See /concepts/session-pruning.
Compaction chunk boundaries and tool pairing
When OpenClaw splits a long transcript into compaction chunks, it keeps
assistant tool calls paired with their matching toolResult entries.
- If the token-share split lands between a tool call and its result, OpenClaw shifts the boundary to the assistant tool-call message instead of separating the pair.
- If a trailing tool-result block would otherwise push the chunk over target, OpenClaw preserves that pending tool block and keeps the unsummarized tail intact.
- Aborted/error tool-call blocks do not hold a pending split open.
When auto-compaction happens (OpenClaw runtime)
In the embedded OpenClaw agent, auto-compaction triggers in two cases:
- Overflow recovery: the model returns a context overflow error
(
request_too_large,context length exceeded,input exceeds the maximum number of tokens,input token count exceeds the maximum number of input tokens,input is too long for the model,ollama error: context length exceeded, and similar provider-shaped variants) → compact → retry. When the provider reports the attempted token count, OpenClaw forwards that observed count into overflow recovery compaction. If the provider confirms overflow but does not expose a parseable count, OpenClaw passes a minimally over-budget synthetic count to compaction engines and diagnostics. If overflow recovery still fails, OpenClaw surfaces explicit guidance to the user and preserves the current session mapping instead of silently rotating the session key to a fresh session id. The next step is operator-controlled: retry the message, run/compact, or run/newwhen a fresh session is preferred. - Threshold maintenance: after a successful turn, when:
contextTokens > contextWindow - reserveTokens
Where:
contextWindowis the model's context windowreserveTokensis headroom reserved for prompts + the next model output
These are OpenClaw runtime semantics.
OpenClaw can also trigger a preflight local compaction before opening the next
run when agents.defaults.compaction.maxActiveTranscriptBytes is set and the
active transcript file reaches that size. This is a file-size guard for local
reopen cost, not raw archival: OpenClaw still runs normal semantic compaction,
and it requires truncateAfterCompaction so the compacted summary can become a
new successor transcript.
For embedded OpenClaw runs, agents.defaults.compaction.midTurnPrecheck.enabled: true
adds an opt-in tool-loop guard. After a tool result is appended and before the
next model call, OpenClaw estimates the prompt pressure using the same preflight
budget logic used at turn start. If the context no longer fits, the guard does
not compact inside OpenClaw runtime's transformContext hook. It raises a structured
mid-turn precheck signal, stops the current prompt submission, and lets the
outer run loop use the existing recovery path: truncate oversized tool results
when that is enough, or trigger the configured compaction mode and retry. The
option is disabled by default and works with both default and safeguard
compaction modes, including provider-backed safeguard compaction.
This is independent of maxActiveTranscriptBytes: the byte-size guard runs
before a turn opens, while mid-turn precheck runs later in the embedded OpenClaw tool
loop after new tool results have been appended.
Compaction settings (reserveTokens, keepRecentTokens)
OpenClaw runtime's compaction settings live in agent settings:
{
compaction: {
enabled: true,
reserveTokens: 16384,
keepRecentTokens: 20000,
},
}
OpenClaw also enforces a safety floor for embedded runs:
- If
compaction.reserveTokens < reserveTokensFloor, OpenClaw bumps it. - Default floor is
20000tokens. - Set
agents.defaults.compaction.reserveTokensFloor: 0to disable the floor. - If it's already higher, OpenClaw leaves it alone.
- Manual
/compacthonors an explicitagents.defaults.compaction.keepRecentTokensand keeps OpenClaw runtime's recent-tail cut point. Without an explicit keep budget, manual compaction remains a hard checkpoint and rebuilt context starts from the new summary. - Set
agents.defaults.compaction.midTurnPrecheck.enabled: trueto run the optional tool-loop precheck after new tool results and before the next model call. This is a trigger only; summary generation still uses the configured compaction path. It is independent ofmaxActiveTranscriptBytes, which is a turn-start active-transcript byte-size guard. - Set
agents.defaults.compaction.maxActiveTranscriptBytesto a byte value or string such as"20mb"to run local compaction before a turn when the active transcript gets large. This guard is active only whentruncateAfterCompactionis also enabled. Leave it unset or set0to disable. - When
agents.defaults.compaction.truncateAfterCompactionis enabled, OpenClaw rotates the active transcript to a compacted successor JSONL after compaction. Branch/restore checkpoint actions use that compacted successor; legacy pre-compaction checkpoint files remain readable while referenced.
Why: leave enough headroom for multi-turn "housekeeping" (like memory writes) before compaction becomes unavoidable.
Implementation: ensureAgentCompactionReserveTokens() in src/agents/agent-settings.ts
(called from src/agents/embedded-agent-runner.ts).
Pluggable compaction providers
Plugins can register a compaction provider via registerCompactionProvider() on the plugin API. When agents.defaults.compaction.provider is set to a registered provider id, the safeguard extension delegates summarization to that provider instead of the built-in summarizeInStages pipeline.
provider: id of a registered compaction provider plugin. Leave unset for default LLM summarization.- Setting a
providerforcesmode: "safeguard". - Providers receive the same compaction instructions and identifier-preservation policy as the built-in path.
- The safeguard still preserves recent-turn and split-turn suffix context after provider output.
- Built-in safeguard summarization re-distills prior summaries with new messages instead of preserving the full previous summary verbatim.
- Safeguard mode enables summary quality audits by default; set
qualityGuard.enabled: falseto skip retry-on-malformed-output behavior. - If the provider fails or returns an empty result, OpenClaw falls back to built-in LLM summarization automatically.
- Abort/timeout signals are re-thrown (not swallowed) to respect caller cancellation.
Source: src/plugins/compaction-provider.ts, src/agents/agent-hooks/compaction-safeguard.ts.
User-visible surfaces
You can observe compaction and session state via:
/status(in any chat session)openclaw status(CLI)openclaw sessions/sessions --json- Gateway logs (
pnpm gateway:watchoropenclaw logs --follow):embedded run auto-compaction start+complete - Verbose mode:
🧹 Auto-compaction complete+ compaction count
Silent housekeeping (NO_REPLY)
OpenClaw supports "silent" turns for background tasks where the user should not see intermediate output.
Convention:
- The assistant starts its output with the exact silent token
NO_REPLY/no_replyto indicate "do not deliver a reply to the user". - OpenClaw strips/suppresses this in the delivery layer.
- Exact silent-token suppression is case-insensitive, so
NO_REPLYandno_replyboth count when the whole payload is just the silent token. - This is for true background/no-delivery turns only; it is not a shortcut for ordinary actionable user requests.
As of 2026.1.10, OpenClaw also suppresses draft/typing streaming when a
partial chunk begins with NO_REPLY, so silent operations don't leak partial
output mid-turn.
Pre-compaction "memory flush" (implemented)
Goal: before auto-compaction happens, run a silent agentic turn that writes durable
state to disk (e.g. memory/YYYY-MM-DD.md in the agent workspace) so compaction can't
erase critical context.
OpenClaw uses the pre-threshold flush approach:
- Monitor session context usage.
- When it crosses a "soft threshold" (below OpenClaw runtime's compaction threshold), run a silent "write memory now" directive to the agent.
- Use the exact silent token
NO_REPLY/no_replyso the user sees nothing.
Config (agents.defaults.compaction.memoryFlush):
enabled(default:true)model(optional exact provider/model override for the flush turn, for exampleollama/qwen3:8b)softThresholdTokens(default:4000)prompt(user message for the flush turn)systemPrompt(extra system prompt appended for the flush turn)
Notes:
- The default prompt/system prompt include a
NO_REPLYhint to suppress delivery. - When
modelis set, the flush turn uses that model without inheriting the active session fallback chain, so local-only housekeeping does not silently fall back to a paid conversation model. - The flush runs once per compaction cycle (tracked in
sessions.json). - The flush runs only for embedded OpenClaw sessions (CLI backends skip it).
- The flush is skipped when the session workspace is read-only (
workspaceAccess: "ro"or"none"). - See Memory for the workspace file layout and write patterns.
OpenClaw also exposes a session_before_compact hook in the extension API, but OpenClaw's
flush logic lives on the Gateway side today.
Troubleshooting checklist
- Session key wrong? Start with /concepts/session and confirm the
sessionKeyin/status. - Store vs transcript mismatch? Confirm the Gateway host and the store path from
openclaw status. - Compaction spam? Check:
- model context window (too small)
- compaction settings (
reserveTokenstoo high for the model window can cause earlier compaction) - tool-result bloat: enable/tune session pruning
- Silent turns leaking? Confirm the reply starts with
NO_REPLY(case-insensitive exact token) and you're on a build that includes the streaming suppression fix.