openclaw/docs/plan/codex-context-engine-harness.md at 3de5979bdc8e4e9e9d3fee446eaab53cad2ff605

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-20 05:34:45 +00:00

Files

Peter Steinberger f91de52f0d refactor: move runtime state to SQLite

* refactor: remove stale file-backed shims

* fix: harden sqlite state ci boundaries

* refactor: store matrix idb snapshots in sqlite

* fix: satisfy rebased CI guardrails

* refactor: store current conversation bindings in sqlite table

* refactor: store tui last sessions in sqlite table

* refactor: reset sqlite schema history

* refactor: drop unshipped sqlite table migration

* refactor: remove plugin index file rollback

* refactor: drop unshipped sqlite sidecar migrations

* refactor: remove runtime commitments kv migration

* refactor: preserve kysely sync result types

* refactor: drop unshipped sqlite schema migration table

* test: keep session usage coverage sqlite-backed

* refactor: keep sqlite migration doctor-only

* refactor: isolate device legacy imports

* refactor: isolate push voicewake legacy imports

* refactor: isolate remaining runtime legacy imports

* refactor: tighten sqlite migration guardrails

* test: cover sqlite persisted enum parsing

* refactor: isolate legacy update and tui imports

* refactor: tighten sqlite state ownership

* refactor: move legacy imports behind doctor

* refactor: remove legacy session row lookup

* refactor: canonicalize memory transcript locators

* refactor: drop transcript path scope fallbacks

* refactor: drop runtime legacy session delivery pruning

* refactor: store tts prefs only in sqlite

* refactor: remove cron store path runtime

* refactor: use cron sqlite store keys

* refactor: rename telegram message cache scope

* refactor: read memory dreaming status from sqlite

* refactor: rename cron status store key

* refactor: stop remembering transcript file paths

* test: use sqlite locators in agent fixtures

* refactor: remove file-shaped commitments and cron store surfaces

* refactor: keep compaction transcript handles out of session rows

* refactor: derive transcript handles from session identity

* refactor: derive runtime transcript handles

* refactor: remove gateway session locator reads

* refactor: remove transcript locator from session rows

* refactor: store raw stream diagnostics in sqlite

* refactor: remove file-shaped transcript rotation

* refactor: hide legacy trajectory paths from runtime

* refactor: remove runtime transcript file bridges

* refactor: repair database-first rebase fallout

* refactor: align tests with database-first state

* refactor: remove transcript file handoffs

* refactor: sync post-compaction memory by transcript scope

* refactor: run codex app-server sessions by id

* refactor: bind codex runtime state by session id

* refactor: pass memory transcripts by sqlite scope

* refactor: remove transcript locator cleanup leftovers

* test: remove stale transcript file fixtures

* refactor: remove transcript locator test helper

* test: make cron sqlite keys explicit

* test: remove cron runtime store paths

* test: remove stale session file fixtures

* test: use sqlite cron keys in diagnostics

* refactor: remove runtime delivery queue backfill

* test: drop fake export session file mocks

* refactor: rename acp session read failure flag

* refactor: rename acp row session key

* refactor: remove session store test seams

* refactor: move legacy session parser tests to doctor

* refactor: reindex managed memory in place

* refactor: drop stale session store wording

* refactor: rename session row helpers

* refactor: rename sqlite session entry modules

* refactor: remove transcript locator leftovers

* refactor: trim file-era audit wording

* refactor: clean managed media through sqlite

* fix: prefer explicit agent for exports

* fix: use prepared agent for session resets

* fix: canonicalize legacy codex binding import

* test: rename state cleanup helper

* docs: align backup docs with sqlite state

* refactor: drop legacy Pi usage auth fallback

* refactor: move legacy auth profile imports to doctor

* refactor: keep Pi model discovery auth in memory

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

* refactor: remove model json compatibility aliases

* refactor: store auth profiles in sqlite

* refactor: seed copied auth profiles in sqlite

* refactor: make auth profile runtime sqlite-addressed

* refactor: migrate hermes secrets into sqlite auth store

* refactor: move plugin install config migration to doctor

* refactor: rename plugin index audit checks

* test: drop auth file assumptions

* test: remove legacy transcript file assertions

* refactor: drop legacy cli session aliases

* refactor: store skill uploads in sqlite

* refactor: keep subagent attachments in sqlite vfs

* refactor: drop subagent attachment cleanup state

* refactor: move legacy session aliases to doctor

* refactor: require node 24 for sqlite state runtime

* refactor: move provider caches into sqlite state

* fix: harden virtual agent filesystem

* refactor: enforce database-first runtime state

* refactor: rename compaction transcript rotation setting

* test: clean sqlite refactor test types

* refactor: consolidate sqlite runtime state

* refactor: model session conversations in sqlite

* refactor: stop deriving cron delivery from session keys

* refactor: stop classifying sessions from key shape

* refactor: hydrate announce targets from typed delivery

* refactor: route heartbeat delivery from typed sqlite context

* refactor: tighten typed sqlite session routing

* refactor: remove session origin routing shadow

* refactor: drop session origin shadow fixtures

* perf: query sqlite vfs paths by prefix

* refactor: use typed conversation metadata for sessions

* refactor: prefer typed session routing metadata

* refactor: require typed session routing metadata

* refactor: resolve group tool policy from typed sessions

* refactor: delete dead session thread info bridge

* Show Codex subscription reset times in channel errors (#80456)

* feat(plugin-sdk): consolidate session workflow APIs

* fix(agents): allow read-only agent mount reads

* [codex] refresh plugin regression fixtures

* fix(agents): restore compaction gateway logs

* test: tighten gateway startup assertions

* Redact persisted secret-shaped payloads [AI] (#79006)

* test: tighten device pair notify assertions

* test: tighten hermes secret assertions

* test: assert matrix client error shapes

* test: assert config compat warnings

* fix(heartbeat): remap cron-run exec events to session keys (#80214)

* fix(codex): route btw through native side threads

* fix(auth): accept friendly OpenAI order for Codex profiles

* fix(codex): rotate auth profiles inside harness

* fix: keep browser status page probe within timeout

* test: assert agents add outputs

* test: pin cron read status

* fix(agents): avoid Pi resource discovery stalls

Co-authored-by: dataCenter430 <titan032000@gmail.com>

* fix: retire timed-out codex app-server clients

* test: tighten qa lab runtime assertions

* test: check security fix outputs

* test: verify extension runtime messages

* feat(wake): expose typed sessionKey on wake protocol + system event CLI

* fix(gateway): await session_end during shutdown drain and track channel + compaction lifecycle paths (#57790)

* test: guard talk consult call helper

* fix(codex): scale context engine projection (#80761)

* fix(codex): scale context engine projection

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* chore: align Codex projection changelog

* chore: realign Codex projection changelog

* fix: isolate Codex projection patch

---------

Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>

* refactor: move agent runtime state toward piless

* refactor: remove cron session reaper

* refactor: move session management to sqlite

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: remove stale file-backed shims

* test: harden kysely type coverage

# Conflicts:
#	.agents/skills/kysely-database-access/SKILL.md
#	src/infra/kysely-sync.types.test.ts
#	src/proxy-capture/store.sqlite.test.ts
#	src/state/openclaw-agent-db.test.ts
#	src/state/openclaw-state-db.test.ts

* refactor: remove cron store path runtime

* refactor: keep compaction transcript handles out of session rows

* refactor: derive embedded transcripts from sqlite identity

* refactor: remove embedded transcript locator handoff

* refactor: remove runtime transcript file bridges

* refactor: remove transcript file handoffs

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

# Conflicts:
#	docs/cli/secrets.md
#	docs/gateway/authentication.md
#	docs/gateway/secrets.md

* fix: keep oauth sibling sync sqlite-local

# Conflicts:
#	src/commands/onboard-auth.test.ts

* refactor: remove task session store maintenance

# Conflicts:
#	src/commands/tasks.ts

* refactor: keep diagnostics in state sqlite

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* Show Codex subscription reset times in channel errors (#80456)

* fix(codex): refresh subscription limit resets

* fix(codex): format reset times for channels

* Update CHANGELOG with latest changes and fixes

Updated CHANGELOG with recent fixes and improvements.

* fix(codex): keep command load failures on codex surface

* fix(codex): format account rate limits as rows

* fix(codex): summarize account limits as usage status

* fix(codex): simplify account limit status

* test: tighten subagent announce queue assertion

* test: tighten session delete lifecycle assertions

* test: tighten cron ops assertions

* fix: track cron execution milestones

* test: tighten hermes secret assertions

* test: assert matrix sync store payloads

* test: assert config compat warnings

* fix(codex): align btw side thread semantics

* fix(codex): honor codex fallback blocking

* fix(agents): avoid Pi resource discovery stalls

* test: tighten codex event assertions

* test: tighten cron assertions

* Fix Codex app-server OAuth harness auth

* refactor: move agent runtime state toward piless

* refactor: move device and push state to sqlite

* refactor: move runtime json state imports to doctor

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: clarify cron sqlite store keys

* refactor: remove stale file-backed shims

* refactor: bind codex runtime state by session id

* test: expect sqlite trajectory branch export

* refactor: rename session row helpers

* fix: keep legacy device identity import in doctor

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* build: align pi contract wrappers

* chore: repair database-first rebase

* refactor: remove session file test contracts

* test: update gateway session expectations

* refactor: stop routing from session compatibility shadows

* refactor: stop persisting session route shadows

* refactor: use typed delivery context in clients

* refactor: stop echoing session route shadows

* refactor: repair embedded runner rebase imports

# Conflicts:
#	src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts

* refactor: align pi contract imports

* refactor: satisfy kysely sync helper guard

* refactor: remove file transcript bridge remnants

* refactor: remove session locator compatibility

* refactor: remove session file test contracts

* refactor: keep rebase database-first clean

* refactor: remove session file assumptions from e2e

* docs: clarify database-first goal state

* test: remove legacy store markers from sqlite runtime tests

* refactor: remove legacy store assumptions from runtime seams

* refactor: align sqlite runtime helper seams

* test: update memory recall sqlite audit mock

* refactor: align database-first runtime type seams

* test: clarify doctor cron legacy store names

* fix: preserve sqlite session route projections

* test: fix copilot token cache test syntax

* docs: update database-first proof status

* test: align database-first test fixtures

* docs: update database-first proof status

* refactor: clean extension database-first drift

* test: align agent session route proof

* test: clarify doctor legacy path fixtures

* chore: clean database-first changed checks

* chore: repair database-first rebase markers

* build: allow baileys git subdependency

* chore: repair exp-vfs rebase drift

* chore: finish exp-vfs rebase cleanup

* chore: satisfy rebase lint drift

* chore: fix qqbot rebase type seam

* chore: fix rebase drift leftovers

* fix: keep auth profile oauth secrets out of sqlite

* fix: repair rebase drift tests

* test: stabilize pairing request ordering

* test: use source manifests in plugin contract checks

* fix: restore gateway session metadata after rebase

* fix: repair database-first rebase drift

* fix: clean up database-first rebase fallout

* test: stabilize line quick reply receipt time

* fix: repair extension rebase drift

* test: keep transcript redaction tests sqlite-backed

* fix: carry injected transcript redaction through sqlite

* chore: clean database branch rebase residue

* fix: repair database branch CI drift

* fix: repair database branch CI guard drift

* fix: stabilize oauth tls preflight test

* test: align database branch fast guards

* test: repair build artifact boundary guards

* chore: clean changelog rebase markers

---------

Co-authored-by: pashpashpash <nik@vault77.ai>
Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: stainlu <stainlu@newtype-ai.org>
Co-authored-by: Jason Zhou <jason.zhou.design@gmail.com>
Co-authored-by: Ruben Cuevas <hi@rubencu.com>
Co-authored-by: Pavan Kumar Gondhi <pavangondhi@gmail.com>
Co-authored-by: Shakker <shakkerdroid@gmail.com>
Co-authored-by: Kaspre <36520309+Kaspre@users.noreply.github.com>
Co-authored-by: dataCenter430 <titan032000@gmail.com>
Co-authored-by: Kaspre <kaspre@gmail.com>
Co-authored-by: pandadev66 <nova.full.stack@outlook.com>
Co-authored-by: Eva <admin@100yen.org>
Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Co-authored-by: jeffjhunter <support@aipersonamethod.com>

2026-05-13 13:15:12 +01:00

22 KiB

Raw Blame History

title, summary, read_when

title

summary

read_when

Codex Harness Context Engine Port

Specification for making the bundled Codex app-server harness honor OpenClaw context-engine plugins

You are wiring context-engine lifecycle behavior into the Codex harness

You need lossless-claw or another context-engine plugin to work with codex/* embedded harness sessions

You are comparing embedded PI and Codex app-server context behavior

Status

Draft implementation specification.

Goal

Make the bundled Codex app-server harness honor the same OpenClaw context-engine lifecycle contract that embedded PI turns already honor.

A session using agents.defaults.embeddedHarness.runtime: "codex" or a codex/* model should still let the selected context-engine plugin, such as lossless-claw, control context assembly, post-turn ingest, maintenance, and OpenClaw-level compaction policy as far as the Codex app-server boundary allows.

Non-goals

Do not reimplement Codex app-server internals.
Do not make Codex native thread compaction produce a lossless-claw summary.
Do not require non-Codex models to use the Codex harness.
Do not change ACP/acpx session behavior. This specification is for the non-ACP embedded agent harness path only.
Do not make third-party plugins register Codex app-server extension factories; the existing bundled-plugin trust boundary remains unchanged.

Current architecture

The embedded run loop resolves the configured context engine once per run before selecting a concrete low-level harness:

src/agents/pi-embedded-runner/run.ts
- initializes context-engine plugins
- calls resolveContextEngine(params.config)
- passes contextEngine and contextTokenBudget into runEmbeddedAttemptWithBackend(...)

runEmbeddedAttemptWithBackend(...) delegates to the selected agent harness:

src/agents/pi-embedded-runner/run/backend.ts
src/agents/harness/selection.ts

The Codex app-server harness is registered by the bundled Codex plugin:

extensions/codex/index.ts
extensions/codex/harness.ts

The Codex harness implementation receives the same EmbeddedRunAttemptParams as PI-backed attempts:

extensions/codex/src/app-server/run-attempt.ts

That means the required hook point is in OpenClaw-controlled code. The external boundary is the Codex app-server protocol itself: OpenClaw can control what it sends to thread/start, thread/resume, and turn/start, and can observe notifications, but it cannot change Codex's internal thread store or native compactor.

Current gap

Embedded PI attempts call the context-engine lifecycle directly:

bootstrap/maintenance before the attempt
assemble before the model call
afterTurn or ingest after the attempt
maintenance after a successful turn
context-engine compaction for engines that own compaction

Relevant PI code:

src/agents/pi-embedded-runner/run/attempt.ts
src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
src/agents/pi-embedded-runner/context-engine-maintenance.ts

Codex app-server attempts currently run generic agent-harness hooks and mirror the transcript, but do not call params.contextEngine.bootstrap, params.contextEngine.assemble, params.contextEngine.afterTurn, params.contextEngine.ingestBatch, params.contextEngine.ingest, or params.contextEngine.maintain.

Relevant Codex code:

extensions/codex/src/app-server/run-attempt.ts
extensions/codex/src/app-server/thread-lifecycle.ts
extensions/codex/src/app-server/event-projector.ts
extensions/codex/src/app-server/compact.ts

Desired behavior

For Codex harness turns, OpenClaw should preserve this lifecycle:

Read the mirrored OpenClaw session transcript.
Bootstrap the active context engine when previous SQLite transcript rows exist.
Run bootstrap maintenance when available.
Assemble context using the active context engine.
Convert the assembled context into Codex-compatible inputs.
Start or resume the Codex thread with developer instructions that include any context-engine systemPromptAddition.
Start the Codex turn with the assembled user-facing prompt.
Mirror the Codex result back into the OpenClaw transcript.
Call afterTurn if implemented, otherwise ingestBatch/ingest, using the mirrored transcript snapshot.
Run turn maintenance after successful non-aborted turns.
Preserve Codex native compaction signals and OpenClaw compaction hooks.

Design constraints

Codex app-server remains canonical for native thread state

Codex owns its native thread and any internal extended history. OpenClaw should not try to mutate the app-server's internal history except through supported protocol calls.

OpenClaw's transcript mirror remains the source for OpenClaw features:

chat history
search
/new and /reset bookkeeping
future model or harness switching
context-engine plugin state

Context engine assembly must be projected into Codex inputs

The context-engine interface returns OpenClaw AgentMessage[], not a Codex thread patch. Codex app-server turn/start accepts a current user input, while thread/start and thread/resume accept developer instructions.

Therefore the implementation needs a projection layer. The safe first version should avoid pretending it can replace Codex internal history. It should inject assembled context as deterministic prompt/developer-instruction material around the current turn.

Prompt-cache stability matters

For engines like lossless-claw, the assembled context should be deterministic for unchanged inputs. Do not add timestamps, random ids, or nondeterministic ordering to generated context text.

Runtime selection semantics do not change

Harness selection remains as-is:

runtime: "pi" forces PI
runtime: "codex" selects the registered Codex harness
runtime: "auto" lets plugin harnesses claim supported providers
unmatched auto runs use PI

This work changes what happens after the Codex harness is selected.

Implementation plan

1. Export or relocate reusable context-engine attempt helpers

Today the reusable lifecycle helpers live under the PI runner:

src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts
src/agents/pi-embedded-runner/context-engine-maintenance.ts

Codex should not import from an implementation path whose name implies PI if we can avoid it.

Create a harness-neutral module, for example:

src/agents/harness/context-engine-lifecycle.ts

Move or re-export:

runAttemptContextEngineBootstrap
assembleAttemptContextEngine
finalizeAttemptContextEngineTurn
buildAfterTurnRuntimeContext
buildAfterTurnRuntimeContextFromUsage
a small wrapper around runContextEngineMaintenance

Keep PI imports working either by re-exporting from the old files or updating PI call sites in the same PR.

The neutral helper names should not mention PI.

Suggested names:

bootstrapHarnessContextEngine
assembleHarnessContextEngine
finalizeHarnessContextEngineTurn
buildHarnessContextEngineRuntimeContext
runHarnessContextEngineMaintenance

2. Add a Codex context projection helper

Add a new module:

extensions/codex/src/app-server/context-engine-projection.ts

Responsibilities:

Accept the assembled AgentMessage[], original mirrored history, and current prompt.
Determine which context belongs in developer instructions vs current user input.
Preserve the current user prompt as the final actionable request.
Render prior messages in a stable, explicit format.
Avoid volatile metadata.

Proposed API:

export type CodexContextProjection = {
  developerInstructionAddition?: string;
  promptText: string;
  assembledMessages: AgentMessage[];
  prePromptMessageCount: number;
};

export function projectContextEngineAssemblyForCodex(params: {
  assembledMessages: AgentMessage[];
  originalHistoryMessages: AgentMessage[];
  prompt: string;
  systemPromptAddition?: string;
}): CodexContextProjection;

Recommended first projection:

Put systemPromptAddition into developer instructions.
Put the assembled transcript context before the current prompt in promptText.
Label it clearly as OpenClaw assembled context.
Keep current prompt last.
Exclude duplicate current user prompt if it already appears at the tail.

Example prompt shape:

OpenClaw assembled context for this turn:

<conversation_context>
[user]
...

[assistant]
...
</conversation_context>

Current user request:
...

This is less elegant than native Codex history surgery, but it is implementable inside OpenClaw and preserves context-engine semantics.

Future improvement: if Codex app-server exposes a protocol for replacing or supplementing thread history, swap this projection layer to use that API.

3. Wire bootstrap before Codex thread startup

In extensions/codex/src/app-server/run-attempt.ts:

Read mirrored session history as today.
Determine whether SQLite already has transcript rows for {agentId, sessionId} before mirroring writes.
Use the SQLite transcript scope helpers; do not open a transcript file or derive a locator.
Call the neutral bootstrap helper when params.contextEngine exists.

Pseudo-flow:

const transcriptScope = { agentId: params.agentId, sessionId: params.sessionId };
const historyMessages = readMirroredSessionHistoryMessages(transcriptScope);
const hadTranscriptRows = historyMessages.length > 0;

await bootstrapHarnessContextEngine({
  hadTranscriptRows,
  contextEngine: params.contextEngine,
  sessionId: params.sessionId,
  sessionKey: sandboxSessionKey,
  transcriptScope,
  runtimeContext: buildHarnessContextEngineRuntimeContext(...),
  runMaintenance: runHarnessContextEngineMaintenance,
  warn,
});

Use the same sessionKey convention as the Codex tool bridge and transcript mirror. Today Codex computes sandboxSessionKey from params.sessionKey or params.sessionId; use that consistently unless there is a reason to preserve raw params.sessionKey.

4. Wire assemble before `thread/start` / `thread/resume` and `turn/start`

In runCodexAppServerAttempt:

Build dynamic tools first, so the context engine sees the actual available tool names.
Read mirrored session history.
Run context-engine assemble(...) when params.contextEngine exists.
Project the assembled result into:
- developer instruction addition
- prompt text for turn/start

The existing hook call:

resolveAgentHarnessBeforePromptBuildResult({
  prompt: params.prompt,
  developerInstructions: buildDeveloperInstructions(params),
  messages: historyMessages,
  ctx: hookContext,
});

should become context-aware:

compute base developer instructions with buildDeveloperInstructions(params)
apply context-engine assembly/projection
run before_prompt_build with the projected prompt/developer instructions

This order lets generic prompt hooks see the same prompt Codex will receive. If we need strict PI parity, run context-engine assembly before hook composition, because PI applies context-engine systemPromptAddition to the final system prompt after its prompt pipeline. The important invariant is that both context engine and hooks get a deterministic, documented order.

Recommended order for first implementation:

buildDeveloperInstructions(params)
context-engine assemble()
append/prepend systemPromptAddition to developer instructions
project assembled messages into prompt text
resolveAgentHarnessBeforePromptBuildResult(...)
pass final developer instructions to startOrResumeThread(...)
pass final prompt text to buildTurnStartParams(...)

The spec should be encoded in tests so future changes do not reorder it by accident.

5. Preserve prompt-cache stable formatting

The projection helper must produce byte-stable output for identical inputs:

stable message order
stable role labels
no generated timestamps
no object key order leakage
no random delimiters
no per-run ids

Use fixed delimiters and explicit sections.

6. Wire post-turn after transcript mirroring

Codex's CodexAppServerEventProjector builds a local messagesSnapshot for the current turn. mirrorTranscriptBestEffort(...) writes that snapshot into the OpenClaw transcript mirror.

After mirroring succeeds or fails, call the context-engine finalizer with the best available message snapshot:

Prefer full mirrored session context after the write, because afterTurn expects the session snapshot, not only the current turn.
Fall back to historyMessages + result.messagesSnapshot if the SQLite read fails.

Pseudo-flow:

const prePromptMessageCount = historyMessages.length;
await mirrorTranscriptBestEffort(...);
const finalMessages = readMirroredSessionHistoryMessages(transcriptScope)
  ?? [...historyMessages, ...result.messagesSnapshot];

await finalizeHarnessContextEngineTurn({
  contextEngine: params.contextEngine,
  promptError: Boolean(finalPromptError),
  aborted: finalAborted,
  yieldAborted,
  sessionIdUsed: params.sessionId,
  sessionKey: sandboxSessionKey,
  transcriptScope,
  messagesSnapshot: finalMessages,
  prePromptMessageCount,
  tokenBudget: params.contextTokenBudget,
  runtimeContext: buildHarnessContextEngineRuntimeContextFromUsage({
    attempt: params,
    workspaceDir: effectiveWorkspace,
    agentDir,
    tokenBudget: params.contextTokenBudget,
    lastCallUsage: result.attemptUsage,
    promptCache: result.promptCache,
  }),
  runMaintenance: runHarnessContextEngineMaintenance,
  sessionManager,
  warn,
});

If mirroring fails, still call afterTurn with the fallback snapshot, but log that the context engine is ingesting from fallback turn data.

7. Normalize usage and prompt-cache runtime context

Codex results include normalized usage from app-server token notifications when available. Pass that usage into the context-engine runtime context.

If Codex app-server eventually exposes cache read/write details, map them into ContextEnginePromptCacheInfo. Until then, omit promptCache rather than inventing zeros.

8. Compaction policy

There are two compaction systems:

OpenClaw context-engine compact()
Codex app-server native thread/compact/start

Do not silently conflate them.

`/compact` and explicit OpenClaw compaction

When the selected context engine has info.ownsCompaction === true, explicit OpenClaw compaction should prefer the context engine's compact() result for the OpenClaw transcript mirror and plugin state.

When the selected Codex harness has a native thread binding, we may additionally request Codex native compaction to keep the app-server thread healthy, but this must be reported as a separate backend action in details.

Recommended behavior:

If contextEngine.info.ownsCompaction === true:
- call context-engine compact() first
- then best-effort call Codex native compaction when a thread binding exists
- return the context-engine result as the primary result
- include Codex native compaction status in details.codexNativeCompaction
If the active context engine does not own compaction:
- preserve current Codex native compaction behavior

This likely requires changing extensions/codex/src/app-server/compact.ts or wrapping it from the generic compaction path, depending on where maybeCompactAgentHarnessSession(...) is invoked.

In-turn Codex native contextCompaction events

Codex may emit contextCompaction item events during a turn. Keep the current before/after compaction hook emission in event-projector.ts, but do not treat that as a completed context-engine compaction.

For engines that own compaction, emit an explicit diagnostic when Codex performs native compaction anyway:

stream/event name: existing compaction stream is acceptable
details: { backend: "codex-app-server", ownsCompaction: true }

This makes the split auditable.

9. Session reset and binding behavior

The existing Codex harness reset(...) clears the Codex app-server binding for the OpenClaw session scope. Preserve that behavior.

Also ensure context-engine state cleanup continues to happen through existing OpenClaw session lifecycle paths. Do not add Codex-specific cleanup unless the context-engine lifecycle currently misses reset/delete events for all harnesses.

10. Error handling

Follow PI semantics:

bootstrap failures warn and continue
assemble failures warn and fall back to unassembled pipeline messages/prompt
afterTurn/ingest failures warn and mark post-turn finalization unsuccessful
maintenance runs only after successful, non-aborted, non-yield turns
compaction errors should not be retried as fresh prompts

Codex-specific additions:

If context projection fails, warn and fall back to the original prompt.
If transcript mirror fails, still attempt context-engine finalization with fallback messages.
If Codex native compaction fails after context-engine compaction succeeds, do not fail the whole OpenClaw compaction when the context engine is primary.

Test plan

Unit tests

Add tests under extensions/codex/src/app-server:

run-attempt.context-engine.test.ts
- Codex calls bootstrap when SQLite transcript rows exist.
- Codex calls assemble with mirrored messages, token budget, tool names, citations mode, model id, and prompt.
- systemPromptAddition is included in developer instructions.
- Assembled messages are projected into the prompt before current request.
- Codex calls afterTurn after transcript mirroring.
- Without afterTurn, Codex calls ingestBatch or per-message ingest.
- Turn maintenance runs after successful turns.
- Turn maintenance does not run on prompt error, abort, or yield abort.
context-engine-projection.test.ts
- stable output for identical inputs
- no duplicate current prompt when assembled history includes it
- handles empty history
- preserves role order
- includes system prompt addition only in developer instructions
compact.context-engine.test.ts
- owning context engine primary result wins
- Codex native compaction status appears in details when also attempted
- Codex native failure does not fail owning context-engine compaction
- non-owning context engine preserves current native compaction behavior

Existing tests to update

extensions/codex/src/app-server/run-attempt.test.ts if present, otherwise nearest Codex app-server run tests.
extensions/codex/src/app-server/event-projector.test.ts only if compaction event details change.
src/agents/harness/selection.test.ts should not need changes unless config behavior changes; it should remain stable.
PI context-engine tests should continue to pass unchanged.

Integration / live tests

Add or extend live Codex harness smoke tests:

configure plugins.slots.contextEngine to a test engine
configure agents.defaults.model to a codex/* model
configure agents.defaults.embeddedHarness.runtime = "codex"
assert test engine observed:
- bootstrap
- assemble
- afterTurn or ingest
- maintenance

Avoid requiring lossless-claw in OpenClaw core tests. Use a small in-repo fake context engine plugin.

Observability

Add debug logs around Codex context-engine lifecycle calls:

codex context engine bootstrap started/completed/failed
codex context engine assemble applied
codex context engine finalize completed/failed
codex context engine maintenance skipped with reason
codex native compaction completed alongside context-engine compaction

Avoid logging full prompts or transcript contents.

Add structured fields where useful:

sessionId
sessionKey redacted or omitted according to existing logging practice
engineId
threadId
turnId
assembledMessageCount
estimatedTokens
hasSystemPromptAddition

Migration / compatibility

This should be backward-compatible:

If no context engine is configured, legacy context engine behavior should be equivalent to today's Codex harness behavior.
If context-engine assemble fails, Codex should continue with the original prompt path.
Existing Codex thread bindings should remain valid.
Dynamic tool fingerprinting should not include context-engine output; otherwise every context change could force a new Codex thread. Only the tool catalog should affect the dynamic tool fingerprint.

Open questions

Should assembled context be injected entirely into the user prompt, entirely into developer instructions, or split?

Recommendation: split. Put systemPromptAddition in developer instructions; put assembled transcript context in the user prompt wrapper. This best matches the current Codex protocol without mutating native thread history.
Should Codex native compaction be disabled when a context engine owns compaction?

Recommendation: no, not initially. Codex native compaction may still be necessary to keep the app-server thread alive. But it must be reported as native Codex compaction, not as context-engine compaction.
Should before_prompt_build run before or after context-engine assembly?

Recommendation: after context-engine projection for Codex, so generic harness hooks see the actual prompt/developer instructions Codex will receive. If PI parity requires the opposite, encode the chosen order in tests and document it here.
Can Codex app-server accept a future structured context/history override?

Unknown. If it can, replace the text projection layer with that protocol and keep the lifecycle calls unchanged.

Acceptance criteria

A codex/* embedded harness turn invokes the selected context engine's assemble lifecycle.
A context-engine systemPromptAddition affects Codex developer instructions.
Assembled context affects the Codex turn input deterministically.
Successful Codex turns call afterTurn or ingest fallback.
Successful Codex turns run context-engine turn maintenance.
Failed/aborted/yield-aborted turns do not run turn maintenance.
Context-engine-owned compaction remains primary for OpenClaw/plugin state.
Codex native compaction remains auditable as native Codex behavior.
Existing PI context-engine behavior is unchanged.
Existing Codex harness behavior is unchanged when no non-legacy context engine is selected or when assembly fails.

22 KiB Raw Blame History