openclaw/docs/plan/codex-context-engine-harness.md

---
title: "Codex Harness Context Engine Port"
summary: "Specification for making the bundled Codex app-server harness honor OpenClaw context-engine plugins"
read_when:
  - You are wiring context-engine lifecycle behavior into the Codex harness
  - You need lossless-claw or another context-engine plugin to work with codex/* embedded harness sessions
  - You are comparing embedded PI and Codex app-server context behavior
---

## Status

Draft implementation specification.

## Goal

Make the bundled Codex app-server harness honor the same OpenClaw context-engine
lifecycle contract that embedded PI turns already honor.

A session using `agents.defaults.embeddedHarness.runtime: "codex"` or a
`codex/*` model should still let the selected context-engine plugin, such as
`lossless-claw`, control context assembly, post-turn ingest, maintenance, and
OpenClaw-level compaction policy as far as the Codex app-server boundary allows.

## Non-goals

- Do not reimplement Codex app-server internals.
- Do not make Codex native thread compaction produce a lossless-claw summary.
- Do not require non-Codex models to use the Codex harness.
- Do not change ACP/acpx session behavior. This specification is for the
  non-ACP embedded agent harness path only.
- Do not make third-party plugins register Codex app-server extension factories;
  the existing bundled-plugin trust boundary remains unchanged.

## Current architecture

The embedded run loop resolves the configured context engine once per run before
selecting a concrete low-level harness:

- `src/agents/pi-embedded-runner/run.ts`
  - initializes context-engine plugins
  - calls `resolveContextEngine(params.config)`
  - passes `contextEngine` and `contextTokenBudget` into
    `runEmbeddedAttemptWithBackend(...)`

`runEmbeddedAttemptWithBackend(...)` delegates to the selected agent harness:

- `src/agents/pi-embedded-runner/run/backend.ts`
- `src/agents/harness/selection.ts`

The Codex app-server harness is registered by the bundled Codex plugin:

- `extensions/codex/index.ts`
- `extensions/codex/harness.ts`

The Codex harness implementation receives the same `EmbeddedRunAttemptParams`
as PI-backed attempts:

- `extensions/codex/src/app-server/run-attempt.ts`

That means the required hook point is in OpenClaw-controlled code. The external
boundary is the Codex app-server protocol itself: OpenClaw can control what it
sends to `thread/start`, `thread/resume`, and `turn/start`, and can observe
notifications, but it cannot change Codex's internal thread store or native
compactor.

## Current gap

Embedded PI attempts call the context-engine lifecycle directly:

- bootstrap/maintenance before the attempt
- assemble before the model call
- afterTurn or ingest after the attempt
- maintenance after a successful turn
- context-engine compaction for engines that own compaction

Relevant PI code:

- `src/agents/pi-embedded-runner/run/attempt.ts`
- `src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts`
- `src/agents/pi-embedded-runner/context-engine-maintenance.ts`

Codex app-server attempts currently run generic agent-harness hooks and mirror
the transcript, but do not call `params.contextEngine.bootstrap`,
`params.contextEngine.assemble`, `params.contextEngine.afterTurn`,
`params.contextEngine.ingestBatch`, `params.contextEngine.ingest`, or
`params.contextEngine.maintain`.

Relevant Codex code:

- `extensions/codex/src/app-server/run-attempt.ts`
- `extensions/codex/src/app-server/thread-lifecycle.ts`
- `extensions/codex/src/app-server/event-projector.ts`
- `extensions/codex/src/app-server/compact.ts`

## Desired behavior

For Codex harness turns, OpenClaw should preserve this lifecycle:

1. Read the mirrored OpenClaw session transcript.
2. Bootstrap the active context engine when a previous session file exists.
3. Run bootstrap maintenance when available.
4. Assemble context using the active context engine.
5. Convert the assembled context into Codex-compatible inputs.
6. Start or resume the Codex thread with developer instructions that include any
   context-engine `systemPromptAddition`.
7. Start the Codex turn with the assembled user-facing prompt.
8. Mirror the Codex result back into the OpenClaw transcript.
9. Call `afterTurn` if implemented, otherwise `ingestBatch`/`ingest`, using the
   mirrored transcript snapshot.
10. Run turn maintenance after successful non-aborted turns.
11. Preserve Codex native compaction signals and OpenClaw compaction hooks.

## Design constraints

### Codex app-server remains canonical for native thread state

Codex owns its native thread and any internal extended history. OpenClaw should
not try to mutate the app-server's internal history except through supported
protocol calls.

OpenClaw's transcript mirror remains the source for OpenClaw features:

- chat history
- search
- `/new` and `/reset` bookkeeping
- future model or harness switching
- context-engine plugin state

### Context engine assembly must be projected into Codex inputs

The context-engine interface returns OpenClaw `AgentMessage[]`, not a Codex
thread patch. Codex app-server `turn/start` accepts a current user input, while
`thread/start` and `thread/resume` accept developer instructions.

Therefore the implementation needs a projection layer. The safe first version
should avoid pretending it can replace Codex internal history. It should inject
assembled context as deterministic prompt/developer-instruction material around
the current turn.

### Prompt-cache stability matters

For engines like lossless-claw, the assembled context should be deterministic
for unchanged inputs. Do not add timestamps, random ids, or nondeterministic
ordering to generated context text.

### Runtime selection semantics do not change

Harness selection remains as-is:

- `runtime: "pi"` forces PI
- `runtime: "codex"` selects the registered Codex harness
- `runtime: "auto"` lets plugin harnesses claim supported providers
- unmatched `auto` runs use PI

This work changes what happens after the Codex harness is selected.

## Implementation plan

### 1. Export or relocate reusable context-engine attempt helpers

Today the reusable lifecycle helpers live under the PI runner:

- `src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts`
- `src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts`
- `src/agents/pi-embedded-runner/context-engine-maintenance.ts`

Codex should not import from an implementation path whose name implies PI if we
can avoid it.

Create a harness-neutral module, for example:

- `src/agents/harness/context-engine-lifecycle.ts`

Move or re-export:

- `runAttemptContextEngineBootstrap`
- `assembleAttemptContextEngine`
- `finalizeAttemptContextEngineTurn`
- `buildAfterTurnRuntimeContext`
- `buildAfterTurnRuntimeContextFromUsage`
- a small wrapper around `runContextEngineMaintenance`

Keep PI imports working either by re-exporting from the old files or updating PI
call sites in the same PR.

The neutral helper names should not mention PI.

Suggested names:

- `bootstrapHarnessContextEngine`
- `assembleHarnessContextEngine`
- `finalizeHarnessContextEngineTurn`
- `buildHarnessContextEngineRuntimeContext`
- `runHarnessContextEngineMaintenance`

### 2. Add a Codex context projection helper

Add a new module:

- `extensions/codex/src/app-server/context-engine-projection.ts`

Responsibilities:

- Accept the assembled `AgentMessage[]`, original mirrored history, and current
  prompt.
- Determine which context belongs in developer instructions vs current user
  input.
- Preserve the current user prompt as the final actionable request.
- Render prior messages in a stable, explicit format.
- Avoid volatile metadata.

Proposed API:

```ts
export type CodexContextProjection = {
  developerInstructionAddition?: string;
  promptText: string;
  assembledMessages: AgentMessage[];
  prePromptMessageCount: number;
};

export function projectContextEngineAssemblyForCodex(params: {
  assembledMessages: AgentMessage[];
  originalHistoryMessages: AgentMessage[];
  prompt: string;
  systemPromptAddition?: string;
}): CodexContextProjection;
```

Recommended first projection:

- Put `systemPromptAddition` into developer instructions.
- Put the assembled transcript context before the current prompt in `promptText`.
- Label it clearly as OpenClaw assembled context.
- Keep current prompt last.
- Exclude duplicate current user prompt if it already appears at the tail.

Example prompt shape:

```text
OpenClaw assembled context for this turn:

<conversation_context>
[user]
...

[assistant]
...
</conversation_context>

Current user request:
...
```

This is less elegant than native Codex history surgery, but it is implementable
inside OpenClaw and preserves context-engine semantics.

Future improvement: if Codex app-server exposes a protocol for replacing or
supplementing thread history, swap this projection layer to use that API.

### 3. Wire bootstrap before Codex thread startup

In `extensions/codex/src/app-server/run-attempt.ts`:

- Read mirrored session history as today.
- Determine whether the session file existed before this run. Prefer a helper
  that checks `fs.stat(params.sessionFile)` before mirroring writes.
- Open a `SessionManager` or use a narrow session manager adapter if the helper
  requires it.
- Call the neutral bootstrap helper when `params.contextEngine` exists.

Pseudo-flow:

```ts
const hadSessionFile = await fileExists(params.sessionFile);
const sessionManager = SessionManager.open(params.sessionFile);
const historyMessages = sessionManager.buildSessionContext().messages;

await bootstrapHarnessContextEngine({
  hadSessionFile,
  contextEngine: params.contextEngine,
  sessionId: params.sessionId,
  sessionKey: sandboxSessionKey,
  sessionFile: params.sessionFile,
  sessionManager,
  runtimeContext: buildHarnessContextEngineRuntimeContext(...),
  runMaintenance: runHarnessContextEngineMaintenance,
  warn,
});
```

Use the same `sessionKey` convention as the Codex tool bridge and transcript
mirror. Today Codex computes `sandboxSessionKey` from `params.sessionKey` or
`params.sessionId`; use that consistently unless there is a reason to preserve
raw `params.sessionKey`.

### 4. Wire assemble before `thread/start` / `thread/resume` and `turn/start`

In `runCodexAppServerAttempt`:

1. Build dynamic tools first, so the context engine sees the actual available
   tool names.
2. Read mirrored session history.
3. Run context-engine `assemble(...)` when `params.contextEngine` exists.
4. Project the assembled result into:
   - developer instruction addition
   - prompt text for `turn/start`

The existing hook call:

```ts
resolveAgentHarnessBeforePromptBuildResult({
  prompt: params.prompt,
  developerInstructions: buildDeveloperInstructions(params),
  messages: historyMessages,
  ctx: hookContext,
});
```

should become context-aware:

1. compute base developer instructions with `buildDeveloperInstructions(params)`
2. apply context-engine assembly/projection
3. run `before_prompt_build` with the projected prompt/developer instructions

This order lets generic prompt hooks see the same prompt Codex will receive. If
we need strict PI parity, run context-engine assembly before hook composition,
because PI applies context-engine `systemPromptAddition` to the final system
prompt after its prompt pipeline. The important invariant is that both context
engine and hooks get a deterministic, documented order.

Recommended order for first implementation:

1. `buildDeveloperInstructions(params)`
2. context-engine `assemble()`
3. append/prepend `systemPromptAddition` to developer instructions
4. project assembled messages into prompt text
5. `resolveAgentHarnessBeforePromptBuildResult(...)`
6. pass final developer instructions to `startOrResumeThread(...)`
7. pass final prompt text to `buildTurnStartParams(...)`

The spec should be encoded in tests so future changes do not reorder it by
accident.

### 5. Preserve prompt-cache stable formatting

The projection helper must produce byte-stable output for identical inputs:

- stable message order
- stable role labels
- no generated timestamps
- no object key order leakage
- no random delimiters
- no per-run ids

Use fixed delimiters and explicit sections.

### 6. Wire post-turn after transcript mirroring

Codex's `CodexAppServerEventProjector` builds a local `messagesSnapshot` for the
current turn. `mirrorTranscriptBestEffort(...)` writes that snapshot into the
OpenClaw transcript mirror.

After mirroring succeeds or fails, call the context-engine finalizer with the
best available message snapshot:

- Prefer full mirrored session context after the write, because `afterTurn`
  expects the session snapshot, not only the current turn.
- Fall back to `historyMessages + result.messagesSnapshot` if the session file
  cannot be reopened.

Pseudo-flow:

```ts
const prePromptMessageCount = historyMessages.length;
await mirrorTranscriptBestEffort(...);
const finalMessages = readMirroredSessionHistoryMessages(params.sessionFile)
  ?? [...historyMessages, ...result.messagesSnapshot];

await finalizeHarnessContextEngineTurn({
  contextEngine: params.contextEngine,
  promptError: Boolean(finalPromptError),
  aborted: finalAborted,
  yieldAborted,
  sessionIdUsed: params.sessionId,
  sessionKey: sandboxSessionKey,
  sessionFile: params.sessionFile,
  messagesSnapshot: finalMessages,
  prePromptMessageCount,
  tokenBudget: params.contextTokenBudget,
  runtimeContext: buildHarnessContextEngineRuntimeContextFromUsage({
    attempt: params,
    workspaceDir: effectiveWorkspace,
    agentDir,
    tokenBudget: params.contextTokenBudget,
    lastCallUsage: result.attemptUsage,
    promptCache: result.promptCache,
  }),
  runMaintenance: runHarnessContextEngineMaintenance,
  sessionManager,
  warn,
});
```

If mirroring fails, still call `afterTurn` with the fallback snapshot, but log
that the context engine is ingesting from fallback turn data.

### 7. Normalize usage and prompt-cache runtime context

Codex results include normalized usage from app-server token notifications when
available. Pass that usage into the context-engine runtime context.

If Codex app-server eventually exposes cache read/write details, map them into
`ContextEnginePromptCacheInfo`. Until then, omit `promptCache` rather than
inventing zeros.

### 8. Compaction policy

There are two compaction systems:

1. OpenClaw context-engine `compact()`
2. Codex app-server native `thread/compact/start`

Do not silently conflate them.

#### `/compact` and explicit OpenClaw compaction

When the selected context engine has `info.ownsCompaction === true`, explicit
OpenClaw compaction should prefer the context engine's `compact()` result for
the OpenClaw transcript mirror and plugin state.

When the selected Codex harness has a native thread binding, we may additionally
request Codex native compaction to keep the app-server thread healthy, but this
must be reported as a separate backend action in details.

Recommended behavior:

- If `contextEngine.info.ownsCompaction === true`:
  - call context-engine `compact()` first
  - then best-effort call Codex native compaction when a thread binding exists
  - return the context-engine result as the primary result
  - include Codex native compaction status in `details.codexNativeCompaction`
- If the active context engine does not own compaction:
  - preserve current Codex native compaction behavior

This likely requires changing `extensions/codex/src/app-server/compact.ts` or
wrapping it from the generic compaction path, depending on where
`maybeCompactAgentHarnessSession(...)` is invoked.

#### In-turn Codex native contextCompaction events

Codex may emit `contextCompaction` item events during a turn. Keep the current
before/after compaction hook emission in `event-projector.ts`, but do not treat
that as a completed context-engine compaction.

For engines that own compaction, emit an explicit diagnostic when Codex performs
native compaction anyway:

- stream/event name: existing `compaction` stream is acceptable
- details: `{ backend: "codex-app-server", ownsCompaction: true }`

This makes the split auditable.

### 9. Session reset and binding behavior

The existing Codex harness `reset(...)` clears the Codex app-server binding from
the OpenClaw session file. Preserve that behavior.

Also ensure context-engine state cleanup continues to happen through existing
OpenClaw session lifecycle paths. Do not add Codex-specific cleanup unless the
context-engine lifecycle currently misses reset/delete events for all harnesses.

### 10. Error handling

Follow PI semantics:

- bootstrap failures warn and continue
- assemble failures warn and fall back to unassembled pipeline messages/prompt
- afterTurn/ingest failures warn and mark post-turn finalization unsuccessful
- maintenance runs only after successful, non-aborted, non-yield turns
- compaction errors should not be retried as fresh prompts

Codex-specific additions:

- If context projection fails, warn and fall back to the original prompt.
- If transcript mirror fails, still attempt context-engine finalization with
  fallback messages.
- If Codex native compaction fails after context-engine compaction succeeds,
  do not fail the whole OpenClaw compaction when the context engine is primary.

## Test plan

### Unit tests

Add tests under `extensions/codex/src/app-server`:

1. `run-attempt.context-engine.test.ts`
   - Codex calls `bootstrap` when a session file exists.
   - Codex calls `assemble` with mirrored messages, token budget, tool names,
     citations mode, model id, and prompt.
   - `systemPromptAddition` is included in developer instructions.
   - Assembled messages are projected into the prompt before current request.
   - Codex calls `afterTurn` after transcript mirroring.
   - Without `afterTurn`, Codex calls `ingestBatch` or per-message `ingest`.
   - Turn maintenance runs after successful turns.
   - Turn maintenance does not run on prompt error, abort, or yield abort.

2. `context-engine-projection.test.ts`
   - stable output for identical inputs
   - no duplicate current prompt when assembled history includes it
   - handles empty history
   - preserves role order
   - includes system prompt addition only in developer instructions

3. `compact.context-engine.test.ts`
   - owning context engine primary result wins
   - Codex native compaction status appears in details when also attempted
   - Codex native failure does not fail owning context-engine compaction
   - non-owning context engine preserves current native compaction behavior

### Existing tests to update

- `extensions/codex/src/app-server/run-attempt.test.ts` if present, otherwise
  nearest Codex app-server run tests.
- `extensions/codex/src/app-server/event-projector.test.ts` only if compaction
  event details change.
- `src/agents/harness/selection.test.ts` should not need changes unless config
  behavior changes; it should remain stable.
- PI context-engine tests should continue to pass unchanged.

### Integration / live tests

Add or extend live Codex harness smoke tests:

- configure `plugins.slots.contextEngine` to a test engine
- configure `agents.defaults.model` to a `codex/*` model
- configure `agents.defaults.embeddedHarness.runtime = "codex"`
- assert test engine observed:
  - bootstrap
  - assemble
  - afterTurn or ingest
  - maintenance

Avoid requiring lossless-claw in OpenClaw core tests. Use a small in-repo fake
context engine plugin.

## Observability

Add debug logs around Codex context-engine lifecycle calls:

- `codex context engine bootstrap started/completed/failed`
- `codex context engine assemble applied`
- `codex context engine finalize completed/failed`
- `codex context engine maintenance skipped` with reason
- `codex native compaction completed alongside context-engine compaction`

Avoid logging full prompts or transcript contents.

Add structured fields where useful:

- `sessionId`
- `sessionKey` redacted or omitted according to existing logging practice
- `engineId`
- `threadId`
- `turnId`
- `assembledMessageCount`
- `estimatedTokens`
- `hasSystemPromptAddition`

## Migration / compatibility

This should be backward-compatible:

- If no context engine is configured, legacy context engine behavior should be
  equivalent to today's Codex harness behavior.
- If context-engine `assemble` fails, Codex should continue with the original
  prompt path.
- Existing Codex thread bindings should remain valid.
- Dynamic tool fingerprinting should not include context-engine output; otherwise
  every context change could force a new Codex thread. Only the tool catalog
  should affect the dynamic tool fingerprint.

## Open questions

1. Should assembled context be injected entirely into the user prompt, entirely
   into developer instructions, or split?

   Recommendation: split. Put `systemPromptAddition` in developer instructions;
   put assembled transcript context in the user prompt wrapper. This best matches
   the current Codex protocol without mutating native thread history.

2. Should Codex native compaction be disabled when a context engine owns
   compaction?

   Recommendation: no, not initially. Codex native compaction may still be
   necessary to keep the app-server thread alive. But it must be reported as
   native Codex compaction, not as context-engine compaction.

3. Should `before_prompt_build` run before or after context-engine assembly?

   Recommendation: after context-engine projection for Codex, so generic harness
   hooks see the actual prompt/developer instructions Codex will receive. If PI
   parity requires the opposite, encode the chosen order in tests and document it
   here.

4. Can Codex app-server accept a future structured context/history override?

   Unknown. If it can, replace the text projection layer with that protocol and
   keep the lifecycle calls unchanged.

## Acceptance criteria

- A `codex/*` embedded harness turn invokes the selected context engine's
  assemble lifecycle.
- A context-engine `systemPromptAddition` affects Codex developer instructions.
- Assembled context affects the Codex turn input deterministically.
- Successful Codex turns call `afterTurn` or ingest fallback.
- Successful Codex turns run context-engine turn maintenance.
- Failed/aborted/yield-aborted turns do not run turn maintenance.
- Context-engine-owned compaction remains primary for OpenClaw/plugin state.
- Codex native compaction remains auditable as native Codex behavior.
- Existing PI context-engine behavior is unchanged.
- Existing Codex harness behavior is unchanged when no non-legacy context engine
  is selected or when assembly fails.