* refactor: extract agent core package Introduce packages/agent-core as the OpenClaw-owned home for reusable agent loop, harness, session, prompt, and runtime dependency contracts. * refactor: extract shared llm runtime Move provider model registries, stream wrappers, OAuth helpers, and LLM utilities into src/llm with plugin-sdk barrels instead of depending on the old embedded runtime layout. * refactor: remove pi runtime internals Rename remaining Pi-shaped agent surfaces to OpenClaw agent runtime names, delete obsolete Pi docs and package graph checks, and add the third-party notice for incorporated code. * refactor: tighten agent session runtime Make agent-core/runtime dependencies explicit, consolidate compaction and session transcript helpers, and move model/session helpers behind OpenClaw-owned contracts. * refactor: remove static model and pi auth paths Drop static model catalogs and Pi auth bridges, move model/provider facts to manifest-owned runtime contracts, and harden internal embedded-agent utilities. * refactor: remove legacy provider compat paths * docs: remove agent parity notes * fix: skip provider wildcard metadata parsing * refactor: share session extension sdk loading * refactor: inline acpx proxy error formatter * refactor: fold edit recovery into edit tool * fix: accept extension batch separator * test: align startup provider plugin expectations * fix: restore provider-scoped release discovery * test: align static asset packaging expectations * fix: run static provider catalogs during scoped discovery * fix: add provider entry catalogs for scoped live discovery * fix: load lightweight provider catalog entries * fix: refresh provider-scoped plugin metadata * fix: keep provider catalog entries on release live path * fix: keep static manifest models in release live checks * fix: harden release model discovery * fix: reduce OpenAI live cache probe reasoning * fix: disable OpenAI cache probe reasoning * ci: extend OpenAI gateway live timeout * fix: extend live gateway model budget * fix: stabilize release validation regressions * fix: honor provider aliases in model rows * fix: stabilize release validation lanes * fix: stabilize release memory qa * ci: stabilize release validation lanes * ci: prefer ipv4 for live docker node calls * fix: restore shared tool-call stream wrapper * ci: remove legacy pi test shard alias * fix: clean up embedded agent test drift * fix: stabilize runtime alias status * fix: clean up embedded agent ci drift * fix: restore release ci invariants * fix: clean up post-rebase runtime drift * fix: restore release ci checks * fix: restore release ci after rebase * fix: remove stale pi runtime path * test: align compaction runtime expectations * test: update plugin prerelease expectations * fix: handle claude live tool approvals * fix: stabilize release validation gates * fix: finish agent runtime import * test: finish post-rebase agent runtime mocks * fix: keep codex compaction native * fix: stabilize codex app-server hook tests * test: isolate codex diagnostic active run * test: remove codex diagnostic completion race # Conflicts: # extensions/codex/src/app-server/run-attempt.test.ts * ci: fix full release manifest performance run id * refactor: narrow llm plugin sdk boundary * chore: drop generated google boundary stamps * fix: repair rebase fallout * fix: clean up rebased runtime references * fix: decode codex jwt payloads as base64url * fix: preserve shipped pi runtime alias * fix: add scoped sdk virtual modules * fix: decode llm codex oauth jwt as base64url * fix: avoid stale vertex adc negative cache * fix: harden tool arg decoding and codeql path * fix: keep vertex adc negative checks live * refactor: consolidate codex jwt and edit helpers * fix: await codex oauth node runtime imports * fix: preserve sdk tool and notice contracts * fix: preserve shipped compat config boundaries * fix: align codex oauth callback host * fix: terminate agent-core loop streams on failure * fix: keep codex oauth callback alive during fallback * ci: include session tools in critical codeql scans * fix: keep Cloudflare Anthropic provider auth header * docs: redirect legacy pi runtime pages * fix: honor bundled web provider compat discovery * fix: protect session output spill files * fix: keep legacy agent dir env blocked * fix: contain auto-discovered skill symlinks * fix: harden agent core sdk proxy surfaces * fix: restore approval reaction sdk compat * fix: keep live docker runs bounded * fix: keep codex oauth redirect host aligned * fix: resolve post-rebase agent runtime drift * fix: redact anthropic oauth parse failures * fix: preserve responses strict tool shaping * fix: repair agent runtime rebase cleanup * docs: redirect retired parity pages * fix: bound auto-discovered resources to roots * fix: repair post-rebase agent test drift * fix: preserve bundled provider allowlist migration * fix: preserve manifest-owned provider aliases * fix: declare photon image dependency * fix: keep provider headers out of proxy body * fix: preserve shipped env aliases * fix: refresh control ui i18n generated state * fix: quote read fallback paths * fix: preview edits through configured backend * test: satisfy core test typecheck * fix: preserve ZAI usage auth fallback * test: repair codex diagnostic test * fix: repair agent runtime rebase drift * test: finish embedded runner import rename * fix: repair agent runtime rebase integrations * test: align compaction oauth fallback expectations * fix: allow sdk-auth session models * fix: update doctor tool schema import * fix: preserve bedrock plugin region * fix: stream harmony-like prose immediately * ci: include session runtime in codeql shards * fix: repair latest rebase integrations * fix: honor explicit codex websocket transport * fix: keep openai-compatible credentials provider-scoped * fix: refresh sdk api baseline after rebase * fix: route cli runtime aliases through openclaw harness * test: rename stale harness mock expectation * test: rename embedded agent overflow calls * test: clean embedded auth test wording * test: use openclaw stream types in deepinfra cache test * fix: refresh sdk api baseline on latest main * fix: honor bundled discovery compat allowlists * fix: refresh sdk api baseline after latest rebase * fix: remove stale rebase imports * test: rename stale model catalog mock * test: mock renamed doctor runtime modules * fix: map canonical kimi env auth * fix: use internal model registry in bench script * fix: migrate deepinfra provider catalog entry * fix: enforce builtin tool suppression * fix: route compaction auth and proxy payloads safely * refactor: prune unused llm registry leftovers * test: update codex hooks session import * test: fix model picker ci coverage * test: align model picker auth mock types
22 KiB
title, summary, read_when
| title | summary | read_when | |||
|---|---|---|---|---|---|
| Codex Harness Context Engine Port | Specification for making the bundled Codex app-server harness honor OpenClaw context-engine plugins |
|
Status
Draft implementation specification.
Goal
Make the bundled Codex app-server harness honor the same OpenClaw context-engine lifecycle contract that embedded OpenClaw turns already honor.
A session using provider/model agentRuntime.id: "codex" or a codex/* model
should still let the selected context-engine plugin, such as
lossless-claw, control context assembly, post-turn ingest, maintenance, and
OpenClaw-level compaction policy as far as the Codex app-server boundary allows.
Non-goals
- Do not reimplement Codex app-server internals.
- Do not make Codex native thread compaction produce a lossless-claw summary.
- Do not require non-Codex models to use the Codex harness.
- Do not change ACP/acpx session behavior. This specification is for the non-ACP embedded agent harness path only.
- Do not make third-party plugins register Codex app-server extension factories; the existing bundled-plugin trust boundary remains unchanged.
Current architecture
The embedded run loop resolves the configured context engine once per run before selecting a concrete low-level harness:
src/agents/embedded-agent-runner/run.ts- initializes context-engine plugins
- calls
resolveContextEngine(params.config) - passes
contextEngineandcontextTokenBudgetintorunEmbeddedAttemptWithBackend(...)
runEmbeddedAttemptWithBackend(...) delegates to the selected agent harness:
src/agents/embedded-agent-runner/run/backend.tssrc/agents/harness/selection.ts
The Codex app-server harness is registered by the bundled Codex plugin:
extensions/codex/index.tsextensions/codex/harness.ts
The Codex harness implementation receives the same EmbeddedRunAttemptParams
as built-in OpenClaw attempts:
extensions/codex/src/app-server/run-attempt.ts
That means the required hook point is in OpenClaw-controlled code. The external
boundary is the Codex app-server protocol itself: OpenClaw can control what it
sends to thread/start, thread/resume, and turn/start, and can observe
notifications, but it cannot change Codex's internal thread store or native
compactor.
Current gap
Built-in OpenClaw attempts call the context-engine lifecycle directly:
- bootstrap/maintenance before the attempt
- assemble before the model call
- afterTurn or ingest after the attempt
- maintenance after a successful turn
- context-engine compaction for engines that own compaction
Relevant OpenClaw code:
src/agents/embedded-agent-runner/run/attempt.tssrc/agents/embedded-agent-runner/run/attempt.context-engine-helpers.tssrc/agents/embedded-agent-runner/context-engine-maintenance.ts
Codex app-server attempts currently run generic agent-harness hooks and mirror
the transcript, but do not call params.contextEngine.bootstrap,
params.contextEngine.assemble, params.contextEngine.afterTurn,
params.contextEngine.ingestBatch, params.contextEngine.ingest, or
params.contextEngine.maintain.
Relevant Codex code:
extensions/codex/src/app-server/run-attempt.tsextensions/codex/src/app-server/thread-lifecycle.tsextensions/codex/src/app-server/event-projector.tsextensions/codex/src/app-server/compact.ts
Desired behavior
For Codex harness turns, OpenClaw should preserve this lifecycle:
- Read the mirrored OpenClaw session transcript.
- Bootstrap the active context engine when a previous session file exists.
- Run bootstrap maintenance when available.
- Assemble context using the active context engine.
- Convert the assembled context into Codex-compatible inputs.
- Start or resume the Codex thread with developer instructions that include any
context-engine
systemPromptAddition. - Start the Codex turn with the assembled user-facing prompt.
- Mirror the Codex result back into the OpenClaw transcript.
- Call
afterTurnif implemented, otherwiseingestBatch/ingest, using the mirrored transcript snapshot. - Run turn maintenance after successful non-aborted turns.
- Preserve Codex native compaction signals and OpenClaw compaction hooks.
Design constraints
Codex app-server remains canonical for native thread state
Codex owns its native thread and any internal extended history. OpenClaw should not try to mutate the app-server's internal history except through supported protocol calls.
OpenClaw's transcript mirror remains the source for OpenClaw features:
- chat history
- search
/newand/resetbookkeeping- future model or harness switching
- context-engine plugin state
Context engine assembly must be projected into Codex inputs
The context-engine interface returns OpenClaw AgentMessage[], not a Codex
thread patch. Codex app-server turn/start accepts a current user input, while
thread/start and thread/resume accept developer instructions.
Therefore the implementation needs a projection layer. The safe first version should avoid pretending it can replace Codex internal history. It should inject assembled context as deterministic prompt/developer-instruction material around the current turn.
Prompt-cache stability matters
For engines like lossless-claw, the assembled context should be deterministic for unchanged inputs. Do not add timestamps, random ids, or nondeterministic ordering to generated context text.
Runtime selection semantics do not change
Harness selection remains as-is:
runtime: "openclaw"selects the built-in OpenClaw harnessruntime: "codex"selects the registered Codex harnessruntime: "auto"lets plugin harnesses claim supported providers- unmatched
autoruns use the built-in OpenClaw harness
This work changes what happens after the Codex harness is selected.
Implementation plan
1. Export or relocate reusable context-engine attempt helpers
Today the reusable lifecycle helpers live under the embedded agent runner:
src/agents/embedded-agent-runner/run/attempt.context-engine-helpers.tssrc/agents/embedded-agent-runner/run/attempt.prompt-helpers.tssrc/agents/embedded-agent-runner/context-engine-maintenance.ts
Codex should import harness-neutral helpers rather than reaching into runner implementation details.
Create a harness-neutral module, for example:
src/agents/harness/context-engine-lifecycle.ts
Move or re-export:
runAttemptContextEngineBootstrapassembleAttemptContextEnginefinalizeAttemptContextEngineTurnbuildAfterTurnRuntimeContextbuildAfterTurnRuntimeContextFromUsage- a small wrapper around
runContextEngineMaintenance
Update built-in harness call sites in the same PR.
The neutral helper names should not mention the built-in harness.
Suggested names:
bootstrapHarnessContextEngineassembleHarnessContextEnginefinalizeHarnessContextEngineTurnbuildHarnessContextEngineRuntimeContextrunHarnessContextEngineMaintenance
2. Add a Codex context projection helper
Add a new module:
extensions/codex/src/app-server/context-engine-projection.ts
Responsibilities:
- Accept the assembled
AgentMessage[], original mirrored history, and current prompt. - Determine which context belongs in developer instructions vs current user input.
- Preserve the current user prompt as the final actionable request.
- Render prior messages in a stable, explicit format.
- Avoid volatile metadata.
Proposed API:
export type CodexContextProjection = {
developerInstructionAddition?: string;
promptText: string;
assembledMessages: AgentMessage[];
prePromptMessageCount: number;
};
export function projectContextEngineAssemblyForCodex(params: {
assembledMessages: AgentMessage[];
originalHistoryMessages: AgentMessage[];
prompt: string;
systemPromptAddition?: string;
}): CodexContextProjection;
Recommended first projection:
- Put
systemPromptAdditioninto developer instructions. - Put the assembled transcript context before the current prompt in
promptText. - Label it clearly as OpenClaw assembled context.
- Keep current prompt last.
- Exclude duplicate current user prompt if it already appears at the tail.
Example prompt shape:
OpenClaw assembled context for this turn:
<conversation_context>
[user]
...
[assistant]
...
</conversation_context>
Current user request:
...
This is less elegant than native Codex history surgery, but it is implementable inside OpenClaw and preserves context-engine semantics.
Future improvement: if Codex app-server exposes a protocol for replacing or supplementing thread history, swap this projection layer to use that API.
3. Wire bootstrap before Codex thread startup
In extensions/codex/src/app-server/run-attempt.ts:
- Read mirrored session history as today.
- Determine whether the session file existed before this run. Prefer a helper
that checks
fs.stat(params.sessionFile)before mirroring writes. - Open a
SessionManageror use a narrow session manager adapter if the helper requires it. - Call the neutral bootstrap helper when
params.contextEngineexists.
Pseudo-flow:
const hadSessionFile = await fileExists(params.sessionFile);
const sessionManager = SessionManager.open(params.sessionFile);
const historyMessages = sessionManager.buildSessionContext().messages;
await bootstrapHarnessContextEngine({
hadSessionFile,
contextEngine: params.contextEngine,
sessionId: params.sessionId,
sessionKey: sandboxSessionKey,
sessionFile: params.sessionFile,
sessionManager,
runtimeContext: buildHarnessContextEngineRuntimeContext(...),
runMaintenance: runHarnessContextEngineMaintenance,
warn,
});
Use the same sessionKey convention as the Codex tool bridge and transcript
mirror. Today Codex computes sandboxSessionKey from params.sessionKey or
params.sessionId; use that consistently unless there is a reason to preserve
raw params.sessionKey.
4. Wire assemble before thread/start / thread/resume and turn/start
In runCodexAppServerAttempt:
- Build dynamic tools first, so the context engine sees the actual available tool names.
- Read mirrored session history.
- Run context-engine
assemble(...)whenparams.contextEngineexists. - Project the assembled result into:
- developer instruction addition
- prompt text for
turn/start
The existing hook call:
resolveAgentHarnessBeforePromptBuildResult({
prompt: params.prompt,
developerInstructions: buildDeveloperInstructions(params),
messages: historyMessages,
ctx: hookContext,
});
should become context-aware:
- compute base developer instructions with
buildDeveloperInstructions(params) - apply context-engine assembly/projection
- run
before_prompt_buildwith the projected prompt/developer instructions
This order lets generic prompt hooks see the same prompt Codex will receive. If
we need strict OpenClaw parity, run context-engine assembly before hook
composition, because the built-in harness applies context-engine
systemPromptAddition to the final system prompt after its prompt pipeline. The
important invariant is that both context engine and hooks get a deterministic,
documented order.
Recommended order for first implementation:
buildDeveloperInstructions(params)- context-engine
assemble() - append/prepend
systemPromptAdditionto developer instructions - project assembled messages into prompt text
resolveAgentHarnessBeforePromptBuildResult(...)- pass final developer instructions to
startOrResumeThread(...) - pass final prompt text to
buildTurnStartParams(...)
The spec should be encoded in tests so future changes do not reorder it by accident.
5. Preserve prompt-cache stable formatting
The projection helper must produce byte-stable output for identical inputs:
- stable message order
- stable role labels
- no generated timestamps
- no object key order leakage
- no random delimiters
- no per-run ids
Use fixed delimiters and explicit sections.
6. Wire post-turn after transcript mirroring
Codex's CodexAppServerEventProjector builds a local messagesSnapshot for the
current turn. mirrorTranscriptBestEffort(...) writes that snapshot into the
OpenClaw transcript mirror.
After mirroring succeeds or fails, call the context-engine finalizer with the best available message snapshot:
- Prefer full mirrored session context after the write, because
afterTurnexpects the session snapshot, not only the current turn. - Fall back to
historyMessages + result.messagesSnapshotif the session file cannot be reopened.
Pseudo-flow:
const prePromptMessageCount = historyMessages.length;
await mirrorTranscriptBestEffort(...);
const finalMessages = readMirroredSessionHistoryMessages(params.sessionFile)
?? [...historyMessages, ...result.messagesSnapshot];
await finalizeHarnessContextEngineTurn({
contextEngine: params.contextEngine,
promptError: Boolean(finalPromptError),
aborted: finalAborted,
yieldAborted,
sessionIdUsed: params.sessionId,
sessionKey: sandboxSessionKey,
sessionFile: params.sessionFile,
messagesSnapshot: finalMessages,
prePromptMessageCount,
tokenBudget: params.contextTokenBudget,
runtimeContext: buildHarnessContextEngineRuntimeContextFromUsage({
attempt: params,
workspaceDir: effectiveWorkspace,
agentDir,
tokenBudget: params.contextTokenBudget,
lastCallUsage: result.attemptUsage,
promptCache: result.promptCache,
}),
runMaintenance: runHarnessContextEngineMaintenance,
sessionManager,
warn,
});
If mirroring fails, still call afterTurn with the fallback snapshot, but log
that the context engine is ingesting from fallback turn data.
7. Normalize usage and prompt-cache runtime context
Codex results include normalized usage from app-server token notifications when available. Pass that usage into the context-engine runtime context.
If Codex app-server eventually exposes cache read/write details, map them into
ContextEnginePromptCacheInfo. Until then, omit promptCache rather than
inventing zeros.
8. Compaction policy
There are two compaction systems:
- OpenClaw context-engine
compact() - Codex app-server native
thread/compact/start
Do not silently conflate them.
/compact and explicit OpenClaw compaction
When the selected context engine has info.ownsCompaction === true, explicit
OpenClaw compaction should prefer the context engine's compact() result for
the OpenClaw transcript mirror and plugin state.
When the selected Codex harness has a native thread binding, we may additionally request Codex native compaction to keep the app-server thread healthy, but this must be reported as a separate backend action in details.
Recommended behavior:
- If
contextEngine.info.ownsCompaction === true:- call context-engine
compact()first - then best-effort call Codex native compaction when a thread binding exists
- return the context-engine result as the primary result
- include Codex native compaction status in
details.codexNativeCompaction
- call context-engine
- If the active context engine does not own compaction:
- preserve current Codex native compaction behavior
This likely requires changing extensions/codex/src/app-server/compact.ts or
wrapping it from the generic compaction path, depending on where
maybeCompactAgentHarnessSession(...) is invoked.
In-turn Codex native contextCompaction events
Codex may emit contextCompaction item events during a turn. Keep the current
before/after compaction hook emission in event-projector.ts, but do not treat
that as a completed context-engine compaction.
For engines that own compaction, emit an explicit diagnostic when Codex performs native compaction anyway:
- stream/event name: existing
compactionstream is acceptable - details:
{ backend: "codex-app-server", ownsCompaction: true }
This makes the split auditable.
9. Session reset and binding behavior
The existing Codex harness reset(...) clears the Codex app-server binding from
the OpenClaw session file. Preserve that behavior.
Also ensure context-engine state cleanup continues to happen through existing OpenClaw session lifecycle paths. Do not add Codex-specific cleanup unless the context-engine lifecycle currently misses reset/delete events for all harnesses.
10. Error handling
Follow built-in OpenClaw semantics:
- bootstrap failures warn and continue
- assemble failures warn and fall back to unassembled pipeline messages/prompt
- afterTurn/ingest failures warn and mark post-turn finalization unsuccessful
- maintenance runs only after successful, non-aborted, non-yield turns
- compaction errors should not be retried as fresh prompts
Codex-specific additions:
- If context projection fails, warn and fall back to the original prompt.
- If transcript mirror fails, still attempt context-engine finalization with fallback messages.
- If Codex native compaction fails after context-engine compaction succeeds, do not fail the whole OpenClaw compaction when the context engine is primary.
Test plan
Unit tests
Add tests under extensions/codex/src/app-server:
-
run-attempt.context-engine.test.ts- Codex calls
bootstrapwhen a session file exists. - Codex calls
assemblewith mirrored messages, token budget, tool names, citations mode, model id, and prompt. systemPromptAdditionis included in developer instructions.- Assembled messages are projected into the prompt before current request.
- Codex calls
afterTurnafter transcript mirroring. - Without
afterTurn, Codex callsingestBatchor per-messageingest. - Turn maintenance runs after successful turns.
- Turn maintenance does not run on prompt error, abort, or yield abort.
- Codex calls
-
context-engine-projection.test.ts- stable output for identical inputs
- no duplicate current prompt when assembled history includes it
- handles empty history
- preserves role order
- includes system prompt addition only in developer instructions
-
compact.context-engine.test.ts- owning context engine primary result wins
- Codex native compaction status appears in details when also attempted
- Codex native failure does not fail owning context-engine compaction
- non-owning context engine preserves current native compaction behavior
Existing tests to update
extensions/codex/src/app-server/run-attempt.test.tsif present, otherwise nearest Codex app-server run tests.extensions/codex/src/app-server/event-projector.test.tsonly if compaction event details change.src/agents/harness/selection.test.tsshould not need changes unless config behavior changes; it should remain stable.- Built-in harness context-engine tests should continue to pass unchanged.
Integration / live tests
Add or extend live Codex harness smoke tests:
- configure
plugins.slots.contextEngineto a test engine - configure
agents.defaults.modelto acodex/*model - configure provider/model
agentRuntime.id = "codex" - assert test engine observed:
- bootstrap
- assemble
- afterTurn or ingest
- maintenance
Avoid requiring lossless-claw in OpenClaw core tests. Use a small in-repo fake context engine plugin.
Observability
Add debug logs around Codex context-engine lifecycle calls:
codex context engine bootstrap started/completed/failedcodex context engine assemble appliedcodex context engine finalize completed/failedcodex context engine maintenance skippedwith reasoncodex native compaction completed alongside context-engine compaction
Avoid logging full prompts or transcript contents.
Add structured fields where useful:
sessionIdsessionKeyredacted or omitted according to existing logging practiceengineIdthreadIdturnIdassembledMessageCountestimatedTokenshasSystemPromptAddition
Migration / compatibility
This should be backward-compatible:
- If no context engine is configured, legacy context engine behavior should be equivalent to today's Codex harness behavior.
- If context-engine
assemblefails, Codex should continue with the original prompt path. - Existing Codex thread bindings should remain valid.
- Dynamic tool fingerprinting should not include context-engine output; otherwise every context change could force a new Codex thread. Only the tool catalog should affect the dynamic tool fingerprint.
Open questions
-
Should assembled context be injected entirely into the user prompt, entirely into developer instructions, or split?
Recommendation: split. Put
systemPromptAdditionin developer instructions; put assembled transcript context in the user prompt wrapper. This best matches the current Codex protocol without mutating native thread history. -
Should Codex native compaction be disabled when a context engine owns compaction?
Recommendation: no, not initially. Codex native compaction may still be necessary to keep the app-server thread alive. But it must be reported as native Codex compaction, not as context-engine compaction.
-
Should
before_prompt_buildrun before or after context-engine assembly?Recommendation: after context-engine projection for Codex, so generic harness hooks see the actual prompt/developer instructions Codex will receive. If built-in harness parity requires the opposite, encode the chosen order in tests and document it here.
-
Can Codex app-server accept a future structured context/history override?
Unknown. If it can, replace the text projection layer with that protocol and keep the lifecycle calls unchanged.
Acceptance criteria
- A
codex/*embedded harness turn invokes the selected context engine's assemble lifecycle. - A context-engine
systemPromptAdditionaffects Codex developer instructions. - Assembled context affects the Codex turn input deterministically.
- Successful Codex turns call
afterTurnor ingest fallback. - Successful Codex turns run context-engine turn maintenance.
- Failed/aborted/yield-aborted turns do not run turn maintenance.
- Context-engine-owned compaction remains primary for OpenClaw/plugin state.
- Codex native compaction remains auditable as native Codex behavior.
- Existing built-in harness context-engine behavior is unchanged.
- Existing Codex harness behavior is unchanged when no non-legacy context engine is selected or when assembly fails.