openclaw/docs/plugins/sdk-agent-harness.md at 3de5979bdc8e4e9e9d3fee446eaab53cad2ff605

mirror of https://github.com/openclaw/openclaw.git synced 2026-05-25 04:23:03 +00:00

Files

Peter Steinberger f91de52f0d refactor: move runtime state to SQLite

* refactor: remove stale file-backed shims

* fix: harden sqlite state ci boundaries

* refactor: store matrix idb snapshots in sqlite

* fix: satisfy rebased CI guardrails

* refactor: store current conversation bindings in sqlite table

* refactor: store tui last sessions in sqlite table

* refactor: reset sqlite schema history

* refactor: drop unshipped sqlite table migration

* refactor: remove plugin index file rollback

* refactor: drop unshipped sqlite sidecar migrations

* refactor: remove runtime commitments kv migration

* refactor: preserve kysely sync result types

* refactor: drop unshipped sqlite schema migration table

* test: keep session usage coverage sqlite-backed

* refactor: keep sqlite migration doctor-only

* refactor: isolate device legacy imports

* refactor: isolate push voicewake legacy imports

* refactor: isolate remaining runtime legacy imports

* refactor: tighten sqlite migration guardrails

* test: cover sqlite persisted enum parsing

* refactor: isolate legacy update and tui imports

* refactor: tighten sqlite state ownership

* refactor: move legacy imports behind doctor

* refactor: remove legacy session row lookup

* refactor: canonicalize memory transcript locators

* refactor: drop transcript path scope fallbacks

* refactor: drop runtime legacy session delivery pruning

* refactor: store tts prefs only in sqlite

* refactor: remove cron store path runtime

* refactor: use cron sqlite store keys

* refactor: rename telegram message cache scope

* refactor: read memory dreaming status from sqlite

* refactor: rename cron status store key

* refactor: stop remembering transcript file paths

* test: use sqlite locators in agent fixtures

* refactor: remove file-shaped commitments and cron store surfaces

* refactor: keep compaction transcript handles out of session rows

* refactor: derive transcript handles from session identity

* refactor: derive runtime transcript handles

* refactor: remove gateway session locator reads

* refactor: remove transcript locator from session rows

* refactor: store raw stream diagnostics in sqlite

* refactor: remove file-shaped transcript rotation

* refactor: hide legacy trajectory paths from runtime

* refactor: remove runtime transcript file bridges

* refactor: repair database-first rebase fallout

* refactor: align tests with database-first state

* refactor: remove transcript file handoffs

* refactor: sync post-compaction memory by transcript scope

* refactor: run codex app-server sessions by id

* refactor: bind codex runtime state by session id

* refactor: pass memory transcripts by sqlite scope

* refactor: remove transcript locator cleanup leftovers

* test: remove stale transcript file fixtures

* refactor: remove transcript locator test helper

* test: make cron sqlite keys explicit

* test: remove cron runtime store paths

* test: remove stale session file fixtures

* test: use sqlite cron keys in diagnostics

* refactor: remove runtime delivery queue backfill

* test: drop fake export session file mocks

* refactor: rename acp session read failure flag

* refactor: rename acp row session key

* refactor: remove session store test seams

* refactor: move legacy session parser tests to doctor

* refactor: reindex managed memory in place

* refactor: drop stale session store wording

* refactor: rename session row helpers

* refactor: rename sqlite session entry modules

* refactor: remove transcript locator leftovers

* refactor: trim file-era audit wording

* refactor: clean managed media through sqlite

* fix: prefer explicit agent for exports

* fix: use prepared agent for session resets

* fix: canonicalize legacy codex binding import

* test: rename state cleanup helper

* docs: align backup docs with sqlite state

* refactor: drop legacy Pi usage auth fallback

* refactor: move legacy auth profile imports to doctor

* refactor: keep Pi model discovery auth in memory

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

* refactor: remove model json compatibility aliases

* refactor: store auth profiles in sqlite

* refactor: seed copied auth profiles in sqlite

* refactor: make auth profile runtime sqlite-addressed

* refactor: migrate hermes secrets into sqlite auth store

* refactor: move plugin install config migration to doctor

* refactor: rename plugin index audit checks

* test: drop auth file assumptions

* test: remove legacy transcript file assertions

* refactor: drop legacy cli session aliases

* refactor: store skill uploads in sqlite

* refactor: keep subagent attachments in sqlite vfs

* refactor: drop subagent attachment cleanup state

* refactor: move legacy session aliases to doctor

* refactor: require node 24 for sqlite state runtime

* refactor: move provider caches into sqlite state

* fix: harden virtual agent filesystem

* refactor: enforce database-first runtime state

* refactor: rename compaction transcript rotation setting

* test: clean sqlite refactor test types

* refactor: consolidate sqlite runtime state

* refactor: model session conversations in sqlite

* refactor: stop deriving cron delivery from session keys

* refactor: stop classifying sessions from key shape

* refactor: hydrate announce targets from typed delivery

* refactor: route heartbeat delivery from typed sqlite context

* refactor: tighten typed sqlite session routing

* refactor: remove session origin routing shadow

* refactor: drop session origin shadow fixtures

* perf: query sqlite vfs paths by prefix

* refactor: use typed conversation metadata for sessions

* refactor: prefer typed session routing metadata

* refactor: require typed session routing metadata

* refactor: resolve group tool policy from typed sessions

* refactor: delete dead session thread info bridge

* Show Codex subscription reset times in channel errors (#80456)

* feat(plugin-sdk): consolidate session workflow APIs

* fix(agents): allow read-only agent mount reads

* [codex] refresh plugin regression fixtures

* fix(agents): restore compaction gateway logs

* test: tighten gateway startup assertions

* Redact persisted secret-shaped payloads [AI] (#79006)

* test: tighten device pair notify assertions

* test: tighten hermes secret assertions

* test: assert matrix client error shapes

* test: assert config compat warnings

* fix(heartbeat): remap cron-run exec events to session keys (#80214)

* fix(codex): route btw through native side threads

* fix(auth): accept friendly OpenAI order for Codex profiles

* fix(codex): rotate auth profiles inside harness

* fix: keep browser status page probe within timeout

* test: assert agents add outputs

* test: pin cron read status

* fix(agents): avoid Pi resource discovery stalls

Co-authored-by: dataCenter430 <titan032000@gmail.com>

* fix: retire timed-out codex app-server clients

* test: tighten qa lab runtime assertions

* test: check security fix outputs

* test: verify extension runtime messages

* feat(wake): expose typed sessionKey on wake protocol + system event CLI

* fix(gateway): await session_end during shutdown drain and track channel + compaction lifecycle paths (#57790)

* test: guard talk consult call helper

* fix(codex): scale context engine projection (#80761)

* fix(codex): scale context engine projection

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* fix: document Codex context projection scaling

* chore: align Codex projection changelog

* chore: realign Codex projection changelog

* fix: isolate Codex projection patch

---------

Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>

* refactor: move agent runtime state toward piless

* refactor: remove cron session reaper

* refactor: move session management to sqlite

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: remove stale file-backed shims

* test: harden kysely type coverage

# Conflicts:
#	.agents/skills/kysely-database-access/SKILL.md
#	src/infra/kysely-sync.types.test.ts
#	src/proxy-capture/store.sqlite.test.ts
#	src/state/openclaw-agent-db.test.ts
#	src/state/openclaw-state-db.test.ts

* refactor: remove cron store path runtime

* refactor: keep compaction transcript handles out of session rows

* refactor: derive embedded transcripts from sqlite identity

* refactor: remove embedded transcript locator handoff

* refactor: remove runtime transcript file bridges

* refactor: remove transcript file handoffs

* refactor: remove MSTeams legacy learning key fallback

* refactor: store model catalog config in sqlite

* refactor: use sqlite model catalog at runtime

# Conflicts:
#	docs/cli/secrets.md
#	docs/gateway/authentication.md
#	docs/gateway/secrets.md

* fix: keep oauth sibling sync sqlite-local

# Conflicts:
#	src/commands/onboard-auth.test.ts

* refactor: remove task session store maintenance

# Conflicts:
#	src/commands/tasks.ts

* refactor: keep diagnostics in state sqlite

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* Show Codex subscription reset times in channel errors (#80456)

* fix(codex): refresh subscription limit resets

* fix(codex): format reset times for channels

* Update CHANGELOG with latest changes and fixes

Updated CHANGELOG with recent fixes and improvements.

* fix(codex): keep command load failures on codex surface

* fix(codex): format account rate limits as rows

* fix(codex): summarize account limits as usage status

* fix(codex): simplify account limit status

* test: tighten subagent announce queue assertion

* test: tighten session delete lifecycle assertions

* test: tighten cron ops assertions

* fix: track cron execution milestones

* test: tighten hermes secret assertions

* test: assert matrix sync store payloads

* test: assert config compat warnings

* fix(codex): align btw side thread semantics

* fix(codex): honor codex fallback blocking

* fix(agents): avoid Pi resource discovery stalls

* test: tighten codex event assertions

* test: tighten cron assertions

* Fix Codex app-server OAuth harness auth

* refactor: move agent runtime state toward piless

* refactor: move device and push state to sqlite

* refactor: move runtime json state imports to doctor

* refactor: finish database-first state migration

* chore: refresh generated sqlite db types

* refactor: clarify cron sqlite store keys

* refactor: remove stale file-backed shims

* refactor: bind codex runtime state by session id

* test: expect sqlite trajectory branch export

* refactor: rename session row helpers

* fix: keep legacy device identity import in doctor

* refactor: enforce database-first runtime state

* refactor: consolidate sqlite runtime state

* build: align pi contract wrappers

* chore: repair database-first rebase

* refactor: remove session file test contracts

* test: update gateway session expectations

* refactor: stop routing from session compatibility shadows

* refactor: stop persisting session route shadows

* refactor: use typed delivery context in clients

* refactor: stop echoing session route shadows

* refactor: repair embedded runner rebase imports

# Conflicts:
#	src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts

* refactor: align pi contract imports

* refactor: satisfy kysely sync helper guard

* refactor: remove file transcript bridge remnants

* refactor: remove session locator compatibility

* refactor: remove session file test contracts

* refactor: keep rebase database-first clean

* refactor: remove session file assumptions from e2e

* docs: clarify database-first goal state

* test: remove legacy store markers from sqlite runtime tests

* refactor: remove legacy store assumptions from runtime seams

* refactor: align sqlite runtime helper seams

* test: update memory recall sqlite audit mock

* refactor: align database-first runtime type seams

* test: clarify doctor cron legacy store names

* fix: preserve sqlite session route projections

* test: fix copilot token cache test syntax

* docs: update database-first proof status

* test: align database-first test fixtures

* docs: update database-first proof status

* refactor: clean extension database-first drift

* test: align agent session route proof

* test: clarify doctor legacy path fixtures

* chore: clean database-first changed checks

* chore: repair database-first rebase markers

* build: allow baileys git subdependency

* chore: repair exp-vfs rebase drift

* chore: finish exp-vfs rebase cleanup

* chore: satisfy rebase lint drift

* chore: fix qqbot rebase type seam

* chore: fix rebase drift leftovers

* fix: keep auth profile oauth secrets out of sqlite

* fix: repair rebase drift tests

* test: stabilize pairing request ordering

* test: use source manifests in plugin contract checks

* fix: restore gateway session metadata after rebase

* fix: repair database-first rebase drift

* fix: clean up database-first rebase fallout

* test: stabilize line quick reply receipt time

* fix: repair extension rebase drift

* test: keep transcript redaction tests sqlite-backed

* fix: carry injected transcript redaction through sqlite

* chore: clean database branch rebase residue

* fix: repair database branch CI drift

* fix: repair database branch CI guard drift

* fix: stabilize oauth tls preflight test

* test: align database branch fast guards

* test: repair build artifact boundary guards

* chore: clean changelog rebase markers

---------

Co-authored-by: pashpashpash <nik@vault77.ai>
Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: stainlu <stainlu@newtype-ai.org>
Co-authored-by: Jason Zhou <jason.zhou.design@gmail.com>
Co-authored-by: Ruben Cuevas <hi@rubencu.com>
Co-authored-by: Pavan Kumar Gondhi <pavangondhi@gmail.com>
Co-authored-by: Shakker <shakkerdroid@gmail.com>
Co-authored-by: Kaspre <36520309+Kaspre@users.noreply.github.com>
Co-authored-by: dataCenter430 <titan032000@gmail.com>
Co-authored-by: Kaspre <kaspre@gmail.com>
Co-authored-by: pandadev66 <nova.full.stack@outlook.com>
Co-authored-by: Eva <admin@100yen.org>
Co-authored-by: Eva (agent) <eva+agent-78055@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Co-authored-by: jeffjhunter <support@aipersonamethod.com>

2026-05-13 13:15:12 +01:00

12 KiB

Raw Blame History

summary, title, sidebarTitle, read_when

summary

title

sidebarTitle

read_when

Experimental SDK surface for plugins that replace the low level embedded agent executor

Agent harness plugins

Agent Harness

You are changing the embedded agent runtime or harness registry

You are registering an agent harness from a bundled or trusted plugin

You need to understand how the Codex plugin relates to model providers

An agent harness is the low level executor for one prepared OpenClaw agent turn. It is not a model provider, not a channel, and not a tool registry. For the user-facing mental model, see Agent runtimes.

Use this surface only for bundled or trusted native plugins. The contract is still experimental because the parameter types intentionally mirror the current embedded runner.

When to use a harness

Register an agent harness when a model family has its own native session runtime and the normal OpenClaw provider transport is the wrong abstraction.

Examples:

a native coding-agent server that owns threads and compaction
a local CLI or daemon that must stream native plan/reasoning/tool events
a model runtime that needs its own resume id in addition to the OpenClaw session transcript

Do not register a harness just to add a new LLM API. For normal HTTP or WebSocket model APIs, build a provider plugin.

What core still owns

Before a harness is selected, OpenClaw has already resolved:

provider and model
runtime auth state
thinking level and context budget
the OpenClaw session scope and SQLite transcript rows
workspace, sandbox, and tool policy
channel reply callbacks and streaming callbacks
model fallback and live model switching policy

That split is intentional. A harness runs a prepared attempt; it does not pick providers, replace channel delivery, or silently switch models.

The prepared attempt also includes params.runtimePlan, an OpenClaw-owned policy bundle for runtime decisions that must stay shared across PI and native harnesses:

runtimePlan.tools.normalize(...) and runtimePlan.tools.logDiagnostics(...) for provider-aware tool schema policy
runtimePlan.transcript.resolvePolicy(...) for transcript sanitization and tool-call repair policy
runtimePlan.delivery.isSilentPayload(...) for shared NO_REPLY and media delivery suppression
runtimePlan.outcome.classifyRunResult(...) for model fallback classification
runtimePlan.observability for resolved provider/model/harness metadata

Harnesses may use the plan for decisions that need to match PI behavior, but should still treat it as host-owned attempt state. Do not mutate it or use it to switch providers/models inside a turn.

Register a harness

Import: openclaw/plugin-sdk/agent-harness

import type { AgentHarness } from "openclaw/plugin-sdk/agent-harness";
import { definePluginEntry } from "openclaw/plugin-sdk/plugin-entry";

const myHarness: AgentHarness = {
  id: "my-harness",
  label: "My native agent harness",

  supports(ctx) {
    return ctx.provider === "my-provider"
      ? { supported: true, priority: 100 }
      : { supported: false };
  },

  async runAttempt(params) {
    // Start or resume your native thread.
    // Use params.prompt, params.tools, params.images, params.onPartialReply,
    // params.onAgentEvent, and the other prepared attempt fields.
    return await runMyNativeTurn(params);
  },
};

export default definePluginEntry({
  id: "my-native-agent",
  name: "My Native Agent",
  description: "Runs selected models through a native agent daemon.",
  register(api) {
    api.registerAgentHarness(myHarness);
  },
});

Selection policy

OpenClaw chooses a harness after provider/model resolution:

Model-scoped runtime policy wins.
Provider-scoped runtime policy comes next.
auto asks registered harnesses if they support the resolved provider/model.
If no registered harness matches, OpenClaw uses PI unless PI fallback is disabled.

Plugin harness failures surface as run failures. In auto mode, PI fallback is only used when no registered plugin harness supports the resolved provider/model. Once a plugin harness has claimed a run, OpenClaw does not replay that same turn through PI because that can change auth/runtime semantics or duplicate side effects.

Whole-session and whole-agent runtime pins are ignored by selection. That includes stale session agentHarnessId values, agents.defaults.agentRuntime, agents.list[].agentRuntime, and OPENCLAW_AGENT_RUNTIME. /status shows the effective runtime selected from the provider/model route. If the selected harness is surprising, enable agents/harness debug logging and inspect the gateway's structured agent harness selected record. It includes the selected harness id, selection reason, runtime/fallback policy, and, in auto mode, each plugin candidate's support result.

The bundled Codex plugin registers codex as its harness id. Core treats that as an ordinary plugin harness id; Codex-specific aliases belong in the plugin or operator config, not in the shared runtime selector.

Provider plus harness pairing

Most harnesses should also register a provider. The provider makes model refs, auth status, model metadata, and /model selection visible to the rest of OpenClaw. The harness then claims that provider in supports(...).

The bundled Codex plugin follows this pattern:

preferred user model refs: openai/gpt-5.5
compatibility refs: legacy codex/gpt-* refs remain accepted, but new configs should not use them as normal provider/model refs
harness id: codex
auth: synthetic provider availability, because the Codex harness owns the native Codex login/session
app-server request: OpenClaw sends the bare model id to Codex and lets the harness talk to the native app-server protocol

The Codex plugin is additive. Plain openai/gpt-* agent refs on the official OpenAI provider select the Codex harness by default. Older codex/gpt-* refs still select the Codex provider and harness for compatibility.

For operator setup, model prefix examples, and Codex-only configs, see Codex Harness.

OpenClaw requires Codex app-server 0.125.0 or newer. The Codex plugin checks the app-server initialize handshake and blocks older or unversioned servers so OpenClaw only runs against the protocol surface it has been tested with. The 0.125.0 floor includes the native MCP hook payload support that landed in Codex 0.124.0, while pinning OpenClaw to the newer tested stable line.

Tool-result middleware

Bundled plugins can attach runtime-neutral tool-result middleware through api.registerAgentToolResultMiddleware(...) when their manifest declares the targeted runtime ids in contracts.agentToolResultMiddleware. This trusted seam is for async tool-result transforms that must run before PI or Codex feeds tool output back into the model.

Legacy bundled plugins can still use api.registerCodexAppServerExtensionFactory(...) for Codex app-server-only middleware, but new result transforms should use the runtime-neutral API. The Pi-only api.registerEmbeddedExtensionFactory(...) hook has been removed; Pi tool-result transforms must use runtime-neutral middleware.

Terminal outcome classification

Native harnesses that own their own protocol projection can use classifyAgentHarnessTerminalOutcome(...) from openclaw/plugin-sdk/agent-harness-runtime when a completed turn produced no visible assistant text. The helper returns empty, reasoning-only, or planning-only so OpenClaw's fallback policy can decide whether to retry on a different model. It intentionally leaves prompt errors, in-flight turns, and intentional silent replies such as NO_REPLY unclassified.

Native Codex harness mode

The bundled codex harness is the native Codex mode for embedded OpenClaw agent turns. Enable the bundled codex plugin first, and include codex in plugins.allow if your config uses a restrictive allowlist. Native app-server configs should use openai/gpt-*; OpenAI agent turns select the Codex harness by default. Legacy openai-codex/* routes should be repaired with openclaw doctor --fix, and legacy codex/* model refs remain compatibility aliases for the native harness.

When this mode runs, Codex owns the native thread id, resume behavior, compaction, and app-server execution. OpenClaw still owns the chat channel, visible transcript mirror, tool policy, approvals, media delivery, and session selection. Use provider/model agentRuntime.id: "codex" when you need to prove that only the Codex app-server path can claim the run. Explicit plugin runtimes fail closed; Codex app-server selection failures and runtime failures are not retried through PI.

Runtime strictness

By default, OpenClaw uses auto provider/model runtime policy: registered plugin harnesses can claim a provider/model pair, and PI handles the turn when none match. OpenAI agent refs on the official OpenAI provider default to Codex. Use an explicit provider/model plugin runtime such as agentRuntime.id: "codex" when missing harness selection should fail instead of routing through PI. Selected plugin harness failures always fail hard. This does not block an explicit provider/model agentRuntime.id: "pi".

For Codex-only embedded runs:

{
  "models": {
    "providers": {
      "openai": {
        "agentRuntime": {
          "id": "codex"
        }
      }
    }
  },
  "agents": {
    "defaults": {
      "model": "openai/gpt-5.5"
    }
  }
}

If you want a CLI backend for one canonical model, put the runtime on that model entry:

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-opus-4-7",
      "models": {
        "anthropic/claude-opus-4-7": {
          "agentRuntime": {
            "id": "claude-cli"
          }
        }
      }
    }
  }
}

Per-agent overrides use the same model-scoped shape:

{
  "agents": {
    "list": [
      {
        "id": "codex-only",
        "model": "openai/gpt-5.5",
        "models": {
          "openai/gpt-5.5": {
            "agentRuntime": { "id": "codex" }
          }
        }
      }
    ]
  }
}

Legacy whole-agent runtime examples like this are ignored:

{
  "agents": {
    "defaults": {
      "agentRuntime": {
        "id": "codex"
      }
    }
  }
}

With an explicit plugin runtime, a session fails early when the requested harness is not registered, does not support the resolved provider/model, or fails before producing turn side effects. That is intentional for Codex-only deployments and for live tests that must prove the Codex app-server path is actually in use.

This setting only controls the embedded agent harness. It does not disable image, video, music, TTS, PDF, or other provider-specific model routing.

Native sessions and transcript mirror

A harness may keep a native session id, thread id, or daemon-side resume token. Keep that binding explicitly associated with the OpenClaw session, and keep mirroring user-visible assistant/tool output into the OpenClaw transcript.

The OpenClaw transcript remains the compatibility layer for:

channel-visible session history
transcript search and indexing
switching back to the built-in PI harness on a later turn
generic /new, /reset, and session deletion behavior

If your harness stores a sidecar binding, implement reset(...) so OpenClaw can clear it when the owning OpenClaw session is reset.

Tool and media results

Core constructs the OpenClaw tool list and passes it into the prepared attempt. When a harness executes a dynamic tool call, return the tool result back through the harness result shape instead of sending channel media yourself.

This keeps text, image, video, music, TTS, approval, and messaging-tool outputs on the same delivery path as PI-backed runs.

Current limitations

The public import path is generic, but some attempt/result type aliases still carry Pi names for compatibility.
Third-party harness installation is experimental. Prefer provider plugins until you need a native session runtime.
Harness switching is supported across turns. Do not switch harnesses in the middle of a turn after native tools, approvals, assistant text, or message sends have started.

12 KiB Raw Blame History