Make harness failures fail honestly (#69981)

* Agents: fail honestly on harness errors

* Docs: clarify Codex harness fallback
This commit is contained in:
pashpashpash
2026-04-21 22:33:21 -07:00
committed by GitHub
parent a0ccf69259
commit dc4e97472d
10 changed files with 55 additions and 54 deletions

View File

@@ -8,7 +8,7 @@ on:
required: true
pull_request:
paths:
- '.github/workflows/**'
- ".github/workflows/**"
permissions:
contents: read

View File

@@ -33,6 +33,7 @@ Docs: https://docs.openclaw.ai
- Doctor/channels: merge configured-channel doctor hooks across read-only, loaded, setup, and runtime plugin discovery so partial adapters no longer hide runtime-only compatibility repair or allowlist warnings, preserve disabled-channel opt-outs, and ignore malformed hook values before they can mask valid fallbacks. (#69919) Thanks @gumadeiras.
- Models/CLI: show bundled provider-owned static catalog rows in `models list --all` before auth is configured, including Kimi K2.6 rows for Moonshot, OpenRouter, and Vercel AI Gateway, while keeping local-only and workspace plugin catalog paths isolated. (#69909) Thanks @shakkernerd.
- Configure: skip generic CLI startup bootstrap for `openclaw configure` and bound hint-only gateway probes so the onboarding TUI reaches its first prompt faster when the Gateway is unavailable. (#69984) Thanks @obviyus.
- Agents/harness: surface selected plugin harness failures directly instead of replaying the same turn through embedded PI, preventing misleading secondary PI auth errors and avoiding duplicate side effects.
## 2026.4.21

View File

@@ -1,4 +1,4 @@
1022a9497ff0481675c483742b8e92c6063e53c9bb3e5c5c3bd39300cf2e1f31 config-baseline.json
7956c319e82d288d496a51cb2ff4485ab72ef4900cb089f99e1df8b9ef3bfb73 config-baseline.core.json
7c1b8b34618f44d56817ff54b930701710087dc7e76beaf4a554b6a5a25ba87c config-baseline.json
ed0c093e8acab2364608be3e65b98836600aea07df73ebb51d11919969c6c8fe config-baseline.core.json
6c0069b971ae298ae68516ebcd3eae0e8c82820d2e8f42ecbd2f53a2f9077371 config-baseline.channel.json
a7f297a3461e807fd15f8a7c8c68e41071dfc09af2118c24a26d5f534301a654 config-baseline.plugin.json
e5b7756b5f45ba227aa1bfab990dcf8a2a8b409b9ca01ea8bb1d5cd7adc06c90 config-baseline.plugin.json

View File

@@ -1266,7 +1266,7 @@ Codex app-server harness.
```
- `runtime`: `"auto"`, `"pi"`, or a registered plugin harness id. The bundled Codex plugin registers `codex`.
- `fallback`: `"pi"` or `"none"`. `"pi"` keeps the built-in PI harness as the compatibility fallback. `"none"` makes missing or unsupported plugin harness selection fail instead of silently using PI.
- `fallback`: `"pi"` or `"none"`. `"pi"` keeps the built-in PI harness as the compatibility fallback when no plugin harness is selected. `"none"` makes missing or unsupported plugin harness selection fail instead of silently using PI. Selected plugin harness failures always surface directly.
- Environment overrides: `OPENCLAW_AGENT_RUNTIME=<id|auto|pi>` overrides `runtime`; `OPENCLAW_AGENT_HARNESS_FALLBACK=none` disables PI fallback for that process.
- For Codex-only deployments, set `model: "codex/gpt-5.4"`, `embeddedHarness.runtime: "codex"`, and `embeddedHarness.fallback: "none"`.
- This only controls the embedded chat harness. Media generation, vision, PDF, music, video, and TTS still use their provider/model settings.

View File

@@ -469,8 +469,12 @@ understanding continue to use the matching provider/model settings such as
**Codex does not appear in `/model`:** enable `plugins.entries.codex.enabled`,
set a `codex/*` model ref, or check whether `plugins.allow` excludes `codex`.
**OpenClaw falls back to PI:** set `embeddedHarness.fallback: "none"` or
`OPENCLAW_AGENT_HARNESS_FALLBACK=none` while testing.
**OpenClaw uses PI instead of Codex:** if no Codex harness claims the run,
OpenClaw may use PI as the compatibility backend. Set
`embeddedHarness.runtime: "codex"` to force Codex selection while testing, or
`embeddedHarness.fallback: "none"` to fail when no plugin harness matches. Once
Codex app-server is selected, its failures surface directly without extra
fallback config.
**The app-server is rejected:** upgrade Codex so the app-server handshake
reports version `0.118.0` or newer.

View File

@@ -94,10 +94,11 @@ OpenClaw chooses a harness after provider/model resolution:
4. If no registered harness matches, OpenClaw uses PI unless PI fallback is
disabled.
Forced plugin harness failures surface as run failures. In `auto` mode,
OpenClaw may fall back to PI when the selected plugin harness fails before a
turn has produced side effects. Set `OPENCLAW_AGENT_HARNESS_FALLBACK=none` or
`embeddedHarness.fallback: "none"` to make that fallback a hard failure instead.
Plugin harness failures surface as run failures. In `auto` mode, PI fallback is
only used when no registered plugin harness supports the resolved
provider/model. Once a plugin harness has claimed a run, OpenClaw does not
replay that same turn through PI because that can change auth/runtime semantics
or duplicate side effects.
The bundled Codex plugin registers `codex` as its harness id. Core treats that
as an ordinary plugin harness id; Codex-specific aliases belong in the plugin
@@ -149,19 +150,20 @@ When this mode runs, Codex owns the native thread id, resume behavior,
compaction, and app-server execution. OpenClaw still owns the chat channel,
visible transcript mirror, tool policy, approvals, media delivery, and session
selection. Use `embeddedHarness.runtime: "codex"` with
`embeddedHarness.fallback: "none"` when you need to prove that the Codex
app-server path is used and PI fallback is not hiding a broken native harness.
`embeddedHarness.fallback: "none"` when you need to prove that only the Codex
app-server path can claim the run. That config is only a selection guard:
Codex app-server failures already fail directly instead of retrying through PI.
## Disable PI fallback
By default, OpenClaw runs embedded agents with `agents.defaults.embeddedHarness`
set to `{ runtime: "auto", fallback: "pi" }`. In `auto` mode, registered plugin
harnesses can claim a provider/model pair. If none match, or if an auto-selected
plugin harness fails before producing output, OpenClaw falls back to PI.
harnesses can claim a provider/model pair. If none match, OpenClaw falls back
to PI.
Set `fallback: "none"` when you need to prove that a plugin harness is the only
runtime being exercised. This disables automatic PI fallback; it does not block
an explicit `runtime: "pi"` or `OPENCLAW_AGENT_RUNTIME=pi`.
Set `fallback: "none"` when you need missing plugin harness selection to fail
instead of using PI. Selected plugin harness failures already fail hard. This
does not block an explicit `runtime: "pi"` or `OPENCLAW_AGENT_RUNTIME=pi`.
For Codex-only embedded runs:

View File

@@ -105,9 +105,8 @@ describe("runAgentHarnessAttemptWithFallback", () => {
expect(piRunAttempt).toHaveBeenCalledTimes(1);
});
it("falls back to the PI harness in auto mode when the selected plugin harness fails", async () => {
it("falls back to the PI harness in auto mode when no plugin harness matches", async () => {
process.env.OPENCLAW_AGENT_RUNTIME = "auto";
registerFailingCodexHarness();
const result = await runAgentHarnessAttemptWithFallback(createAttemptParams());
@@ -115,6 +114,16 @@ describe("runAgentHarnessAttemptWithFallback", () => {
expect(piRunAttempt).toHaveBeenCalledTimes(1);
});
it("surfaces an auto-selected plugin harness failure instead of replaying through PI", async () => {
process.env.OPENCLAW_AGENT_RUNTIME = "auto";
registerFailingCodexHarness();
await expect(runAgentHarnessAttemptWithFallback(createAttemptParams())).rejects.toThrow(
"codex startup failed",
);
expect(piRunAttempt).not.toHaveBeenCalled();
});
it("surfaces a forced plugin harness failure instead of replaying through PI", async () => {
process.env.OPENCLAW_AGENT_RUNTIME = "codex";
registerFailingCodexHarness();
@@ -125,26 +134,15 @@ describe("runAgentHarnessAttemptWithFallback", () => {
expect(piRunAttempt).not.toHaveBeenCalled();
});
it("disables PI retry fallback when auto-selected harness fails and fallback is none", async () => {
process.env.OPENCLAW_AGENT_RUNTIME = "auto";
registerFailingCodexHarness();
await expect(
runAgentHarnessAttemptWithFallback(
createAttemptParams({ agents: { defaults: { embeddedHarness: { fallback: "none" } } } }),
),
).rejects.toThrow("codex startup failed");
expect(piRunAttempt).not.toHaveBeenCalled();
});
it("honors env fallback override over config fallback", async () => {
process.env.OPENCLAW_AGENT_RUNTIME = "auto";
process.env.OPENCLAW_AGENT_HARNESS_FALLBACK = "none";
registerFailingCodexHarness();
await expect(runAgentHarnessAttemptWithFallback(createAttemptParams())).rejects.toThrow(
"codex startup failed",
);
await expect(
runAgentHarnessAttemptWithFallback(
createAttemptParams({ agents: { defaults: { embeddedHarness: { fallback: "pi" } } } }),
),
).rejects.toThrow("PI fallback is disabled");
expect(piRunAttempt).not.toHaveBeenCalled();
});
});

View File

@@ -1,5 +1,6 @@
import type { AgentEmbeddedHarnessConfig } from "../../config/types.agents-shared.js";
import type { OpenClawConfig } from "../../config/types.openclaw.js";
import { formatErrorMessage } from "../../infra/errors.js";
import { createSubsystemLogger } from "../../logging/subsystem.js";
import { normalizeAgentId } from "../../routing/session-key.js";
import { listAgentEntries, resolveSessionAgentIds } from "../agent-scope.js";
@@ -108,13 +109,6 @@ export function selectAgentHarness(params: {
export async function runAgentHarnessAttemptWithFallback(
params: EmbeddedRunAttemptParams,
): Promise<EmbeddedRunAttemptResult> {
const policy = resolveAgentHarnessPolicy({
provider: params.provider,
modelId: params.modelId,
config: params.config,
agentId: params.agentId,
sessionKey: params.sessionKey,
});
const harness = selectAgentHarness({
provider: params.provider,
modelId: params.modelId,
@@ -129,11 +123,13 @@ export async function runAgentHarnessAttemptWithFallback(
try {
return await harness.runAttempt(params);
} catch (error) {
if (policy.runtime !== "auto" || policy.fallback === "none") {
throw error;
}
log.warn(`${harness.label} failed; falling back to embedded PI backend`, { error });
return createPiAgentHarness().runAttempt(params);
log.warn(`${harness.label} failed; not falling back to embedded PI backend`, {
harnessId: harness.id,
provider: params.provider,
modelId: params.modelId,
error: formatErrorMessage(error),
});
throw error;
}
}

View File

@@ -3023,7 +3023,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
enum: ["pi", "none"],
title: "Default Embedded Harness Fallback",
description:
"Embedded harness fallback when no plugin harness matches or an auto-selected plugin harness fails before side effects. Set none to disable automatic PI fallback.",
"Embedded harness fallback when no plugin harness matches. Selected plugin harness failures surface directly. Set none to disable automatic PI fallback.",
},
},
additionalProperties: false,
@@ -5721,7 +5721,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
additionalProperties: false,
title: "Agent Embedded Harness",
description:
"Per-agent embedded harness policy override. Use fallback=none to make this agent fail instead of falling back to PI.",
"Per-agent embedded harness policy override. Use fallback=none to make missing plugin harness selection fail instead of falling back to PI.",
},
model: {
anyOf: [
@@ -23416,7 +23416,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
},
"agents.defaults.embeddedHarness.fallback": {
label: "Default Embedded Harness Fallback",
help: "Embedded harness fallback when no plugin harness matches or an auto-selected plugin harness fails before side effects. Set none to disable automatic PI fallback.",
help: "Embedded harness fallback when no plugin harness matches. Selected plugin harness failures surface directly. Set none to disable automatic PI fallback.",
tags: ["reliability"],
},
"agents.list": {
@@ -23461,7 +23461,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA: BaseConfigSchemaResponse = {
},
"agents.list.*.embeddedHarness": {
label: "Agent Embedded Harness",
help: "Per-agent embedded harness policy override. Use fallback=none to make this agent fail instead of falling back to PI.",
help: "Per-agent embedded harness policy override. Use fallback=none to make missing plugin harness selection fail instead of falling back to PI.",
tags: ["advanced"],
},
"agents.list.*.embeddedHarness.runtime": {

View File

@@ -1145,9 +1145,9 @@ export const FIELD_HELP: Record<string, string> = {
"agents.defaults.embeddedHarness.runtime":
"Embedded harness runtime: auto, pi, or a registered plugin harness id such as codex.",
"agents.defaults.embeddedHarness.fallback":
"Embedded harness fallback when no plugin harness matches or an auto-selected plugin harness fails before side effects. Set none to disable automatic PI fallback.",
"Embedded harness fallback when no plugin harness matches. Selected plugin harness failures surface directly. Set none to disable automatic PI fallback.",
"agents.list.*.embeddedHarness":
"Per-agent embedded harness policy override. Use fallback=none to make this agent fail instead of falling back to PI.",
"Per-agent embedded harness policy override. Use fallback=none to make missing plugin harness selection fail instead of falling back to PI.",
"agents.list.*.embeddedHarness.runtime":
"Per-agent embedded harness runtime: auto, pi, or a registered plugin harness id such as codex.",
"agents.list.*.embeddedHarness.fallback":