mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 11:50:43 +00:00
docs: typography hygiene + 2 in-body H1 removals across 5 pages
Replaced 112 typography characters (curly quotes, apostrophes, em/en dashes, non-breaking hyphens) with ASCII equivalents per docs/CLAUDE.md heading and content hygiene rules. - docs/help/gpt55-codex-agentic-parity.md: 22 chars; removed the duplicate '# GPT-5.5 / Codex Agentic Parity in OpenClaw' H1 (Mintlify renders the title from frontmatter; the in-body H1 with the slash produced a brittle anchor). - docs/platforms/mac/menu-bar.md: 21 chars; removed the duplicate '# Menu Bar Status Logic' H1. - docs/tools/acp-agents.md: 23 chars - docs/concepts/qa-matrix.md: 23 chars - docs/concepts/qa-e2e-automation.md: 23 chars
This commit is contained in:
@@ -7,8 +7,6 @@ read_when:
|
||||
- Reviewing the strict-agentic, tool-schema, elevation, and replay fixes
|
||||
---
|
||||
|
||||
# GPT-5.5 / Codex Agentic Parity in OpenClaw
|
||||
|
||||
OpenClaw already worked well with tool-using frontier models, but GPT-5.5 and Codex-style models were still underperforming in a few practical ways:
|
||||
|
||||
- they could stop after planning instead of doing the work
|
||||
@@ -25,11 +23,11 @@ This parity program fixes those gaps in four reviewable slices.
|
||||
|
||||
This slice adds an opt-in `strict-agentic` execution contract for embedded Pi GPT-5 runs.
|
||||
|
||||
When enabled, OpenClaw stops accepting plan-only turns as “good enough” completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task.
|
||||
When enabled, OpenClaw stops accepting plan-only turns as "good enough" completion. If the model only says what it intends to do and does not actually use tools or make progress, OpenClaw retries with an act-now steer and then fails closed with an explicit blocked state instead of silently ending the task.
|
||||
|
||||
This improves the GPT-5.5 experience most on:
|
||||
|
||||
- short “ok do it” follow-ups
|
||||
- short "ok do it" follow-ups
|
||||
- code tasks where the first step is obvious
|
||||
- flows where `update_plan` should be progress tracking rather than filler text
|
||||
|
||||
@@ -86,21 +84,21 @@ The goal is not to make GPT-5.5 imitate Opus. The goal is to give GPT-5.5 a runt
|
||||
|
||||
That changes the user experience from:
|
||||
|
||||
- “the model had a good plan but stopped”
|
||||
- "the model had a good plan but stopped"
|
||||
|
||||
to:
|
||||
|
||||
- “the model either acted, or OpenClaw surfaced the exact reason it could not”
|
||||
- "the model either acted, or OpenClaw surfaced the exact reason it could not"
|
||||
|
||||
## Before vs after for GPT-5.5 users
|
||||
|
||||
| Before this program | After PR A-D |
|
||||
| ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns “plan only” into “act now or surface a blocked state” |
|
||||
| GPT-5.5 could stop after a reasonable plan without taking the next tool step | PR A turns "plan only" into "act now or surface a blocked state" |
|
||||
| Strict tool schemas could reject parameter-free or OpenAI/Codex-shaped tools in confusing ways | PR C makes provider-owned tool registration and invocation more predictable |
|
||||
| `/elevated full` guidance could be vague or wrong in blocked runtimes | PR B gives GPT-5.5 and the user truthful runtime and permission hints |
|
||||
| Replay or compaction failures could feel like the task silently disappeared | PR C surfaces paused, blocked, abandoned, and replay-invalid outcomes explicitly |
|
||||
| “GPT-5.5 feels worse than Opus” was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate |
|
||||
| "GPT-5.5 feels worse than Opus" was mostly anecdotal | PR D turns that into the same scenario pack, the same metrics, and a hard pass/fail gate |
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -142,7 +140,7 @@ The first-wave parity pack currently covers five scenarios:
|
||||
|
||||
### `approval-turn-tool-followthrough`
|
||||
|
||||
Checks that the model does not stop at “I’ll do that” after a short approval. It should take the first concrete action in the same turn.
|
||||
Checks that the model does not stop at "I'll do that" after a short approval. It should take the first concrete action in the same turn.
|
||||
|
||||
### `model-switch-tool-continuity`
|
||||
|
||||
@@ -210,8 +208,8 @@ Use the verdict in `qa-agentic-parity-summary.json` as the final machine-readabl
|
||||
|
||||
- `pass` means GPT-5.5 covered the same scenarios as Opus 4.6 and did not regress on the agreed aggregate metrics.
|
||||
- `fail` means at least one hard gate tripped: weaker completion, worse unintended stops, weaker valid tool use, any fake-success case, or mismatched scenario coverage.
|
||||
- “shared/base CI issue” is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs.
|
||||
- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR B’s deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage.
|
||||
- "shared/base CI issue" is not itself a parity result. If CI noise outside PR D blocks a run, the verdict should wait for a clean merged-runtime execution instead of being inferred from branch-era logs.
|
||||
- Auth, proxy, DNS, and `/elevated full` truthfulness still come from PR B's deterministic suites, so the final release claim needs both: a passing PR D parity verdict and green PR B truthfulness coverage.
|
||||
|
||||
## Who should enable `strict-agentic`
|
||||
|
||||
@@ -219,7 +217,7 @@ Use `strict-agentic` when:
|
||||
|
||||
- the agent is expected to act immediately when a next step is obvious
|
||||
- GPT-5.5 or Codex-family models are the primary runtime
|
||||
- you prefer explicit blocked states over “helpful” recap-only replies
|
||||
- you prefer explicit blocked states over "helpful" recap-only replies
|
||||
|
||||
Keep the default contract when:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user