docs: clarify parity verdict interpretation

2026-05-06 18:40:44 +00:00 · 2026-04-11 03:30:59 +07:00
parent db09edacfc
commit c73d005c7a
2 changed files with 29 additions and 0 deletions
--- a/docs/help/gpt54-codex-agentic-parity-maintainers.md
+++ b/docs/help/gpt54-codex-agentic-parity-maintainers.md
@@ -141,3 +141,13 @@ The parity harness is not the only evidence source. Keep this split explicit in

 - PR D owns the scenario-based GPT-5.4 vs Opus 4.6 comparison
 - PR B deterministic suites still own auth/proxy/DNS and full-access truthfulness evidence
+
+## Reviewer shorthand: before vs after
+
+| User-visible problem before                                 | Review signal after                                                                     |
+| ----------------------------------------------------------- | --------------------------------------------------------------------------------------- |
+| GPT-5.4 stopped after planning                              | PR A shows act-or-block behavior instead of commentary-only completion                  |
+| Tool use felt brittle with strict OpenAI/Codex schemas      | PR C keeps tool registration and parameter-free invocation predictable                  |
+| `/elevated full` hints were sometimes misleading            | PR B ties guidance to actual runtime capability and blocked reasons                     |
+| Long tasks could disappear into replay/compaction ambiguity | PR C emits explicit paused, blocked, abandoned, and replay-invalid state                |
+| Parity claims were anecdotal                                | PR D produces a report plus JSON verdict with the same scenario coverage on both models |