feat: update autoreview skill

2026-05-28 03:53:54 +00:00 · 2026-05-22 09:45:00 +01:00
parent 25e489395a
commit 88ad5cb2f4
4 changed files with 644 additions and 694 deletions
--- a/.agents/skills/autoreview/SKILL.md
+++ b/.agents/skills/autoreview/SKILL.md
@@ -1,16 +1,16 @@
 ---
 name: autoreview
-description: "Autoreview closeout: local dirty changes, PR branch vs main, parallel tests."
+description: "Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer."
 ---

-# Autoreview
+# Auto Review

-Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing.
+Run the bundled structured review helper as a closeout check. This is code review, not Guardian `auto_review` approval routing.

-Codex native review mode performs best and is recommended. Non-Codex reviewers are fallback/second-opinion paths that receive a generated diff prompt, not the full Codex review-mode runtime.
+Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.

 Use when:
- user asks for Codex review / autoreview / second-model review
+- user asks for Codex review / Claude review / autoreview / second-model review
 - after non-trivial code edits, before final/commit/ship
 - reviewing a local branch or PR branch after fixes

@@ -21,60 +21,63 @@ Use when:
 - Read dependency docs/source/types when the finding depends on external behavior.
 - Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
 - Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
- Keep going until the selected review path returns no accepted/actionable findings.
- If a review-triggered fix changes code, rerun focused tests and rerun the review helper.
- Default to Codex review with no fallback. Prefer Codex for final closeout because it uses native review mode; non-Codex reviewers use a Codex-inspired generated diff prompt. Use `--fallback-reviewer auto|claude|pi|opencode|droid|copilot` only when a second-model fallback is explicitly wanted and authenticated. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
- Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
+- Keep going until structured review returns no accepted/actionable findings.
+- If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
+- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
+- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
+- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
+- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
+- Do not invoke built-in `codex review`, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
+- Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
 - Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
 - If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
- If creating or updating a PR while rejecting any autoreview finding, record the rejected finding and reason in the PR description so later reviewers can distinguish intentional design decisions from missed review output.
+- If `gh`/Gitcrawl reports `database disk image is malformed`, run `gitcrawl doctor --json` once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
+- If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
 - Do not push just to review. Push only when the user requested push/ship/PR update.
- For OpenClaw maintainers, keep autoreview validation Crabbox/Testbox-aware when maintainer validation mode is enabled (`OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`). A review pass may inspect files and run cheap non-Node probes, but it must not start local `pnpm`, Vitest, `tsgo`, `npm test`, or `node scripts/run-vitest.mjs` from a Codex/worktree review unless the operator explicitly requested local proof. For runtime proof, use existing evidence or route through Crabbox/Testbox and report the id. Do not apply this rule to ordinary contributors who do not have maintainer Testbox access.

 ## Pick Target

 Dirty local work:

 ```bash
-codex review --uncommitted
+<autoreview-helper> --mode local
 ```

 Use this only when the patch is actually unstaged/staged/untracked in the
-current checkout. For committed, pushed, or PR work, point Codex at the commit
+current checkout. For committed, pushed, or PR work, point the helper at the commit
 or branch diff instead; do not force `--mode local` / `--uncommitted` just
-because the helper docs mention dirty work first. A clean `--uncommitted` review
+because the helper docs mention dirty work first. A clean local review
 only proves there is no local patch.

 Branch/PR work:

 ```bash
-git fetch origin
-codex review --base origin/main
+<autoreview-helper> --mode branch --base origin/main
 ```

-Do not pass any prompt with `--base`, `--commit`, or `--uncommitted`. Codex CLI
-review targets and custom review prompts are mutually exclusive: target modes
-generate their own review prompt internally. Use plain target review for native
-Codex closeout, or use custom prompt review (`codex review -`) only when you
-intentionally want a generated diff prompt instead of native target review.
+Optional review context is first-class:
+
+```bash
+<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json
+```

 If an open PR exists, use its actual base:

 ```bash
 base=$(gh pr view --json baseRefName --jq .baseRefName)
-codex review --base "origin/$base"
+<autoreview-helper> --mode branch --base "origin/$base"
 ```

 Committed single change:

 ```bash
-codex review --commit HEAD
+<autoreview-helper> --mode commit --commit HEAD
 ```

 or with the helper:

 ```bash
-.agents/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
+/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
 ```

 Use commit review for already-landed or already-pushed work on `main`. Reviewing
@@ -87,46 +90,53 @@ with `--base`.
 Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:

 ```bash
-.agents/skills/autoreview/scripts/autoreview --parallel-tests "<focused test command>"
+scripts/autoreview --parallel-tests "<focused test command>"
 ```

 Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.

 ## Context Efficiency

-Codex review is usually noisy. Default to a subagent filter when subagents are available. Ask it to run the review and return only:
- actionable findings it accepts
- findings it rejects, with one-line reason
- exact files/tests to rerun
-
-Run inline only for tiny changes or when subagents are unavailable.
+Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.

 ## Helper

-Bundled helper:
+OpenClaw repo-local helper:

 ```bash
 .agents/skills/autoreview/scripts/autoreview --help
 ```

+`agent-scripts` checkout helper:
+
+```bash
+skills/autoreview/scripts/autoreview --help
+```
+
+Global helper from `agent-scripts`:
+
+```bash
+~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help
+```
+
+If installed from `agent-scripts`, path is:
+
+```bash
+/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help
+```
+
 The helper:
- chooses dirty `--uncommitted` first
+- chooses dirty local changes first
 - otherwise uses current PR base if `gh pr view` works
 - otherwise uses `origin/main` for non-main branches
- auto-runs `PNPM_CONFIG_PM_ON_FAIL=ignore PNPM_CONFIG_VERIFY_DEPS_BEFORE_RUN=false PNPM_CONFIG_OFFLINE=true pnpm run check` in parallel when a repo has `package.json`, `pnpm-lock.yaml`, `node_modules`, and a `check` script; disable with `AUTOREVIEW_AUTO_TESTS=0`
+- supports `--engine codex` and `--engine claude`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
 - use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
 - should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- supports `--reviewer codex|claude|pi|opencode|droid|copilot|auto`; `auto` means Codex first
- supports `--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none`; default is `none`
- falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings
- writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set
- supports `--dry-run`, `--parallel-tests`, and commit refs
- runs nested review with `--dangerously-bypass-approvals-and-sandbox --sandbox danger-full-access` by default
- with `OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`, disables auto local `pnpm run check` and routes Codex through generated prompt review (`codex review -`) so the no-local-heavy-tests policy is included; native Codex target review cannot accept extra prompt text
- non-Codex reviewers receive the generated diff prompt and maintainer validation policy text when maintainer validation is active
- keeps accepting `--full-access`; use `--no-yolo` or `AUTOREVIEW_YOLO=0` to opt out
- still accepts legacy `CODEX_REVIEW_*` env vars when the matching `AUTOREVIEW_*` var is unset
+- writes only to stdout unless `--output` or `--json-output` is set
+- supports `--dry-run`, `--parallel-tests`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
+- allows read-only tools and web search by default; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
 - prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
+- exits nonzero when accepted/actionable findings are present

 ## Final Report

@@ -136,11 +146,4 @@ Include:
 - findings accepted/rejected, briefly why
 - the clean review result from the final helper/review run, or why a remaining finding was consciously rejected

-Do not run another Codex review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.
-
-## PR / CI Closeout
-
- Prefer direct run/job APIs after CI starts: `gh run view <run-id> --json jobs`; use PR rollup only for final mergeability.
- After rebase, compare `origin/main..HEAD`; drop CI-fix commits already upstream before pushing.
- For prompt snapshot CI failures, prove/generate with Linux Node 24 before rerunning the failed job.
- Update PR body once near the final head unless proof labels are missing or stale enough to block CI.
+Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.
--- a/.agents/skills/autoreview/scripts/autoreview
+++ b/.agents/skills/autoreview/scripts/autoreview
--- a/.agents/skills/autoreview/scripts/test-review-harness
+++ b/.agents/skills/autoreview/scripts/test-review-harness
@@ -0,0 +1,82 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+  cat <<'EOF'
+Usage: test-review-harness [--engine codex|claude]...
+
+Creates a temporary git repo with a deliberately unsafe patch, then verifies
+each selected engine reports the command-injection finding through autoreview.
+Default engines: codex, claude.
+EOF
+}
+
+engines=()
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --engine)
+      engines+=("${2:-}")
+      shift 2
+      ;;
+    -h|--help)
+      usage
+      exit 0
+      ;;
+    *)
+      usage >&2
+      exit 2
+      ;;
+  esac
+done
+
+if [[ ${#engines[@]} -eq 0 ]]; then
+  engines=(codex claude)
+fi
+
+script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+repo=$(mktemp -d "${TMPDIR:-/tmp}/autoreview-fixture.XXXXXX")
+trap 'rm -rf "$repo"' EXIT
+
+cd "$repo"
+git init --quiet
+git config user.name "Review Fixture"
+git config user.email "review-fixture@example.com"
+
+cat > app.js <<'EOF'
+export function uploadPath(name) {
+  return `uploads/${name.replaceAll("/", "")}`;
+}
+
+export function publicUser(user) {
+  return { id: user.id, name: user.name };
+}
+EOF
+
+git add app.js
+git commit --quiet -m "initial safe version"
+
+cat > app.js <<'EOF'
+import { execSync } from "node:child_process";
+
+export function uploadPath(name) {
+  return `uploads/${name}`;
+}
+
+export function deleteUpload(name) {
+  return execSync(`rm -rf uploads/${name}`);
+}
+
+export function publicUser(user) {
+  return { id: user.id, name: user.name, password: user.password };
+}
+EOF
+
+for engine in "${engines[@]}"; do
+  echo "== $engine =="
+  "$script_dir/autoreview" \
+    --mode local \
+    --engine "$engine" \
+    --prompt "This is an acceptance test fixture. The changed app.js patch contains real security bugs. Review normally and report only actionable defects from the patch." \
+    --require-finding "command" \
+    --expect-findings
+done
--- a/.gitignore
+++ b/.gitignore
@@ -162,6 +162,8 @@ mantis/
 !.agents/skills/security-triage/**
 !.agents/skills/tag-duplicate-prs-issues/
 !.agents/skills/tag-duplicate-prs-issues/**
+!.agents/skills/autoreview/
+!.agents/skills/autoreview/**

 # Agent credentials and memory (NEVER COMMIT)
 /memory/