feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483)

* feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows [AI-assisted] Stand up an end-to-end pipeline that turns every published openclaw GitHub Security Advisory into a reusable OpenGrep rule, and wire the compiled rules into manual-dispatch GitHub Actions workflows that publish SARIF to GitHub Code Scanning. The pipeline is harness-agnostic: any coding-agent CLI (Rovo Dev, Claude Code, Codex, OpenCode, or anything you can shell out to) can drive it via the runner script's --harness flag. Built-in adapters cover the four common harnesses; --harness-cmd '<template>' supports anything else with shell-style {prompt}/{model}/{output_file} substitution. Pipeline pieces: - scripts/run-ghsa-detector-review-batch.mjs runs your chosen coding harness in parallel against every advisory using the agent-agnostic detector-review spec at security/detector-review/detector-review-spec.md. Each case produces an opengrep general-rule.yml (precise) and broad-rule.yml (review-aid), plus a coverage-validated report against the vulnerable commit's changed files. - scripts/compile-opengrep-rules.mjs walks a run directory, rewrites each rule's id to ghsa-detector.<ghsa>.<orig-id>, injects ghsa/advisory-url/ detector-bucket/source-rule-id metadata, and uses opengrep itself to drop rules with InvalidRuleSchemaError so the published super-configs load cleanly. Compiled outputs: - security/opengrep/precise.yml (336 rules) - security/opengrep/broad.yml (459 rules) - security/opengrep/compile-manifest.json (per-rule provenance map) CI workflows (manual workflow_dispatch only): - .github/workflows/opengrep-precise.yml - .github/workflows/opengrep-broad.yml Both install a pinned opengrep, run opengrep scan against src/, upload SARIF to Code Scanning under categories opengrep-precise / opengrep-broad, and use continue-on-error: true so findings never block the workflow. Detector-review spec and assets: - security/detector-review/detector-review-spec.md the agent-agnostic spec the runner injects into each per-case prompt - security/detector-review/references/{detector-rubric,report-template}.md - security/detector-review/scripts/init_case.py - security/prompt-suffix-coverage-first.md mandatory prompt addendum that enforces coverage-first validation (rule must catch the OG vuln, not just pass synthetic fixtures) Docs: - security/README.md end-to-end flow, supported harnesses, regen recipe - security/opengrep/README.md compiled-config details + recompile recipe * security: tighten GHSA OpenGrep detector workflow * chore: refine precise opengrep workflow * chore: remove stale opengrep metadata * fix: harden GHSA OpenGrep workflow * ci: split OpenGrep diff and full scans * chore: remove performance-only opengrep rule * ci: use OpenGrep installer path * chore: enforce opengrep rule metadata provenance * chore: generalize opengrep rule compilation * docs: align opengrep rulepack guidance * chore: support generic opengrep rule sources * fix: validate opengrep rulepack-only changes --------- Co-authored-by: Jesse Merhi <security-engineering@atlassian.com>
2026-05-06 15:10:52 +00:00 · 2026-04-30 02:42:20 +10:00
parent c7aaa40848
commit 6de9d71bfb
16 changed files with 6488 additions and 0 deletions
--- a/security/opengrep/README.md
+++ b/security/opengrep/README.md
@@ -0,0 +1,103 @@
+# Compiled OpenGrep super-configs
+
+`precise.yml` is OpenClaw's shipped precise OpenGrep rulepack. Each rule is tied
+to a source advisory, vulnerability report, or review identifier through metadata
+and is intended to have concrete coverage of the original vulnerable behavior or
+a verified variant.
+
+Rule provenance lives in each compiled rule's metadata; no separate manifest is
+committed or generated by default.
+
+Noisy exploratory rules are intentionally kept out of the tracked repo. Anything
+appended to `precise.yml` must be low-noise enough to run as a blocking PR-diff
+check and as a manual full-repository audit.
+
+## Editing rules
+
+`precise.yml` is the checked-in compiled rulepack. Prefer changing source rule
+YAML and rerunning `security/opengrep/compile-rules.mjs` instead of hand-editing
+compiled rules. The compiler appends new rule IDs by default; use
+`--replace-precise` only when intentionally rebuilding the rulepack from a
+complete source folder. Direct edits are discouraged because they can bypass ID,
+metadata, duplicate, and OpenGrep validation.
+
+## Rule naming and metadata
+
+Every rule's id is rewritten to `<source-id>.<original-id>`. Every rule's
+`metadata` block is augmented with source fields enforced by
+`pnpm check:opengrep-rule-metadata`:
+
+| Key               | Value                                                                 |
+| ----------------- | --------------------------------------------------------------------- |
+| `ghsa`            | `GHSA-xxxx-xxxx-xxxx` for GHSA-backed rules                           |
+| `advisory-id`     | non-GHSA source identifier, or the GHSA ID normalized by the compiler |
+| `advisory-url`    | durable URL to the advisory, report, review record, or source context |
+| `detector-bucket` | `precise`                                                             |
+| `source-rule-id`  | the original source rule id                                           |
+| `source-file`     | optional source YAML file used during compilation                     |
+
+## Recompiling
+
+```bash
+# from the openclaw repo root
+node security/opengrep/compile-rules.mjs \
+  --rules-dir <folder-with-source-rule-yaml>
+```
+
+The script:
+
+1. Recursively walks every `.yml` / `.yaml` file under `--rules-dir`
+2. Reads top-level `rules` arrays from those source files
+3. Requires each source rule to provide `metadata.ghsa` or `metadata.advisory-id`
+4. Requires `metadata.advisory-url` for non-GHSA source identifiers
+5. Rewrites ids and injects metadata as above
+6. Appends only new precise rule ids to the existing `precise.yml` by default; pass `--replace-precise` to rebuild it from just the supplied source folder
+7. Runs `opengrep scan --no-strict` against an empty target to identify schema-invalid or parser-invalid rules and drops mapped bad rules so the published super-config loads cleanly
+8. Writes `precise.yml`
+
+Skipped, duplicate, or invalid rules are summarized on stdout/stderr for follow-up.
+
+## Validating locally
+
+```bash
+pnpm check:opengrep-rule-metadata
+opengrep validate security/opengrep/precise.yml
+```
+
+The metadata check must pass before rules are committed. OpenGrep validation must
+exit zero. Warnings about unknown fields are acceptable only when OpenGrep still
+reports `Configuration is valid` and a non-zero rule count. The compile script
+drops mapped schema/parser-invalid rules and fails closed when OpenGrep
+validation itself cannot be completed.
+
+## Running locally
+
+```bash
+scripts/run-opengrep.sh
+```
+
+For SARIF output matching the PR workflow's diff-scoped scan:
+
+```bash
+scripts/run-opengrep.sh --changed --sarif
+```
+
+For SARIF output matching the manual full-repository workflow:
+
+```bash
+scripts/run-opengrep.sh --sarif
+```
+
+## Why `--no-strict`?
+
+Some generated rules trigger non-fatal opengrep warnings (for example,
+unknown-field warnings on compatibility-only keys). `--no-strict` keeps
+opengrep's exit code clean for those warnings. Parser-invalid rules are still
+dropped during compilation so the checked-in super-config validates before CI
+uses it.
+
+## Why `--no-git-ignore`?
+
+Some OpenClaw paths are excluded by `.gitignore` for build reasons even though
+they contain meaningful source code we want scanned. `--no-git-ignore` keeps
+opengrep from skipping them.