feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows (#69483)

* feat(security): add GHSA detector-review pipeline and OpenGrep CI workflows [AI-assisted]

Stand up an end-to-end pipeline that turns every published openclaw GitHub
Security Advisory into a reusable OpenGrep rule, and wire the compiled rules
into manual-dispatch GitHub Actions workflows that publish SARIF to GitHub
Code Scanning.

The pipeline is harness-agnostic: any coding-agent CLI (Rovo Dev, Claude
Code, Codex, OpenCode, or anything you can shell out to) can drive it via
the runner script's --harness flag. Built-in adapters cover the four common
harnesses; --harness-cmd '<template>' supports anything else with shell-style
{prompt}/{model}/{output_file} substitution.

Pipeline pieces:

- scripts/run-ghsa-detector-review-batch.mjs runs your chosen coding harness
  in parallel against every advisory using the agent-agnostic detector-review
  spec at security/detector-review/detector-review-spec.md. Each case
  produces an opengrep general-rule.yml (precise) and broad-rule.yml
  (review-aid), plus a coverage-validated report against the vulnerable
  commit's changed files.
- scripts/compile-opengrep-rules.mjs walks a run directory, rewrites each
  rule's id to ghsa-detector.<ghsa>.<orig-id>, injects ghsa/advisory-url/
  detector-bucket/source-rule-id metadata, and uses opengrep itself to drop
  rules with InvalidRuleSchemaError so the published super-configs load
  cleanly.

Compiled outputs:

- security/opengrep/precise.yml     (336 rules)
- security/opengrep/broad.yml       (459 rules)
- security/opengrep/compile-manifest.json    (per-rule provenance map)

CI workflows (manual workflow_dispatch only):

- .github/workflows/opengrep-precise.yml
- .github/workflows/opengrep-broad.yml

Both install a pinned opengrep, run opengrep scan against src/, upload SARIF
to Code Scanning under categories opengrep-precise / opengrep-broad, and use
continue-on-error: true so findings never block the workflow.

Detector-review spec and assets:

- security/detector-review/detector-review-spec.md   the agent-agnostic spec
  the runner injects into each per-case prompt
- security/detector-review/references/{detector-rubric,report-template}.md
- security/detector-review/scripts/init_case.py
- security/prompt-suffix-coverage-first.md   mandatory prompt addendum that
  enforces coverage-first validation (rule must catch the OG vuln, not just
  pass synthetic fixtures)

Docs:

- security/README.md          end-to-end flow, supported harnesses, regen recipe
- security/opengrep/README.md compiled-config details + recompile recipe

* security: tighten GHSA OpenGrep detector workflow

* chore: refine precise opengrep workflow

* chore: remove stale opengrep metadata

* fix: harden GHSA OpenGrep workflow

* ci: split OpenGrep diff and full scans

* chore: remove performance-only opengrep rule

* ci: use OpenGrep installer path

* chore: enforce opengrep rule metadata provenance

* chore: generalize opengrep rule compilation

* docs: align opengrep rulepack guidance

* chore: support generic opengrep rule sources

* fix: validate opengrep rulepack-only changes

---------

Co-authored-by: Jesse Merhi <security-engineering@atlassian.com>
This commit is contained in:
Jesse Merhi
2026-04-30 02:42:20 +10:00
committed by GitHub
parent c7aaa40848
commit 6de9d71bfb
16 changed files with 6488 additions and 0 deletions

136
security/README.md Normal file
View File

@@ -0,0 +1,136 @@
# Security tooling
This directory holds OpenClaw's shipped OpenGrep security rulepack and the
supporting tooling that validates and runs it. Maintainer-only advisory triage
and detector-generation prompts live outside the public repo; this repo keeps the
durable artifacts needed to block regressions in PRs and support local rule
validation.
## Layout
```text
security/
├── README.md <- this file
└── opengrep/
├── README.md <- precise rulepack details + compile recipe
└── precise.yml <- compiled super-config: precise rules
```
The related scripts are:
- `security/opengrep/compile-rules.mjs` — gathers source OpenGrep rule YAMLs from
a folder and appends new compiled rule IDs to `security/opengrep/precise.yml`.
- `security/opengrep/check-rule-metadata.mjs` — enforces that every committed
rule carries durable source/provenance metadata.
- `scripts/run-opengrep.sh` — runs the compiled precise rulepack locally or in
CI with consistent paths and exclusions.
## Rule lifecycle
Maintainers investigate advisories and generate candidate rules outside the public repo.
Once a candidate rule has been validated and reviewed, put the shippable source
rule YAML in any local folder and compile it into this repo:
```bash
node security/opengrep/compile-rules.mjs \
--rules-dir <folder-with-source-rule-yaml>
```
Commit the resulting `security/opengrep/precise.yml` diff. Durable rule
provenance lives in each compiled rule's metadata and is checked by
`pnpm check:opengrep-rule-metadata`.
Rule quality contract: precise rules must catch the vulnerable behavior they were
written for, should be silent on corresponding fixed behavior when a fix exists,
and should keep current findings limited to verified regressions or variants.
## Writing precise OpenGrep rules
A rule is appropriate for `security/opengrep/precise.yml` only when the dangerous
shape is stable enough to block PRs. Prefer, in order:
1. **Variant detector** — source-to-sink or missing-guard detection across the
same bug family.
2. **Scoped behavioral regression** — a narrow subsystem-specific rule anchored
on the affected API or trust boundary.
3. **Exact regression canary** — a labelled canary for the original vulnerable
shape when broader variants would be noisy.
4. **No OpenGrep rule** — if runtime state, product policy, or external data is
required to distinguish vulnerable and safe behavior.
Before compiling a rule, validate it against vulnerable/fixed/current code when
those surfaces exist. Every current finding must be classified as a true original
issue or true variant, or the rule must be tightened/dropped before it ships.
## Running the rules locally
The wrapper script handles paths, exclusions, and output formatting so local
scans match CI exactly.
```bash
scripts/run-opengrep.sh # precise rules, human output
scripts/run-opengrep.sh --json # write .opengrep-out/precise.json
scripts/run-opengrep.sh --sarif # write .opengrep-out/precise.sarif
scripts/run-opengrep.sh --changed # scan changed first-party paths
scripts/run-opengrep.sh -- src/agents/ # scan a single dir
```
If you'd rather invoke `opengrep` directly, the equivalent is:
```bash
opengrep scan --no-strict --no-git-ignore \
--config security/opengrep/precise.yml \
src/ extensions/ apps/ packages/ scripts/
```
Both forms read `.semgrepignore` at the repo root automatically — that's the
single source of truth for which paths are skipped (test files, fixtures, mocks,
QA-tooling extensions, test-orchestration scripts, …). Add a glob there if a new
test naming convention shows up.
## Running the rules in CI
There are two OpenGrep workflows:
- **OpenGrep — PR Diff** (`.github/workflows/opengrep-precise.yml`) runs on pull
requests and executes `scripts/run-opengrep.sh --changed --sarif --error` so
findings stay scoped to changed first-party paths.
- **OpenGrep — Full** (`.github/workflows/opengrep-precise-full.yml`) is manual
dispatch only and executes `scripts/run-opengrep.sh --sarif --error` across
the full first-party source set for maintainers who want a repository-wide
audit.
Both workflows:
- Inherit the same `.semgrepignore` exclusions used by the local wrapper
- Upload SARIF to GitHub Code Scanning under stable OpenGrep categories
- Fail on precise findings so the rulepack acts as a regression firewall
- Enforce committed rule provenance with `pnpm check:opengrep-rule-metadata`
## Editing, silencing, or removing rules
`precise.yml` is the checked-in compiled rulepack. Prefer editing source rule
YAML and recompiling instead of hand-editing compiled rules, because the compiler
normalizes rule IDs, metadata, duplicates, and OpenGrep validation. The compiler
appends new rule IDs by default; use `--replace-precise` only when intentionally
rebuilding the rulepack from a complete source folder.
To drop a noisy rule:
1. Delete the offending source rule from the local source-rule folder.
2. Re-run `node security/opengrep/compile-rules.mjs --rules-dir <folder-with-source-rule-yaml>`.
3. Commit the resulting `security/opengrep/precise.yml` diff.
To narrow a rule's path scope, edit the source rule's `paths.include` /
`paths.exclude` fields in the same local artifact location and recompile.
## Tracing a finding back to its source
Every compiled rule's `id` is `<source-id>.<original-id>`. For GHSA-backed rules,
`<source-id>` is the lower-case GHSA ID. For other source-backed rules, use a
stable source identifier without dots such as a CVE, OSV ID, internal advisory ID, or other
review identifier. Rule `metadata` must include `advisory-url`,
`detector-bucket`, and `source-rule-id`, plus either `ghsa` or `advisory-id`.
New compilations also add `source-file` when available.
`pnpm check:opengrep-rule-metadata` enforces these durable source fields so each
committed rule is traceable without a separate committed manifest.

103
security/opengrep/README.md Normal file
View File

@@ -0,0 +1,103 @@
# Compiled OpenGrep super-configs
`precise.yml` is OpenClaw's shipped precise OpenGrep rulepack. Each rule is tied
to a source advisory, vulnerability report, or review identifier through metadata
and is intended to have concrete coverage of the original vulnerable behavior or
a verified variant.
Rule provenance lives in each compiled rule's metadata; no separate manifest is
committed or generated by default.
Noisy exploratory rules are intentionally kept out of the tracked repo. Anything
appended to `precise.yml` must be low-noise enough to run as a blocking PR-diff
check and as a manual full-repository audit.
## Editing rules
`precise.yml` is the checked-in compiled rulepack. Prefer changing source rule
YAML and rerunning `security/opengrep/compile-rules.mjs` instead of hand-editing
compiled rules. The compiler appends new rule IDs by default; use
`--replace-precise` only when intentionally rebuilding the rulepack from a
complete source folder. Direct edits are discouraged because they can bypass ID,
metadata, duplicate, and OpenGrep validation.
## Rule naming and metadata
Every rule's id is rewritten to `<source-id>.<original-id>`. Every rule's
`metadata` block is augmented with source fields enforced by
`pnpm check:opengrep-rule-metadata`:
| Key | Value |
| ----------------- | --------------------------------------------------------------------- |
| `ghsa` | `GHSA-xxxx-xxxx-xxxx` for GHSA-backed rules |
| `advisory-id` | non-GHSA source identifier, or the GHSA ID normalized by the compiler |
| `advisory-url` | durable URL to the advisory, report, review record, or source context |
| `detector-bucket` | `precise` |
| `source-rule-id` | the original source rule id |
| `source-file` | optional source YAML file used during compilation |
## Recompiling
```bash
# from the openclaw repo root
node security/opengrep/compile-rules.mjs \
--rules-dir <folder-with-source-rule-yaml>
```
The script:
1. Recursively walks every `.yml` / `.yaml` file under `--rules-dir`
2. Reads top-level `rules` arrays from those source files
3. Requires each source rule to provide `metadata.ghsa` or `metadata.advisory-id`
4. Requires `metadata.advisory-url` for non-GHSA source identifiers
5. Rewrites ids and injects metadata as above
6. Appends only new precise rule ids to the existing `precise.yml` by default; pass `--replace-precise` to rebuild it from just the supplied source folder
7. Runs `opengrep scan --no-strict` against an empty target to identify schema-invalid or parser-invalid rules and drops mapped bad rules so the published super-config loads cleanly
8. Writes `precise.yml`
Skipped, duplicate, or invalid rules are summarized on stdout/stderr for follow-up.
## Validating locally
```bash
pnpm check:opengrep-rule-metadata
opengrep validate security/opengrep/precise.yml
```
The metadata check must pass before rules are committed. OpenGrep validation must
exit zero. Warnings about unknown fields are acceptable only when OpenGrep still
reports `Configuration is valid` and a non-zero rule count. The compile script
drops mapped schema/parser-invalid rules and fails closed when OpenGrep
validation itself cannot be completed.
## Running locally
```bash
scripts/run-opengrep.sh
```
For SARIF output matching the PR workflow's diff-scoped scan:
```bash
scripts/run-opengrep.sh --changed --sarif
```
For SARIF output matching the manual full-repository workflow:
```bash
scripts/run-opengrep.sh --sarif
```
## Why `--no-strict`?
Some generated rules trigger non-fatal opengrep warnings (for example,
unknown-field warnings on compatibility-only keys). `--no-strict` keeps
opengrep's exit code clean for those warnings. Parser-invalid rules are still
dropped during compilation so the checked-in super-config validates before CI
uses it.
## Why `--no-git-ignore`?
Some OpenClaw paths are excluded by `.gitignore` for build reasons even though
they contain meaningful source code we want scanned. `--no-git-ignore` keeps
opengrep from skipping them.

View File

@@ -0,0 +1,138 @@
#!/usr/bin/env node
import { promises as fs } from "node:fs";
import * as path from "node:path";
import { parseDocument } from "yaml";
const DEFAULT_RULEPACK = path.resolve("security", "opengrep", "precise.yml");
const GHSA_RE = /^GHSA-[0-9A-Z]{4}-[0-9A-Z]{4}-[0-9A-Z]{4}$/;
const RULE_ID_RE = /^([a-z0-9][a-z0-9_-]*)\..+$/;
function printHelp() {
console.log(`Usage: node security/opengrep/check-rule-metadata.mjs [rulepack.yml]
Checks that every compiled OpenGrep rule carries source/provenance metadata.
Default rulepack: ${DEFAULT_RULEPACK}
`);
}
export async function readRules(rulepackPath) {
const raw = await fs.readFile(rulepackPath, "utf8");
const doc = parseDocument(raw, { keepSourceTokens: false });
if (doc.errors.length > 0) {
throw new Error(
`Could not parse ${rulepackPath}: ${doc.errors.map((e) => e.message).join("; ")}`,
);
}
const data = doc.toJSON();
if (!data || !Array.isArray(data.rules)) {
throw new Error(`${rulepackPath} must contain a top-level rules array`);
}
return data.rules;
}
function hasNonEmptyString(value) {
return typeof value === "string" && value.trim().length > 0;
}
function sanitizeIdComponent(value) {
return (
String(value || "")
.replace(/[^a-zA-Z0-9._-]+/g, "-")
.replace(/^-+|-+$/g, "")
.toLowerCase() || "rule"
);
}
function sanitizeSourceIdComponent(value) {
return sanitizeIdComponent(value).replace(/[.]+/g, "-");
}
export function validateRuleMetadata(rules) {
const violations = [];
for (const [index, rule] of rules.entries()) {
const id = String(rule?.id ?? "");
const label = id || `rules[${index}]`;
const metadata = rule?.metadata;
if (!metadata || typeof metadata !== "object" || Array.isArray(metadata)) {
violations.push(`${label}: missing metadata object`);
continue;
}
const idMatch = id.match(RULE_ID_RE);
if (!idMatch) {
violations.push(`${label}: id must match <source-id>.<source-rule-id>`);
}
const ghsa = String(metadata.ghsa ?? "");
const advisoryId = String(metadata["advisory-id"] ?? metadata.ghsa ?? "")
.trim()
.toUpperCase();
if (!hasNonEmptyString(advisoryId)) {
violations.push(`${label}: missing metadata.advisory-id or metadata.ghsa`);
} else if (idMatch && idMatch[1] !== sanitizeSourceIdComponent(advisoryId)) {
violations.push(
`${label}: source id in metadata (${advisoryId}) must match source id in rule id (${idMatch[1]})`,
);
}
if (ghsa && !GHSA_RE.test(ghsa)) {
violations.push(`${label}: metadata.ghsa must match GHSA-XXXX-XXXX-XXXX when present`);
} else if (ghsa && advisoryId !== ghsa) {
violations.push(
`${label}: metadata.advisory-id must match metadata.ghsa when both are present`,
);
}
const advisoryUrl = String(metadata["advisory-url"] ?? "");
const expectedGhsaUrl = GHSA_RE.test(advisoryId)
? `https://github.com/openclaw/openclaw/security/advisories/${advisoryId}`
: "";
if (!hasNonEmptyString(advisoryUrl)) {
violations.push(`${label}: missing metadata.advisory-url`);
} else if (expectedGhsaUrl && advisoryUrl !== expectedGhsaUrl) {
violations.push(`${label}: metadata.advisory-url must be ${expectedGhsaUrl}`);
}
if (metadata["detector-bucket"] !== "precise") {
violations.push(`${label}: metadata.detector-bucket must be precise`);
}
if (!hasNonEmptyString(metadata["source-rule-id"])) {
violations.push(`${label}: missing metadata.source-rule-id`);
}
}
return violations;
}
export async function checkRulepack(rulepackPath = DEFAULT_RULEPACK) {
const rules = await readRules(rulepackPath);
return validateRuleMetadata(rules);
}
export async function main(argv = process.argv.slice(2)) {
if (argv.includes("--help") || argv.includes("-h")) {
printHelp();
return 0;
}
const rulepackPath = path.resolve(argv[0] ?? DEFAULT_RULEPACK);
const violations = await checkRulepack(rulepackPath);
if (violations.length > 0) {
console.error(
`check-opengrep-rule-metadata: ${violations.length} violation(s) in ${rulepackPath}`,
);
for (const violation of violations.slice(0, 50)) {
console.error(` - ${violation}`);
}
if (violations.length > 50) {
console.error(` ... ${violations.length - 50} more`);
}
return 1;
}
console.log(`check-opengrep-rule-metadata: ${rulepackPath} ok`);
return 0;
}
if (import.meta.main) {
process.exitCode = await main();
}

View File

@@ -0,0 +1,602 @@
#!/usr/bin/env node
/**
* compile-rules.mjs
*
* Compiles source OpenGrep rule YAML files from a folder into OpenClaw's shipped
* precise super-config. The input folder is intentionally generic: any nested
* .yml/.yaml file containing a top-level `rules` array can be compiled as long
* as each rule carries metadata.ghsa or metadata.advisory-id.
*/
import { spawn } from "node:child_process";
import { promises as fs } from "node:fs";
import * as os from "node:os";
import * as path from "node:path";
import { fileURLToPath } from "node:url";
import { parseDocument, stringify } from "yaml";
const REPO_BASENAME = "openclaw/openclaw";
const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
const REPO_ROOT = path.resolve(SCRIPT_DIR, "..", "..");
const DEFAULT_OUT_DIR = path.resolve(REPO_ROOT, "security", "opengrep");
const GHSA_RE = /^GHSA-[0-9A-Z]{4}-[0-9A-Z]{4}-[0-9A-Z]{4}$/;
function printHelp() {
console.log(`Usage: node security/opengrep/compile-rules.mjs --rules-dir <path> [options]
Options:
--rules-dir <path> Required. Directory containing source OpenGrep YAML files.
--out-dir <path> Output directory for precise.yml (default: <repo>/security/opengrep).
--advisory-repo <r> GitHub owner/repo used in advisory-url metadata.
Default: ${REPO_BASENAME}
--replace-precise Replace precise.yml instead of appending new rule ids.
--help Show this help.
`);
}
function parseArgs(argv) {
const opts = {
rulesDir: "",
outDir: "",
advisoryRepo: REPO_BASENAME,
replacePrecise: false,
};
for (let i = 0; i < argv.length; i += 1) {
const arg = argv[i];
switch (arg) {
case "--rules-dir":
opts.rulesDir = path.resolve(argv[i + 1] ?? "");
i += 1;
break;
case "--run-dir":
throw new Error(
"--run-dir was replaced by --rules-dir; pass a folder of source rule YAML files",
);
case "--out-dir":
opts.outDir = path.resolve(argv[i + 1] ?? "");
i += 1;
break;
case "--advisory-repo":
opts.advisoryRepo = argv[i + 1] ?? REPO_BASENAME;
i += 1;
break;
case "--replace-precise":
opts.replacePrecise = true;
break;
case "--help":
case "-h":
printHelp();
process.exit(0);
default:
throw new Error(`Unknown argument: ${arg}`);
}
}
if (!opts.rulesDir) {
printHelp();
throw new Error("--rules-dir is required");
}
return opts;
}
function sanitizeIdComponent(value) {
return (
String(value || "")
.replace(/[^a-zA-Z0-9._-]+/g, "-")
.replace(/^-+|-+$/g, "")
.toLowerCase() || "rule"
);
}
function normalizeSourceId(value) {
return String(value || "")
.trim()
.toUpperCase();
}
function sanitizeSourceIdComponent(value) {
return sanitizeIdComponent(value).replace(/[.]+/g, "-");
}
function sourceIdFromMetadata(metadata) {
return normalizeSourceId(metadata?.["advisory-id"] || metadata?.ghsa);
}
function buildGhsaAdvisoryUrl(advisoryRepo, ghsa) {
return `https://github.com/${advisoryRepo}/security/advisories/${ghsa}`;
}
function toPortablePath(filePath, repoRoot = REPO_ROOT) {
const resolved = path.resolve(filePath);
const relative = path.relative(repoRoot, resolved);
if (relative && !relative.startsWith("..") && !path.isAbsolute(relative)) {
return relative.split(path.sep).join("/");
}
return path.basename(resolved);
}
function rewriteRule(rule, params) {
const originalId = String(rule.id ?? "rule");
const metadata = { ...(rule.metadata ?? {}) };
const sourceId = sourceIdFromMetadata(metadata);
if (!sourceId) {
throw new Error(
`${params.sourceFile}: rule ${originalId} must set metadata.advisory-id or metadata.ghsa`,
);
}
if (GHSA_RE.test(sourceId)) {
metadata.ghsa = sourceId;
metadata["advisory-url"] =
metadata["advisory-url"] || buildGhsaAdvisoryUrl(params.advisoryRepo, sourceId);
} else if (!metadata["advisory-url"]) {
throw new Error(
`${params.sourceFile}: rule ${originalId} must set metadata.advisory-url for non-GHSA source ${sourceId}`,
);
}
metadata["advisory-id"] = sourceId;
metadata["detector-bucket"] = "precise";
metadata["source-rule-id"] = originalId;
metadata["source-file"] = toPortablePath(params.sourceFile);
const newId = `${sanitizeSourceIdComponent(sourceId)}.${sanitizeIdComponent(originalId)}`;
return { ...rule, id: newId, metadata };
}
async function readRuleFile(filePath) {
const raw = await fs.readFile(filePath, "utf8");
if (!raw.trim()) {
return { rules: [], error: null };
}
let doc;
try {
doc = parseDocument(raw, { keepSourceTokens: false });
} catch (error) {
return { rules: [], error: `parse-error: ${error.message}` };
}
if (doc.errors && doc.errors.length > 0) {
return { rules: [], error: `yaml-errors: ${doc.errors.map((e) => e.message).join("; ")}` };
}
const data = doc.toJSON();
if (!data || !Array.isArray(data.rules)) {
return { rules: [], error: "no-rules-array" };
}
return { rules: data.rules, error: null };
}
async function listYamlFiles(dir) {
const out = [];
async function walk(current) {
const entries = await fs.readdir(current, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(current, entry.name);
if (entry.isDirectory()) {
if (entry.name === "node_modules" || entry.name === ".git") {
continue;
}
await walk(fullPath);
} else if (entry.isFile() && /\.ya?ml$/i.test(entry.name)) {
if (entry.name === "precise.yml") {
continue;
}
out.push(fullPath);
}
}
}
await walk(dir);
return out.toSorted();
}
async function compile(opts) {
const sourceFiles = await listYamlFiles(opts.rulesDir);
const buckets = {
precise: { rules: [], skipped: [] },
};
const manifest = {
rulesDir: toPortablePath(opts.rulesDir),
advisoryRepo: opts.advisoryRepo,
generatedAt: new Date().toISOString(),
totals: {},
files: {},
};
for (const filePath of sourceFiles) {
const fileKey = toPortablePath(filePath);
const fileEntry = { precise: [], errors: {} };
const { rules, error } = await readRuleFile(filePath);
if (error) {
buckets.precise.skipped.push({ file: fileKey, error });
fileEntry.errors.precise = error;
} else {
for (const rule of rules) {
try {
const rewritten = rewriteRule(rule, {
advisoryRepo: opts.advisoryRepo,
sourceFile: filePath,
});
buckets.precise.rules.push(rewritten);
fileEntry.precise.push(rewritten.id);
} catch (error_) {
const errorMessage = error_ instanceof Error ? error_.message : String(error_);
buckets.precise.skipped.push({ file: fileKey, error: errorMessage });
fileEntry.errors.precise = errorMessage;
}
}
}
if (fileEntry.precise.length || Object.keys(fileEntry.errors).length) {
manifest.files[fileKey] = fileEntry;
}
}
manifest.totals = {
filesScanned: sourceFiles.length,
filesWithAnyRule: Object.keys(manifest.files).length,
preciseRulesGenerated: buckets.precise.rules.length,
preciseSkipped: buckets.precise.skipped.length,
};
return { buckets, manifest };
}
function buildBucketHeader(bucket, manifest, ruleCount) {
const count = ruleCount ?? manifest.totals.preciseRules;
return [
`# OpenGrep super-config: ${bucket}`,
`#`,
`# Auto-generated by security/opengrep/compile-rules.mjs.`,
`# DO NOT EDIT BY HAND. Re-run the compile script after editing source rules.`,
`#`,
`# Source rules dir: ${manifest.rulesDir}`,
`# Generated at : ${manifest.generatedAt}`,
`# Rule count : ${count}`,
"",
].join("\n");
}
async function readExistingRules(filePath) {
const { rules, error } = await readRuleFile(filePath);
if (error) {
throw new Error(`Could not read existing precise rules from ${filePath}: ${error}`);
}
return rules;
}
function appendNewRules(existingRules, generatedRules) {
const existingIds = new Set(existingRules.map((rule) => String(rule.id ?? "")));
const appendedRules = [];
const skippedDuplicateIds = [];
for (const rule of generatedRules) {
const id = String(rule.id ?? "");
if (existingIds.has(id)) {
skippedDuplicateIds.push(id);
continue;
}
existingIds.add(id);
appendedRules.push(rule);
}
return {
rules: [...existingRules, ...appendedRules],
appendedRules,
skippedDuplicateIds,
};
}
function detectIdCollisions(rules) {
const seen = new Map();
const dupes = [];
for (const r of rules) {
if (seen.has(r.id)) {
dupes.push({ id: r.id, ghsas: [seen.get(r.id), r.metadata?.ghsa] });
} else {
seen.set(r.id, r.metadata?.ghsa || "");
}
}
return dupes;
}
function disambiguateCollisions(rules) {
const seen = new Map();
const out = [];
for (const r of rules) {
let id = r.id;
if (seen.has(id)) {
const next = (seen.get(id) ?? 1) + 1;
seen.set(id, next);
id = `${id}-${next}`;
} else {
seen.set(id, 1);
}
out.push({ ...r, id });
}
return out;
}
function runCommand(argv, options = {}) {
return new Promise((resolve) => {
const { timeoutMs, ...spawnOptions } = options;
const child = spawn(argv[0], argv.slice(1), {
stdio: ["ignore", "pipe", "pipe"],
...spawnOptions,
});
let stdout = "";
let stderr = "";
let settled = false;
const finish = (result) => {
if (settled) {
return;
}
settled = true;
if (timer) {
clearTimeout(timer);
}
resolve(result);
};
const timer =
timeoutMs && timeoutMs > 0
? setTimeout(() => {
child.kill("SIGKILL");
finish({ code: null, stdout, stderr, timedOut: true });
}, timeoutMs)
: null;
child.stdout.on("data", (chunk) => (stdout += chunk));
child.stderr.on("data", (chunk) => (stderr += chunk));
child.on("close", (code) => finish({ code, stdout, stderr, timedOut: false }));
child.on("error", (err) => finish({ code: -1, stdout, stderr: String(err), timedOut: false }));
});
}
async function findInvalidRuleSpans(superConfigPath) {
const emptyDir = await fs.mkdtemp(path.join(os.tmpdir(), "opengrep-empty-"));
try {
const result = await runCommand(
[
"opengrep",
"scan",
"--no-strict",
"--config",
superConfigPath,
"--json",
"--no-git-ignore",
emptyDir,
],
{ timeoutMs: 120_000 },
);
if (!result.stdout || result.stdout.trim() === "") {
const tail = (result.stderr || "").trim().slice(-500);
return {
invalidLines: new Set(),
errorCount: 0,
validatorOk: false,
validatorError: `opengrep produced no JSON output (exit code ${result.code}). stderr tail: ${tail || "(empty)"}`,
};
}
let parsed;
try {
parsed = JSON.parse(result.stdout);
} catch (parseErr) {
return {
invalidLines: new Set(),
errorCount: 0,
validatorOk: false,
validatorError: `opengrep stdout was not valid JSON (exit code ${result.code}): ${String(parseErr).slice(0, 200)}`,
};
}
const invalidLines = new Set();
const invalidRuleIds = new Set();
const unmappedErrors = [];
let errorCount = 0;
for (const err of parsed.errors || []) {
const ruleId = typeof err.rule_id === "string" ? err.rule_id : "";
if (ruleId) {
invalidRuleIds.add(ruleId);
errorCount += 1;
continue;
}
if (err.type === "InvalidRuleSchemaError") {
errorCount += 1;
for (const span of err.spans || []) {
const start = span.start?.line;
const end = span.end?.line ?? start;
if (typeof start === "number" && typeof end === "number") {
for (let line = start; line <= end; line += 1) {
invalidLines.add(line);
}
}
}
if (!err.spans || err.spans.length === 0) {
unmappedErrors.push(err.type);
}
continue;
}
unmappedErrors.push(err.type || "unknown");
}
if (result.code !== 0 && unmappedErrors.length > 0) {
return {
invalidLines,
invalidRuleIds,
errorCount,
validatorOk: false,
validatorError: `opengrep exited ${result.code} with unmapped errors: ${unmappedErrors.join(", ")}`,
};
}
if (result.code !== 0 && invalidLines.size === 0 && invalidRuleIds.size === 0) {
return {
invalidLines,
invalidRuleIds,
errorCount,
validatorOk: false,
validatorError: `opengrep exited ${result.code} with no mappable rule errors`,
};
}
return { invalidLines, invalidRuleIds, errorCount, validatorOk: true };
} finally {
await fs.rm(emptyDir, { recursive: true, force: true }).catch(() => {});
}
}
function rulesOverlappingLines(superConfigText, invalidLines) {
const lines = superConfigText.split("\n");
const ruleStarts = [];
for (let i = 0; i < lines.length; i += 1) {
if (/^\s{2}-\s+id:\s*/.test(lines[i])) {
ruleStarts.push(i + 1);
}
}
const bad = new Set();
for (const ln of invalidLines) {
let lo = 0;
let hi = ruleStarts.length - 1;
let pick = -1;
while (lo <= hi) {
const mid = (lo + hi) >> 1;
if (ruleStarts[mid] <= ln) {
pick = mid;
lo = mid + 1;
} else {
hi = mid - 1;
}
}
if (pick >= 0) {
bad.add(pick);
}
}
return bad;
}
async function pruneInvalidRulesForBucket(rules, manifest, bucket, outDir, maxIterations = 4) {
let working = rules.slice();
const droppedDetails = [];
for (let iter = 0; iter < maxIterations; iter += 1) {
const yamlText =
buildBucketHeader(bucket, manifest, working.length) +
stringify({ rules: working }, { lineWidth: 0 });
const tmpPath = path.join(outDir, `.tmp-${bucket}.yml`);
await fs.writeFile(tmpPath, yamlText);
const { invalidLines, invalidRuleIds, errorCount, validatorOk, validatorError } =
await findInvalidRuleSpans(tmpPath);
await fs.rm(tmpPath, { force: true }).catch(() => {});
if (!validatorOk) {
throw new Error(
`opengrep schema validation failed for bucket '${bucket}'. Install opengrep ` +
`(https://opengrep.dev) and retry. Validator error: ${validatorError}`,
);
}
if (
errorCount === 0 ||
(invalidLines.size === 0 && (!invalidRuleIds || invalidRuleIds.size === 0))
) {
return { rules: working, droppedDetails };
}
const badIndices = rulesOverlappingLines(yamlText, invalidLines);
if (invalidRuleIds && invalidRuleIds.size > 0) {
for (let i = 0; i < working.length; i += 1) {
const ruleId = String(working[i].id ?? "");
for (const invalidRuleId of invalidRuleIds) {
if (invalidRuleId === ruleId || invalidRuleId.endsWith(`.${ruleId}`)) {
badIndices.add(i);
break;
}
}
}
}
if (badIndices.size === 0) {
throw new Error(
`opengrep reported ${errorCount} invalid ${bucket} rule(s), but the compiler could not map them to generated rules`,
);
}
const next = [];
for (let i = 0; i < working.length; i += 1) {
if (badIndices.has(i)) {
droppedDetails.push({
id: working[i].id,
ghsa: working[i].metadata?.ghsa,
});
} else {
next.push(working[i]);
}
}
working = next;
}
return { rules: working, droppedDetails };
}
async function writeOutputs(buckets, manifest, outDir, opts) {
await fs.mkdir(outDir, { recursive: true });
const precisePath = path.join(outDir, "precise.yml");
const existingRules = opts.replacePrecise ? [] : await readExistingRules(precisePath);
const collisions = detectIdCollisions(buckets.precise.rules);
if (collisions.length > 0) {
console.error(
`[warn] precise: ${collisions.length} duplicate generated rule ids will be auto-suffixed (-2, -3, ...).`,
);
}
const disambiguated = disambiguateCollisions(buckets.precise.rules);
const appendResult = opts.replacePrecise
? { rules: disambiguated, appendedRules: disambiguated, skippedDuplicateIds: [] }
: appendNewRules(existingRules, disambiguated);
let validRules = appendResult.rules;
let droppedDetails = [];
if (appendResult.rules.length > 0) {
console.error(`[info] precise: validating ${appendResult.rules.length} rules with opengrep...`);
({ rules: validRules, droppedDetails } = await pruneInvalidRulesForBucket(
appendResult.rules,
manifest,
"precise",
outDir,
));
} else {
console.error("[info] precise: no rules to validate with opengrep.");
}
buckets.precise.invalid = droppedDetails;
if (droppedDetails.length > 0) {
console.error(`[warn] precise: dropped ${droppedDetails.length} rules with invalid schema.`);
}
const yaml = stringify({ rules: validRules }, { lineWidth: 0 });
await fs.writeFile(precisePath, buildBucketHeader("precise", manifest, validRules.length) + yaml);
manifest.totals.preciseRulesExisting = existingRules.length;
manifest.totals.preciseRulesAppended = appendResult.appendedRules.length;
manifest.totals.preciseRulesDuplicateSkipped = appendResult.skippedDuplicateIds.length;
manifest.totals.preciseRules = validRules.length;
manifest.totals.preciseInvalid = droppedDetails.length;
manifest.preciseInvalid = droppedDetails;
manifest.preciseDuplicateSkipped = appendResult.skippedDuplicateIds;
}
function printSummary(buckets, manifest, outDir) {
console.log(`compile-rules: done`);
console.log(` out-dir : ${outDir}`);
console.log(` files scanned : ${manifest.totals.filesScanned}`);
console.log(` files with rules : ${manifest.totals.filesWithAnyRule}`);
console.log(
` precise rules : ${manifest.totals.preciseRules} total (${manifest.totals.preciseRulesExisting ?? 0} existing, ${manifest.totals.preciseRulesAppended ?? 0} appended, ${manifest.totals.preciseRulesDuplicateSkipped ?? 0} duplicate skipped, yaml-skipped: ${manifest.totals.preciseSkipped}, schema-invalid: ${manifest.totals.preciseInvalid ?? 0})`,
);
const totalDropped =
(manifest.totals.preciseSkipped ?? 0) + (manifest.totals.preciseInvalid ?? 0);
if (totalDropped > 0) {
console.log("\nFirst few skipped/invalid rules:");
for (const s of (buckets.precise.skipped ?? []).slice(0, 3)) {
console.log(` [precise] ${s.file}: yaml: ${s.error.split("\n")[0]}`);
}
for (const s of (buckets.precise.invalid ?? []).slice(0, 3)) {
console.log(` [precise] ${s.id}: schema-invalid`);
}
}
}
async function main() {
const opts = parseArgs(process.argv.slice(2));
if (!opts.outDir) {
opts.outDir = DEFAULT_OUT_DIR;
}
const { buckets, manifest } = await compile(opts);
await writeOutputs(buckets, manifest, opts.outDir, opts);
printSummary(buckets, manifest, opts.outDir);
}
main().catch((err) => {
console.error(`compile-rules: error: ${err.message ?? err}`);
process.exit(1);
});

File diff suppressed because it is too large Load Diff