docs(skills): add release ci workflow skill

2026-05-18 10:14:45 +00:00 · 2026-05-15 06:34:59 +01:00
parent a543d21352
commit c5ca8a17ce
6 changed files with 329 additions and 0 deletions
--- a/.agents/skills/openclaw-release-ci/SKILL.md
+++ b/.agents/skills/openclaw-release-ci/SKILL.md
@@ -0,0 +1,90 @@
+---
+name: openclaw-release-ci
+description: "Run, watch, debug, and summarize OpenClaw full release CI, release checks, live provider gates, install/update proofs, and release-secret preflights."
+---
+
+# OpenClaw Release CI
+
+Use this with `$openclaw-release-maintainer` and `$openclaw-testing` when a release candidate needs full validation, install/update proof, live provider checks, or CI recovery.
+
+## Guardrails
+
+- No version bump, tag, npm publish, GitHub release, or release promotion without explicit operator approval.
+- Validate provider secrets before dispatching expensive full release matrices.
+- Do not set GitHub secrets from unvalidated 1Password candidates. If a candidate returns 401/403, leave the existing secret alone and report the exact missing provider.
+- Use `$one-password` for secret reads/writes: one persistent tmux session, targeted items only, no secret output.
+- Watch one parent run plus compact child summaries. Avoid broad `gh run view` polling loops; REST quota is easy to burn.
+- Fetch logs only for failed or currently-blocking jobs. If quota is low, stop polling and wait for reset.
+- Treat live-provider flakes separately from code failures: prove key validity, provider HTTP status, retry evidence, and exact failing lane before editing code.
+
+## Preflight
+
+Before full release validation:
+
+```bash
+node .agents/skills/openclaw-release-ci/scripts/verify-provider-secrets.mjs --required openai,anthropic,fireworks
+gh api rate_limit --jq '.resources.core'
+git status --short --branch
+git rev-parse HEAD
+```
+
+If env lacks keys, use `$one-password` to inject or set them, then rerun the script. The script prints only provider status and HTTP class, never tokens.
+
+## Dispatch
+
+Prefer the trusted workflow on `main`, target the exact release SHA:
+
+```bash
+gh workflow run full-release-validation.yml \
+  --repo openclaw/openclaw \
+  --ref main \
+  -f ref=<release-sha> \
+  -f provider=openai \
+  -f mode=both \
+  -f release_profile=full \
+  -f rerun_group=all
+```
+
+Use `release_profile=stable` unless the operator explicitly asks for the broad advisory provider/media matrix. Use narrow `rerun_group` after focused fixes.
+
+## Watch
+
+Use the summary helper instead of repeated raw polling:
+
+```bash
+node .agents/skills/openclaw-release-ci/scripts/release-ci-summary.mjs <full-release-run-id>
+```
+
+Then watch only when useful:
+
+```bash
+gh run watch <full-release-run-id> --repo openclaw/openclaw --exit-status
+```
+
+Stop watchers before ending the turn or switching strategy.
+
+## Failure Triage
+
+1. Confirm parent SHA and child run IDs.
+2. List failed jobs only:
+   ```bash
+   gh run view <child-run-id> --repo openclaw/openclaw --json jobs \
+     --jq '.jobs[] | select(.conclusion=="failure" or .conclusion=="timed_out" or .conclusion=="cancelled") | [.databaseId,.name,.conclusion,.url] | @tsv'
+   ```
+3. Fetch one failed job log. If rate-limited, note reset time and avoid more REST calls.
+4. For secret-looking failures, validate the provider endpoint from the same secret source before editing code.
+5. For live-cache failures, inspect whether it is missing/invalid key, empty text, provider refusal, timeout, or baseline miss. Do not weaken release gates without clear provider evidence.
+6. Fix narrowly, run local/changed proof, commit, push, rerun the smallest matching group.
+
+## Evidence
+
+Record:
+
+- release SHA
+- full parent run URL
+- child run IDs and conclusions: CI, Release Checks, Plugin Prerelease, NPM Telegram
+- targeted local proof commands
+- provider-secret preflight result
+- known gaps or unrelated failures
+
+For lessons and recovery patterns, read `references/release-ci-notes.md`.
--- a/.agents/skills/openclaw-release-ci/agents/openai.yaml
+++ b/.agents/skills/openclaw-release-ci/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "OpenClaw Release CI"
+  short_description: "Verify and debug OpenClaw release validation runs"
+  default_prompt: "Use $openclaw-release-ci to preflight provider secrets, watch full release validation, summarize child runs, and triage only failing release lanes."
--- a/.agents/skills/openclaw-release-ci/references/release-ci-notes.md
+++ b/.agents/skills/openclaw-release-ci/references/release-ci-notes.md
@@ -0,0 +1,41 @@
+# Release CI Notes
+
+## What Went Wrong
+
+- Full validation was started before all provider keys were proven valid.
+- GitHub secret presence was confused with key validity.
+- Repeated `gh run view` and log fetches exhausted REST quota.
+- Parent run state was less useful than child run evidence.
+- Live-cache failures needed structured classification: invalid key, empty provider output, timeout, or real cache regression.
+- Background watchers accumulated and made interruption recovery harder.
+
+## Better Defaults
+
+- Run provider-secret preflight first. Require real `/models` or equivalent endpoint checks for release-blocking providers.
+- Keep one watcher open. Use child summaries every few minutes, not every few seconds.
+- Fetch failed-job logs only after a job reaches a terminal failing state.
+- Prefer narrow `rerun_group` recovery after a focused fix.
+- Leave bad secrets unset. A 401 candidate from 1Password should not overwrite GitHub.
+- Make the final release evidence note durable: parent URL, child run URLs, SHA, command proof, and gaps.
+
+## Secret Handling Pattern
+
+- Use `$one-password`; never run broad env dumps.
+- Search exact item titles or known ids.
+- Validate candidates without printing values.
+- Set GitHub secrets only after endpoint validation succeeds.
+- After setting, verify metadata with `gh secret list`, not value output.
+
+## Live Cache Pattern
+
+- Empty text with token usage is a provider/output issue until proven otherwise.
+- Retry lane-level mismatches once with a fresh session id.
+- Keep cache baselines strict, but log enough structured usage to distinguish cache miss from response mismatch.
+- If a provider key validates locally but fails in Actions, inspect whether the workflow reads the expected secret name.
+
+## Quota-Safe GitHub Pattern
+
+- Check `gh api rate_limit --jq '.resources.core'` before log-heavy work.
+- Use one child-run listing call, then inspect failed jobs only.
+- If remaining quota is low, pause until reset; do not keep polling.
+- Prefer GraphQL only for metadata when REST is exhausted; logs still need REST.
--- a/.agents/skills/openclaw-release-ci/scripts/release-ci-summary.mjs
+++ b/.agents/skills/openclaw-release-ci/scripts/release-ci-summary.mjs
@@ -0,0 +1,79 @@
+#!/usr/bin/env node
+import { execFileSync } from "node:child_process";
+import process from "node:process";
+
+const runId = process.argv[2];
+const repo = process.env.OPENCLAW_RELEASE_REPO || "openclaw/openclaw";
+
+if (!runId) {
+  console.error("usage: release-ci-summary.mjs <full-release-run-id>");
+  process.exit(2);
+}
+
+function gh(args) {
+  return execFileSync("gh", args, {
+    encoding: "utf8",
+    stdio: ["ignore", "pipe", "pipe"],
+  });
+}
+
+function jsonGh(args) {
+  return JSON.parse(gh(args));
+}
+
+function rate() {
+  try {
+    return jsonGh(["api", "rate_limit"]).resources.core;
+  } catch {
+    return undefined;
+  }
+}
+
+const core = rate();
+if (core) {
+  const reset = new Date(core.reset * 1000).toISOString();
+  console.log(`rate: remaining=${core.remaining}/${core.limit} reset=${reset}`);
+  if (core.remaining < 20) {
+    console.error("rate too low for CI summary; wait for reset before polling");
+    process.exit(3);
+  }
+}
+
+const parent = jsonGh([
+  "run",
+  "view",
+  runId,
+  "--repo",
+  repo,
+  "--json",
+  "status,conclusion,createdAt,headSha,url,jobs",
+]);
+
+console.log(`parent: ${runId} ${parent.status}/${parent.conclusion || "none"}`);
+console.log(`sha: ${parent.headSha}`);
+console.log(`url: ${parent.url}`);
+
+for (const job of parent.jobs ?? []) {
+  const marker = job.conclusion || job.status;
+  console.log(`parent-job: ${marker} ${job.name}`);
+}
+
+const since = parent.createdAt;
+const runList = gh([
+  "api",
+  `repos/${repo}/actions/runs?per_page=100`,
+  "--jq",
+  `.workflow_runs[] | select(.created_at >= "${since}") | select(.name=="CI" or .name=="OpenClaw Release Checks" or .name=="Plugin Prerelease" or .name=="NPM Telegram Beta E2E" or .name=="Full Release Validation") | [.id,.name,.status,.conclusion,.head_sha,.html_url] | @tsv`,
+]).trim();
+
+if (!runList) {
+  console.log("children: none found yet");
+  process.exit(0);
+}
+
+console.log("children:");
+for (const line of runList.split("\n")) {
+  const [id, name, status, conclusion, sha, url] = line.split("\t");
+  console.log(`child: ${id} ${name} ${status}/${conclusion || "none"} sha=${sha}`);
+  console.log(`child-url: ${url}`);
+}
--- a/.agents/skills/openclaw-release-ci/scripts/verify-provider-secrets.mjs
+++ b/.agents/skills/openclaw-release-ci/scripts/verify-provider-secrets.mjs
@@ -0,0 +1,113 @@
+#!/usr/bin/env node
+import process from "node:process";
+
+const args = new Map();
+for (let index = 2; index < process.argv.length; index += 1) {
+  const arg = process.argv[index];
+  if (!arg.startsWith("--")) continue;
+  const [key, inlineValue] = arg.slice(2).split("=", 2);
+  const value = inlineValue ?? process.argv[index + 1];
+  if (inlineValue === undefined) index += 1;
+  args.set(key, value);
+}
+
+const requiredInput = String(args.get("required") ?? "openai,anthropic").trim();
+const required = new Set(
+  (requiredInput.toLowerCase() === "none" ? "" : requiredInput)
+    .split(",")
+    .map((entry) => entry.trim().toLowerCase())
+    .filter(Boolean),
+);
+
+const timeoutMs = Number(args.get("timeout-ms") ?? 10_000);
+
+function envFirst(names) {
+  for (const name of names) {
+    const value = process.env[name]?.trim();
+    if (value) return { name, value };
+  }
+  return undefined;
+}
+
+async function checkProvider(id, config) {
+  const secret = envFirst(config.env);
+  if (!secret) {
+    return { id, ok: false, status: "missing", env: config.env.join("|") };
+  }
+
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), timeoutMs);
+  try {
+    const headers = config.headers(secret.value);
+    const response = await fetch(config.url, {
+      headers,
+      signal: controller.signal,
+    });
+    return {
+      id,
+      ok: response.ok,
+      status: response.ok ? "ok" : `http_${response.status}`,
+      env: secret.name,
+    };
+  } catch (error) {
+    return {
+      id,
+      ok: false,
+      status: error?.name === "AbortError" ? "timeout" : "error",
+      env: secret.name,
+    };
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+const providers = {
+  openai: {
+    env: ["OPENAI_API_KEY"],
+    url: "https://api.openai.com/v1/models",
+    headers: (token) => ({ authorization: `Bearer ${token}` }),
+  },
+  anthropic: {
+    env: ["ANTHROPIC_API_KEY", "ANTHROPIC_API_TOKEN"],
+    url: "https://api.anthropic.com/v1/models",
+    headers: (token) => ({
+      "anthropic-version": "2023-06-01",
+      "x-api-key": token,
+    }),
+  },
+  fireworks: {
+    env: ["FIREWORKS_API_KEY"],
+    url: "https://api.fireworks.ai/inference/v1/models",
+    headers: (token) => ({ authorization: `Bearer ${token}` }),
+  },
+  openrouter: {
+    env: ["OPENROUTER_API_KEY"],
+    url: "https://openrouter.ai/api/v1/models",
+    headers: (token) => ({ authorization: `Bearer ${token}` }),
+  },
+};
+
+const unknown = [...required].filter((id) => !providers[id]);
+if (unknown.length > 0) {
+  console.error(`unknown providers: ${unknown.join(",")}`);
+  process.exit(2);
+}
+
+const results = [];
+for (const id of Object.keys(providers)) {
+  if (required.has(id) || envFirst(providers[id].env)) {
+    results.push(await checkProvider(id, providers[id]));
+  }
+}
+
+let failed = false;
+for (const result of results) {
+  const requiredLabel = required.has(result.id) ? "required" : "optional";
+  console.log(`${result.id}: ${result.status} env=${result.env} ${requiredLabel}`);
+  if (required.has(result.id) && !result.ok) failed = true;
+}
+
+if (failed) {
+  console.error("release provider secret preflight failed");
+  process.exit(1);
+}
--- a/.gitignore
+++ b/.gitignore
@@ -139,6 +139,8 @@ mantis/
 !.agents/skills/openclaw-refactor-docs/**
 !.agents/skills/openclaw-qa-testing/
 !.agents/skills/openclaw-qa-testing/**
+!.agents/skills/openclaw-release-ci/
+!.agents/skills/openclaw-release-ci/**
 !.agents/skills/openclaw-release-maintainer/
 !.agents/skills/openclaw-release-maintainer/**
 !.agents/skills/openclaw-secret-scanning-maintainer/