fix(oc-path): tighten path contracts

This commit is contained in:
Gio Della-Libera
2026-05-08 21:00:14 -07:00
committed by Peter Steinberger
parent 295c30fd61
commit 8f26422840
8 changed files with 191 additions and 36 deletions

View File

@@ -9,11 +9,19 @@ title: "Path"
# `openclaw path`
Plugin-provided shell access to the `oc://` addressing substrate one universal,
kind-dispatched path scheme for inspecting and surgically editing workspace
files (markdown, jsonc, jsonl). Self-hosters and editor extensions use
it to read or write a single leaf inside a workspace file without scripting
against the SDK directly.
Plugin-provided shell access to the `oc://` addressing substrate: one
kind-dispatched path scheme for inspecting and editing addressable workspace
files (markdown, jsonc, jsonl). Self-hosters, plugin authors, and editor
extensions use it to read, find, or update a narrow location without
hand-rolling per-file parsers.
The CLI mirrors the substrate's public verbs:
- `resolve` is concrete and single-match.
- `find` is the multi-match verb for wildcards, unions, predicates, and
positional expansion.
- `set` only accepts concrete paths or insertion markers; wildcard patterns are
rejected before writing.
`path` is provided by the bundled optional `oc-path` plugin. Enable it before
first use:
@@ -26,10 +34,10 @@ openclaw plugins enable oc-path
| Subcommand | Purpose |
| ----------------------- | ---------------------------------------------------------------------------- |
| `resolve <oc-path>` | Print the match at the path (or "not found"). |
| `find <pattern>` | Enumerate matches for a wildcard / predicate path. |
| `set <oc-path> <value>` | Write a leaf at the path. Supports `--dry-run`. |
| `validate <oc-path>` | Parse-only print structural breakdown (file / section / item / field). |
| `resolve <oc-path>` | Print the concrete match at the path (or "not found"). |
| `find <pattern>` | Enumerate matches for a wildcard / union / predicate path. |
| `set <oc-path> <value>` | Write a leaf or insertion target at a concrete path. Supports `--dry-run`. |
| `validate <oc-path>` | Parse-only; print structural breakdown (file / section / item / field). |
| `emit <file>` | Round-trip a file through `parseXxx` + `emitXxx` (byte-fidelity diagnostic). |
## Global flags
@@ -48,7 +56,7 @@ openclaw plugins enable oc-path
oc://FILE/SECTION/ITEM/FIELD?session=SCOPE
```
Slot rules `field` requires `item`, `item` requires `section`. Across all
Slot rules: `field` requires `item`, and `item` requires `section`. Across all
four slots:
- **Quoted segments** — `"a/b.c"` survives `/` and `.` separators.
@@ -65,12 +73,49 @@ four slots:
- **Ordinal** — `#N` for Nth match by document order.
- **Insertion markers** — `+`, `+key`, `+nnn` for keyed / indexed
insertion (use with `set`).
- **Session scope** — `?session=cron:daily` etc. Orthogonal to slot
nesting.
- **Session scope** — `?session=cron-daily` etc. Orthogonal to slot
nesting. Session values are raw, not percent-decoded; they may not contain
control characters or reserved query delimiters (`?`, `&`, `%`).
Reserved characters (`?`, `&`, `%`) outside quoted, predicate, or union
segments are rejected. Control characters (U+0000U+001F, U+007F) are
rejected anywhere.
segments are rejected. Control characters (U+0000-U+001F, U+007F) are rejected
anywhere, including the `session` query value.
`formatOcPath(parseOcPath(path)) === path` is guaranteed for canonical paths.
Non-canonical query parameters are ignored except for the first non-empty
`session=` value.
## Addressing by file kind
| Kind | Addressing model |
| ---------- | -------------------------------------------------------------------------------- |
| Markdown | H2 sections by slug, bullet items by slug or `#N`, frontmatter via `[frontmatter]`. |
| JSONC/JSON | Object keys and array indexes; dots split nested sub-segments unless quoted. |
| JSONL | Top-level line addresses (`L1`, `L2`, `$last`), then JSONC-style descent inside the line. |
`resolve` returns a structured match: `root`, `node`, `leaf`, or
`insertion-point`, with a 1-based line number. Leaf values are surfaced as text
plus a `leafType` so plugin authors can render previews without depending on
the per-kind AST shape.
## Mutation contract
`set` writes one concrete target:
- Markdown frontmatter values and `- key: value` item fields are string leaves.
Markdown insertions append sections, frontmatter keys, or section items and
render a canonical markdown shape for the changed file.
- JSONC leaf writes coerce the string value to the existing leaf type
(`string`, finite `number`, `true`/`false`, or `null`). JSONC object and array
insertions parse `<value>` as JSON and use the `jsonc-parser` edit path for
ordinary leaf writes, preserving comments and nearby formatting.
- JSONL leaf writes coerce like JSONC inside a line. Whole-line replacement and
append parse `<value>` as JSON. Rendered JSONL preserves the file's dominant
LF/CRLF line-ending convention.
Use `--dry-run` before user-visible writes when the exact bytes matter. The
substrate preserves byte-identical output for parse/emit round-trips, but a
mutation can canonicalize the edited region or file depending on kind.
## Examples
@@ -94,6 +139,40 @@ openclaw path set 'oc://gateway.jsonc/version' '2.0'
openclaw path emit ./AGENTS.md
```
More grammar examples:
```bash
# Quote keys containing / or .
openclaw path resolve 'oc://config.jsonc/agents.defaults.models/"anthropic/claude-opus-4-7"/alias'
# Predicate search over JSONC children
openclaw path find 'oc://config.jsonc/plugins/[enabled=true]/id'
# Insert into a JSONC array
openclaw path set 'oc://config.jsonc/items/+1' '{"id":"new","enabled":true}' --dry-run
# Insert a JSONC object key
openclaw path set 'oc://config.jsonc/plugins/+github' '{"enabled":true}' --dry-run
# Append a JSONL event
openclaw path set 'oc://session.jsonl/+' '{"event":"checkpoint","ok":true}' --file ./logs/session.jsonl
# Resolve the last JSONL value line
openclaw path resolve 'oc://session.jsonl/$last/event' --file ./logs/session.jsonl
# Address markdown frontmatter
openclaw path resolve 'oc://AGENTS.md/[frontmatter]/name'
# Insert markdown frontmatter
openclaw path set 'oc://AGENTS.md/[frontmatter]/+description' 'Agent instructions' --dry-run
# Find markdown item fields
openclaw path find 'oc://SKILL.md/Tools/*/send_email'
# Validate a session-scoped path
openclaw path validate 'oc://AGENTS.md/Tools/$last/risk?session=cron-daily'
```
## Exit codes
| Code | Meaning |
@@ -110,7 +189,7 @@ auto-detection.
## Notes
- `set` writes raw bytes through the substrate's emit path, which applies the
- `set` writes bytes through the substrate's emit path, which applies the
redaction-sentinel guard automatically. A leaf carrying
`__OPENCLAW_REDACTED__` (verbatim or as a substring) is refused at write
time.

View File

@@ -1,14 +1,11 @@
/**
* Cross-kind utilities. The substrate exposes per-kind verbs only;
* `inferKind` is a convention helper for callers who want to map
* filename → kind so they can pick the right `parseXxx` / `setXxx` /
* `resolveXxx` function.
* Cross-kind utilities. `inferKind` is a convention helper for callers
* who want to map filename to the parser they should use before calling
* the universal verbs (`resolveOcPath`, `findOcPaths`, `setOcPath`).
*
* Earlier drafts had `resolveOcPath` / `setOcPath` / `appendOcPath`
* universal dispatchers with tagged-union AST inputs. They were dropped
* — the kind tag bled through every consumer (lint runner, doctor
* fixers, tests) since those code paths still needed to know the kind
* to use the result. Per-kind verbs are honest about input/output.
* Encoding remains per-kind (`parseMd`, `parseJsonc`, `parseJsonl`),
* while addressing and mutation dispatch are universal once callers
* have an AST carrying its `kind` discriminator.
*
* @module @openclaw/oc-path/dispatch
*/

View File

@@ -7,16 +7,17 @@
* addressing (resolve/set) is universal.
*
* **Public verbs**:
* - One `setOcPath(ast, path, value)` — universal, kind-dispatched
* - One `resolveOcPath(ast, path)` — universal, kind-dispatched
* - Per-kind `parseXxx` / `emitXxx` (parsing IS per-kind by nature)
* - One `resolveOcPath(ast, path)` - concrete, kind-dispatched
* - One `findOcPaths(ast, pattern)` - multi-match, kind-dispatched
* - One `setOcPath(ast, path, value)` - concrete mutation / insertion
* - Per-kind `parseXxx` / `emitXxx` (parsing is per-kind by nature)
*
* `setOcPath` accepts a string value; the substrate coerces based on
* AST shape at the path location. The OcPath syntax encodes the
* operation: plain path = leaf set, `+` suffix = insertion.
*
* Per-kind set/resolve helpers exist as internal implementation; they
* aren't on the public surface. Callers don't need to pick a kind
* aren't on the public surface. Callers don't need to pick a kind -
* the AST carries its `kind` discriminator and the universal verbs
* dispatch internally.
*

View File

@@ -188,7 +188,12 @@ export function appendJsonlOcPath(ast: JsonlAst, value: JsoncValue): JsonlAst {
value,
raw: "",
};
const next: JsonlAst = { kind: "jsonl", raw: "", lines: [...ast.lines, newLine] };
const next: JsonlAst = {
kind: "jsonl",
raw: "",
lines: [...ast.lines, newLine],
...(ast.lineEnding !== undefined ? { lineEnding: ast.lineEnding } : {}),
};
const rendered = emitJsonl(next, { mode: "render" });
return { ...next, raw: rendered };
}

View File

@@ -3,7 +3,9 @@
*
* oc://{file}[/{section}[/{item}[/{field}]]][?session={id}]
*
* Round-trip contract: `formatOcPath(parseOcPath(s)) === s`.
* Canonical round-trip contract: `formatOcPath(parseOcPath(s)) === s`
* for canonical paths. Extra query parameters are ignored except for
* the first non-empty `session=` value.
*
* @module @openclaw/oc-path/oc-path
*/
@@ -49,7 +51,9 @@ function printable(s: string): string {
* Parsed `oc://` path. Components nest strictly: `item` implies
* `section`, `field` implies `item`. `field` directly under file
* addresses a frontmatter key; under item it addresses the value of a
* `- key: value` bullet.
* `- key: value` bullet. `session` is an opaque raw scope string; it is
* not percent-decoded and cannot contain control characters or reserved
* query delimiters (`?`, `&`, `%`).
*/
export interface OcPath {
readonly file: string;
@@ -102,6 +106,23 @@ function validateFileSlot(file: string, contextInput: string): void {
}
}
function validateSessionSlot(session: string, contextInput: string): void {
if (hasControlChar(session)) {
fail(
`Control character in oc:// session query: ${printable(contextInput)}`,
contextInput,
"OC_PATH_CONTROL_CHAR",
);
}
if (RESERVED_CHARS_RE.test(session)) {
fail(
`Reserved character (\`?\` / \`&\` / \`%\`) in oc:// session query: ${printable(contextInput)}`,
contextInput,
"OC_PATH_RESERVED_CHAR",
);
}
}
/** Parse an `oc://` path string into a structured `OcPath`. */
export function parseOcPath(input: string): OcPath {
if (typeof input !== "string") {
@@ -131,6 +152,13 @@ export function parseOcPath(input: string): OcPath {
if (!normalized.startsWith(OC_SCHEME)) {
fail(`Missing oc:// scheme: ${printable(input)}`, input, "OC_PATH_MISSING_SCHEME");
}
if (hasControlChar(normalized)) {
fail(
`Control character in oc:// path: ${printable(input)}`,
input,
"OC_PATH_CONTROL_CHAR",
);
}
const afterScheme = normalized.slice(OC_SCHEME.length);
// Top-level split skips quoted keys so `"foo?bar"` isn't broken.
@@ -178,7 +206,7 @@ export function parseOcPath(input: string): OcPath {
const file = isQuotedSeg(fileSeg) ? unquoteSeg(fileSeg) : fileSeg;
validateFileSlot(file, input);
const session = extractSession(queryPart);
const session = extractSession(queryPart, input);
return {
file,
...(segments[1] !== undefined ? { section: segments[1] } : {}),
@@ -244,7 +272,10 @@ export function formatOcPath(path: OcPath): string {
if (path.section !== undefined) {out += "/" + formatSlot(path.section, "section");}
if (path.item !== undefined) {out += "/" + formatSlot(path.item, "item");}
if (path.field !== undefined) {out += "/" + formatSlot(path.field, "field");}
if (path.session !== undefined) {out += "?session=" + path.session;}
if (path.session !== undefined) {
validateSessionSlot(path.session, path.file);
out += "?session=" + path.session;
}
if (out.length > MAX_PATH_LENGTH) {
fail(
@@ -464,14 +495,17 @@ export function repackPath(pattern: OcPath, subs: readonly string[]): OcPath {
};
}
function extractSession(queryPart: string): string | undefined {
function extractSession(queryPart: string, input: string): string | undefined {
if (queryPart.length === 0) {return undefined;}
for (const pair of queryPart.split("&")) {
const eqIndex = pair.indexOf("=");
if (eqIndex === -1) {continue;}
const key = pair.slice(0, eqIndex);
const value = pair.slice(eqIndex + 1);
if (key === "session" && value.length > 0) {return value;}
if (key === "session" && value.length > 0) {
validateSessionSlot(value, input);
return value;
}
}
return undefined;
}

View File

@@ -85,6 +85,15 @@ describe("appendJsonlOcPath — session checkpointing primitive", () => {
expect(out).toHaveLength(2);
expect(JSON.parse(out[1] ?? "")).toEqual({ b: 2 });
});
it("preserves CRLF line endings when appending", () => {
const { ast } = parseJsonl('{"a":1}\r\n');
const next = appendJsonlOcPath(ast, {
kind: "object",
entries: [{ key: "b", line: 0, value: { kind: "number", value: 2 } }],
});
expect(emitJsonl(next)).toBe('{"a":1}\r\n{"b":2}');
});
});
describe("setJsonlOcPath — $last line address", () => {

View File

@@ -37,6 +37,27 @@ describe("parseOcPath", () => {
});
});
it("rejects reserved chars in session query values", () => {
expectOcPathError(
() => parseOcPath("oc://SOUL.md?session=cron%2Fdaily"),
"OC_PATH_RESERVED_CHAR",
);
});
it("rejects control chars in session query values", () => {
expectOcPathError(
() => parseOcPath("oc://SOUL.md?session=daily\x00cron"),
"OC_PATH_CONTROL_CHAR",
);
});
it("rejects control chars in ignored query values", () => {
expectOcPathError(
() => parseOcPath("oc://SOUL.md?ignored=\x00"),
"OC_PATH_CONTROL_CHAR",
);
});
it("rejects missing scheme", () => {
expectOcPathError(() => parseOcPath("SOUL.md"), "OC_PATH_MISSING_SCHEME");
});
@@ -88,6 +109,13 @@ describe("formatOcPath", () => {
expect(formatOcPath({ file: "SOUL.md", session: "cron" })).toBe("oc://SOUL.md?session=cron");
});
it("rejects reserved chars in formatted session values", () => {
expectOcPathError(
() => formatOcPath({ file: "SOUL.md", session: "cron&scope=daily" }),
"OC_PATH_RESERVED_CHAR",
);
});
it("rejects empty file", () => {
expectOcPathError(() => formatOcPath({ file: "" }), "OC_PATH_FILE_REQUIRED");
});

View File

@@ -2,7 +2,9 @@
* Universal `setOcPath` / `resolveOcPath` / `detectInsertion`.
* Addressing is universal; encoding is per-kind. Callers pass any AST
* + path + value; the substrate dispatches on `ast.kind` and coerces
* the value based on the AST shape at the resolution point.
* the value based on the AST shape at the resolution point. Wildcard,
* union, and predicate expansion belong to `findOcPaths`; `resolveOcPath`
* and `setOcPath` require concrete paths.
*
* oc://FILE/section/item/field → leaf address
* oc://FILE/section/+ → end-insertion