From f7a07d300a9ceff151740bae45ba61df66804e8b Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sun, 10 May 2026 14:56:40 +0100 Subject: [PATCH] docs(plugin-sdk): document consolidated workflow seams --- CHANGELOG.md | 3 + .../.generated/plugin-sdk-api-baseline.sha256 | 4 +- docs/plugins/architecture-internals.md | 29 ++++++++++ docs/plugins/sdk-overview.md | 58 +++++++++++++++---- docs/plugins/sdk-runtime.md | 33 +++++++++++ docs/plugins/sdk-subpaths.md | 2 +- 6 files changed, 114 insertions(+), 15 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5e47b45269b..25f0a6ac417 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -37,6 +37,8 @@ Docs: https://docs.openclaw.ai - Agents/Codex: remove the configurable Codex dynamic-tools profile so Codex app-server always owns workspace, edit, patch, exec, process, and plan tools while OpenClaw integration tools remain available. - macOS app: update the Peekaboo bridge dependency to Peekaboo 3.0.0. - Dependencies: refresh workspace pins and move the WhatsApp plugin from `@whiskeysockets/baileys` to `baileys` while keeping the `7.0.0-rc10` runtime. +- Plugin SDK: add bundled-plugin session actions, `sendSessionAttachment`, and Cron-backed `scheduleSessionTurn`/tag cleanup under the grouped session namespace. Replaces #75578/#75581/#75588 and part of #73384/#74483. Thanks @100yenadmin. +- Plugin SDK/media-understanding: add `extractStructuredWithModel(...)` plus the optional provider-side `extractStructured(...)` seam so trusted plugins can run bounded image-first structured extraction with optional supplemental text context through provider-owned runtimes such as Codex. ### Fixes @@ -331,6 +333,7 @@ Docs: https://docs.openclaw.ai - QA/Mantis: accept Blacksmith Testbox `tbx_...` lease ids from desktop smoke warmup, so provider overrides do not fail before inspect/run. Thanks @vincentkoc. - Plugins/SDK: add bounded `before_agent_finalize` retry instructions so workflow plugins can request one more model pass. Thanks @100yenadmin. - Plugin SDK: add plugin-owned `SessionEntry` slot projection and scoped trusted-policy session extension reads. (#75609; replaces part of #73384/#74483) Thanks @100yenadmin. +- Plugin SDK/Gateway: add scoped `plugins.sessionAction` dispatch and plugin-attributed `emitAgentEvent` support so plugins can expose typed session actions and workflow events to trusted clients. (#75578; replaces part of #73384/#74483) Thanks @100yenadmin. - Plugins/SDK: expose host-derived tool target paths to `before_tool_call` and trusted policy hooks so workflow plugins can reason about known file targets without reparsing tool envelopes. (#75605) Thanks @100yenadmin. - Control UI/WebChat: show a persistent compact context usage indicator from fresh session token data before the high-pressure warning state, while keeping the existing compaction prompt threshold. Fixes #46398; refs #45048, #50071, and #73744. Thanks @walterwkchoy, @AxelrodAI, @Brissux, @vincentkoc, and @BunsDev. - Contributor PRs: require external pull requests to include after-fix real behavior proof from a real OpenClaw setup, with terminal screenshots, console output, redacted runtime logs, linked artifacts, and copied live output treated as valid evidence while unit tests, mocks, lint, typechecks, snapshots, and CI remain supplemental only. diff --git a/docs/.generated/plugin-sdk-api-baseline.sha256 b/docs/.generated/plugin-sdk-api-baseline.sha256 index 06672938506..dbc60115afc 100644 --- a/docs/.generated/plugin-sdk-api-baseline.sha256 +++ b/docs/.generated/plugin-sdk-api-baseline.sha256 @@ -1,2 +1,2 @@ -c1d16721a00a26fc578878e508e9a4e2190fe2fadd80534873d0b47ef5e4b513 plugin-sdk-api-baseline.json -cb008db0486c3ba433bfda844a1c2526bb9729afa21f865e0e86f3ad95ac7cfe plugin-sdk-api-baseline.jsonl +19455aee06dd33e2679cfcd8075b10cce806069667097fd7e717aa641c262e51 plugin-sdk-api-baseline.json +ea6e0b36ab14977bed8dcf64118e58a8e58a76f41860c32055a73bcd04612826 plugin-sdk-api-baseline.jsonl diff --git a/docs/plugins/architecture-internals.md b/docs/plugins/architecture-internals.md index 9856a33015f..45e7ecfcd20 100644 --- a/docs/plugins/architecture-internals.md +++ b/docs/plugins/architecture-internals.md @@ -499,6 +499,30 @@ const video = await api.runtime.mediaUnderstanding.describeVideoFile({ filePath: "/tmp/inbound-video.mp4", cfg: api.config, }); + +const extraction = await api.runtime.mediaUnderstanding.extractStructuredWithModel({ + provider: "codex", + model: "gpt-5.5", + input: [ + { + type: "image", + buffer: receiptImageBuffer, + fileName: "receipt.png", + mime: "image/png", + }, + { type: "text", text: "Use the printed fields as the source of truth." }, + ], + instructions: "Return entities and searchable tags.", + schemaName: "example.evidence", + jsonSchema: { + type: "object", + properties: { + entities: { type: "array", items: { type: "string" } }, + tags: { type: "array", items: { type: "string" } }, + }, + }, + cfg: api.config, +}); ``` For audio transcription, plugins can use either the media-understanding runtime @@ -517,6 +541,11 @@ Notes: - `api.runtime.mediaUnderstanding.*` is the preferred shared surface for image/audio/video understanding. +- `extractStructuredWithModel(...)` is the plugin-facing seam for bounded + provider-owned image-first extraction. Include at least one image input; + text inputs are supplemental context. + product plugins own their routes and schemas while OpenClaw owns the + provider/runtime boundary. - Uses core media-understanding audio configuration (`tools.media.audio`) and provider fallback order. - Returns `{ text: undefined }` when no transcription output is produced (for example skipped/unsupported input). - `api.runtime.stt.transcribeAudioFile(...)` remains as a compatibility alias. diff --git a/docs/plugins/sdk-overview.md b/docs/plugins/sdk-overview.md index 392bf179679..4d315d95056 100644 --- a/docs/plugins/sdk-overview.md +++ b/docs/plugins/sdk-overview.md @@ -140,18 +140,52 @@ generic contracts; Plan Mode can use them, but so can approval workflows, workspace policy gates, background monitors, setup wizards, and UI companion plugins. -| Method | Contract it owns | -| ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------- | -| `api.registerSessionExtension(...)` | Plugin-owned, JSON-compatible session state projected through Gateway sessions | -| `api.enqueueNextTurnInjection(...)` | Durable exactly-once context injected into the next agent turn for one session | -| `api.registerTrustedToolPolicy(...)` | Bundled/trusted pre-plugin tool policy that can block or rewrite tool params | -| `api.registerToolMetadata(...)` | Tool catalog display metadata without changing the tool implementation | -| `api.registerCommand(...)` | Scoped plugin commands; command results can set `continueAgent: true`; Discord native commands support `descriptionLocalizations` | -| `api.registerControlUiDescriptor(...)` | Control UI contribution descriptors for session, tool, run, or settings surfaces | -| `api.registerRuntimeLifecycle(...)` | Cleanup callbacks for plugin-owned runtime resources on reset/delete/reload paths | -| `api.registerAgentEventSubscription(...)` | Sanitized event subscriptions for workflow state and monitors | -| `api.setRunContext(...)` / `getRunContext(...)` / `clearRunContext(...)` | Per-run plugin scratch state cleared on terminal run lifecycle | -| `api.registerSessionSchedulerJob(...)` | Plugin-owned session scheduler job records with deterministic cleanup | +| Method | Contract it owns | +| ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------- | +| `api.session.state.registerSessionExtension(...)` | Plugin-owned, JSON-compatible session state projected through Gateway sessions | +| `api.session.workflow.enqueueNextTurnInjection(...)` | Durable exactly-once context injected into the next agent turn for one session | +| `api.registerTrustedToolPolicy(...)` | Bundled/trusted pre-plugin tool policy that can block or rewrite tool params | +| `api.registerToolMetadata(...)` | Tool catalog display metadata without changing the tool implementation | +| `api.registerCommand(...)` | Scoped plugin commands; command results can set `continueAgent: true`; Discord native commands support `descriptionLocalizations` | +| `api.session.controls.registerControlUiDescriptor(...)` | Control UI contribution descriptors for session, tool, run, or settings surfaces | +| `api.lifecycle.registerRuntimeLifecycle(...)` | Cleanup callbacks for plugin-owned runtime resources on reset/delete/reload paths | +| `api.agent.events.registerAgentEventSubscription(...)` | Sanitized event subscriptions for workflow state and monitors | +| `api.runContext.setRunContext(...)` / `getRunContext(...)` / `clearRunContext(...)` | Per-run plugin scratch state cleared on terminal run lifecycle | +| `api.session.workflow.registerSessionSchedulerJob(...)` | Cleanup metadata for plugin-owned scheduler jobs; does not schedule work or create task records | +| `api.session.workflow.sendSessionAttachment(...)` | Bundled-only host-mediated file attachment delivery to the active direct-outbound session route | +| `api.session.workflow.scheduleSessionTurn(...)` / `unscheduleSessionTurnsByTag(...)` | Bundled-only Cron-backed scheduled session turns plus tag-based cleanup | +| `api.session.controls.registerSessionAction(...)` | Typed session actions clients can dispatch through the Gateway | + +Use the grouped namespaces for new plugin code: + +- `api.session.state.registerSessionExtension(...)` +- `api.session.workflow.enqueueNextTurnInjection(...)` +- `api.session.workflow.registerSessionSchedulerJob(...)` +- `api.session.workflow.sendSessionAttachment(...)` +- `api.session.workflow.scheduleSessionTurn(...)` +- `api.session.workflow.unscheduleSessionTurnsByTag(...)` +- `api.session.controls.registerSessionAction(...)` +- `api.session.controls.registerControlUiDescriptor(...)` +- `api.agent.events.registerAgentEventSubscription(...)` +- `api.agent.events.emitAgentEvent(...)` +- `api.runContext.setRunContext(...)` / `getRunContext(...)` / `clearRunContext(...)` +- `api.lifecycle.registerRuntimeLifecycle(...)` + +The equivalent flat methods remain available as deprecated compatibility +aliases for existing plugins. Do not add new plugin code that calls +`api.registerSessionExtension`, `api.enqueueNextTurnInjection`, +`api.registerControlUiDescriptor`, `api.registerRuntimeLifecycle`, +`api.registerAgentEventSubscription`, `api.emitAgentEvent`, +`api.setRunContext`, `api.getRunContext`, `api.clearRunContext`, +`api.registerSessionSchedulerJob`, `api.registerSessionAction`, +`api.sendSessionAttachment`, `api.scheduleSessionTurn`, or +`api.unscheduleSessionTurnsByTag` directly. + +`scheduleSessionTurn(...)` is a session-scoped convenience over the Gateway +Cron scheduler. Cron owns timing and creates the background task record when the +turn runs; the Plugin SDK only constrains the target session, plugin-owned +naming, and cleanup. Use `api.runtime.tasks.managedFlows` inside the scheduled +turn when the work itself needs durable multi-step Task Flow state. The contracts intentionally split authority: diff --git a/docs/plugins/sdk-runtime.md b/docs/plugins/sdk-runtime.md index 104f18df93d..9f853248c06 100644 --- a/docs/plugins/sdk-runtime.md +++ b/docs/plugins/sdk-runtime.md @@ -217,6 +217,11 @@ Provider and channel execution paths must use the active runtime config snapshot Bind a Task Flow runtime to an existing OpenClaw session key or trusted tool context, then create and manage Task Flows without passing an owner on every call. + Task Flow tracks durable multi-step workflow state. It is not a scheduler: + use Cron or `api.session.workflow.scheduleSessionTurn(...)` for future + wakeups, then use `managedFlows` from the scheduled turn when that work + needs flow state, child tasks, waits, or cancellation. + ```typescript const taskFlow = api.runtime.tasks.managedFlows.fromToolContext(ctx); @@ -300,6 +305,34 @@ Provider and channel execution paths must use the active runtime config snapshot filePath: "/tmp/inbound-file.pdf", cfg: api.config, }); + + // Structured image extraction through a specific provider/model. + // Include at least one image; text inputs are supplemental context. + const evidence = await api.runtime.mediaUnderstanding.extractStructuredWithModel({ + provider: "codex", + model: "gpt-5.5", + input: [ + { + type: "image", + buffer: receiptImageBuffer, + fileName: "receipt.png", + mime: "image/png", + }, + { type: "text", text: "Prefer the printed total over handwritten notes." }, + ], + instructions: "Extract vendor, total, and searchable tags.", + schemaName: "receipt.evidence", + jsonSchema: { + type: "object", + properties: { + vendor: { type: "string" }, + total: { type: "number" }, + tags: { type: "array", items: { type: "string" } }, + }, + required: ["vendor", "total"], + }, + cfg: api.config, + }); ``` Returns `{ text: undefined }` when no output is produced (e.g. skipped input). diff --git a/docs/plugins/sdk-subpaths.md b/docs/plugins/sdk-subpaths.md index 90b1093d134..8c3fca0322e 100644 --- a/docs/plugins/sdk-subpaths.md +++ b/docs/plugins/sdk-subpaths.md @@ -306,7 +306,7 @@ focused channel/runtime subpaths, `config-contracts`, `string-coerce-runtime`, | `plugin-sdk/media-mime` | Narrow MIME normalization, file-extension mapping, MIME detection, and media-kind helpers | | `plugin-sdk/media-store` | Narrow media store helpers such as `saveMediaBuffer` | | `plugin-sdk/media-generation-runtime` | Shared media-generation failover helpers, candidate selection, and missing-model messaging | - | `plugin-sdk/media-understanding` | Media understanding provider types plus provider-facing image/audio helper exports | + | `plugin-sdk/media-understanding` | Media understanding provider types plus provider-facing image/audio/structured-extraction helper exports | | `plugin-sdk/text-chunking` | Text and markdown chunking/render helpers, markdown table conversion, directive-tag stripping, and safe-text utilities | | `plugin-sdk/text-chunking` | Outbound text chunking helper | | `plugin-sdk/speech` | Speech provider types plus provider-facing directive, registry, validation, OpenAI-compatible TTS builder, and speech helper exports |