mirror of https://github.com/openclaw/openclaw.git synced 2026-04-12 09:41:11 +00:00

Go to file

xieyongliang 2c57ec7b5f video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

* video_generate: add providerOptions, inputAudios, and imageRoles

- VideoGenerationSourceAsset gains an optional `role` field (e.g.
  "first_frame", "last_frame"); core treats it as opaque and forwards it
  to the provider unchanged.

- VideoGenerationRequest gains `inputAudios` (reference audio assets,
  e.g. background music) and `providerOptions` (arbitrary
  provider-specific key/value pairs forwarded as-is).

- VideoGenerationProviderCapabilities gains `maxInputAudios`.

- video_generate tool schema adds:
  - `imageRoles` array (parallel to `images`, sets role per asset)
  - `audioRef` / `audioRefs` (single/multi reference audio inputs)
  - `providerOptions` (JSON object passed through to the provider)
  - `MAX_INPUT_IMAGES` bumped 5 → 9; `MAX_INPUT_AUDIOS` = 3

- Capability validation extended to gate on `maxInputAudios`.

- runtime.ts threads `inputAudios` and `providerOptions` through to
  `provider.generateVideo`.

- Docs and runtime tests updated.

Made-with: Cursor

* docs: fix BytePlus Seedance capability table — split 1.5 and 2.0 rows

1.5 Pro supports at most 2 input images (first_frame + last_frame);
2.0 supports up to 9 reference images, 3 videos, and 3 audios.
Provider notes section updated accordingly.

Made-with: Cursor

* docs: list all Seedance 1.0 models in video-generation provider table

- Default model updated to seedance-1-0-pro-250528 (was the T2V lite)
- Provider notes now enumerate all five 1.0 model IDs with T2V/I2V capability notes

Made-with: Cursor

* video_generate: address review feedback (P1/P2)

P1: Add "adaptive" to SUPPORTED_ASPECT_RATIOS so provider-specific ratio
passthrough (used by Seedance 1.5/2.0) is accepted instead of throwing.
Update error message to include "adaptive" in the allowed list.

P1: Fix audio input capability default — when a provider does not declare
maxInputAudios, default to 0 (no audio support) instead of MAX_INPUT_AUDIOS.
Providers must explicitly opt in via maxInputAudios to accept audio inputs.

P2: Remove unnecessary type cast in imageRoles assignment; VideoGenerationSourceAsset
already declares role?: string so a non-null assertion suffices.

P2: Add videoRoles and audioRoles tool parameters, parallel to imageRoles,
so callers can assign semantic role hints to reference video and audio assets
(e.g. "reference_video", "reference_audio" for Seedance 2.0).

Made-with: Cursor

* video_generate: fix check-docs formatting and snake_case param reading

Made-with: Cursor

* video_generate: clarify *Roles are parallel to combined input list (P2)

Made-with: Cursor

* video_generate: add missing duration import; fix corrupted docs section

Made-with: Cursor

* video_generate: pass mode inputs to duration resolver; note plugin requirement (P2)

Made-with: Cursor

* plugin-sdk: sync new video-gen fields — role, inputAudios, providerOptions, maxInputAudios

Add fields introduced by core in the PR1 batch to the public plugin-sdk
mirror so TypeScript provider plugins can declare and consume them
without type assertions:
- VideoGenerationSourceAsset.role?: string
- VideoGenerationRequest.inputAudios and .providerOptions
- VideoGenerationModeCapabilities.maxInputAudios

The AssertAssignable bidirectional checks still pass because all new
fields are optional; this change makes the SDK surface complete.

Made-with: Cursor

* video-gen runtime: skip failover candidates lacking audio capability

Made-with: Cursor

* video-gen: fall back to flat capabilities.maxInputAudios in failover and tool validation

Made-with: Cursor

* video-gen: defer audio-count check to runtime, enabling fallback for audio-capable candidates

Made-with: Cursor

* video-gen: defer maxDurationSeconds check to runtime, enabling fallback for higher-cap candidates

Made-with: Cursor

* video-gen: add VideoGenerationAssetRole union and typed providerOptions capability

Introduces a canonical VideoGenerationAssetRole union (first_frame,
last_frame, reference_image, reference_video, reference_audio) for the
source-asset role hint, and a VideoGenerationProviderOptionType tag
('number' | 'boolean' | 'string') plus a new capabilities.providerOptions
schema that providers use to declare which opaque providerOptions keys
they accept and with what primitive type.

Types are additive and backwards compatible. The role field accepts both
canonical union values and arbitrary provider-specific strings via a
`VideoGenerationAssetRole | (string & {})` union, so autocomplete works
for the common case without blocking provider-specific extensions.

Runtime enforcement of providerOptions (skip-in-fallback, unknown key
and type mismatch) lands in a follow-up commit.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: enforce typed providerOptions schema via skip-in-fallback

Adds `validateProviderOptionsAgainstDeclaration` in the video-generation
runtime and wires it into the `generateVideo` candidate loop alongside
the existing audio-count and duration-cap skip guards.

Behavior:
  - Candidates with no declared `capabilities.providerOptions` skip any
    non-empty providerOptions payload with a clear skip reason, so a
    provider that would ignore `{seed: 42}` and succeed without the
    caller's intent never gets reached.
  - Candidates that declare a schema reject unknown keys with the list
    of accepted keys in the error.
  - Candidates that declare a schema reject type mismatches (expected
    number/boolean/string) with the declared type in the error.
  - All skip reasons push into `attempts` so the aggregated failure
    message at the end of the fallback chain explains exactly why each
    candidate was rejected.

Also hardens the tool boundary: `providerOptions` that is not a plain
JSON object (including bogus arrays like `["seed", 42]`) now throws a
`ToolInputError` up front instead of being cast to `Record` and
forwarded with numeric-string keys.

Consistent with the audio/duration skip-in-fallback pattern introduced
by yongliang.xie in earlier commits on this branch.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: harden *Roles parity + document canonical role values

Replaces the inline `parseRolesArg` lambda with a dedicated
`parseRoleArray` helper that throws a ToolInputError when the caller
supplies more roles than assets. Off-by-one alignment mistakes in
`imageRoles` / `videoRoles` / `audioRoles` now fail loudly at the tool
boundary instead of silently dropping trailing roles.

Also tightens the schema descriptions to document the canonical
VideoGenerationAssetRole values (first_frame, last_frame, reference_*)
and the skip-in-fallback contract on providerOptions, and rejects
non-array inputs to any `*Roles` field early rather than coercing them
to an empty list.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: surface dropped aspectRatio sentinels in ignoredOverrides

"adaptive" and other provider-specific sentinel aspect ratios are
unparseable as numeric ratios, so when the active provider does not
declare the sentinel in caps.aspectRatios, `resolveClosestAspectRatio`
returns undefined and the previous code silently nulled out
`aspectRatio` without surfacing a warning.

Push the dropped value into `ignoredOverrides` so the tool result
warning path ("Ignored unsupported overrides for …") picks it up, and
the caller gets visible feedback that the request was dropped instead
of a silent no-op. Also corrects the tool-side comment on
SUPPORTED_ASPECT_RATIOS to describe actual behavior.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: surface declared providerOptions + maxInputAudios in action=list

`video_generate action=list` now includes the declared providerOptions
schema (key:type) per provider, so agents can discover which opaque
keys each provider accepts without trial and error. Both mode-level and
flat-provider providerOptions declarations are merged, matching the
runtime lookup order in `generateVideo`.

Also surfaces `maxInputAudios` alongside the other max-input counts for
completeness — previously the list output did not expose the audio cap
at all, even though the tool validates against it.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: warn once per request when runtime skips a fallback candidate

The skip-in-fallback guards (audio cap, duration cap, providerOptions)
all logged at debug level, which meant operators had no visible signal
when the primary provider was silently passed over in favor of a
fallback. Add a first-skip log.warn in the runtime loop so the reason
for the first rejection is surfaced once per request, and leave the
rest of the skip events at debug to avoid flooding on long chains.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: cover new tool-level behavior with regression tests

Adds regression tests for:
  - providerOptions shape rejection (arrays, strings)
  - providerOptions happy-path forwarding to runtime
  - imageRoles length-parity guard
  - *Roles non-array rejection
  - positional role attachment to loaded reference images
  - audio data: URL templated rejection branch
  - aspectRatio='adaptive' acceptance and forwarding
  - unsupported aspectRatio rejection (mentions 'adaptive' in the error)

All eight new cases run in the existing video-generate-tool suite and
use the same provider-mock pattern already established in the file.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* video-gen: cover runtime providerOptions skip-in-fallback branches

Adds runtime regression tests for the new typed-providerOptions guard:
  - candidates without a declared providerOptions schema are skipped
    when any providerOptions is supplied (prevents silent drop)
  - candidates that declare a schema skip on unknown keys with the
    accepted-key list surfaced in the error
  - candidates that declare a schema skip on type mismatches with the
    declared type surfaced in the error
  - end-to-end fallback: openai (no providerOptions) is skipped and
    byteplus (declared schema) accepts the same request, with an
    attempt entry recording the first skip reason

Also updates the existing 'forwards providerOptions to the provider
unchanged' case so the destination provider declares the matching
typed schema, and wires a `warn` stub into the hoisted logger mock
so the new first-skip log.warn call path does not blow up.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* changelog: note video_generate providerOptions / inputAudios / role hints

Adds an Unreleased Changes entry describing the user-visible surface
expansion for video_generate: typed providerOptions capability,
inputAudios reference audio, per-asset role hints via the canonical
VideoGenerationAssetRole union, the 'adaptive' aspect-ratio sentinel,
maxInputAudios capability, and the relaxed 9-image cap.

Credits the original PR author.

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>

* byteplus: declare providerOptions schema (seed, draft, camerafixed) and forward to API

Made-with: Cursor

* byteplus: fix camera_fixed body field (API uses underscore, not camerafixed)

Made-with: Cursor

* fix(byteplus): normalize resolution to lowercase before API call

The Seedance API rejects resolution values with uppercase letters —
"480P", "720P" etc return InvalidParameter, while "480p", "720p"
are accepted. This was breaking the video generation live test
(resolveLiveVideoResolution returns "480P").

Normalize req.resolution to lowercase at the provider layer before
setting body.resolution, so any caller-supplied casing is corrected
without requiring changes to the VideoGenerationResolution type or
live-test helpers.

Verified via direct API call:
  body.resolution = "480P" → HTTP 400 InvalidParameter
  body.resolution = "480p" → task created successfully
  body.resolution = "720p" → task created successfully (t2v, i2v, 1.5-pro)
  body.resolution = "1080p" → task created successfully

Made-with: Cursor

* video-gen/byteplus: auto-select i2v model when input images provided with t2v model

Seedance 1.0 uses separate model IDs for T2V (seedance-1-0-lite-t2v-250428)
and I2V (seedance-1-0-lite-i2v-250428). When the caller requests a T2V model
but also provides inputImages, the API rejects with task_type i2v not supported
on t2v model.

Fix: when inputImages are present and the requested model contains "-t2v-",
auto-substitute "-i2v-" so the API receives the correct model. Seedance 1.5 Pro
uses a single model ID for both modes and is unaffected by this substitution.

Verified via live test: both mode=generate and mode=imageToVideo pass for
byteplus/seedance-1-0-lite-t2v-250428 with no failures.

Co-authored-by: odysseus0 <odysseus0@example.com>
Made-with: Cursor

* video-gen: fix duration rounding + align BytePlus (1.0) docs (P2)

Made-with: Cursor

* video-gen: relax providerOptions gate for undeclared-schema providers (P1)

Distinguish undefined (not declared = backward-compat pass-through) from
{} (explicitly declared empty = no options accepted) in
validateProviderOptionsAgainstDeclaration. Providers without a declared
schema receive providerOptions as-is; providers with an explicit empty
schema still skip. Typed schemas continue to validate key names and types.

Also: restore camera_fixed (underscore) in BytePlus provider schema and
body key (regression from earlier rebase), remove duplicate local
readBooleanToolParam definition now imported from media-tool-shared,
update tests and docs accordingly.

Made-with: Cursor

* video_generate: add landing follow-up coverage

* video_generate: finalize plugin-sdk baseline (#61987) (thanks @xieyongliang)

---------

Co-authored-by: yongliang.xie <yongliang.xie@bytedance.com>
Co-authored-by: George Zhang <georgezhangtj97@gmail.com>
Co-authored-by: odysseus0 <odysseus0@example.com>

2026-04-11 02:23:14 -07:00

.agents

test: harden macOS npm update smoke fallback

2026-04-09 04:07:45 +01:00

.github

fix(release): write npm auth for latest promotion

2026-04-11 04:29:25 +01:00

.pi

ui: fix sessions table collapse on narrow widths (#12175 )

2026-03-09 23:14:07 -05:00

.vscode

…

apps

fix(talk): fix ensure permissions on first execution of Talk Mode in MacOS (#62459 )

2026-04-11 18:08:45 +10:00

assets

!refactor(browser): remove Chrome extension path and add MCP doctor migration (#47893 )

2026-03-15 23:56:08 -07:00

docs

video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

2026-04-11 02:23:14 -07:00

extensions

video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

2026-04-11 02:23:14 -07:00

git-hooks

fix(hooks): skip full gate for docs-only commits

2026-04-01 20:02:54 +09:00

packages

refactor: simplify cli conversions

2026-04-11 01:27:48 +01:00

patches

…

test: tighten qa live scenarios

2026-04-11 00:58:40 +01:00

scripts

test(install): pin smoke docker platform

2026-04-11 03:31:47 +01:00

skills

style: apply oxfmt cleanup

2026-04-10 23:09:37 +01:00

src

video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

2026-04-11 02:23:14 -07:00

Swabble

refactor(voicewake): mark transcript parameter unused

2026-03-14 12:48:12 +11:00

test

video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

2026-04-11 02:23:14 -07:00

test-fixtures

test: sync gateway and config expectations

2026-04-07 08:05:32 +01:00

fix: restore memory wiki and dreaming checks

2026-04-11 06:15:21 +01:00

vendor/a2ui

build: refresh deps and vitest cache lanes

2026-03-27 02:26:07 +00:00

.codex

fix(browser): block SSRF redirect bypass via real-time route interception (#58771 )

2026-04-02 09:07:57 -07:00

.detect-secrets.cfg

fix: stabilize launchd paths and appcast secret scan

2026-03-09 08:37:37 +00:00

.dockerignore

Config: separate core/plugin baseline entries (#60162 )

2026-04-03 18:26:23 +09:00

.env.example

…

.gitattributes

…

.gitignore

chore: polish qa lab follow-ups

2026-04-05 23:21:56 +01:00

.jscpd.json

build: ignore generated docs and changelogs in jscpd

2026-03-13 20:19:39 +00:00

.mailmap

…

.markdownlint-cli2.jsonc

docs: allow sponsor table markup in markdownlint

2026-03-31 23:18:55 +01:00

.npmignore

refactor(plugin-sdk): untangle extension test seams

2026-03-29 23:43:53 +01:00

.npmrc

Build: use hoisted pnpm linker

2026-03-18 10:14:53 -07:00

.oxfmtrc.jsonc

chore: update dependencies and oxc tooling

2026-04-10 19:28:42 +01:00

.oxlintrc.json

lint: enable small oxlint rules

2026-04-11 02:15:21 +01:00

.pre-commit-config.yaml

fix: stabilize launchd paths and appcast secret scan

2026-03-09 08:37:37 +00:00

.prettierignore

fix(ci): restore config baseline release-check output (#47629 )

2026-03-15 14:14:30 -07:00

.secrets.baseline

feat(gateway): make health monitor stale threshold and max restarts configurable (openclaw#42107)

2026-03-14 21:21:56 -05:00

.shellcheckrc

…

.swiftformat

fix(swiftformat): sync GatewayModels exclusions with OpenClawProtocol (#41242 )

2026-03-09 20:42:54 -05:00

.swiftlint.yml

fix(swiftformat): sync GatewayModels exclusions with OpenClawProtocol (#41242 )

2026-03-09 20:42:54 -05:00

AGENTS.md

docs(agents): add tsgo triage guidance

2026-04-10 08:40:54 +01:00

appcast.xml

chore(release): update macOS appcast for v2026.4.10

2026-04-11 04:32:55 +01:00

CHANGELOG.md

video_generate: add providerOptions, inputAudios, and imageRoles (#61987 )

2026-04-11 02:23:14 -07:00

CLAUDE.md

…

CONTRIBUTING.md

docs(boundary): codify shared test helper plugin seams

2026-04-10 08:27:35 +01:00

docker-compose.yml

docs(install): update container setup paths

2026-03-19 13:40:26 -07:00

docker-setup.sh

refactor(scripts): move container setup entrypoints

2026-03-19 13:40:26 -07:00

Dockerfile

fix: make qa lab docker boot resilient

2026-04-07 09:04:18 +01:00

Dockerfile.sandbox

docker: add apt-get upgrade to all Dockerfiles (#45384 )

2026-03-13 16:23:02 -07:00

Dockerfile.sandbox-browser

fix(sandbox): add CJK fonts to browser image (#56905 )

2026-03-29 18:21:51 +09:00

Dockerfile.sandbox-common

docker: add apt-get upgrade to all Dockerfiles (#45384 )

2026-03-13 16:23:02 -07:00

docs.acp.md

acp: enrich streaming updates for ide clients (#41442 )

2026-03-09 22:26:46 +01:00

dream-diary-preview-v2.html

style(preview): format dream diary preview files

2026-04-06 16:16:10 +01:00

dream-diary-preview-v3.html

style(preview): format dream diary preview files

2026-04-06 16:16:10 +01:00

fix2.py

fix(heartbeat): preserve HEARTBEAT.md directives in task-mode prompt

2026-04-04 15:09:48 +01:00

fly.private.toml

…

fly.toml

…

INCIDENT_RESPONSE.md

Update INCIDENT_RESPONSE.md

2026-04-11 01:43:58 +01:00

knip.config.ts

chore(deadcode): fix knip scan config

2026-04-06 16:13:26 +01:00

LICENSE

…

Makefile

fix(tasks): restore registry build compatibility

2026-03-31 19:59:21 +01:00

openclaw.mjs

fix(cli): precompute bare root help startup path

2026-03-24 12:24:52 -07:00

openclaw.podman.env

…

package.json

chore(release): bump version to 2026.4.11

2026-04-11 04:51:17 +01:00

pnpm-lock.yaml

chore: prepare 2026.4.10 release

2026-04-11 03:22:18 +01:00

pnpm-workspace.yaml

build(deps): update workspace dependencies

2026-04-10 19:17:39 +01:00

pyproject.toml

…

README.md

docs: make README model guidance provider-agnostic

2026-04-07 09:17:05 +01:00

render.yaml

docs(render): fix port env var, remove nonexistent setup wizard

2026-03-22 22:10:28 +00:00

SECURITY.md

docs(security): clarify localhost shared-auth trust model

2026-04-05 23:12:52 +01:00

setup-podman.sh

refactor(scripts): move container setup entrypoints

2026-03-19 13:40:26 -07:00

tsconfig.json

perf: trim tsgo input graph

2026-04-10 15:56:56 +01:00

tsconfig.oxlint.json

chore: update dependencies and oxc tooling

2026-04-10 19:28:42 +01:00

tsconfig.plugin-sdk.dts.json

build: narrow plugin SDK declaration build

2026-04-08 20:00:51 +01:00

tsdown.config.ts

build: stage nostr runtime dependencies

2026-04-08 20:05:55 +01:00

VISION.md

…

vitest.config.ts

test: move Vitest configs under test

2026-04-10 13:44:51 +01:00

zizmor.yml

…

README.md

🦞 OpenClaw — Personal AI Assistant

EXFOLIATE! EXFOLIATE!

OpenClaw is a personal AI assistant you run on your own devices. It answers you on the channels you already use (WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, iMessage, BlueBubbles, IRC, Microsoft Teams, Matrix, Feishu, LINE, Mattermost, Nextcloud Talk, Nostr, Synology Chat, Tlon, Twitch, Zalo, Zalo Personal, WeChat, WebChat). It can speak and listen on macOS/iOS/Android, and can render a live Canvas you control. The Gateway is just the control plane — the product is the assistant.

If you want a personal, single-user assistant that feels local, fast, and always-on, this is it.

Website · Docs · Vision · DeepWiki · Getting Started · Updating · Showcase · FAQ · Onboarding · Nix · Docker · Discord

Preferred setup: run openclaw onboard in your terminal. OpenClaw Onboard guides you step by step through setting up the gateway, workspace, channels, and skills. It is the recommended CLI setup path and works on macOS, Linux, and Windows (via WSL2; strongly recommended). Works with npm, pnpm, or bun. New install? Start here: Getting started

Models (selection + auth)

Models config + CLI: Models
Auth profile rotation (OAuth vs API keys) + fallbacks: Model failover

Install (recommended)

Runtime: Node 24 (recommended) or Node 22.16+.

npm install -g openclaw@latest
# or: pnpm add -g openclaw@latest

openclaw onboard --install-daemon

OpenClaw Onboard installs the Gateway daemon (launchd/systemd user service) so it stays running.

Quick start (TL;DR)

Runtime: Node 24 (recommended) or Node 22.16+.

Full beginner guide (auth, pairing, channels): Getting started

openclaw onboard --install-daemon

openclaw gateway --port 18789 --verbose

# Send a message
openclaw message send --to +1234567890 --message "Hello from OpenClaw"

# Talk to the assistant (optionally deliver back to any connected channel: WhatsApp/Telegram/Slack/Discord/Google Chat/Signal/iMessage/BlueBubbles/IRC/Microsoft Teams/Matrix/Feishu/LINE/Mattermost/Nextcloud Talk/Nostr/Synology Chat/Tlon/Twitch/Zalo/Zalo Personal/WeChat/WebChat)
openclaw agent --message "Ship checklist" --thinking high

Upgrading? Updating guide (and run openclaw doctor).

Development channels

stable: tagged releases (vYYYY.M.D or vYYYY.M.D-<patch>), npm dist-tag latest.
beta: prerelease tags (vYYYY.M.D-beta.N), npm dist-tag beta (macOS app may be missing).
dev: moving head of main, npm dist-tag dev (when published).

Switch channels (git + npm): openclaw update --channel stable|beta|dev. Details: Development channels.

From source (development)

Prefer pnpm for builds from source. Bun is optional for running TypeScript directly.

git clone https://github.com/openclaw/openclaw.git
cd openclaw

pnpm install
pnpm ui:build # auto-installs UI deps on first run
pnpm build

pnpm openclaw onboard --install-daemon

# Dev loop (auto-reload on source/config changes)
pnpm gateway:watch

Note: pnpm openclaw ... runs TypeScript directly (via tsx). pnpm build produces dist/ for running via Node / the packaged openclaw binary.

Security defaults (DM access)

OpenClaw connects to real messaging surfaces. Treat inbound DMs as untrusted input.

Full security guide: Security

Default behavior on Telegram/WhatsApp/Signal/iMessage/Microsoft Teams/Discord/Google Chat/Slack:

DM pairing (dmPolicy="pairing" / channels.discord.dmPolicy="pairing" / channels.slack.dmPolicy="pairing"; legacy: channels.discord.dm.policy, channels.slack.dm.policy): unknown senders receive a short pairing code and the bot does not process their message.
Approve with: openclaw pairing approve <channel> <code> (then the sender is added to a local allowlist store).
Public inbound DMs require an explicit opt-in: set dmPolicy="open" and include "*" in the channel allowlist (allowFrom / channels.discord.allowFrom / channels.slack.allowFrom; legacy: channels.discord.dm.allowFrom, channels.slack.dm.allowFrom).

Run openclaw doctor to surface risky/misconfigured DM policies.

Highlights

Local-first Gateway — single control plane for sessions, channels, tools, and events.
Multi-channel inbox — WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, BlueBubbles (iMessage), iMessage (legacy), IRC, Microsoft Teams, Matrix, Feishu, LINE, Mattermost, Nextcloud Talk, Nostr, Synology Chat, Tlon, Twitch, Zalo, Zalo Personal, WeChat, WebChat, macOS, iOS/Android.
Multi-agent routing — route inbound channels/accounts/peers to isolated agents (workspaces + per-agent sessions).
Voice Wake + Talk Mode — wake words on macOS/iOS and continuous voice on Android (ElevenLabs + system TTS fallback).
Live Canvas — agent-driven visual workspace with A2UI.
First-class tools — browser, canvas, nodes, cron, sessions, and Discord/Slack actions.
Companion apps — macOS menu bar app + iOS/Android nodes.
Onboarding + skills — onboarding-driven setup with bundled/managed/workspace skills.

Star History

Everything we built so far

Core platform

Gateway WS control plane with sessions, presence, config, cron, webhooks, Control UI, and Canvas host.
CLI surface: gateway, agent, send, onboarding, and doctor.
Pi agent runtime in RPC mode with tool streaming and block streaming.
Session model: main for direct chats, group isolation, activation modes, queue modes, reply-back. Group rules: Groups.
Media pipeline: images/audio/video, transcription hooks, size caps, temp file lifecycle. Audio details: Audio.

Channels

Channels: WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Google Chat (Chat API), Signal (signal-cli), BlueBubbles (iMessage, recommended), iMessage (legacy imsg), IRC, Microsoft Teams, Matrix, Feishu, LINE, Mattermost, Nextcloud Talk, Nostr, Synology Chat, Tlon, Twitch, Zalo, Zalo Personal, WeChat (@tencent-weixin/openclaw-weixin), WebChat.
Group routing: mention gating, reply tags, per-channel chunking and routing. Channel rules: Channels.

Apps + nodes

macOS app: menu bar control plane, Voice Wake/PTT, Talk Mode overlay, WebChat, debug tools, remote gateway control.
iOS node: Canvas, Voice Wake, Talk Mode, camera, screen recording, Bonjour + device pairing.
Android node: Connect tab (setup code/manual), chat sessions, voice tab, Canvas, camera/screen recording, and Android device commands (notifications/location/SMS/photos/contacts/calendar/motion/app update).
macOS node mode: system.run/notify + canvas/camera exposure.

Tools + automation

Browser control: dedicated openclaw Chrome/Chromium, snapshots, actions, uploads, profiles.
Canvas: A2UI push/reset, eval, snapshot.
Nodes: camera snap/clip, screen record, location.get, notifications.
Cron + wakeups; webhooks; Gmail Pub/Sub.
Skills platform: bundled, managed, and workspace skills with install gating + UI.

Runtime + safety

Channel routing, retry policy, and streaming/chunking.
Presence, typing indicators, and usage tracking.
Models, model failover, and session pruning.
Security and troubleshooting.

Ops + packaging

Control UI + WebChat served directly from the Gateway.
Tailscale Serve/Funnel or SSH tunnels with token/password auth.
Nix mode for declarative config; Docker-based installs.
Doctor migrations, logging.

How it works (short)

WhatsApp / Telegram / Slack / Discord / Google Chat / Signal / iMessage / BlueBubbles / IRC / Microsoft Teams / Matrix / Feishu / LINE / Mattermost / Nextcloud Talk / Nostr / Synology Chat / Tlon / Twitch / Zalo / Zalo Personal / WeChat / WebChat
               │
               ▼
┌───────────────────────────────┐
│            Gateway            │
│       (control plane)         │
│     ws://127.0.0.1:18789      │
└──────────────┬────────────────┘
               │
               ├─ Pi agent (RPC)
               ├─ CLI (openclaw …)
               ├─ WebChat UI
               ├─ macOS app
               └─ iOS / Android nodes

Key subsystems

Gateway WebSocket network — single WS control plane for clients, tools, and events (plus ops: Gateway runbook).
Tailscale exposure — Serve/Funnel for the Gateway dashboard + WS (remote access: Remote).
Browser control — openclaw‑managed Chrome/Chromium with CDP control.
Canvas + A2UI — agent‑driven visual workspace (A2UI host: Canvas/A2UI).
Voice Wake + Talk Mode — wake words on macOS/iOS plus continuous voice on Android.
Nodes — Canvas, camera snap/clip, screen record, location.get, notifications, plus macOS‑only system.run/system.notify.

Tailscale access (Gateway dashboard)

OpenClaw can auto-configure Tailscale Serve (tailnet-only) or Funnel (public) while the Gateway stays bound to loopback. Configure gateway.tailscale.mode:

off: no Tailscale automation (default).
serve: tailnet-only HTTPS via tailscale serve (uses Tailscale identity headers by default).
funnel: public HTTPS via tailscale funnel (requires shared password auth).

Notes:

gateway.bind must stay loopback when Serve/Funnel is enabled (OpenClaw enforces this).
Serve can be forced to require a password by setting gateway.auth.mode: "password" or gateway.auth.allowTailscale: false.
Funnel refuses to start unless gateway.auth.mode: "password" is set.
Optional: gateway.tailscale.resetOnExit to undo Serve/Funnel on shutdown.

Details: Tailscale guide · Web surfaces

Remote Gateway (Linux is great)

It’s perfectly fine to run the Gateway on a small Linux instance. Clients (macOS app, CLI, WebChat) can connect over Tailscale Serve/Funnel or SSH tunnels, and you can still pair device nodes (macOS/iOS/Android) to execute device‑local actions when needed.

Gateway host runs the exec tool and channel connections by default.
Device nodes run device‑local actions (system.run, camera, screen recording, notifications) via node.invoke. In short: exec runs where the Gateway lives; device actions run where the device lives.

Details: Remote access · Nodes · Security

macOS permissions via the Gateway protocol

The macOS app can run in node mode and advertises its capabilities + permission map over the Gateway WebSocket (node.list / node.describe). Clients can then execute local actions via node.invoke:

system.run runs a local command and returns stdout/stderr/exit code; set needsScreenRecording: true to require screen-recording permission (otherwise you’ll get PERMISSION_MISSING).
system.notify posts a user notification and fails if notifications are denied.
canvas.*, camera.*, screen.record, and location.get are also routed via node.invoke and follow TCC permission status.

Elevated bash (host permissions) is separate from macOS TCC:

Use /elevated on|off to toggle per‑session elevated access when enabled + allowlisted.
Gateway persists the per‑session toggle via sessions.patch (WS method) alongside thinkingLevel, verboseLevel, model, sendPolicy, and groupActivation.

Details: Nodes · macOS app · Gateway protocol

Agent to Agent (sessions_* tools)

Use these to coordinate work across sessions without jumping between chat surfaces.
sessions_list — discover active sessions (agents) and their metadata.
sessions_history — fetch transcript logs for a session.
sessions_send — message another session; optional reply‑back ping‑pong + announce step (REPLY_SKIP, ANNOUNCE_SKIP).

Details: Session tools

Skills registry (ClawHub)

ClawHub is a minimal skill registry. With ClawHub enabled, the agent can search for skills automatically and pull in new ones as needed.

ClawHub

Chat commands

Send these in WhatsApp/Telegram/Slack/Google Chat/Microsoft Teams/WebChat (group commands are owner-only):

/status — compact session status (model + tokens, cost when available)
/new or /reset — reset the session
/compact — compact session context (summary)
/think <level> — off|minimal|low|medium|high|xhigh (GPT-5.2 + Codex models only)
/verbose on|off
/usage off|tokens|full — per-response usage footer
/restart — restart the gateway (owner-only in groups)
/activation mention|always — group activation toggle (groups only)

Apps (optional)

The Gateway alone delivers a great experience. All apps are optional and add extra features.

If you plan to build/run companion apps, follow the platform runbooks below.

macOS (OpenClaw.app) (optional)

Menu bar control for the Gateway and health.
Voice Wake + push-to-talk overlay.
WebChat + debug tools.
Remote gateway control over SSH.

Note: signed builds required for macOS permissions to stick across rebuilds (see macOS Permissions).

iOS node (optional)

Pairs as a node over the Gateway WebSocket (device pairing).
Voice trigger forwarding + Canvas surface.
Controlled via openclaw nodes ….

Runbook: iOS connect.

Android node (optional)

Pairs as a WS node via device pairing (openclaw devices ...).
Exposes Connect/Chat/Voice tabs plus Canvas, Camera, Screen capture, and Android device command families.
Runbook: Android connect.

Agent workspace + skills

Workspace root: ~/.openclaw/workspace (configurable via agents.defaults.workspace).
Injected prompt files: AGENTS.md, SOUL.md, TOOLS.md.
Skills: ~/.openclaw/workspace/skills/<skill>/SKILL.md.

Configuration

Minimal ~/.openclaw/openclaw.json (model + defaults):

{
  agent: {
    model: "<provider>/<model-id>",
  },
}

Full configuration reference (all keys + examples).

Security model (important)

Default: tools run on the host for the main session, so the agent has full access when it’s just you.
Group/channel safety: set agents.defaults.sandbox.mode: "non-main" to run non‑main sessions (groups/channels) inside per‑session Docker sandboxes; bash then runs in Docker for those sessions.
Sandbox defaults: allowlist bash, process, read, write, edit, sessions_list, sessions_history, sessions_send, sessions_spawn; denylist browser, canvas, nodes, cron, discord, gateway.

Details: Security guide · Docker + sandboxing · Sandbox config

Link the device: pnpm openclaw channels login (stores creds in ~/.openclaw/credentials).
Allowlist who can talk to the assistant via channels.whatsapp.allowFrom.
If channels.whatsapp.groups is set, it becomes a group allowlist; include "*" to allow all.

Set TELEGRAM_BOT_TOKEN or channels.telegram.botToken (env wins).
Optional: set channels.telegram.groups (with channels.telegram.groups."*".requireMention); when set, it is a group allowlist (include "*" to allow all). Also channels.telegram.allowFrom or channels.telegram.webhookUrl + channels.telegram.webhookSecret as needed.

{
  channels: {
    telegram: {
      botToken: "123456:ABCDEF",
    },
  },
}

Slack

Set SLACK_BOT_TOKEN + SLACK_APP_TOKEN (or channels.slack.botToken + channels.slack.appToken).

Discord

Set DISCORD_BOT_TOKEN or channels.discord.token.
Optional: set commands.native, commands.text, or commands.useAccessGroups, plus channels.discord.allowFrom, channels.discord.guilds, or channels.discord.mediaMaxMb as needed.

{
  channels: {
    discord: {
      token: "1234abcd",
    },
  },
}

Signal

Requires signal-cli and a channels.signal config section.

BlueBubbles (iMessage)

Recommended iMessage integration.
Configure channels.bluebubbles.serverUrl + channels.bluebubbles.password and a webhook (channels.bluebubbles.webhookPath).
The BlueBubbles server runs on macOS; the Gateway can run on macOS or elsewhere.

iMessage (legacy)

Legacy macOS-only integration via imsg (Messages must be signed in).
If channels.imessage.groups is set, it becomes a group allowlist; include "*" to allow all.

Microsoft Teams

Configure a Teams app + Bot Framework, then add a msteams config section.
Allowlist who can talk via msteams.allowFrom; group access via msteams.groupAllowFrom or msteams.groupPolicy: "open".

WeChat

Official Tencent plugin via @tencent-weixin/openclaw-weixin (iLink Bot API). Private chats only; v2.x requires OpenClaw >=2026.3.22.
Install: openclaw plugins install "@tencent-weixin/openclaw-weixin", then openclaw channels login --channel openclaw-weixin to scan the QR code.
Requires the WeChat ClawBot plugin (WeChat > Me > Settings > Plugins); gradual rollout by Tencent.

WebChat

Uses the Gateway WebSocket; no separate WebChat port/config.

Browser control (optional):

{
  browser: {
    enabled: true,
    color: "#FF4500",
  },
}

Docs

Use these when you’re past the onboarding flow and want the deeper reference.

Advanced docs (discovery + control)

Operations & troubleshooting

Deep dives

Workspace & skills

Platform internals

Email hooks (Gmail)

docs.openclaw.ai/gmail-pubsub

Molty

OpenClaw was built for Molty, a space lobster AI assistant. 🦞 by Peter Steinberger and the community.

Community

See CONTRIBUTING.md for guidelines, maintainers, and how to submit PRs. AI/vibe-coded PRs welcome! 🤖

Special thanks to Mario Zechner for his support and for pi-mono. Special thanks to Adam Doppelt for lobster.bot.

Thanks to all clawtributors:

Languages

TypeScript 79.2%

JavaScript 13.1%

Swift 4.7%

Kotlin 1.3%

Shell 0.9%

Other 0.7%

README.md Unescape Escape

🦞 OpenClaw — Personal AI Assistant

Sponsors

Models (selection + auth)

Install (recommended)

Quick start (TL;DR)

Development channels

From source (development)

Security defaults (DM access)

Highlights

Star History

Everything we built so far

Core platform

Channels

Apps + nodes

Tools + automation

Runtime + safety

Ops + packaging

How it works (short)

Key subsystems

Tailscale access (Gateway dashboard)

Remote Gateway (Linux is great)

macOS permissions via the Gateway protocol

Agent to Agent (sessions_* tools)

Skills registry (ClawHub)

Chat commands

Apps (optional)

macOS (OpenClaw.app) (optional)

iOS node (optional)

Android node (optional)

Agent workspace + skills

Configuration

Security model (important)

WeChat

Docs

Advanced docs (discovery + control)

Operations & troubleshooting

Deep dives

Workspace & skills

Platform internals

Email hooks (Gmail)

Molty

Community

README.md