When users put a runtime command name like "dreaming" into `plugins.allow`,
validation now explains that it is a command provided by a specific plugin
(e.g. "memory-core") and suggests using the plugin id instead, rather than
the generic "plugin not found" warning that previously created a circular
trap with the CLI error message.
Similarly, running `openclaw dreaming` from the CLI now explains that
`/dreaming` is a runtime slash command (not a CLI command) and points users
to `openclaw memory` for CLI operations or `/dreaming` in a chat session.
Fixes two related UX problems:
1. `plugins.allow: ["dreaming"]` → validation warned "plugin not found"
2. `openclaw dreaming status` → CLI said "add dreaming to plugins.allow"
(which then triggered problem 1)
Root cause: "dreaming" is a slash command registered by the memory-core
plugin via `api.registerCommand()`, not a standalone plugin or CLI command.
When the simple-completion model selected for thread-title generation is a
reasoning model (e.g. MiniMax M2, Claude thinking models, OpenAI o-series),
the 24-token output budget is entirely consumed by the internal thinking
block before any user-visible text is emitted. extractAssistantText then
returns an empty string, generateThreadTitle returns null, and the
auto-thread rename is silently skipped while the feature appears to do
nothing.
Raise DISCORD_THREAD_TITLE_MAX_TOKENS to 512 so there is enough headroom
for a short thinking pass plus the 3-6 word title output. The generous
ceiling only matters when the provider actually reasons; non-reasoning
models still emit a short title and stop early at end-of-sequence.
Verified live against a MiniMax M2 reasoning model served through an
Anthropic-compatible API endpoint: before the fix, the rename never fired;
after the fix, the thread is renamed with a concise generated title.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treat duplicate registerService calls from the same plugin id as idempotent so plugin snapshot and activation loads stop emitting spurious service already registered diagnostics.\n\nThanks @ly85206559.
Auto-compaction never triggered for self-hosted llama.cpp HTTP servers
(used directly or behind an OpenAI-compatible shim configured with
`api: "openai-completions"`) because llama.cpp's native overflow wording
isn't covered by any existing pattern in `isContextOverflowError()` or
`matchesProviderContextOverflow()`.
When the prompt overshoots a slot's `--ctx-size`, llama.cpp returns:
400 request (66202 tokens) exceeds the available context size (65536 tokens), try increasing it
That message uses "context size" rather than "context length", says
"request (N tokens)" instead of "input/prompt is too long", and the
status code is 400 (not 413), so it slips past every existing string
check and every regex in `PROVIDER_CONTEXT_OVERFLOW_PATTERNS`. The
generic candidate pre-check passes, but the concrete provider regexes
all miss, so the agent runner reports `surface_error reason=...` and
the user gets the raw upstream error instead of compaction + retry.
This commit adds a llama.cpp-shaped pattern next to the existing Bedrock
/ Vertex / Ollama / Cohere ones in
`PROVIDER_CONTEXT_OVERFLOW_PATTERNS`, plus four test cases (three
parameterised messages exercising the new regex directly, and one
end-to-end assertion that `isContextOverflowError()` now returns true
for the verbatim message produced by llama.cpp's slot manager).
The pattern is anchored on llama.cpp's stable slot-manager wording
(`(?:request|prompt) (N tokens) exceeds (the )?available context size`)
so it won't accidentally swallow unrelated provider errors.
Closes#64180
AI-assisted: drafted with Claude Code (Opus 4.6, 1M context).
Testing: targeted tests pass via `pnpm vitest run
src/agents/pi-embedded-helpers/provider-error-patterns.test.ts`
(26/26). Broader vitest run shows 2 unrelated failures in
`group-policy.fallback.contract.test.ts` that are not touched by this
change.