Merge branch 'main' into feat/deepseek-provider

This commit is contained in:
07akioni
2026-03-17 14:41:27 +08:00
committed by GitHub
88 changed files with 2385 additions and 2112 deletions

View File

@@ -877,6 +877,7 @@ Time format in system prompt. Default: `auto` (OS preference).
},
imageGenerationModel: {
primary: "openai/gpt-image-1",
fallbacks: ["google/gemini-3.1-flash-image-preview"],
},
pdfModel: {
primary: "anthropic/claude-opus-4-6",

View File

@@ -360,6 +360,15 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
- Enable: `BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts`
- Optional model override: `BYTEPLUS_CODING_MODEL=ark-code-latest`
## Google image generation live
- Test: `src/image-generation/providers/google.live.test.ts`
- Enable: `GOOGLE_LIVE_TEST=1 pnpm test:live src/image-generation/providers/google.live.test.ts`
- Key source: `GEMINI_API_KEY` or `GOOGLE_API_KEY`
- Optional overrides:
- `GOOGLE_IMAGE_GENERATION_MODEL=gemini-3.1-flash-image-preview`
- `GOOGLE_IMAGE_BASE_URL=https://generativelanguage.googleapis.com/v1beta`
## Docker runners (optional “works in Linux” checks)
These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted). They also bind-mount CLI auth homes like `~/.codex`, `~/.claude`, `~/.qwen`, and `~/.minimax` when present, then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:

View File

@@ -88,7 +88,7 @@ Image generation follows the standard shape:
1. core defines `ImageGenerationProvider`
2. core exposes `registerImageGenerationProvider(...)`
3. core exposes `runtime.imageGeneration.generate(...)`
4. the `openai` plugin registers an OpenAI-backed implementation
4. the `openai` and `google` plugins register vendor-backed implementations
5. future vendors can register the same contract without changing channels/tools
The config key is separate from vision-analysis routing:

View File

@@ -116,8 +116,10 @@ Examples:
speech + media-understanding + image-generation behavior
- the bundled `elevenlabs` plugin owns ElevenLabs speech behavior
- the bundled `microsoft` plugin owns Microsoft speech behavior
- the bundled `google`, `minimax`, `mistral`, `moonshot`, and `zai` plugins own
their media-understanding backends
- the bundled `google` plugin owns Google model-provider behavior plus Google
media-understanding + image-generation + web-search behavior
- the bundled `minimax`, `mistral`, `moonshot`, and `zai` plugins own their
media-understanding backends
- the `voice-call` plugin is a feature plugin: it owns call transport, tools,
CLI, routes, and runtime, but it consumes core TTS/STT capability instead of
inventing a second speech stack