16 KiB
summary, title, read_when
| summary | title | read_when | ||
|---|---|---|---|---|
| Google Gemini setup (API key + OAuth, image generation, media understanding, TTS, web search) | Google (Gemini) |
|
The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.
- Provider:
google - Auth:
GEMINI_API_KEYorGOOGLE_API_KEY - API: Google Gemini API
- Runtime option:
agents.defaults.agentRuntime.id: "google-gemini-cli"reuses Gemini CLI OAuth while keeping model refs canonical asgoogle/*.
Getting started
Choose your preferred auth method and follow the setup steps.
**Best for:** standard Gemini API access through Google AI Studio.<Steps>
<Step title="Run onboarding">
```bash
openclaw onboard --auth-choice gemini-api-key
```
Or pass the key directly:
```bash
openclaw onboard --non-interactive \
--mode local \
--auth-choice gemini-api-key \
--gemini-api-key "$GEMINI_API_KEY"
```
</Step>
<Step title="Set a default model">
```json5
{
agents: {
defaults: {
model: { primary: "google/gemini-3.1-pro-preview" },
},
},
}
```
</Step>
<Step title="Verify the model is available">
```bash
openclaw models list --provider google
```
</Step>
</Steps>
<Tip>
The environment variables `GEMINI_API_KEY` and `GOOGLE_API_KEY` are both accepted. Use whichever you already have configured.
</Tip>
**Best for:** reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.
<Warning>
The `google-gemini-cli` provider is an unofficial integration. Some users
report account restrictions when using OAuth this way. Use at your own risk.
</Warning>
<Steps>
<Step title="Install the Gemini CLI">
The local `gemini` command must be available on `PATH`.
```bash
# Homebrew
brew install gemini-cli
# or npm
npm install -g @google/gemini-cli
```
OpenClaw supports both Homebrew installs and global npm installs, including
common Windows/npm layouts.
</Step>
<Step title="Log in via OAuth">
```bash
openclaw models auth login --provider google-gemini-cli --set-default
```
</Step>
<Step title="Verify the model is available">
```bash
openclaw models list --provider google
```
</Step>
</Steps>
- Default model: `google/gemini-3.1-pro-preview`
- Runtime: `google-gemini-cli`
- Alias: `gemini-cli`
Gemini 3.1 Pro's Gemini API model id is `gemini-3.1-pro-preview`. OpenClaw accepts the shorter `google/gemini-3.1-pro` as a convenience alias and normalizes it before provider calls.
**Environment variables:**
- `OPENCLAW_GEMINI_OAUTH_CLIENT_ID`
- `OPENCLAW_GEMINI_OAUTH_CLIENT_SECRET`
(Or the `GEMINI_CLI_*` variants.)
<Note>
If Gemini CLI OAuth requests fail after login, set `GOOGLE_CLOUD_PROJECT` or
`GOOGLE_CLOUD_PROJECT_ID` on the gateway host and retry.
</Note>
<Note>
If login fails before the browser flow starts, make sure the local `gemini`
command is installed and on `PATH`.
</Note>
`google-gemini-cli/*` model refs are legacy compatibility aliases. New
configs should use `google/*` model refs plus the `google-gemini-cli`
runtime when they want local Gemini CLI execution.
Capabilities
| Capability | Supported |
|---|---|
| Chat completions | Yes |
| Image generation | Yes |
| Music generation | Yes |
| Text-to-speech | Yes |
| Realtime voice | Yes (Google Live API) |
| Image understanding | Yes |
| Audio transcription | Yes |
| Video understanding | Yes |
| Web search (Grounding) | Yes |
| Thinking/reasoning | Yes (Gemini 2.5+ / Gemini 3+) |
| Gemma 4 models | Yes |
Web search
The bundled gemini web-search provider uses Gemini Google Search grounding.
Configure a dedicated search key under plugins.entries.google.config.webSearch,
or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:
{
plugins: {
entries: {
google: {
config: {
webSearch: {
apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
model: "gemini-2.5-flash",
},
},
},
},
},
}
Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY,
then models.providers.google.apiKey. webSearch.baseUrl is optional and
exists for operator proxies or compatible Gemini API endpoints; when omitted,
Gemini web search reuses models.providers.google.baseUrl. See
Gemini search for the provider-specific tool behavior.
/think adaptive keeps Google's dynamic thinking semantics instead of choosing
a fixed OpenClaw level. Gemini 3 and Gemini 3.1 omit a fixed thinkingLevel so
Google can choose the level; Gemini 2.5 sends Google's dynamic sentinel
thinkingBudget: -1.
Gemma 4 models (for example gemma-4-26b-a4b-it) support thinking mode. OpenClaw
rewrites thinkingBudget to a supported Google thinkingLevel for Gemma 4.
Setting thinking to off preserves thinking disabled instead of mapping to
MINIMAL.
Image generation
The bundled google image-generation provider defaults to
google/gemini-3.1-flash-image-preview.
- Also supports
google/gemini-3-pro-image-preview - Generate: up to 4 images per request
- Edit mode: enabled, up to 5 input images
- Geometry controls:
size,aspectRatio, andresolution
To use Google as the default image provider:
{
agents: {
defaults: {
imageGenerationModel: {
primary: "google/gemini-3.1-flash-image-preview",
},
},
},
}
Video generation
The bundled google plugin also registers video generation through the shared
video_generate tool.
- Default video model:
google/veo-3.1-fast-generate-preview - Modes: text-to-video, image-to-video, and single-video reference flows
- Supports
aspectRatio,resolution, andaudio - Current duration clamp: 4 to 8 seconds
To use Google as the default video provider:
{
agents: {
defaults: {
videoGenerationModel: {
primary: "google/veo-3.1-fast-generate-preview",
},
},
},
}
Music generation
The bundled google plugin also registers music generation through the shared
music_generate tool.
- Default music model:
google/lyria-3-clip-preview - Also supports
google/lyria-3-pro-preview - Prompt controls:
lyricsandinstrumental - Output format:
mp3by default, pluswavongoogle/lyria-3-pro-preview - Reference inputs: up to 10 images
- Session-backed runs detach through the shared task/status flow, including
action: "status"
To use Google as the default music provider:
{
agents: {
defaults: {
musicGenerationModel: {
primary: "google/lyria-3-clip-preview",
},
},
},
}
Text-to-speech
The bundled google speech provider uses the Gemini API TTS path with
gemini-3.1-flash-tts-preview.
- Default voice:
Kore - Auth:
messages.tts.providers.google.apiKey,models.providers.google.apiKey,GEMINI_API_KEY, orGOOGLE_API_KEY - Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
- Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with
ffmpeg
To use Google as the default TTS provider:
{
messages: {
tts: {
auto: "always",
provider: "google",
providers: {
google: {
model: "gemini-3.1-flash-tts-preview",
voiceName: "Kore",
audioProfile: "Speak professionally with a calm tone.",
},
},
},
},
}
Gemini API TTS uses natural-language prompting for style control. Set
audioProfile to prepend a reusable style prompt before the spoken text. Set
speakerName when your prompt text refers to a named speaker.
Gemini API TTS also accepts expressive square-bracket audio tags in the text,
such as [whispers] or [laughs]. To keep tags out of the visible chat reply
while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]]
block:
Here is the clean reply text.
[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
Realtime voice
The bundled google plugin registers a realtime voice provider backed by the
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
| Setting | Config path | Default |
|---|---|---|
| Model | plugins.entries.voice-call.config.realtime.providers.google.model |
gemini-2.5-flash-native-audio-preview-12-2025 |
| Voice | ...google.voice |
Kore |
| Temperature | ...google.temperature |
(unset) |
| VAD start sensitivity | ...google.startSensitivity |
(unset) |
| VAD end sensitivity | ...google.endSensitivity |
(unset) |
| Silence duration | ...google.silenceDurationMs |
(unset) |
| Activity handling | ...google.activityHandling |
Google default, start-of-activity-interrupts |
| Turn coverage | ...google.turnCoverage |
Google default, only-activity |
| Disable auto VAD | ...google.automaticActivityDetectionDisabled |
false |
| Session resumption | ...google.sessionResumption |
true |
| Context compression | ...google.contextWindowCompression |
true |
| API key | ...google.apiKey |
Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY |
Example Voice Call realtime config:
{
plugins: {
entries: {
"voice-call": {
enabled: true,
config: {
realtime: {
enabled: true,
provider: "google",
providers: {
google: {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
activityHandling: "start-of-activity-interrupts",
turnCoverage: "only-activity",
},
},
},
},
},
},
},
}
For maintainer live verification, run
OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts.
The Google leg mints the same constrained Live API token shape used by Control
UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload,
and waits for setupComplete.
Advanced configuration
For direct Gemini API runs (`api: "google-generative-ai"`), OpenClaw passes a configured `cachedContent` handle through to Gemini requests.- Configure per-model or global params with either
`cachedContent` or legacy `cached_content`
- If both are present, `cachedContent` wins
- Example value: `cachedContents/prebuilt-context`
- Gemini cache-hit usage is normalized into OpenClaw `cacheRead` from
upstream `cachedContentTokenCount`
```json5
{
agents: {
defaults: {
models: {
"google/gemini-2.5-pro": {
params: {
cachedContent: "cachedContents/prebuilt-context",
},
},
},
},
},
}
```
When using the `google-gemini-cli` OAuth provider, OpenClaw normalizes
the CLI JSON output as follows:
- Reply text comes from the CLI JSON `response` field.
- Usage falls back to `stats` when the CLI leaves `usage` empty.
- `stats.cached` is normalized into OpenClaw `cacheRead`.
- If `stats.input` is missing, OpenClaw derives input tokens from
`stats.input_tokens - stats.cached`.
If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY`
is available to that process (for example, in `~/.openclaw/.env` or via
`env.shellEnv`).