mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 17:31:06 +00:00
fix(web-search): support self-hosted Firecrawl
This commit is contained in:
@@ -197,7 +197,7 @@ See [MCP](/cli/mcp#openclaw-as-an-mcp-client-registry) and
|
||||
- Channel plugin account/runtime settings live under `channels.<id>` and should be described by the owning plugin's manifest `channelConfigs` metadata, not by a central OpenClaw option registry.
|
||||
- `plugins.entries.firecrawl.config.webFetch`: Firecrawl web-fetch provider settings.
|
||||
- `apiKey`: Firecrawl API key (accepts SecretRef). Falls back to `plugins.entries.firecrawl.config.webSearch.apiKey`, legacy `tools.web.fetch.firecrawl.apiKey`, or `FIRECRAWL_API_KEY` env var.
|
||||
- `baseUrl`: Firecrawl API base URL (default: `https://api.firecrawl.dev`).
|
||||
- `baseUrl`: Firecrawl API base URL (default: `https://api.firecrawl.dev`; self-hosted overrides must target private/internal endpoints).
|
||||
- `onlyMainContent`: extract only the main content from pages (default: `true`).
|
||||
- `maxAgeMs`: maximum cache age in milliseconds (default: `172800000` / 2 days).
|
||||
- `timeoutSeconds`: scrape request timeout in seconds (default: `60`).
|
||||
|
||||
@@ -54,7 +54,7 @@ Notes:
|
||||
- Choosing Firecrawl in onboarding or `openclaw configure --section web` enables the bundled Firecrawl plugin automatically.
|
||||
- `web_search` with Firecrawl supports `query` and `count`.
|
||||
- For Firecrawl-specific controls like `sources`, `categories`, or result scraping, use `firecrawl_search`.
|
||||
- `baseUrl` overrides must stay on `https://api.firecrawl.dev`.
|
||||
- `baseUrl` defaults to hosted Firecrawl at `https://api.firecrawl.dev`. Self-hosted overrides are allowed only for private/internal endpoints; HTTP is accepted only for those private targets.
|
||||
- `FIRECRAWL_BASE_URL` is the shared env fallback for Firecrawl search and scrape base URLs.
|
||||
|
||||
## Configure Firecrawl scrape + web_fetch fallback
|
||||
@@ -85,10 +85,19 @@ Notes:
|
||||
- Firecrawl fallback attempts run only when an API key is available (`plugins.entries.firecrawl.config.webFetch.apiKey` or `FIRECRAWL_API_KEY`).
|
||||
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
|
||||
- Legacy `tools.web.fetch.firecrawl.*` config is auto-migrated by `openclaw doctor --fix`.
|
||||
- Firecrawl scrape/base URL overrides are restricted to `https://api.firecrawl.dev`.
|
||||
- Firecrawl scrape/base URL overrides follow the same hosted/private rule as search: public hosted traffic uses `https://api.firecrawl.dev`; self-hosted overrides must resolve to private/internal endpoints.
|
||||
|
||||
`firecrawl_scrape` reuses the same `plugins.entries.firecrawl.config.webFetch.*` settings and env vars.
|
||||
|
||||
### Self-hosted Firecrawl
|
||||
|
||||
Set `plugins.entries.firecrawl.config.webSearch.baseUrl`,
|
||||
`plugins.entries.firecrawl.config.webFetch.baseUrl`, or `FIRECRAWL_BASE_URL`
|
||||
when you run Firecrawl yourself. OpenClaw accepts `http://` only for loopback,
|
||||
private-network, `.local`, `.internal`, or `.localhost` targets. Public custom
|
||||
hosts are rejected so Firecrawl API keys are not sent to arbitrary endpoints by
|
||||
accident.
|
||||
|
||||
## Firecrawl plugin tools
|
||||
|
||||
### `firecrawl_search`
|
||||
|
||||
@@ -126,8 +126,9 @@ Legacy `tools.web.fetch.firecrawl.*` config is auto-migrated by `openclaw doctor
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Firecrawl `baseUrl` overrides are locked down: they must use `https://` and
|
||||
the official Firecrawl host (`api.firecrawl.dev`).
|
||||
Firecrawl `baseUrl` overrides are locked down: hosted traffic uses
|
||||
`https://api.firecrawl.dev`; self-hosted overrides must target private or
|
||||
internal endpoints, and `http://` is accepted only for those private targets.
|
||||
</Note>
|
||||
|
||||
Current runtime behavior:
|
||||
|
||||
Reference in New Issue
Block a user