mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-23 07:51:33 +00:00
Navigation restructure: - "Browser" group -> "Web Browser" - New "Web Tools" group containing Web Fetch, Web Search, and all 7 search provider sub-pages - Other tools (btw, diffs, etc.) stay at top level New page: - tools/web-fetch.md: dedicated web_fetch reference with Steps, config, Firecrawl fallback, limits Rewritten page: - tools/web.md: now "Web Search" -- focused search overview with Steps quick-start, CardGroup provider picker, Tabs for key storage, provider comparison table, auto-detection, parameters, examples. Removed all inline provider setup (lives in sub-pages) and web_fetch content (now in dedicated page). Final sidebar: Tools ├── Web Browser (browser, login, troubleshooting) ├── Web Tools │ ├── Web Fetch │ ├── Web Search │ └── Brave / Firecrawl / Gemini / Grok / Kimi / Perplexity / Tavily ├── btw, diffs, exec, ...
3.7 KiB
3.7 KiB
summary, read_when, title, sidebarTitle
| summary | read_when | title | sidebarTitle | |||
|---|---|---|---|---|---|---|
| web_fetch tool -- HTTP fetch with readable content extraction |
|
Web Fetch | Web Fetch |
Web Fetch
The web_fetch tool does a plain HTTP GET and extracts readable content
(HTML to markdown or text). It does not execute JavaScript.
For JS-heavy sites or login-protected pages, use the Web Browser instead.
Quick start
web_fetch is enabled by default -- no configuration needed. The agent can
call it immediately:
await web_fetch({ url: "https://example.com/article" });
Tool parameters
| Parameter | Type | Description |
|---|---|---|
url |
string |
URL to fetch (required, http/https only) |
extractMode |
string |
"markdown" (default) or "text" |
maxChars |
number |
Truncate output to this many chars |
How it works
Sends an HTTP GET with a Chrome-like User-Agent and `Accept-Language` header. Blocks private/internal hostnames and re-checks redirects. Runs Readability (main-content extraction) on the HTML response. If Readability fails and Firecrawl is configured, retries through the Firecrawl API with bot-circumvention mode. Results are cached for 15 minutes (configurable) to reduce repeated fetches of the same URL.Config
{
tools: {
web: {
fetch: {
enabled: true, // default: true
maxChars: 50000, // max output chars
maxCharsCap: 50000, // hard cap for maxChars param
maxResponseBytes: 2000000, // max download size before truncation
timeoutSeconds: 30,
cacheTtlMinutes: 15,
maxRedirects: 3,
readability: true, // use Readability extraction
userAgent: "Mozilla/5.0 ...", // override User-Agent
},
},
},
}
Firecrawl fallback
If Readability extraction fails, web_fetch can fall back to
Firecrawl for bot-circumvention and better extraction:
{
tools: {
web: {
fetch: {
firecrawl: {
enabled: true,
apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
baseUrl: "https://api.firecrawl.dev",
onlyMainContent: true,
maxAgeMs: 86400000, // cache duration (1 day)
timeoutSeconds: 60,
},
},
},
},
}
tools.web.fetch.firecrawl.apiKey supports SecretRef objects.
Limits and safety
maxCharsis clamped totools.web.fetch.maxCharsCap- Response body is capped at
maxResponseBytesbefore parsing; oversized responses are truncated with a warning - Private/internal hostnames are blocked
- Redirects are checked and limited by
maxRedirects web_fetchis best-effort -- some sites need the Web Browser
Tool profiles
If you use tool profiles or allowlists, add web_fetch or group:web:
{
tools: {
allow: ["web_fetch"],
// or: allow: ["group:web"] (includes both web_fetch and web_search)
},
}
Related
- Web Search -- search the web with multiple providers
- Web Browser -- full browser automation for JS-heavy sites
- Firecrawl -- Firecrawl search and scrape tools