openclaw/docs/tools/web-fetch.md at bb3e5654870baeec102142a3260dcc9945c14ae3

mirror of https://github.com/openclaw/openclaw.git synced 2026-03-23 07:51:33 +00:00

Files

Vincent Koc bb3e565487 docs(tools): restructure web tools IA and rewrite web.md

Navigation restructure:
- "Browser" group -> "Web Browser"
- New "Web Tools" group containing Web Fetch, Web Search, and all
  7 search provider sub-pages
- Other tools (btw, diffs, etc.) stay at top level

New page:
- tools/web-fetch.md: dedicated web_fetch reference with Steps,
  config, Firecrawl fallback, limits

Rewritten page:
- tools/web.md: now "Web Search" -- focused search overview with
  Steps quick-start, CardGroup provider picker, Tabs for key storage,
  provider comparison table, auto-detection, parameters, examples.
  Removed all inline provider setup (lives in sub-pages) and web_fetch
  content (now in dedicated page).

Final sidebar:
  Tools
  ├── Web Browser (browser, login, troubleshooting)
  ├── Web Tools
  │   ├── Web Fetch
  │   ├── Web Search
  │   └── Brave / Firecrawl / Gemini / Grok / Kimi / Perplexity / Tavily
  ├── btw, diffs, exec, ...

2026-03-22 15:01:09 -07:00

3.7 KiB

Raw Blame History

summary, read_when, title, sidebarTitle

summary

read_when

title

sidebarTitle

web_fetch tool -- HTTP fetch with readable content extraction

You want to fetch a URL and extract readable content

You need to configure web_fetch or its Firecrawl fallback

You want to understand web_fetch limits and caching

Web Fetch

The web_fetch tool does a plain HTTP GET and extracts readable content (HTML to markdown or text). It does not execute JavaScript.

For JS-heavy sites or login-protected pages, use the Web Browser instead.

Quick start

web_fetch is enabled by default -- no configuration needed. The agent can call it immediately:

await web_fetch({ url: "https://example.com/article" });

Tool parameters

Parameter	Type	Description
`url`	`string`	URL to fetch (required, http/https only)
`extractMode`	`string`	`"markdown"` (default) or `"text"`
`maxChars`	`number`	Truncate output to this many chars

How it works

Sends an HTTP GET with a Chrome-like User-Agent and `Accept-Language` header. Blocks private/internal hostnames and re-checks redirects. Runs Readability (main-content extraction) on the HTML response. If Readability fails and Firecrawl is configured, retries through the Firecrawl API with bot-circumvention mode. Results are cached for 15 minutes (configurable) to reduce repeated fetches of the same URL.

Config

{
  tools: {
    web: {
      fetch: {
        enabled: true, // default: true
        maxChars: 50000, // max output chars
        maxCharsCap: 50000, // hard cap for maxChars param
        maxResponseBytes: 2000000, // max download size before truncation
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
        maxRedirects: 3,
        readability: true, // use Readability extraction
        userAgent: "Mozilla/5.0 ...", // override User-Agent
      },
    },
  },
}

Firecrawl fallback

If Readability extraction fails, web_fetch can fall back to Firecrawl for bot-circumvention and better extraction:

{
  tools: {
    web: {
      fetch: {
        firecrawl: {
          enabled: true,
          apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
          baseUrl: "https://api.firecrawl.dev",
          onlyMainContent: true,
          maxAgeMs: 86400000, // cache duration (1 day)
          timeoutSeconds: 60,
        },
      },
    },
  },
}

tools.web.fetch.firecrawl.apiKey supports SecretRef objects.

If Firecrawl is enabled and its SecretRef is unresolved with no `FIRECRAWL_API_KEY` env fallback, gateway startup fails fast.

Limits and safety

maxChars is clamped to tools.web.fetch.maxCharsCap
Response body is capped at maxResponseBytes before parsing; oversized responses are truncated with a warning
Private/internal hostnames are blocked
Redirects are checked and limited by maxRedirects
web_fetch is best-effort -- some sites need the Web Browser

Tool profiles

If you use tool profiles or allowlists, add web_fetch or group:web:

{
  tools: {
    allow: ["web_fetch"],
    // or: allow: ["group:web"]  (includes both web_fetch and web_search)
  },
}

Web Search -- search the web with multiple providers
Web Browser -- full browser automation for JS-heavy sites
Firecrawl -- Firecrawl search and scrape tools

3.7 KiB Raw Blame History