docs(tools): restructure web tools IA and rewrite web.md

Navigation restructure: - "Browser" group -> "Web Browser" - New "Web Tools" group containing Web Fetch, Web Search, and all 7 search provider sub-pages - Other tools (btw, diffs, etc.) stay at top level New page: - tools/web-fetch.md: dedicated web_fetch reference with Steps, config, Firecrawl fallback, limits Rewritten page: - tools/web.md: now "Web Search" -- focused search overview with Steps quick-start, CardGroup provider picker, Tabs for key storage, provider comparison table, auto-detection, parameters, examples. Removed all inline provider setup (lives in sub-pages) and web_fetch content (now in dedicated page). Final sidebar: Tools ├── Web Browser (browser, login, troubleshooting) ├── Web Tools │ ├── Web Fetch │ ├── Web Search │ └── Brave / Firecrawl / Gemini / Grok / Kimi / Perplexity / Tavily ├── btw, diffs, exec, ...
2026-05-04 22:50:22 +00:00 · 2026-03-22 15:00:56 -07:00
parent 3a68e87f84
commit bb3e565487
3 changed files with 290 additions and 387 deletions
--- a/docs/tools/web-fetch.md
+++ b/docs/tools/web-fetch.md
@@ -0,0 +1,135 @@
+---
+summary: "web_fetch tool -- HTTP fetch with readable content extraction"
+read_when:
+  - You want to fetch a URL and extract readable content
+  - You need to configure web_fetch or its Firecrawl fallback
+  - You want to understand web_fetch limits and caching
+title: "Web Fetch"
+sidebarTitle: "Web Fetch"
+---
+
+# Web Fetch
+
+The `web_fetch` tool does a plain HTTP GET and extracts readable content
+(HTML to markdown or text). It does **not** execute JavaScript.
+
+For JS-heavy sites or login-protected pages, use the
+[Web Browser](/tools/browser) instead.
+
+## Quick start
+
+`web_fetch` is **enabled by default** -- no configuration needed. The agent can
+call it immediately:
+
+```javascript
+await web_fetch({ url: "https://example.com/article" });
+```
+
+## Tool parameters
+
+| Parameter     | Type     | Description                              |
+| ------------- | -------- | ---------------------------------------- |
+| `url`         | `string` | URL to fetch (required, http/https only) |
+| `extractMode` | `string` | `"markdown"` (default) or `"text"`       |
+| `maxChars`    | `number` | Truncate output to this many chars       |
+
+## How it works
+
+<Steps>
+  <Step title="Fetch">
+    Sends an HTTP GET with a Chrome-like User-Agent and `Accept-Language`
+    header. Blocks private/internal hostnames and re-checks redirects.
+  </Step>
+  <Step title="Extract">
+    Runs Readability (main-content extraction) on the HTML response.
+  </Step>
+  <Step title="Fallback (optional)">
+    If Readability fails and Firecrawl is configured, retries through the
+    Firecrawl API with bot-circumvention mode.
+  </Step>
+  <Step title="Cache">
+    Results are cached for 15 minutes (configurable) to reduce repeated
+    fetches of the same URL.
+  </Step>
+</Steps>
+
+## Config
+
+```json5
+{
+  tools: {
+    web: {
+      fetch: {
+        enabled: true, // default: true
+        maxChars: 50000, // max output chars
+        maxCharsCap: 50000, // hard cap for maxChars param
+        maxResponseBytes: 2000000, // max download size before truncation
+        timeoutSeconds: 30,
+        cacheTtlMinutes: 15,
+        maxRedirects: 3,
+        readability: true, // use Readability extraction
+        userAgent: "Mozilla/5.0 ...", // override User-Agent
+      },
+    },
+  },
+}
+```
+
+## Firecrawl fallback
+
+If Readability extraction fails, `web_fetch` can fall back to
+[Firecrawl](/tools/firecrawl) for bot-circumvention and better extraction:
+
+```json5
+{
+  tools: {
+    web: {
+      fetch: {
+        firecrawl: {
+          enabled: true,
+          apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
+          baseUrl: "https://api.firecrawl.dev",
+          onlyMainContent: true,
+          maxAgeMs: 86400000, // cache duration (1 day)
+          timeoutSeconds: 60,
+        },
+      },
+    },
+  },
+}
+```
+
+`tools.web.fetch.firecrawl.apiKey` supports SecretRef objects.
+
+<Note>
+  If Firecrawl is enabled and its SecretRef is unresolved with no
+  `FIRECRAWL_API_KEY` env fallback, gateway startup fails fast.
+</Note>
+
+## Limits and safety
+
+- `maxChars` is clamped to `tools.web.fetch.maxCharsCap`
+- Response body is capped at `maxResponseBytes` before parsing; oversized
+  responses are truncated with a warning
+- Private/internal hostnames are blocked
+- Redirects are checked and limited by `maxRedirects`
+- `web_fetch` is best-effort -- some sites need the [Web Browser](/tools/browser)
+
+## Tool profiles
+
+If you use tool profiles or allowlists, add `web_fetch` or `group:web`:
+
+```json5
+{
+  tools: {
+    allow: ["web_fetch"],
+    // or: allow: ["group:web"]  (includes both web_fetch and web_search)
+  },
+}
+```
+
+## Related
+
+- [Web Search](/tools/web) -- search the web with multiple providers
+- [Web Browser](/tools/browser) -- full browser automation for JS-heavy sites
+- [Firecrawl](/tools/firecrawl) -- Firecrawl search and scrape tools