feat: integrate Upstash Vector for enhanced document retrieval in chat API

- Implemented Upstash Vector as a cloud-based storage solution for document chunks, replacing the local LanceDB option. - Added auto-detection of storage mode based on environment variables for seamless integration. - Updated the chat API to utilize the new retrieval mechanism, enhancing response accuracy and performance. - Enhanced README with setup instructions for Upstash and updated environment variable requirements. - Introduced new scripts and configurations for managing the vector index and API interactions.
2026-04-26 16:41:49 +00:00 · 2026-02-03 04:39:04 -06:00
parent 4a543d15d1
commit 4b42a54452
16 changed files with 907 additions and 122 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -73,4 +73,5 @@ USER.md
 .serena/

 # docs-chat vector database
-scripts/docs-chat/.lance-db/
+.lancedb
+.lance-db
--- a/docs/assets/docs-chat-config.js
+++ b/docs/assets/docs-chat-config.js
@@ -0,0 +1,17 @@
+/**
+ * Configuration for the docs-chat widget.
+ * Automatically selects API URL based on Mintlify environment.
+ */
+(() => {
+  const hostname = window.location.hostname;
+
+  // Mintlify local dev (mintlify dev runs on localhost)
+  if (hostname === "localhost" || hostname === "127.0.0.1") {
+    window.DOCS_CHAT_API_URL = "http://localhost:3001";
+    return;
+  }
+
+  // Production (docs.openclaw.ai and *.mintlify.app previews)
+  // TODO: Update this to the actual production URL
+  window.DOCS_CHAT_API_URL = "https://claw-api.openknot.ai";
+})();
--- a/package.json
+++ b/package.json
@@ -174,6 +174,7 @@
    "@sinclair/typebox": "0.34.48",
    "@slack/bolt": "^4.6.0",
    "@slack/web-api": "^7.13.0",
+    "@upstash/vector": "^1.2.2",
    "@whiskeysockets/baileys": "7.0.0-rc.9",
    "ajv": "^8.17.1",
    "chalk": "^5.6.2",
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -75,6 +75,9 @@ importers:
      '@slack/web-api':
        specifier: ^7.13.0
        version: 7.13.0
+      '@upstash/vector':
+        specifier: ^1.2.2
+        version: 1.2.2
      '@whiskeysockets/baileys':
        specifier: 7.0.0-rc.9
        version: 7.0.0-rc.9(audio-decode@2.2.3)(sharp@0.34.5)
@@ -2772,6 +2775,9 @@ packages:
    resolution: {integrity: sha512-IlqQ/Gv22xUC1r/WQm4StLkYQmaaTsXAhUVsNE0+xiyf0yRFiH5++q78U3bw6bLKDCTmh0uqKB9eG9+Bt75Dkg==}
    engines: {node: '>=20.0.0'}

+  '@upstash/vector@1.2.2':
+    resolution: {integrity: sha512-ptQ9xnxtKqmpNK52PCcHCszlPOLxIBfjsv7ty8RoF95pkjctS9rSjTQ3Pl9bx5VFbpDj+0dMXw88WLt6swDkgQ==}
+
  '@urbit/aura@3.0.0':
    resolution: {integrity: sha512-N8/FHc/lmlMDCumMuTXyRHCxlov5KZY6unmJ9QR2GOw+OpROZMBsXYGwE+ZMtvN21ql9+Xb8KhGNBj08IrG3Wg==}
    engines: {node: '>=16', npm: '>=8'}
@@ -8018,6 +8024,8 @@ snapshots:
    transitivePeerDependencies:
      - supports-color

+  '@upstash/vector@1.2.2': {}
+
  '@urbit/aura@3.0.0': {}

  '@urbit/http-api@3.0.0':
--- a/scripts/docs-chat/README.md
+++ b/scripts/docs-chat/README.md
@@ -3,79 +3,6 @@
 Docs chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions
 from the OpenClaw documentation via semantic search.

-## RAG Pipeline (Recommended)
-
-The vector-based RAG pipeline uses OpenAI embeddings and LanceDB for semantic
-search. This provides much better results than keyword matching.
-
-### Build the vector index
-
-```bash
-OPENAI_API_KEY=sk-... pnpm docs:chat:index:vector
-```
-
-This generates embeddings for all doc chunks and stores them in
-`scripts/docs-chat/.lance-db/` (gitignored).
-
-### Run the RAG API
-
-```bash
-OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
-```
-
-Defaults to `http://localhost:3001`. Optional environment variables:
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `PORT` | `3001` | Server port |
-| `RATE_LIMIT` | `20` | Max requests per window per IP |
-| `RATE_WINDOW_MS` | `60000` | Rate limit window in milliseconds |
-
-Health check:
-
-```bash
-curl http://localhost:3001/health
-# Returns: {"ok":true,"chunks":N,"mode":"vector"}
-```
-
-## Legacy Keyword Pipeline
-
-The original keyword-based implementation is still available for backward
-compatibility.
-
-### Build the keyword index
-
-```bash
-pnpm docs:chat:index
-```
-
-This generates `scripts/docs-chat/search-index.json` from `docs/**/*.md`.
-
-### Run the keyword API
-
-```bash
-OPENAI_API_KEY=sk-... pnpm docs:chat:serve
-```
-
-## Pipeline Integration
-
-CI rebuilds the keyword index whenever docs change so PRs keep
-`scripts/docs-chat/search-index.json` in sync. For production deployments with
-RAG, run `pnpm docs:chat:index:vector` during deploy.
-
-## Mintlify widget
-
-Mintlify loads any `.js` in the docs content directory on every page.
-`docs/assets/docs-chat-widget.js` injects a floating "Ask Molty" button and
-calls the API at:
-
-```
-window.DOCS_CHAT_API_URL || "http://localhost:3001"
-```
-
-To use a deployed API, set `window.DOCS_CHAT_API_URL` before the widget runs
-(for example by adding another small `.js` file in `docs/assets/` that sets it).
-
 ## Architecture

 ```
@@ -89,12 +16,183 @@ docs/**/*.md
         │
         ▼
 ┌─────────────────┐
-│ .lance-db/      │  LanceDB Vector Store
+│ Vector Store    │  Upstash (cloud) or LanceDB (local)
 └────────┬────────┘
         │
         ▼
 ┌─────────────────┐
-│ serve.ts        │  Hybrid Retrieval (Vector + Keyword Boost)
-│                 │  → GPT-4o-mini Streaming Response
+│ API Server      │  Hybrid Retrieval (Vector + Keyword Boost)
+│ serve.ts        │  → GPT-4o-mini Streaming Response
 └─────────────────┘
 ```
+
+## Storage Backends
+
+The pipeline supports two vector storage backends, auto-detected based on
+environment variables:
+
+| Backend     | When Used                                     | Best For                  |
+| ----------- | --------------------------------------------- | ------------------------- |
+| **LanceDB** | Default (no Upstash credentials)              | Local dev, POC, testing   |
+| **Upstash** | When `UPSTASH_VECTOR_REST_*` env vars are set | Production, Vercel deploy |
+
+**Recommendation:** For production deployments, use Upstash Vector for its
+serverless scalability and Vercel compatibility. LanceDB is great for local
+development and proof-of-concept work without external dependencies.
+
+## Quick Start (Local with LanceDB)
+
+For local development without external services:
+
+```bash
+# Only OPENAI_API_KEY is required - uses LanceDB automatically
+OPENAI_API_KEY=sk-... pnpm docs:chat:index:vector
+OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
+```
+
+The index is stored locally in `scripts/docs-chat/.lancedb/`.
+
+## Production Setup (Upstash Vector)
+
+### 1. Create Upstash Vector Index
+
+1. Go to [Upstash Console](https://console.upstash.com/)
+2. Create a new Vector index with:
+   - **Dimensions:** 3072 (for `text-embedding-3-large`)
+   - **Distance Metric:** Cosine
+3. Copy the REST URL and token
+
+### 2. Environment Variables
+
+| Variable                    | Required | Description                          |
+| --------------------------- | -------- | ------------------------------------ |
+| `OPENAI_API_KEY`            | Yes      | OpenAI API key for embeddings + chat |
+| `UPSTASH_VECTOR_REST_URL`   | No\*     | Upstash Vector REST endpoint         |
+| `UPSTASH_VECTOR_REST_TOKEN` | No\*     | Upstash Vector REST token            |
+
+\* Required for Upstash mode; omit both for LanceDB mode.
+
+### 3. Build the Vector Index
+
+```bash
+OPENAI_API_KEY=sk-... \
+UPSTASH_VECTOR_REST_URL=https://... \
+UPSTASH_VECTOR_REST_TOKEN=... \
+pnpm docs:chat:index:vector
+```
+
+This generates embeddings for all doc chunks and upserts them to Upstash Vector.
+
+### 4. Deploy to Vercel
+
+```bash
+cd scripts/docs-chat
+npm install
+vercel
+```
+
+Set the environment variables in the Vercel dashboard.
+
+## Local Development
+
+### Run the API locally
+
+```bash
+# With Upstash (cloud):
+OPENAI_API_KEY=sk-... \
+UPSTASH_VECTOR_REST_URL=https://... \
+UPSTASH_VECTOR_REST_TOKEN=... \
+pnpm docs:chat:serve:vector
+
+# With LanceDB (local):
+OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
+```
+
+Defaults to `http://localhost:3001`. Optional environment variables:
+
+| Variable         | Default | Description                                    |
+| ---------------- | ------- | ---------------------------------------------- |
+| `PORT`           | `3001`  | Server port                                    |
+| `RATE_LIMIT`     | `20`    | Max requests per window per IP (Upstash only)  |
+| `RATE_WINDOW_MS` | `60000` | Rate limit window in milliseconds (Upstash only) |
+
+> **Note:** Rate limiting is only enforced in Upstash (production) mode. Local
+development with LanceDB has no rate limits.
+
+### Health check
+
+```bash
+curl http://localhost:3001/health
+# Returns: {"ok":true,"chunks":N,"mode":"upstash"}  # or "lancedb"
+```
+
+## Mintlify Widget
+
+Mintlify loads `.js` files from the docs content directory on every page.
+
+- `docs/assets/docs-chat-config.js` - Sets the API URL
+- `docs/assets/docs-chat-widget.js` - The chat widget
+
+To configure the production API URL, edit `docs/assets/docs-chat-config.js`:
+
+```javascript
+window.DOCS_CHAT_API_URL = "https://your-project.vercel.app";
+```
+
+## API Endpoints
+
+### POST /chat
+
+Send a message and receive a streaming response.
+
+**Request:**
+
+```json
+{ "message": "How do I configure the gateway?" }
+```
+
+**Response:** Streaming text/plain with the AI response.
+
+### GET /health
+
+Check API health and vector count.
+
+**Response:**
+
+```json
+{ "ok": true, "chunks": 847, "mode": "upstash-vector" }
+```
+
+## Legacy Pipelines
+
+### Keyword-based search
+
+The keyword-based implementation is still available for backward compatibility:
+
+```bash
+pnpm docs:chat:index    # Build keyword index
+pnpm docs:chat:serve    # Run keyword API
+```
+
+## File Structure
+
+```
+scripts/docs-chat/
+├── api/
+│   ├── chat.ts              # Vercel serverless function for chat
+│   └── health.ts            # Vercel serverless function for health check
+├── rag/
+│   ├── embeddings.ts        # OpenAI embeddings wrapper
+│   ├── retriever-factory.ts # Unified retriever (works with any store)
+│   ├── retriever-upstash.ts # Legacy Upstash-specific retriever
+│   ├── retriever.ts         # Legacy LanceDB retriever
+│   ├── store-factory.ts     # Auto-selects Upstash or LanceDB
+│   ├── store-upstash.ts     # Upstash Vector store
+│   └── store.ts             # LanceDB store (local)
+├── build-vector-index.ts    # Index builder script
+├── serve.ts                 # Local dev server
+├── package.json             # Standalone package for Vercel
+├── tsconfig.json            # TypeScript config
+├── vercel.json              # Vercel deployment config
+└── README.md
+```
--- a/scripts/docs-chat/api/chat.ts
+++ b/scripts/docs-chat/api/chat.ts
@@ -0,0 +1,191 @@
+/**
+ * Vercel serverless function for docs-chat API.
+ * Handles RAG-based question answering with streaming responses.
+ *
+ * Environment variables:
+ *   OPENAI_API_KEY - for embeddings and chat completions
+ *   UPSTASH_VECTOR_REST_URL - Upstash Vector endpoint
+ *   UPSTASH_VECTOR_REST_TOKEN - Upstash Vector auth token
+ */
+import type { VercelRequest, VercelResponse } from "@vercel/node";
+import { Embeddings } from "../rag/embeddings.js";
+import { DocsStore } from "../rag/store-upstash.js";
+import { Retriever } from "../rag/retriever-upstash.js";
+
+const MAX_MESSAGE_LENGTH = 2000;
+
+const corsHeaders: Record<string, string> = {
+  "Access-Control-Allow-Origin": "*",
+  "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
+  "Access-Control-Allow-Headers": "Content-Type",
+};
+
+function sendJson(
+  res: VercelResponse,
+  status: number,
+  body: Record<string, unknown>,
+) {
+  Object.entries(corsHeaders).forEach(([key, value]) => {
+    res.setHeader(key, value);
+  });
+  res.status(status).json(body);
+}
+
+async function streamOpenAI(
+  apiKey: string,
+  systemPrompt: string,
+  userMessage: string,
+  res: VercelResponse,
+) {
+  const response = await fetch("https://api.openai.com/v1/chat/completions", {
+    method: "POST",
+    headers: {
+      "Content-Type": "application/json",
+      Authorization: `Bearer ${apiKey}`,
+    },
+    body: JSON.stringify({
+      model: "gpt-4o-mini",
+      stream: true,
+      messages: [
+        { role: "system", content: systemPrompt },
+        { role: "user", content: userMessage },
+      ],
+    }),
+  });
+
+  if (!response.ok || !response.body) {
+    const errorText = await response.text();
+    throw new Error(`OpenAI ${response.status}: ${errorText}`);
+  }
+
+  const decoder = new TextDecoder();
+  let buffer = "";
+
+  for await (const chunk of response.body as AsyncIterable<Uint8Array>) {
+    buffer += decoder.decode(chunk, { stream: true });
+    const lines = buffer.split("\n");
+    buffer = lines.pop() ?? "";
+
+    for (const line of lines) {
+      const trimmed = line.trim();
+      if (!trimmed.startsWith("data:")) continue;
+      const data = trimmed.slice(5).trim();
+      if (data === "[DONE]") return;
+
+      try {
+        const json = JSON.parse(data);
+        const delta = json.choices?.[0]?.delta?.content;
+        if (delta) {
+          res.write(delta);
+        }
+      } catch {
+        // Ignore malformed SSE lines
+      }
+    }
+  }
+}
+
+export default async function handler(req: VercelRequest, res: VercelResponse) {
+  // Handle CORS preflight
+  if (req.method === "OPTIONS") {
+    Object.entries(corsHeaders).forEach(([key, value]) => {
+      res.setHeader(key, value);
+    });
+    res.status(204).end();
+    return;
+  }
+
+  // Only accept POST
+  if (req.method !== "POST") {
+    sendJson(res, 405, { error: "Method not allowed" });
+    return;
+  }
+
+  // Validate environment
+  const apiKey = process.env.OPENAI_API_KEY;
+  if (!apiKey) {
+    sendJson(res, 500, { error: "Server configuration error" });
+    return;
+  }
+
+  // Parse body
+  let message = "";
+  try {
+    const body = typeof req.body === "string" ? JSON.parse(req.body) : req.body;
+    message = body?.message;
+  } catch {
+    sendJson(res, 400, { error: "Invalid JSON" });
+    return;
+  }
+
+  if (!message || typeof message !== "string") {
+    sendJson(res, 400, { error: "message required" });
+    return;
+  }
+
+  const trimmedMessage = message.trim();
+  if (!trimmedMessage) {
+    sendJson(res, 400, { error: "message required" });
+    return;
+  }
+
+  if (trimmedMessage.length > MAX_MESSAGE_LENGTH) {
+    sendJson(res, 400, {
+      error: `Message too long (max ${MAX_MESSAGE_LENGTH} characters)`,
+    });
+    return;
+  }
+
+  try {
+    // Initialize RAG components
+    const embeddings = new Embeddings(apiKey);
+    const store = new DocsStore();
+    const retriever = new Retriever(store, embeddings);
+
+    // Retrieve relevant docs
+    const results = await retriever.retrieve(trimmedMessage, 8);
+
+    if (results.length === 0) {
+      Object.entries(corsHeaders).forEach(([key, value]) => {
+        res.setHeader(key, value);
+      });
+      res.setHeader("Content-Type", "text/plain; charset=utf-8");
+      res.status(200).send(
+        "I couldn't find relevant documentation excerpts for that question. Try rephrasing or search the docs.",
+      );
+      return;
+    }
+
+    // Build context from retrieved chunks
+    const context = results
+      .map(
+        (result) =>
+          `[${result.chunk.title}](${result.chunk.url})\n${result.chunk.content.slice(0, 1200)}`,
+      )
+      .join("\n\n---\n\n");
+
+    const systemPrompt =
+      "You are a helpful assistant for OpenClaw documentation. " +
+      "Answer only from the provided documentation excerpts. " +
+      "If the answer is not in the excerpts, say so and suggest checking the docs. " +
+      "Cite sources by name or URL when relevant.\n\nDocumentation excerpts:\n" +
+      context;
+
+    // Set up streaming response
+    Object.entries(corsHeaders).forEach(([key, value]) => {
+      res.setHeader(key, value);
+    });
+    res.setHeader("Content-Type", "text/plain; charset=utf-8");
+    res.setHeader("Transfer-Encoding", "chunked");
+
+    await streamOpenAI(apiKey, systemPrompt, trimmedMessage, res);
+    res.end();
+  } catch (err) {
+    console.error("Chat error:", err);
+    if (!res.headersSent) {
+      sendJson(res, 500, { error: "Internal server error" });
+    } else {
+      res.end("\n\n[Error processing request]");
+    }
+  }
+}
--- a/scripts/docs-chat/api/health.ts
+++ b/scripts/docs-chat/api/health.ts
@@ -0,0 +1,40 @@
+/**
+ * Health check endpoint for docs-chat API.
+ */
+import type { VercelRequest, VercelResponse } from "@vercel/node";
+import { DocsStore } from "../rag/store-upstash.js";
+
+const corsHeaders: Record<string, string> = {
+  "Access-Control-Allow-Origin": "*",
+  "Access-Control-Allow-Methods": "GET, OPTIONS",
+  "Access-Control-Allow-Headers": "Content-Type",
+};
+
+export default async function handler(req: VercelRequest, res: VercelResponse) {
+  // Handle CORS preflight
+  if (req.method === "OPTIONS") {
+    Object.entries(corsHeaders).forEach(([key, value]) => {
+      res.setHeader(key, value);
+    });
+    res.status(204).end();
+    return;
+  }
+
+  if (req.method !== "GET") {
+    res.status(405).json({ error: "Method not allowed" });
+    return;
+  }
+
+  try {
+    const store = new DocsStore();
+    const count = await store.count();
+
+    Object.entries(corsHeaders).forEach(([key, value]) => {
+      res.setHeader(key, value);
+    });
+    res.status(200).json({ ok: true, chunks: count, mode: "upstash-vector" });
+  } catch (err) {
+    console.error("Health check error:", err);
+    res.status(500).json({ ok: false, error: "Failed to connect to vector store" });
+  }
+}
--- a/scripts/docs-chat/build-vector-index.ts
+++ b/scripts/docs-chat/build-vector-index.ts
@@ -3,33 +3,36 @@
 * Build a vector search index from docs/*.md for the docs-chat RAG pipeline.
 * Usage: bun build-vector-index.ts [--docs path/to/docs] [--base-url https://docs.openclaw.ai]
 *
- * Requires: OPENAI_API_KEY environment variable
+ * Requires environment variables:
+ *   OPENAI_API_KEY - for embeddings
+ *
+ * Optional (for Upstash cloud store):
+ *   UPSTASH_VECTOR_REST_URL - Upstash Vector endpoint
+ *   UPSTASH_VECTOR_REST_TOKEN - Upstash Vector auth token
+ *
+ * If Upstash credentials are not set, falls back to LanceDB (local file store).
 */
 import fs from "node:fs";
 import path from "node:path";
 import { fileURLToPath } from "node:url";
 import { randomUUID } from "node:crypto";
 import { Embeddings } from "./rag/embeddings.js";
-import { DocsStore, type DocsChunk } from "./rag/store.js";
+import { createStore, detectStoreMode, type DocsChunk } from "./rag/store-factory.js";

 const __dirname = path.dirname(fileURLToPath(import.meta.url));
 const root = path.resolve(__dirname, "../..");
 const defaultDocsDir = path.join(root, "docs");
-const defaultDbPath = path.join(__dirname, ".lance-db");

 // Parse CLI arguments
 const args = process.argv.slice(2);
 let docsDir = defaultDocsDir;
 let baseUrl = "https://docs.openclaw.ai";
-let dbPath = defaultDbPath;

 for (let i = 0; i < args.length; i++) {
  if (args[i] === "--docs" && args[i + 1]) {
    docsDir = path.resolve(args[++i]);
  } else if (args[i] === "--base-url" && args[i + 1]) {
    baseUrl = args[++i].replace(/\/$/, "");
-  } else if (args[i] === "--db" && args[i + 1]) {
-    dbPath = path.resolve(args[++i]);
  }
 }

@@ -255,13 +258,18 @@ async function main() {
    vector: vectors[i],
  }));

-  // Store in LanceDB
-  console.error(`Storing in LanceDB at: ${dbPath}`);
-  const store = new DocsStore(dbPath, embeddings.dimensions);
+  // Store in vector database (auto-detects Upstash or LanceDB)
+  const storeMode = detectStoreMode();
+  console.error(
+    `Storing in ${storeMode === "upstash" ? "Upstash Vector" : "LanceDB (local)"}...`,
+  );
+  const { store, mode } = await createStore();
  await store.replaceAll(docsChunks);

  const count = await store.count();
-  console.error(`Done! Stored ${count} chunks in vector database.`);
+  console.error(
+    `Done! Stored ${count} chunks in ${mode === "upstash" ? "Upstash Vector" : "LanceDB"}.`,
+  );
 }

 main().catch((err) => {
--- a/scripts/docs-chat/package.json
+++ b/scripts/docs-chat/package.json
@@ -0,0 +1,20 @@
+{
+  "name": "openclaw-docs-chat",
+  "version": "1.0.0",
+  "private": true,
+  "description": "RAG-based docs chat API for OpenClaw documentation",
+  "type": "module",
+  "scripts": {
+    "dev": "vercel dev",
+    "build:index": "bun build-vector-index.ts",
+    "deploy": "vercel"
+  },
+  "dependencies": {
+    "@upstash/vector": "^1.2.2",
+    "openai": "^6.17.0"
+  },
+  "devDependencies": {
+    "@vercel/node": "^5.5.28",
+    "typescript": "^5.9.3"
+  }
+}
--- a/scripts/docs-chat/rag/retriever-factory.ts
+++ b/scripts/docs-chat/rag/retriever-factory.ts
@@ -0,0 +1,76 @@
+/**
+ * Unified Retriever for docs-chat RAG pipeline.
+ * Works with any IDocsStore implementation (Upstash or LanceDB).
+ */
+import { Embeddings } from "./embeddings.js";
+import type { IDocsStore, DocsChunk } from "./store-factory.js";
+
+export interface RetrievalResult {
+  chunk: Omit<DocsChunk, "vector">;
+  score: number;
+}
+
+export class Retriever {
+  constructor(
+    private readonly store: IDocsStore,
+    private readonly embeddings: Embeddings,
+  ) {}
+
+  /**
+   * Retrieve relevant chunks using hybrid scoring:
+   * - Primary: vector similarity search
+   * - Secondary: keyword boost for exact term matches
+   */
+  async retrieve(query: string, limit: number = 8): Promise<RetrievalResult[]> {
+    // Generate query embedding
+    const queryVector = await this.embeddings.embed(query);
+
+    // Over-fetch for reranking (2x limit)
+    const searchResults = await this.store.search(queryVector, limit * 2);
+
+    if (searchResults.length === 0) {
+      return [];
+    }
+
+    // Apply hybrid scoring
+    const scored = searchResults.map((result) => ({
+      chunk: result.chunk,
+      score: this.hybridScore(result.similarity, query, result.chunk),
+    }));
+
+    // Sort by hybrid score and take top-k
+    scored.sort((a, b) => b.score - a.score);
+
+    return scored.slice(0, limit).map((item) => ({
+      chunk: {
+        id: item.chunk.id,
+        path: item.chunk.path,
+        title: item.chunk.title,
+        content: item.chunk.content,
+        url: item.chunk.url,
+      },
+      score: item.score,
+    }));
+  }
+
+  /**
+   * Compute hybrid score combining vector similarity and keyword boost.
+   */
+  private hybridScore(
+    vectorSimilarity: number,
+    query: string,
+    chunk: DocsChunk,
+  ): number {
+    const words = query
+      .toLowerCase()
+      .split(/\s+/)
+      .filter((w) => w.length > 2);
+    const text = `${chunk.title} ${chunk.content}`.toLowerCase();
+
+    // Count matching words and apply boost
+    const matchingWords = words.filter((word) => text.includes(word));
+    const keywordBoost = matchingWords.length * 0.05;
+
+    return vectorSimilarity + keywordBoost;
+  }
+}
--- a/scripts/docs-chat/rag/retriever-upstash.ts
+++ b/scripts/docs-chat/rag/retriever-upstash.ts
@@ -0,0 +1,76 @@
+/**
+ * Hybrid retriever for docs-chat RAG pipeline (Upstash Vector version).
+ * Combines vector similarity with keyword boosting for improved relevance.
+ */
+import { Embeddings } from "./embeddings.js";
+import { DocsStore, type DocsChunk, type SearchResult } from "./store-upstash.js";
+
+export interface RetrievalResult {
+  chunk: Omit<DocsChunk, "vector">;
+  score: number;
+}
+
+export class Retriever {
+  constructor(
+    private readonly store: DocsStore,
+    private readonly embeddings: Embeddings,
+  ) {}
+
+  /**
+   * Retrieve relevant chunks using hybrid scoring:
+   * - Primary: vector similarity search
+   * - Secondary: keyword boost for exact term matches
+   */
+  async retrieve(query: string, limit: number = 8): Promise<RetrievalResult[]> {
+    // Generate query embedding
+    const queryVector = await this.embeddings.embed(query);
+
+    // Over-fetch for reranking (2x limit)
+    const searchResults = await this.store.search(queryVector, limit * 2);
+
+    if (searchResults.length === 0) {
+      return [];
+    }
+
+    // Apply hybrid scoring
+    const scored = searchResults.map((result) => ({
+      chunk: result.chunk,
+      score: this.hybridScore(result.similarity, query, result.chunk),
+    }));
+
+    // Sort by hybrid score and take top-k
+    scored.sort((a, b) => b.score - a.score);
+
+    return scored.slice(0, limit).map((item) => ({
+      chunk: {
+        id: item.chunk.id,
+        path: item.chunk.path,
+        title: item.chunk.title,
+        content: item.chunk.content,
+        url: item.chunk.url,
+      },
+      score: item.score,
+    }));
+  }
+
+  /**
+   * Compute hybrid score combining vector similarity and keyword boost.
+   */
+  private hybridScore(
+    vectorSimilarity: number,
+    query: string,
+    chunk: DocsChunk,
+  ): number {
+    const words = query
+      .toLowerCase()
+      .split(/\s+/)
+      .filter((w) => w.length > 2);
+    const text = `${chunk.title} ${chunk.content}`.toLowerCase();
+
+    // Count matching words and apply boost
+    const matchingWords = words.filter((word) => text.includes(word));
+    const keywordBoost = matchingWords.length * 0.05;
+
+    return vectorSimilarity + keywordBoost;
+  }
+}
--- a/scripts/docs-chat/rag/store-factory.ts
+++ b/scripts/docs-chat/rag/store-factory.ts
@@ -0,0 +1,69 @@
+/**
+ * Factory for docs-chat vector store.
+ * Auto-selects Upstash (cloud) or LanceDB (local) based on environment.
+ *
+ * Priority:
+ * 1. If UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN are set → Upstash
+ * 2. Otherwise → LanceDB (local file-based store)
+ */
+import path from "node:path";
+import { fileURLToPath } from "node:url";
+
+// Common interfaces shared by both stores
+export interface DocsChunk {
+  id: string;
+  path: string;
+  title: string;
+  content: string;
+  url: string;
+  vector: number[];
+}
+
+export interface SearchResult {
+  chunk: DocsChunk;
+  distance: number;
+  similarity: number;
+}
+
+export interface IDocsStore {
+  replaceAll(chunks: DocsChunk[]): Promise<void>;
+  search(vector: number[], limit?: number): Promise<SearchResult[]>;
+  count(): Promise<number>;
+}
+
+export type StoreMode = "upstash" | "lancedb";
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const DEFAULT_LANCEDB_PATH = path.resolve(__dirname, "../.lancedb");
+const VECTOR_DIM = 3072; // text-embedding-3-large
+
+/**
+ * Detect which store backend to use based on environment.
+ */
+export function detectStoreMode(): StoreMode {
+  const hasUpstash =
+    process.env.UPSTASH_VECTOR_REST_URL &&
+    process.env.UPSTASH_VECTOR_REST_TOKEN;
+  return hasUpstash ? "upstash" : "lancedb";
+}
+
+/**
+ * Create the appropriate store based on environment.
+ * Returns the store instance and which mode was selected.
+ */
+export async function createStore(
+  mode?: StoreMode,
+  lancedbPath?: string,
+): Promise<{ store: IDocsStore; mode: StoreMode }> {
+  const selectedMode = mode ?? detectStoreMode();
+
+  if (selectedMode === "upstash") {
+    const { DocsStore } = await import("./store-upstash.js");
+    return { store: new DocsStore(), mode: "upstash" };
+  }
+
+  // LanceDB (local)
+  const { DocsStore } = await import("./store.js");
+  const dbPath = lancedbPath ?? DEFAULT_LANCEDB_PATH;
+  return { store: new DocsStore(dbPath, VECTOR_DIM), mode: "lancedb" };
+}
--- a/scripts/docs-chat/rag/store-upstash.ts
+++ b/scripts/docs-chat/rag/store-upstash.ts
@@ -0,0 +1,122 @@
+/**
+ * Upstash Vector storage layer for docs-chat RAG pipeline.
+ * Stores document chunks with vector embeddings for semantic search.
+ * Replaces LanceDB for serverless deployment compatibility.
+ */
+import { Index } from "@upstash/vector";
+
+export interface DocsChunk {
+  id: string;
+  path: string;
+  title: string;
+  content: string;
+  url: string;
+  vector: number[];
+}
+
+export interface SearchResult {
+  chunk: DocsChunk;
+  distance: number;
+  similarity: number;
+}
+
+interface ChunkMetadata {
+  path: string;
+  title: string;
+  content: string;
+  url: string;
+}
+
+// Upstash Vector has a limit of 1000 vectors per upsert batch
+const UPSERT_BATCH_SIZE = 1000;
+
+export class DocsStore {
+  private index: Index<ChunkMetadata>;
+
+  constructor() {
+    const url = process.env.UPSTASH_VECTOR_REST_URL;
+    const token = process.env.UPSTASH_VECTOR_REST_TOKEN;
+
+    if (!url || !token) {
+      throw new Error(
+        "UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN are required",
+      );
+    }
+
+    this.index = new Index<ChunkMetadata>({ url, token });
+  }
+
+  /**
+   * Drop existing vectors and upsert new chunks.
+   * Used during index rebuild.
+   */
+  async replaceAll(chunks: DocsChunk[]): Promise<void> {
+    // Reset the index (delete all vectors)
+    await this.index.reset();
+
+    if (chunks.length === 0) {
+      return;
+    }
+
+    // Upsert in batches to respect API limits
+    for (let i = 0; i < chunks.length; i += UPSERT_BATCH_SIZE) {
+      const batch = chunks.slice(i, i + UPSERT_BATCH_SIZE);
+      const vectors = batch.map((chunk) => ({
+        id: chunk.id,
+        vector: chunk.vector,
+        metadata: {
+          path: chunk.path,
+          title: chunk.title,
+          content: chunk.content,
+          url: chunk.url,
+        },
+      }));
+
+      await this.index.upsert(vectors);
+      console.error(
+        `Upserted batch ${Math.floor(i / UPSERT_BATCH_SIZE) + 1}/${Math.ceil(chunks.length / UPSERT_BATCH_SIZE)}`,
+      );
+    }
+  }
+
+  /**
+   * Search for similar chunks using vector similarity.
+   */
+  async search(vector: number[], limit: number = 8): Promise<SearchResult[]> {
+    const results = await this.index.query<ChunkMetadata>({
+      vector,
+      topK: limit,
+      includeMetadata: true,
+      includeVectors: false,
+    });
+
+    return results.map((result) => {
+      // Upstash returns cosine similarity score (0-1, higher is more similar)
+      const similarity = result.score;
+      // Convert to distance for compatibility with existing code
+      const distance = 1 - similarity;
+
+      const metadata = result.metadata!;
+      return {
+        chunk: {
+          id: result.id as string,
+          path: metadata.path,
+          title: metadata.title,
+          content: metadata.content,
+          url: metadata.url,
+          vector: [], // Don't return vector to save memory
+        },
+        distance,
+        similarity,
+      };
+    });
+  }
+
+  /**
+   * Get count of stored chunks.
+   */
+  async count(): Promise<number> {
+    const info = await this.index.info();
+    return info.vectorCount;
+  }
+}
--- a/scripts/docs-chat/serve.ts
+++ b/scripts/docs-chat/serve.ts
@@ -1,18 +1,17 @@
 #!/usr/bin/env bun
 /**
 * Docs-chat API with RAG (vector search).
- * Env: OPENAI_API_KEY, DOCS_CHAT_DB, PORT, RATE_LIMIT, RATE_WINDOW_MS
+ * Auto-detects Upstash Vector (cloud) or LanceDB (local) based on environment.
+ *
+ * Env: OPENAI_API_KEY (required)
+ *      UPSTASH_VECTOR_REST_URL, UPSTASH_VECTOR_REST_TOKEN (optional, for cloud)
+ *      PORT, RATE_LIMIT, RATE_WINDOW_MS
 */
-import path from "node:path";
-import { fileURLToPath } from "node:url";
 import http from "node:http";
 import { Embeddings } from "./rag/embeddings.js";
-import { DocsStore } from "./rag/store.js";
-import { Retriever } from "./rag/retriever.js";
+import { createStore, type IDocsStore, type StoreMode } from "./rag/store-factory.js";
+import { Retriever } from "./rag/retriever-factory.js";

-const __dirname = path.dirname(fileURLToPath(import.meta.url));
-const defaultDbPath = path.join(__dirname, ".lance-db");
-const dbPath = process.env.DOCS_CHAT_DB || defaultDbPath;
 const port = Number(process.env.PORT || 3001);

 // Rate limiting configuration
@@ -78,10 +77,11 @@ if (!apiKey) {
  process.exit(1);
 }

-// Initialize RAG components
+// RAG components (initialized async before server starts)
+let store: IDocsStore;
+let storeMode: StoreMode;
+let retriever: Retriever;
 const embeddings = new Embeddings(apiKey);
-const store = new DocsStore(dbPath, embeddings.dimensions);
-const retriever = new Retriever(store, embeddings);

 const corsHeaders: Record<string, string> = {
  "Access-Control-Allow-Origin": "*",
@@ -241,28 +241,29 @@ const server = http.createServer(async (req, res) => {

  if (req.method === "GET" && (req.url === "/" || req.url === "/health")) {
    const count = await store.count();
-    sendJson(res, 200, { ok: true, chunks: count, mode: "vector" });
+    sendJson(res, 200, { ok: true, chunks: count, mode: storeMode });
    return;
  }

  if (req.method === "POST" && req.url === "/chat") {
-    // Apply rate limiting
-    const clientIP = getClientIP(req);
-    const rateCheck = checkRateLimit(clientIP);
+    // Only apply rate limiting in production (Upstash) mode
+    if (storeMode === "upstash") {
+      const clientIP = getClientIP(req);
+      const rateCheck = checkRateLimit(clientIP);

-    // Add rate limit headers
-    res.setHeader("X-RateLimit-Limit", RATE_LIMIT);
-    res.setHeader("X-RateLimit-Remaining", Math.max(0, rateCheck.remaining));
-    res.setHeader("X-RateLimit-Reset", Math.ceil(rateCheck.resetAt / 1000));
+      res.setHeader("X-RateLimit-Limit", RATE_LIMIT);
+      res.setHeader("X-RateLimit-Remaining", Math.max(0, rateCheck.remaining));
+      res.setHeader("X-RateLimit-Reset", Math.ceil(rateCheck.resetAt / 1000));

-    if (!rateCheck.allowed) {
-      const retryAfter = Math.ceil((rateCheck.resetAt - Date.now()) / 1000);
-      res.setHeader("Retry-After", retryAfter);
-      sendJson(res, 429, {
-        error: "Too many requests. Please wait before trying again.",
-        retryAfter,
-      });
-      return;
+      if (!rateCheck.allowed) {
+        const retryAfter = Math.ceil((rateCheck.resetAt - Date.now()) / 1000);
+        res.setHeader("Retry-After", retryAfter);
+        sendJson(res, 429, {
+          error: "Too many requests. Please wait before trying again.",
+          retryAfter,
+        });
+        return;
+      }
    }

    await handleChat(req, res);
@@ -272,12 +273,28 @@ const server = http.createServer(async (req, res) => {
  sendJson(res, 404, { error: "Not found" });
 });

-server.listen(port, async () => {
-  const count = await store.count();
-  console.error(
-    `docs-chat API (RAG) running at http://localhost:${port} (chunks: ${count})`,
-  );
-  console.error(
-    `Rate limit: ${RATE_LIMIT} requests per ${RATE_WINDOW_MS / 1000}s window`,
-  );
+// Initialize store and start server
+async function main() {
+  const result = await createStore();
+  store = result.store;
+  storeMode = result.mode;
+  retriever = new Retriever(store, embeddings);
+
+  server.listen(port, async () => {
+    const count = await store.count();
+    const modeName = storeMode === "upstash" ? "Upstash Vector" : "LanceDB (local)";
+    console.error(
+      `docs-chat API (${modeName}) running at http://localhost:${port} (chunks: ${count})`,
+    );
+    if (storeMode === "upstash") {
+      console.error(
+        `Rate limit: ${RATE_LIMIT} requests per ${RATE_WINDOW_MS / 1000}s window`,
+      );
+    }
+  });
+}
+
+main().catch((err) => {
+  console.error("Failed to start server:", err);
+  process.exit(1);
 });
--- a/scripts/docs-chat/tsconfig.json
+++ b/scripts/docs-chat/tsconfig.json
@@ -0,0 +1,17 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "ESNext",
+    "moduleResolution": "bundler",
+    "esModuleInterop": true,
+    "strict": true,
+    "skipLibCheck": true,
+    "outDir": "dist",
+    "rootDir": ".",
+    "declaration": false,
+    "resolveJsonModule": true,
+    "allowSyntheticDefaultImports": true
+  },
+  "include": ["api/**/*.ts", "rag/**/*.ts", "*.ts"],
+  "exclude": ["node_modules", "dist"]
+}
--- a/scripts/docs-chat/vercel.json
+++ b/scripts/docs-chat/vercel.json
@@ -0,0 +1,24 @@
+{
+  "$schema": "https://openapi.vercel.sh/vercel.json",
+  "version": 2,
+  "builds": [
+    {
+      "src": "api/**/*.ts",
+      "use": "@vercel/node"
+    }
+  ],
+  "routes": [
+    {
+      "src": "/health",
+      "dest": "/api/health"
+    },
+    {
+      "src": "/chat",
+      "dest": "/api/chat"
+    },
+    {
+      "src": "/(.*)",
+      "dest": "/api/health"
+    }
+  ]
+}