feat: integrate Upstash Vector for enhanced document retrieval in chat API

- Implemented Upstash Vector as a cloud-based storage solution for document chunks, replacing the local LanceDB option.
- Added auto-detection of storage mode based on environment variables for seamless integration.
- Updated the chat API to utilize the new retrieval mechanism, enhancing response accuracy and performance.
- Enhanced README with setup instructions for Upstash and updated environment variable requirements.
- Introduced new scripts and configurations for managing the vector index and API interactions.
This commit is contained in:
Buns Enchantress
2026-02-03 04:39:04 -06:00
parent 4a543d15d1
commit 4b42a54452
16 changed files with 907 additions and 122 deletions

3
.gitignore vendored
View File

@@ -73,4 +73,5 @@ USER.md
.serena/
# docs-chat vector database
scripts/docs-chat/.lance-db/
.lancedb
.lance-db

View File

@@ -0,0 +1,17 @@
/**
* Configuration for the docs-chat widget.
* Automatically selects API URL based on Mintlify environment.
*/
(() => {
const hostname = window.location.hostname;
// Mintlify local dev (mintlify dev runs on localhost)
if (hostname === "localhost" || hostname === "127.0.0.1") {
window.DOCS_CHAT_API_URL = "http://localhost:3001";
return;
}
// Production (docs.openclaw.ai and *.mintlify.app previews)
// TODO: Update this to the actual production URL
window.DOCS_CHAT_API_URL = "https://claw-api.openknot.ai";
})();

View File

@@ -174,6 +174,7 @@
"@sinclair/typebox": "0.34.48",
"@slack/bolt": "^4.6.0",
"@slack/web-api": "^7.13.0",
"@upstash/vector": "^1.2.2",
"@whiskeysockets/baileys": "7.0.0-rc.9",
"ajv": "^8.17.1",
"chalk": "^5.6.2",

8
pnpm-lock.yaml generated
View File

@@ -75,6 +75,9 @@ importers:
'@slack/web-api':
specifier: ^7.13.0
version: 7.13.0
'@upstash/vector':
specifier: ^1.2.2
version: 1.2.2
'@whiskeysockets/baileys':
specifier: 7.0.0-rc.9
version: 7.0.0-rc.9(audio-decode@2.2.3)(sharp@0.34.5)
@@ -2772,6 +2775,9 @@ packages:
resolution: {integrity: sha512-IlqQ/Gv22xUC1r/WQm4StLkYQmaaTsXAhUVsNE0+xiyf0yRFiH5++q78U3bw6bLKDCTmh0uqKB9eG9+Bt75Dkg==}
engines: {node: '>=20.0.0'}
'@upstash/vector@1.2.2':
resolution: {integrity: sha512-ptQ9xnxtKqmpNK52PCcHCszlPOLxIBfjsv7ty8RoF95pkjctS9rSjTQ3Pl9bx5VFbpDj+0dMXw88WLt6swDkgQ==}
'@urbit/aura@3.0.0':
resolution: {integrity: sha512-N8/FHc/lmlMDCumMuTXyRHCxlov5KZY6unmJ9QR2GOw+OpROZMBsXYGwE+ZMtvN21ql9+Xb8KhGNBj08IrG3Wg==}
engines: {node: '>=16', npm: '>=8'}
@@ -8018,6 +8024,8 @@ snapshots:
transitivePeerDependencies:
- supports-color
'@upstash/vector@1.2.2': {}
'@urbit/aura@3.0.0': {}
'@urbit/http-api@3.0.0':

View File

@@ -3,79 +3,6 @@
Docs chatbot that uses RAG (Retrieval-Augmented Generation) to answer questions
from the OpenClaw documentation via semantic search.
## RAG Pipeline (Recommended)
The vector-based RAG pipeline uses OpenAI embeddings and LanceDB for semantic
search. This provides much better results than keyword matching.
### Build the vector index
```bash
OPENAI_API_KEY=sk-... pnpm docs:chat:index:vector
```
This generates embeddings for all doc chunks and stores them in
`scripts/docs-chat/.lance-db/` (gitignored).
### Run the RAG API
```bash
OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
```
Defaults to `http://localhost:3001`. Optional environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3001` | Server port |
| `RATE_LIMIT` | `20` | Max requests per window per IP |
| `RATE_WINDOW_MS` | `60000` | Rate limit window in milliseconds |
Health check:
```bash
curl http://localhost:3001/health
# Returns: {"ok":true,"chunks":N,"mode":"vector"}
```
## Legacy Keyword Pipeline
The original keyword-based implementation is still available for backward
compatibility.
### Build the keyword index
```bash
pnpm docs:chat:index
```
This generates `scripts/docs-chat/search-index.json` from `docs/**/*.md`.
### Run the keyword API
```bash
OPENAI_API_KEY=sk-... pnpm docs:chat:serve
```
## Pipeline Integration
CI rebuilds the keyword index whenever docs change so PRs keep
`scripts/docs-chat/search-index.json` in sync. For production deployments with
RAG, run `pnpm docs:chat:index:vector` during deploy.
## Mintlify widget
Mintlify loads any `.js` in the docs content directory on every page.
`docs/assets/docs-chat-widget.js` injects a floating "Ask Molty" button and
calls the API at:
```
window.DOCS_CHAT_API_URL || "http://localhost:3001"
```
To use a deployed API, set `window.DOCS_CHAT_API_URL` before the widget runs
(for example by adding another small `.js` file in `docs/assets/` that sets it).
## Architecture
```
@@ -89,12 +16,183 @@ docs/**/*.md
┌─────────────────┐
.lance-db/ │ LanceDB Vector Store
Vector Store Upstash (cloud) or LanceDB (local)
└────────┬────────┘
┌─────────────────┐
serve.ts │ Hybrid Retrieval (Vector + Keyword Boost)
│ → GPT-4o-mini Streaming Response
API Server │ Hybrid Retrieval (Vector + Keyword Boost)
serve.ts │ → GPT-4o-mini Streaming Response
└─────────────────┘
```
## Storage Backends
The pipeline supports two vector storage backends, auto-detected based on
environment variables:
| Backend | When Used | Best For |
| ----------- | --------------------------------------------- | ------------------------- |
| **LanceDB** | Default (no Upstash credentials) | Local dev, POC, testing |
| **Upstash** | When `UPSTASH_VECTOR_REST_*` env vars are set | Production, Vercel deploy |
**Recommendation:** For production deployments, use Upstash Vector for its
serverless scalability and Vercel compatibility. LanceDB is great for local
development and proof-of-concept work without external dependencies.
## Quick Start (Local with LanceDB)
For local development without external services:
```bash
# Only OPENAI_API_KEY is required - uses LanceDB automatically
OPENAI_API_KEY=sk-... pnpm docs:chat:index:vector
OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
```
The index is stored locally in `scripts/docs-chat/.lancedb/`.
## Production Setup (Upstash Vector)
### 1. Create Upstash Vector Index
1. Go to [Upstash Console](https://console.upstash.com/)
2. Create a new Vector index with:
- **Dimensions:** 3072 (for `text-embedding-3-large`)
- **Distance Metric:** Cosine
3. Copy the REST URL and token
### 2. Environment Variables
| Variable | Required | Description |
| --------------------------- | -------- | ------------------------------------ |
| `OPENAI_API_KEY` | Yes | OpenAI API key for embeddings + chat |
| `UPSTASH_VECTOR_REST_URL` | No\* | Upstash Vector REST endpoint |
| `UPSTASH_VECTOR_REST_TOKEN` | No\* | Upstash Vector REST token |
\* Required for Upstash mode; omit both for LanceDB mode.
### 3. Build the Vector Index
```bash
OPENAI_API_KEY=sk-... \
UPSTASH_VECTOR_REST_URL=https://... \
UPSTASH_VECTOR_REST_TOKEN=... \
pnpm docs:chat:index:vector
```
This generates embeddings for all doc chunks and upserts them to Upstash Vector.
### 4. Deploy to Vercel
```bash
cd scripts/docs-chat
npm install
vercel
```
Set the environment variables in the Vercel dashboard.
## Local Development
### Run the API locally
```bash
# With Upstash (cloud):
OPENAI_API_KEY=sk-... \
UPSTASH_VECTOR_REST_URL=https://... \
UPSTASH_VECTOR_REST_TOKEN=... \
pnpm docs:chat:serve:vector
# With LanceDB (local):
OPENAI_API_KEY=sk-... pnpm docs:chat:serve:vector
```
Defaults to `http://localhost:3001`. Optional environment variables:
| Variable | Default | Description |
| ---------------- | ------- | ---------------------------------------------- |
| `PORT` | `3001` | Server port |
| `RATE_LIMIT` | `20` | Max requests per window per IP (Upstash only) |
| `RATE_WINDOW_MS` | `60000` | Rate limit window in milliseconds (Upstash only) |
> **Note:** Rate limiting is only enforced in Upstash (production) mode. Local
development with LanceDB has no rate limits.
### Health check
```bash
curl http://localhost:3001/health
# Returns: {"ok":true,"chunks":N,"mode":"upstash"} # or "lancedb"
```
## Mintlify Widget
Mintlify loads `.js` files from the docs content directory on every page.
- `docs/assets/docs-chat-config.js` - Sets the API URL
- `docs/assets/docs-chat-widget.js` - The chat widget
To configure the production API URL, edit `docs/assets/docs-chat-config.js`:
```javascript
window.DOCS_CHAT_API_URL = "https://your-project.vercel.app";
```
## API Endpoints
### POST /chat
Send a message and receive a streaming response.
**Request:**
```json
{ "message": "How do I configure the gateway?" }
```
**Response:** Streaming text/plain with the AI response.
### GET /health
Check API health and vector count.
**Response:**
```json
{ "ok": true, "chunks": 847, "mode": "upstash-vector" }
```
## Legacy Pipelines
### Keyword-based search
The keyword-based implementation is still available for backward compatibility:
```bash
pnpm docs:chat:index # Build keyword index
pnpm docs:chat:serve # Run keyword API
```
## File Structure
```
scripts/docs-chat/
├── api/
│ ├── chat.ts # Vercel serverless function for chat
│ └── health.ts # Vercel serverless function for health check
├── rag/
│ ├── embeddings.ts # OpenAI embeddings wrapper
│ ├── retriever-factory.ts # Unified retriever (works with any store)
│ ├── retriever-upstash.ts # Legacy Upstash-specific retriever
│ ├── retriever.ts # Legacy LanceDB retriever
│ ├── store-factory.ts # Auto-selects Upstash or LanceDB
│ ├── store-upstash.ts # Upstash Vector store
│ └── store.ts # LanceDB store (local)
├── build-vector-index.ts # Index builder script
├── serve.ts # Local dev server
├── package.json # Standalone package for Vercel
├── tsconfig.json # TypeScript config
├── vercel.json # Vercel deployment config
└── README.md
```

View File

@@ -0,0 +1,191 @@
/**
* Vercel serverless function for docs-chat API.
* Handles RAG-based question answering with streaming responses.
*
* Environment variables:
* OPENAI_API_KEY - for embeddings and chat completions
* UPSTASH_VECTOR_REST_URL - Upstash Vector endpoint
* UPSTASH_VECTOR_REST_TOKEN - Upstash Vector auth token
*/
import type { VercelRequest, VercelResponse } from "@vercel/node";
import { Embeddings } from "../rag/embeddings.js";
import { DocsStore } from "../rag/store-upstash.js";
import { Retriever } from "../rag/retriever-upstash.js";
const MAX_MESSAGE_LENGTH = 2000;
const corsHeaders: Record<string, string> = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type",
};
function sendJson(
res: VercelResponse,
status: number,
body: Record<string, unknown>,
) {
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.status(status).json(body);
}
async function streamOpenAI(
apiKey: string,
systemPrompt: string,
userMessage: string,
res: VercelResponse,
) {
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model: "gpt-4o-mini",
stream: true,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: userMessage },
],
}),
});
if (!response.ok || !response.body) {
const errorText = await response.text();
throw new Error(`OpenAI ${response.status}: ${errorText}`);
}
const decoder = new TextDecoder();
let buffer = "";
for await (const chunk of response.body as AsyncIterable<Uint8Array>) {
buffer += decoder.decode(chunk, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed.startsWith("data:")) continue;
const data = trimmed.slice(5).trim();
if (data === "[DONE]") return;
try {
const json = JSON.parse(data);
const delta = json.choices?.[0]?.delta?.content;
if (delta) {
res.write(delta);
}
} catch {
// Ignore malformed SSE lines
}
}
}
}
export default async function handler(req: VercelRequest, res: VercelResponse) {
// Handle CORS preflight
if (req.method === "OPTIONS") {
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.status(204).end();
return;
}
// Only accept POST
if (req.method !== "POST") {
sendJson(res, 405, { error: "Method not allowed" });
return;
}
// Validate environment
const apiKey = process.env.OPENAI_API_KEY;
if (!apiKey) {
sendJson(res, 500, { error: "Server configuration error" });
return;
}
// Parse body
let message = "";
try {
const body = typeof req.body === "string" ? JSON.parse(req.body) : req.body;
message = body?.message;
} catch {
sendJson(res, 400, { error: "Invalid JSON" });
return;
}
if (!message || typeof message !== "string") {
sendJson(res, 400, { error: "message required" });
return;
}
const trimmedMessage = message.trim();
if (!trimmedMessage) {
sendJson(res, 400, { error: "message required" });
return;
}
if (trimmedMessage.length > MAX_MESSAGE_LENGTH) {
sendJson(res, 400, {
error: `Message too long (max ${MAX_MESSAGE_LENGTH} characters)`,
});
return;
}
try {
// Initialize RAG components
const embeddings = new Embeddings(apiKey);
const store = new DocsStore();
const retriever = new Retriever(store, embeddings);
// Retrieve relevant docs
const results = await retriever.retrieve(trimmedMessage, 8);
if (results.length === 0) {
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.setHeader("Content-Type", "text/plain; charset=utf-8");
res.status(200).send(
"I couldn't find relevant documentation excerpts for that question. Try rephrasing or search the docs.",
);
return;
}
// Build context from retrieved chunks
const context = results
.map(
(result) =>
`[${result.chunk.title}](${result.chunk.url})\n${result.chunk.content.slice(0, 1200)}`,
)
.join("\n\n---\n\n");
const systemPrompt =
"You are a helpful assistant for OpenClaw documentation. " +
"Answer only from the provided documentation excerpts. " +
"If the answer is not in the excerpts, say so and suggest checking the docs. " +
"Cite sources by name or URL when relevant.\n\nDocumentation excerpts:\n" +
context;
// Set up streaming response
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.setHeader("Content-Type", "text/plain; charset=utf-8");
res.setHeader("Transfer-Encoding", "chunked");
await streamOpenAI(apiKey, systemPrompt, trimmedMessage, res);
res.end();
} catch (err) {
console.error("Chat error:", err);
if (!res.headersSent) {
sendJson(res, 500, { error: "Internal server error" });
} else {
res.end("\n\n[Error processing request]");
}
}
}

View File

@@ -0,0 +1,40 @@
/**
* Health check endpoint for docs-chat API.
*/
import type { VercelRequest, VercelResponse } from "@vercel/node";
import { DocsStore } from "../rag/store-upstash.js";
const corsHeaders: Record<string, string> = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "GET, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type",
};
export default async function handler(req: VercelRequest, res: VercelResponse) {
// Handle CORS preflight
if (req.method === "OPTIONS") {
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.status(204).end();
return;
}
if (req.method !== "GET") {
res.status(405).json({ error: "Method not allowed" });
return;
}
try {
const store = new DocsStore();
const count = await store.count();
Object.entries(corsHeaders).forEach(([key, value]) => {
res.setHeader(key, value);
});
res.status(200).json({ ok: true, chunks: count, mode: "upstash-vector" });
} catch (err) {
console.error("Health check error:", err);
res.status(500).json({ ok: false, error: "Failed to connect to vector store" });
}
}

View File

@@ -3,33 +3,36 @@
* Build a vector search index from docs/*.md for the docs-chat RAG pipeline.
* Usage: bun build-vector-index.ts [--docs path/to/docs] [--base-url https://docs.openclaw.ai]
*
* Requires: OPENAI_API_KEY environment variable
* Requires environment variables:
* OPENAI_API_KEY - for embeddings
*
* Optional (for Upstash cloud store):
* UPSTASH_VECTOR_REST_URL - Upstash Vector endpoint
* UPSTASH_VECTOR_REST_TOKEN - Upstash Vector auth token
*
* If Upstash credentials are not set, falls back to LanceDB (local file store).
*/
import fs from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";
import { randomUUID } from "node:crypto";
import { Embeddings } from "./rag/embeddings.js";
import { DocsStore, type DocsChunk } from "./rag/store.js";
import { createStore, detectStoreMode, type DocsChunk } from "./rag/store-factory.js";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const root = path.resolve(__dirname, "../..");
const defaultDocsDir = path.join(root, "docs");
const defaultDbPath = path.join(__dirname, ".lance-db");
// Parse CLI arguments
const args = process.argv.slice(2);
let docsDir = defaultDocsDir;
let baseUrl = "https://docs.openclaw.ai";
let dbPath = defaultDbPath;
for (let i = 0; i < args.length; i++) {
if (args[i] === "--docs" && args[i + 1]) {
docsDir = path.resolve(args[++i]);
} else if (args[i] === "--base-url" && args[i + 1]) {
baseUrl = args[++i].replace(/\/$/, "");
} else if (args[i] === "--db" && args[i + 1]) {
dbPath = path.resolve(args[++i]);
}
}
@@ -255,13 +258,18 @@ async function main() {
vector: vectors[i],
}));
// Store in LanceDB
console.error(`Storing in LanceDB at: ${dbPath}`);
const store = new DocsStore(dbPath, embeddings.dimensions);
// Store in vector database (auto-detects Upstash or LanceDB)
const storeMode = detectStoreMode();
console.error(
`Storing in ${storeMode === "upstash" ? "Upstash Vector" : "LanceDB (local)"}...`,
);
const { store, mode } = await createStore();
await store.replaceAll(docsChunks);
const count = await store.count();
console.error(`Done! Stored ${count} chunks in vector database.`);
console.error(
`Done! Stored ${count} chunks in ${mode === "upstash" ? "Upstash Vector" : "LanceDB"}.`,
);
}
main().catch((err) => {

View File

@@ -0,0 +1,20 @@
{
"name": "openclaw-docs-chat",
"version": "1.0.0",
"private": true,
"description": "RAG-based docs chat API for OpenClaw documentation",
"type": "module",
"scripts": {
"dev": "vercel dev",
"build:index": "bun build-vector-index.ts",
"deploy": "vercel"
},
"dependencies": {
"@upstash/vector": "^1.2.2",
"openai": "^6.17.0"
},
"devDependencies": {
"@vercel/node": "^5.5.28",
"typescript": "^5.9.3"
}
}

View File

@@ -0,0 +1,76 @@
/**
* Unified Retriever for docs-chat RAG pipeline.
* Works with any IDocsStore implementation (Upstash or LanceDB).
*/
import { Embeddings } from "./embeddings.js";
import type { IDocsStore, DocsChunk } from "./store-factory.js";
export interface RetrievalResult {
chunk: Omit<DocsChunk, "vector">;
score: number;
}
export class Retriever {
constructor(
private readonly store: IDocsStore,
private readonly embeddings: Embeddings,
) {}
/**
* Retrieve relevant chunks using hybrid scoring:
* - Primary: vector similarity search
* - Secondary: keyword boost for exact term matches
*/
async retrieve(query: string, limit: number = 8): Promise<RetrievalResult[]> {
// Generate query embedding
const queryVector = await this.embeddings.embed(query);
// Over-fetch for reranking (2x limit)
const searchResults = await this.store.search(queryVector, limit * 2);
if (searchResults.length === 0) {
return [];
}
// Apply hybrid scoring
const scored = searchResults.map((result) => ({
chunk: result.chunk,
score: this.hybridScore(result.similarity, query, result.chunk),
}));
// Sort by hybrid score and take top-k
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, limit).map((item) => ({
chunk: {
id: item.chunk.id,
path: item.chunk.path,
title: item.chunk.title,
content: item.chunk.content,
url: item.chunk.url,
},
score: item.score,
}));
}
/**
* Compute hybrid score combining vector similarity and keyword boost.
*/
private hybridScore(
vectorSimilarity: number,
query: string,
chunk: DocsChunk,
): number {
const words = query
.toLowerCase()
.split(/\s+/)
.filter((w) => w.length > 2);
const text = `${chunk.title} ${chunk.content}`.toLowerCase();
// Count matching words and apply boost
const matchingWords = words.filter((word) => text.includes(word));
const keywordBoost = matchingWords.length * 0.05;
return vectorSimilarity + keywordBoost;
}
}

View File

@@ -0,0 +1,76 @@
/**
* Hybrid retriever for docs-chat RAG pipeline (Upstash Vector version).
* Combines vector similarity with keyword boosting for improved relevance.
*/
import { Embeddings } from "./embeddings.js";
import { DocsStore, type DocsChunk, type SearchResult } from "./store-upstash.js";
export interface RetrievalResult {
chunk: Omit<DocsChunk, "vector">;
score: number;
}
export class Retriever {
constructor(
private readonly store: DocsStore,
private readonly embeddings: Embeddings,
) {}
/**
* Retrieve relevant chunks using hybrid scoring:
* - Primary: vector similarity search
* - Secondary: keyword boost for exact term matches
*/
async retrieve(query: string, limit: number = 8): Promise<RetrievalResult[]> {
// Generate query embedding
const queryVector = await this.embeddings.embed(query);
// Over-fetch for reranking (2x limit)
const searchResults = await this.store.search(queryVector, limit * 2);
if (searchResults.length === 0) {
return [];
}
// Apply hybrid scoring
const scored = searchResults.map((result) => ({
chunk: result.chunk,
score: this.hybridScore(result.similarity, query, result.chunk),
}));
// Sort by hybrid score and take top-k
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, limit).map((item) => ({
chunk: {
id: item.chunk.id,
path: item.chunk.path,
title: item.chunk.title,
content: item.chunk.content,
url: item.chunk.url,
},
score: item.score,
}));
}
/**
* Compute hybrid score combining vector similarity and keyword boost.
*/
private hybridScore(
vectorSimilarity: number,
query: string,
chunk: DocsChunk,
): number {
const words = query
.toLowerCase()
.split(/\s+/)
.filter((w) => w.length > 2);
const text = `${chunk.title} ${chunk.content}`.toLowerCase();
// Count matching words and apply boost
const matchingWords = words.filter((word) => text.includes(word));
const keywordBoost = matchingWords.length * 0.05;
return vectorSimilarity + keywordBoost;
}
}

View File

@@ -0,0 +1,69 @@
/**
* Factory for docs-chat vector store.
* Auto-selects Upstash (cloud) or LanceDB (local) based on environment.
*
* Priority:
* 1. If UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN are set → Upstash
* 2. Otherwise → LanceDB (local file-based store)
*/
import path from "node:path";
import { fileURLToPath } from "node:url";
// Common interfaces shared by both stores
export interface DocsChunk {
id: string;
path: string;
title: string;
content: string;
url: string;
vector: number[];
}
export interface SearchResult {
chunk: DocsChunk;
distance: number;
similarity: number;
}
export interface IDocsStore {
replaceAll(chunks: DocsChunk[]): Promise<void>;
search(vector: number[], limit?: number): Promise<SearchResult[]>;
count(): Promise<number>;
}
export type StoreMode = "upstash" | "lancedb";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const DEFAULT_LANCEDB_PATH = path.resolve(__dirname, "../.lancedb");
const VECTOR_DIM = 3072; // text-embedding-3-large
/**
* Detect which store backend to use based on environment.
*/
export function detectStoreMode(): StoreMode {
const hasUpstash =
process.env.UPSTASH_VECTOR_REST_URL &&
process.env.UPSTASH_VECTOR_REST_TOKEN;
return hasUpstash ? "upstash" : "lancedb";
}
/**
* Create the appropriate store based on environment.
* Returns the store instance and which mode was selected.
*/
export async function createStore(
mode?: StoreMode,
lancedbPath?: string,
): Promise<{ store: IDocsStore; mode: StoreMode }> {
const selectedMode = mode ?? detectStoreMode();
if (selectedMode === "upstash") {
const { DocsStore } = await import("./store-upstash.js");
return { store: new DocsStore(), mode: "upstash" };
}
// LanceDB (local)
const { DocsStore } = await import("./store.js");
const dbPath = lancedbPath ?? DEFAULT_LANCEDB_PATH;
return { store: new DocsStore(dbPath, VECTOR_DIM), mode: "lancedb" };
}

View File

@@ -0,0 +1,122 @@
/**
* Upstash Vector storage layer for docs-chat RAG pipeline.
* Stores document chunks with vector embeddings for semantic search.
* Replaces LanceDB for serverless deployment compatibility.
*/
import { Index } from "@upstash/vector";
export interface DocsChunk {
id: string;
path: string;
title: string;
content: string;
url: string;
vector: number[];
}
export interface SearchResult {
chunk: DocsChunk;
distance: number;
similarity: number;
}
interface ChunkMetadata {
path: string;
title: string;
content: string;
url: string;
}
// Upstash Vector has a limit of 1000 vectors per upsert batch
const UPSERT_BATCH_SIZE = 1000;
export class DocsStore {
private index: Index<ChunkMetadata>;
constructor() {
const url = process.env.UPSTASH_VECTOR_REST_URL;
const token = process.env.UPSTASH_VECTOR_REST_TOKEN;
if (!url || !token) {
throw new Error(
"UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN are required",
);
}
this.index = new Index<ChunkMetadata>({ url, token });
}
/**
* Drop existing vectors and upsert new chunks.
* Used during index rebuild.
*/
async replaceAll(chunks: DocsChunk[]): Promise<void> {
// Reset the index (delete all vectors)
await this.index.reset();
if (chunks.length === 0) {
return;
}
// Upsert in batches to respect API limits
for (let i = 0; i < chunks.length; i += UPSERT_BATCH_SIZE) {
const batch = chunks.slice(i, i + UPSERT_BATCH_SIZE);
const vectors = batch.map((chunk) => ({
id: chunk.id,
vector: chunk.vector,
metadata: {
path: chunk.path,
title: chunk.title,
content: chunk.content,
url: chunk.url,
},
}));
await this.index.upsert(vectors);
console.error(
`Upserted batch ${Math.floor(i / UPSERT_BATCH_SIZE) + 1}/${Math.ceil(chunks.length / UPSERT_BATCH_SIZE)}`,
);
}
}
/**
* Search for similar chunks using vector similarity.
*/
async search(vector: number[], limit: number = 8): Promise<SearchResult[]> {
const results = await this.index.query<ChunkMetadata>({
vector,
topK: limit,
includeMetadata: true,
includeVectors: false,
});
return results.map((result) => {
// Upstash returns cosine similarity score (0-1, higher is more similar)
const similarity = result.score;
// Convert to distance for compatibility with existing code
const distance = 1 - similarity;
const metadata = result.metadata!;
return {
chunk: {
id: result.id as string,
path: metadata.path,
title: metadata.title,
content: metadata.content,
url: metadata.url,
vector: [], // Don't return vector to save memory
},
distance,
similarity,
};
});
}
/**
* Get count of stored chunks.
*/
async count(): Promise<number> {
const info = await this.index.info();
return info.vectorCount;
}
}

View File

@@ -1,18 +1,17 @@
#!/usr/bin/env bun
/**
* Docs-chat API with RAG (vector search).
* Env: OPENAI_API_KEY, DOCS_CHAT_DB, PORT, RATE_LIMIT, RATE_WINDOW_MS
* Auto-detects Upstash Vector (cloud) or LanceDB (local) based on environment.
*
* Env: OPENAI_API_KEY (required)
* UPSTASH_VECTOR_REST_URL, UPSTASH_VECTOR_REST_TOKEN (optional, for cloud)
* PORT, RATE_LIMIT, RATE_WINDOW_MS
*/
import path from "node:path";
import { fileURLToPath } from "node:url";
import http from "node:http";
import { Embeddings } from "./rag/embeddings.js";
import { DocsStore } from "./rag/store.js";
import { Retriever } from "./rag/retriever.js";
import { createStore, type IDocsStore, type StoreMode } from "./rag/store-factory.js";
import { Retriever } from "./rag/retriever-factory.js";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const defaultDbPath = path.join(__dirname, ".lance-db");
const dbPath = process.env.DOCS_CHAT_DB || defaultDbPath;
const port = Number(process.env.PORT || 3001);
// Rate limiting configuration
@@ -78,10 +77,11 @@ if (!apiKey) {
process.exit(1);
}
// Initialize RAG components
// RAG components (initialized async before server starts)
let store: IDocsStore;
let storeMode: StoreMode;
let retriever: Retriever;
const embeddings = new Embeddings(apiKey);
const store = new DocsStore(dbPath, embeddings.dimensions);
const retriever = new Retriever(store, embeddings);
const corsHeaders: Record<string, string> = {
"Access-Control-Allow-Origin": "*",
@@ -241,28 +241,29 @@ const server = http.createServer(async (req, res) => {
if (req.method === "GET" && (req.url === "/" || req.url === "/health")) {
const count = await store.count();
sendJson(res, 200, { ok: true, chunks: count, mode: "vector" });
sendJson(res, 200, { ok: true, chunks: count, mode: storeMode });
return;
}
if (req.method === "POST" && req.url === "/chat") {
// Apply rate limiting
const clientIP = getClientIP(req);
const rateCheck = checkRateLimit(clientIP);
// Only apply rate limiting in production (Upstash) mode
if (storeMode === "upstash") {
const clientIP = getClientIP(req);
const rateCheck = checkRateLimit(clientIP);
// Add rate limit headers
res.setHeader("X-RateLimit-Limit", RATE_LIMIT);
res.setHeader("X-RateLimit-Remaining", Math.max(0, rateCheck.remaining));
res.setHeader("X-RateLimit-Reset", Math.ceil(rateCheck.resetAt / 1000));
res.setHeader("X-RateLimit-Limit", RATE_LIMIT);
res.setHeader("X-RateLimit-Remaining", Math.max(0, rateCheck.remaining));
res.setHeader("X-RateLimit-Reset", Math.ceil(rateCheck.resetAt / 1000));
if (!rateCheck.allowed) {
const retryAfter = Math.ceil((rateCheck.resetAt - Date.now()) / 1000);
res.setHeader("Retry-After", retryAfter);
sendJson(res, 429, {
error: "Too many requests. Please wait before trying again.",
retryAfter,
});
return;
if (!rateCheck.allowed) {
const retryAfter = Math.ceil((rateCheck.resetAt - Date.now()) / 1000);
res.setHeader("Retry-After", retryAfter);
sendJson(res, 429, {
error: "Too many requests. Please wait before trying again.",
retryAfter,
});
return;
}
}
await handleChat(req, res);
@@ -272,12 +273,28 @@ const server = http.createServer(async (req, res) => {
sendJson(res, 404, { error: "Not found" });
});
server.listen(port, async () => {
const count = await store.count();
console.error(
`docs-chat API (RAG) running at http://localhost:${port} (chunks: ${count})`,
);
console.error(
`Rate limit: ${RATE_LIMIT} requests per ${RATE_WINDOW_MS / 1000}s window`,
);
// Initialize store and start server
async function main() {
const result = await createStore();
store = result.store;
storeMode = result.mode;
retriever = new Retriever(store, embeddings);
server.listen(port, async () => {
const count = await store.count();
const modeName = storeMode === "upstash" ? "Upstash Vector" : "LanceDB (local)";
console.error(
`docs-chat API (${modeName}) running at http://localhost:${port} (chunks: ${count})`,
);
if (storeMode === "upstash") {
console.error(
`Rate limit: ${RATE_LIMIT} requests per ${RATE_WINDOW_MS / 1000}s window`,
);
}
});
}
main().catch((err) => {
console.error("Failed to start server:", err);
process.exit(1);
});

View File

@@ -0,0 +1,17 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"esModuleInterop": true,
"strict": true,
"skipLibCheck": true,
"outDir": "dist",
"rootDir": ".",
"declaration": false,
"resolveJsonModule": true,
"allowSyntheticDefaultImports": true
},
"include": ["api/**/*.ts", "rag/**/*.ts", "*.ts"],
"exclude": ["node_modules", "dist"]
}

View File

@@ -0,0 +1,24 @@
{
"$schema": "https://openapi.vercel.sh/vercel.json",
"version": 2,
"builds": [
{
"src": "api/**/*.ts",
"use": "@vercel/node"
}
],
"routes": [
{
"src": "/health",
"dest": "/api/health"
},
{
"src": "/chat",
"dest": "/api/chat"
},
{
"src": "/(.*)",
"dest": "/api/health"
}
]
}