Group G — markdown tech-debt cleanup (post-audit 2026-05-02).
- 36 SKILL.md files: added "## When to use" section. Was missing across the
catalog; orchestrator routing by keyword could not auto-dispatch.
- 20 code-implementer agent .md files: added Output Footer block prescribing
RULE 0.16 STATUS-TRUTH MARKER schema in agent's final report. Previously only
code-implementer-rust.md had it; other 27 language/role variants were silent
about the marker, breaking RULE 0.16 §3 status-truth aggregation for non-Rust
batches.
- skills/site-create/: added phase-5-preview.md and phase-6-deploy.md skeleton
files. SKILL.md table-of-contents referenced 7 phases; only 5 existed on disk.
- skills/{ai-animation,rag-pipeline}/skill.md: added migration banner comment
noting they should be SKILL.md (canonical filename). Case-rename via git is a
separate orchestrator task (macOS APFS is case-insensitive; Linux deploy needs
explicit rename).
- 3 deprecated skills (site-builder, competitor-analysis, design-inspiration):
added concrete removed-after dates (was vague "before v2").
- docs/CONVERGENCE-PLAN.md:129: TBD on _blocks/evidence-grading.md duplicate
resolved (file exists, not duplicated).
- docs/DNA-INDEX.md: count edits made then overwritten by auto-encyclopedia-refresh
hook during agent run. The .kei-registry-ignore files in test fixtures (Group F)
are the structural fix; kei-registry walker implementation is the follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.5 KiB
name: rag-pipeline description: Use when building RAG (Retrieval-Augmented Generation) systems — embedding pipeline, vector database, document ingestion, semantic search, hybrid search. Triggers on "RAG", "embeddings", "vector search", "semantic search", "document ingestion", "knowledge base". arguments:
- name: command description: "Command: init, ingest, search, upgrade" required: false
- name: provider description: "Embedding provider: openai, gemini, voyage, cohere, local (default: openai)" required: false
RAG Pipeline Skill
When to use
- Building a RAG system: embedding pipeline, vector store ingestion, semantic or hybrid search over documents.
- Choosing between embedding providers (OpenAI, Gemini, Voyage, local) or vector databases (LanceDB, Qdrant, Pinecone).
- Adding a knowledge base or document search capability to an existing application.
Build retrieval-augmented generation systems with swappable components.
Architecture
Documents → Ingestion → Chunking → Embedding → Vector DB
↓
Query → Embed Query → Hybrid Search (dense + BM25) → Rerank → LLM Context
Tier Selection
| Tier | Embedding | Vector DB | Cost | Use Case |
|---|---|---|---|---|
| Minimal | OpenAI small ($0.02/MTok) | LanceDB (embedded) | ~$0 | Prototyping, offline |
| Production | Voyage-4 or OpenAI large | LanceDB hybrid / Qdrant | Low | Most projects |
| Multimodal | Gemini Embedding 2 ($0.20/MTok) | LanceDB / Pinecone | Medium | Text + images + video |
Step 1: Init — Choose Stack
Default: LanceDB + OpenAI (zero infrastructure)
npm install lancedb @lancedb/vectordb openai
LanceDB: embedded (no server), Apache Arrow, hybrid search via RRF, scales to billions, Node.js + Python native. Free forever [E1].
Embedding Providers [E1]
| Provider | Model | $/MTok | Dims | Context | Multimodal |
|---|---|---|---|---|---|
| OpenAI | text-embedding-3-small | $0.02 | 1536 | 8K | No |
| OpenAI | text-embedding-3-large | $0.13 | 3072 | 8K | No |
| Gemini | Embedding 2 | $0.20 | 3072 | 8K | Text+Image+Video+Audio |
| Voyage | voyage-3.5 | $0.06 | flex | 32K | No |
| Cohere | Embed 4 | $0.12 | 1536 | 128K | Text+Image |
| Local | nomic-embed-text-v2-moe | FREE | 768 | 8K | No |
Decision: OpenAI small for text-only (cheapest quality). Gemini 2 for multimodal (only unified embedding space). Voyage for domain-specific (code/law/finance). Local nomic for privacy/offline.
Vector DB Comparison [E1]
| DB | Type | Free Tier | Hybrid Search | Setup |
|---|---|---|---|---|
| LanceDB | Embedded | Unlimited (OSS) | Yes (RRF) | npm install |
| ChromaDB | Embedded | Unlimited (OSS) | Yes (BM25) | pip install |
| Pinecone | Cloud | 2GB, 2M writes/mo | Yes | API key |
| Qdrant | Cloud+self | 1GB RAM free | Yes | Docker or API |
Default: LanceDB — zero ops, no server, embedded, free.
Step 2: Ingest — Document Processing
PDF Parsing [E2]
Python (best quality):
pip install pymupdf4llm # PyMuPDF with LLM-optimized markdown output
import pymupdf4llm
md_text = pymupdf4llm.to_markdown("document.pdf")
Node.js:
npm install pdf-parse # basic text extraction
For complex PDFs with tables/images: use LlamaParse API or call PyMuPDF via subprocess.
Chunking Strategy [E2]
Default: Recursive character splitting (512 tokens, 50 overlap)
function chunkText(text: string, maxTokens = 512, overlap = 50): string[] {
const separators = ['\n\n', '\n', '. ', ' '];
const chunks: string[] = [];
let remaining = text;
for (const sep of separators) {
if (remaining.length <= maxTokens * 4) break; // ~4 chars/token
const parts = remaining.split(sep);
let current = '';
for (const part of parts) {
if ((current + sep + part).length > maxTokens * 4) {
if (current) chunks.push(current.trim());
current = part;
} else {
current = current ? current + sep + part : part;
}
}
remaining = current;
}
if (remaining.trim()) chunks.push(remaining.trim());
return chunks;
}
Advanced (production):
- Semantic chunking: split on topic boundaries (+70% accuracy vs fixed) [E2]
- Contextual retrieval: prepend document context to each chunk (-69% error rate with hybrid) [E2]
- Hierarchical: paragraph + section level chunks for multi-granularity retrieval
Step 3: Embed & Store
Embedding
import OpenAI from 'openai';
const openai = new OpenAI();
async function embed(texts: string[]): Promise<number[][]> {
const res = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return res.data.map(d => d.embedding);
}
Gemini Multimodal (images + video + audio in same space)
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const result = await ai.models.embedContent({
model: 'gemini-embedding-exp-03-07',
contents: [{ parts: [{ text: 'query' }] }],
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 },
});
Store in LanceDB
import lancedb from 'lancedb';
const db = await lancedb.connect('./vectors');
const table = await db.createTable('docs', [
{ id: '1', text: 'chunk text', vector: embedding, source: 'file.pdf', page: 1 },
]);
Step 4: Search
Dense Search (cosine similarity)
const results = await table.search(queryEmbedding).limit(5).toArray();
Hybrid Search (dense + BM25 via RRF) [E2]
const results = await table
.search(queryEmbedding, 'vector') // dense
.search('keyword query', 'text') // full-text BM25
.rerank('rrf') // Reciprocal Rank Fusion
.limit(5)
.toArray();
Hybrid search reduces error rate ~69% vs dense-only when combined with contextual retrieval [E2].
Vercel AI SDK Pattern
import { embed, cosineSimilarity } from 'ai';
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: query,
});
const results = chunks
.map(c => ({ ...c, score: cosineSimilarity(embedding, c.embedding) }))
.sort((a, b) => b.score - a.score)
.slice(0, 5);
Claude Tool-Based Retrieval
const tools = [{
name: 'search_documents',
description: 'Search the knowledge base for relevant information',
input_schema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
limit: { type: 'number', description: 'Max results (default 5)' },
},
required: ['query'],
},
}];
// Claude decides when to search. Backend queries vector DB, returns as tool result.
Cost Calculator
For 1000 documents (~500 pages, ~0.4M tokens):
| Component | OpenAI small | Gemini 2 | Local |
|---|---|---|---|
| Embedding | $0.008 | $0.080 | $0 |
| Storage (LanceDB) | $0 | $0 | $0 |
| Per query embed | $0.000002 | $0.00002 | $0 |
| LLM call dominates query cost | ~$0.003-0.015 per query |
Upgrade Paths
- Minimal → Production: Add hybrid search (BM25 + vector), add reranking
- Production → Multimodal: Switch to Gemini Embedding 2, add image/video ingestion
- Embedded → Cloud: Swap LanceDB for Qdrant Cloud or Pinecone (API-compatible)