Embeddings Pattern

Generate and manage vector embeddings for semantic search, similarity matching, and retrieval-augmented generation (RAG).

Overview#

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce similar vectors, enabling semantic search, recommendations, and RAG applications.

When to use:

  • Semantic search (find similar content by meaning)
  • RAG applications (knowledge bases, document Q&A)
  • Recommendation systems
  • Content deduplication
  • Clustering and classification

Key features:

  • OpenAI embedding models
  • PostgreSQL vector storage with pgvector
  • Semantic similarity search
  • Text chunking strategies
  • Batch processing

Code Example#

OpenAI Embeddings#

1// lib/embeddings.ts 2import OpenAI from 'openai' 3 4const openai = new OpenAI({ 5 apiKey: process.env.OPENAI_API_KEY 6}) 7 8export async function getEmbedding(text: string) { 9 const response = await openai.embeddings.create({ 10 model: 'text-embedding-3-small', 11 input: text 12 }) 13 14 return response.data[0].embedding 15} 16 17export async function getEmbeddings(texts: string[]) { 18 const response = await openai.embeddings.create({ 19 model: 'text-embedding-3-small', 20 input: texts 21 }) 22 23 return response.data.map((d) => d.embedding) 24}

Store with Prisma + pgvector#

1// lib/embeddings.ts 2import { prisma } from '@/lib/db' 3 4export async function storeDocument( 5 content: string, 6 metadata: Record<string, any> 7) { 8 const embedding = await getEmbedding(content) 9 10 await prisma.$executeRaw` 11 INSERT INTO documents (content, metadata, embedding) 12 VALUES (${content}, ${metadata}::jsonb, ${embedding}::vector) 13 ` 14}
1// lib/embeddings.ts 2export async function semanticSearch( 3 query: string, 4 limit = 5 5) { 6 const queryEmbedding = await getEmbedding(query) 7 8 const results = await prisma.$queryRaw` 9 SELECT 10 id, 11 content, 12 metadata, 13 1 - (embedding <=> ${queryEmbedding}::vector) as similarity 14 FROM documents 15 ORDER BY embedding <=> ${queryEmbedding}::vector 16 LIMIT ${limit} 17 ` 18 19 return results 20}

Text Chunking Strategy#

1// lib/chunking.ts 2export function chunkText( 3 text: string, 4 chunkSize = 1000, 5 overlap = 200 6) { 7 const chunks: string[] = [] 8 let start = 0 9 10 while (start < text.length) { 11 const end = Math.min(start + chunkSize, text.length) 12 chunks.push(text.slice(start, end)) 13 start += chunkSize - overlap 14 } 15 16 return chunks 17} 18 19// Index a document with chunking 20export async function indexDocument(content: string, docId: string) { 21 const chunks = chunkText(content) 22 23 for (let i = 0; i < chunks.length; i++) { 24 await storeDocument(chunks[i], { 25 source: 'document', 26 documentId: docId, 27 chunkIndex: i 28 }) 29 } 30}

Database Schema#

1-- Enable pgvector extension 2CREATE EXTENSION IF NOT EXISTS vector; 3 4-- Create documents table with vector column 5CREATE TABLE documents ( 6 id SERIAL PRIMARY KEY, 7 content TEXT NOT NULL, 8 metadata JSONB, 9 embedding vector(1536), 10 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP 11); 12 13-- Create index for fast similarity search 14CREATE INDEX ON documents 15USING ivfflat (embedding vector_cosine_ops) 16WITH (lists = 100);

Prisma Schema with pgvector#

1// prisma/schema.prisma 2generator client { 3 provider = "prisma-client-js" 4 previewFeatures = ["postgresqlExtensions"] 5} 6 7datasource db { 8 provider = "postgresql" 9 url = env("DATABASE_URL") 10 extensions = [vector] 11} 12 13model Document { 14 id String @id @default(cuid()) 15 content String 16 metadata Json? 17 embedding Unsupported("vector(1536)")? 18 createdAt DateTime @default(now()) 19}

Usage Instructions#

  1. Set up pgvector: Enable the pgvector extension in your PostgreSQL database
  2. Generate embeddings: Use OpenAI's embedding API to convert text to vectors
  3. Store embeddings: Save vectors alongside your content in the database
  4. Create indexes: Add IVFFlat or HNSW indexes for fast similarity search
  5. Search semantically: Query using cosine distance for relevant results

Best Practices#

  1. Choose the right model - text-embedding-3-small is cost-effective; text-embedding-3-large offers higher quality
  2. Chunk appropriately - Keep chunks between 500-1500 characters with overlap
  3. Use overlap - 10-20% overlap prevents losing context at chunk boundaries
  4. Batch requests - Process multiple texts in single API calls to reduce latency
  5. Cache embeddings - Store embeddings to avoid regenerating for unchanged content
  6. Index wisely - Use IVFFlat for faster queries, HNSW for better recall
  7. Normalize vectors - Ensure consistent similarity scores by normalizing