Embeddings Pattern
Generate and manage vector embeddings for semantic search, similarity matching, and retrieval-augmented generation (RAG).
Overview#
Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce similar vectors, enabling semantic search, recommendations, and RAG applications.
When to use:
- Semantic search (find similar content by meaning)
- RAG applications (knowledge bases, document Q&A)
- Recommendation systems
- Content deduplication
- Clustering and classification
Key features:
- OpenAI embedding models
- PostgreSQL vector storage with pgvector
- Semantic similarity search
- Text chunking strategies
- Batch processing
Code Example#
OpenAI Embeddings#
1// lib/embeddings.ts
2import OpenAI from 'openai'
3
4const openai = new OpenAI({
5 apiKey: process.env.OPENAI_API_KEY
6})
7
8export async function getEmbedding(text: string) {
9 const response = await openai.embeddings.create({
10 model: 'text-embedding-3-small',
11 input: text
12 })
13
14 return response.data[0].embedding
15}
16
17export async function getEmbeddings(texts: string[]) {
18 const response = await openai.embeddings.create({
19 model: 'text-embedding-3-small',
20 input: texts
21 })
22
23 return response.data.map((d) => d.embedding)
24}Store with Prisma + pgvector#
1// lib/embeddings.ts
2import { prisma } from '@/lib/db'
3
4export async function storeDocument(
5 content: string,
6 metadata: Record<string, any>
7) {
8 const embedding = await getEmbedding(content)
9
10 await prisma.$executeRaw`
11 INSERT INTO documents (content, metadata, embedding)
12 VALUES (${content}, ${metadata}::jsonb, ${embedding}::vector)
13 `
14}Semantic Search#
1// lib/embeddings.ts
2export async function semanticSearch(
3 query: string,
4 limit = 5
5) {
6 const queryEmbedding = await getEmbedding(query)
7
8 const results = await prisma.$queryRaw`
9 SELECT
10 id,
11 content,
12 metadata,
13 1 - (embedding <=> ${queryEmbedding}::vector) as similarity
14 FROM documents
15 ORDER BY embedding <=> ${queryEmbedding}::vector
16 LIMIT ${limit}
17 `
18
19 return results
20}Text Chunking Strategy#
1// lib/chunking.ts
2export function chunkText(
3 text: string,
4 chunkSize = 1000,
5 overlap = 200
6) {
7 const chunks: string[] = []
8 let start = 0
9
10 while (start < text.length) {
11 const end = Math.min(start + chunkSize, text.length)
12 chunks.push(text.slice(start, end))
13 start += chunkSize - overlap
14 }
15
16 return chunks
17}
18
19// Index a document with chunking
20export async function indexDocument(content: string, docId: string) {
21 const chunks = chunkText(content)
22
23 for (let i = 0; i < chunks.length; i++) {
24 await storeDocument(chunks[i], {
25 source: 'document',
26 documentId: docId,
27 chunkIndex: i
28 })
29 }
30}Database Schema#
1-- Enable pgvector extension
2CREATE EXTENSION IF NOT EXISTS vector;
3
4-- Create documents table with vector column
5CREATE TABLE documents (
6 id SERIAL PRIMARY KEY,
7 content TEXT NOT NULL,
8 metadata JSONB,
9 embedding vector(1536),
10 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
11);
12
13-- Create index for fast similarity search
14CREATE INDEX ON documents
15USING ivfflat (embedding vector_cosine_ops)
16WITH (lists = 100);Prisma Schema with pgvector#
1// prisma/schema.prisma
2generator client {
3 provider = "prisma-client-js"
4 previewFeatures = ["postgresqlExtensions"]
5}
6
7datasource db {
8 provider = "postgresql"
9 url = env("DATABASE_URL")
10 extensions = [vector]
11}
12
13model Document {
14 id String @id @default(cuid())
15 content String
16 metadata Json?
17 embedding Unsupported("vector(1536)")?
18 createdAt DateTime @default(now())
19}Usage Instructions#
- Set up pgvector: Enable the pgvector extension in your PostgreSQL database
- Generate embeddings: Use OpenAI's embedding API to convert text to vectors
- Store embeddings: Save vectors alongside your content in the database
- Create indexes: Add IVFFlat or HNSW indexes for fast similarity search
- Search semantically: Query using cosine distance for relevant results
Best Practices#
- Choose the right model -
text-embedding-3-smallis cost-effective;text-embedding-3-largeoffers higher quality - Chunk appropriately - Keep chunks between 500-1500 characters with overlap
- Use overlap - 10-20% overlap prevents losing context at chunk boundaries
- Batch requests - Process multiple texts in single API calls to reduce latency
- Cache embeddings - Store embeddings to avoid regenerating for unchanged content
- Index wisely - Use IVFFlat for faster queries, HNSW for better recall
- Normalize vectors - Ensure consistent similarity scores by normalizing
Related Patterns#
- RAG - Use embeddings for retrieval-augmented generation
- Semantic Search - Full-text and vector search
- Database Patterns - Prisma setup and queries