Indexes are the most important tool for query optimization. A well-designed index can turn a query from seconds to milliseconds. Here's how to use them effectively.
How Indexes Work#
Without index (table scan):
Query: SELECT * FROM users WHERE email = 'john@example.com'
Table: 1,000,000 rows
Operation: Scan every row → 1,000,000 comparisons
Time: O(n)
With index (index lookup):
B-tree index on email column
Operation: Tree traversal → ~20 comparisons
Time: O(log n)
Index Types#
B-Tree (Default)#
1-- Most common, works for equality and range queries
2CREATE INDEX idx_users_email ON users(email);
3
4-- Good for:
5SELECT * FROM users WHERE email = 'john@example.com';
6SELECT * FROM users WHERE created_at > '2024-01-01';
7SELECT * FROM users WHERE name LIKE 'John%'; -- prefix search
8SELECT * FROM users ORDER BY created_at DESC;Hash Index#
1-- Only for equality comparisons, faster than B-tree for exact matches
2CREATE INDEX idx_users_email_hash ON users USING HASH (email);
3
4-- Good for:
5SELECT * FROM users WHERE email = 'john@example.com';
6
7-- NOT good for:
8SELECT * FROM users WHERE email LIKE 'john%'; -- Can't use hash
9SELECT * FROM users ORDER BY email; -- Can't use for sortingGIN (Generalized Inverted Index)#
1-- For array, JSONB, and full-text search
2CREATE INDEX idx_users_tags ON users USING GIN (tags);
3CREATE INDEX idx_products_data ON products USING GIN (metadata);
4
5-- Good for:
6SELECT * FROM users WHERE tags @> ARRAY['developer'];
7SELECT * FROM products WHERE metadata @> '{"featured": true}';GiST (Generalized Search Tree)#
1-- For geometric and full-text data
2CREATE INDEX idx_locations_point ON locations USING GIST (coordinates);
3CREATE INDEX idx_documents_search ON documents USING GIST (search_vector);
4
5-- Good for:
6SELECT * FROM locations
7WHERE coordinates <@ box '((0,0),(100,100))';Composite Indexes#
Column Order Matters#
1-- Index on (a, b, c)
2CREATE INDEX idx_composite ON orders(customer_id, status, created_at);
3
4-- Can use index:
5WHERE customer_id = 123
6WHERE customer_id = 123 AND status = 'pending'
7WHERE customer_id = 123 AND status = 'pending' AND created_at > '2024-01-01'
8
9-- Cannot use index efficiently:
10WHERE status = 'pending' -- Skips first column
11WHERE customer_id = 123 AND created_at > '2024-01-01' -- Skips middle columnDesign Principles#
1-- Put equality conditions first, range conditions last
2-- Good
3CREATE INDEX idx_orders ON orders(status, customer_id, created_at);
4WHERE status = 'pending' AND customer_id = 123 AND created_at > '2024-01-01'
5
6-- Put high-selectivity columns first
7-- If status has 5 values but customer_id has 100,000 values:
8CREATE INDEX idx_orders ON orders(customer_id, status);Covering Indexes (Index-Only Scans)#
1-- Include all columns needed by query
2CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name, created_at);
3
4-- Query satisfied entirely from index, no table access
5SELECT name, created_at FROM users WHERE email = 'john@example.com';
6
7-- Check if using index-only scan
8EXPLAIN SELECT name, created_at FROM users WHERE email = 'john@example.com';
9-- Look for "Index Only Scan"Partial Indexes#
1-- Index only subset of rows
2CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
3CREATE INDEX idx_recent_orders ON orders(customer_id) WHERE created_at > '2024-01-01';
4
5-- Much smaller than full index
6-- Only used when query matches WHERE clause
7
8-- Good for:
9SELECT * FROM users WHERE email = 'john@example.com' AND status = 'active';Expression Indexes#
1-- Index on computed values
2CREATE INDEX idx_users_lower_email ON users(LOWER(email));
3CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM created_at));
4CREATE INDEX idx_products_price_cents ON products((price * 100)::integer);
5
6-- Query must match expression exactly
7SELECT * FROM users WHERE LOWER(email) = 'john@example.com'; -- Uses index
8SELECT * FROM users WHERE email = 'john@example.com'; -- Doesn't use indexAnalyzing Index Usage#
EXPLAIN ANALYZE#
1EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
2
3-- Output:
4Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=128) (actual time=0.028..0.029 rows=1 loops=1)
5 Index Cond: (email = 'john@example.com'::text)
6Planning Time: 0.081 ms
7Execution Time: 0.048 ms
8
9-- Key things to look for:
10-- - "Index Scan" or "Index Only Scan" = using index
11-- - "Seq Scan" = table scan (might need index)
12-- - "Bitmap Index Scan" = using index, then fetching rows
13-- - actual time vs estimated costFinding Unused Indexes#
1-- PostgreSQL: Check index usage stats
2SELECT
3 schemaname || '.' || relname AS table,
4 indexrelname AS index,
5 idx_scan AS times_used,
6 pg_size_pretty(pg_relation_size(indexrelid)) AS size
7FROM pg_stat_user_indexes
8WHERE idx_scan = 0
9ORDER BY pg_relation_size(indexrelid) DESC;Finding Missing Indexes#
1-- Check for sequential scans on large tables
2SELECT
3 schemaname || '.' || relname AS table,
4 seq_scan,
5 seq_tup_read,
6 idx_scan,
7 n_live_tup AS row_count
8FROM pg_stat_user_tables
9WHERE seq_scan > 0
10 AND n_live_tup > 10000
11ORDER BY seq_tup_read DESC;Common Patterns#
Foreign Key Indexes#
1-- Always index foreign keys (not automatic in PostgreSQL)
2CREATE TABLE orders (
3 id SERIAL PRIMARY KEY,
4 customer_id INTEGER REFERENCES customers(id),
5 product_id INTEGER REFERENCES products(id)
6);
7
8CREATE INDEX idx_orders_customer ON orders(customer_id);
9CREATE INDEX idx_orders_product ON orders(product_id);Pagination#
1-- Keyset pagination is faster than OFFSET
2CREATE INDEX idx_posts_created ON posts(created_at DESC, id DESC);
3
4-- Instead of:
5SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 1000;
6
7-- Use:
8SELECT * FROM posts
9WHERE (created_at, id) < ('2024-01-15', 12345)
10ORDER BY created_at DESC, id DESC
11LIMIT 20;Soft Deletes#
-- Partial index for non-deleted rows
CREATE INDEX idx_users_active_email ON users(email) WHERE deleted_at IS NULL;
-- Most queries only need active records
SELECT * FROM users WHERE email = 'john@example.com' AND deleted_at IS NULL;Anti-Patterns#
Over-Indexing#
1-- Don't index every column
2-- Each index:
3-- - Uses disk space
4-- - Slows down writes (INSERT, UPDATE, DELETE)
5-- - Requires maintenance
6
7-- Only index columns that are:
8-- - Used in WHERE clauses
9-- - Used in JOIN conditions
10-- - Used in ORDER BYLow Selectivity Indexes#
-- Boolean columns often not worth indexing
CREATE INDEX idx_users_active ON users(is_active); -- Usually not helpful
-- Better: partial index
CREATE INDEX idx_users_inactive ON users(email) WHERE is_active = false;Functions Preventing Index Use#
1-- Index on email won't help:
2SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
3SELECT * FROM users WHERE email::text = 'john@example.com';
4SELECT * FROM users WHERE COALESCE(email, '') = 'john@example.com';
5
6-- Either create expression index or rewrite queryMaintenance#
Monitoring Index Bloat#
1-- Check index bloat (PostgreSQL)
2SELECT
3 schemaname || '.' || relname AS table,
4 indexrelname AS index,
5 pg_size_pretty(pg_relation_size(indexrelid)) AS size,
6 idx_scan AS scans
7FROM pg_stat_user_indexes
8ORDER BY pg_relation_size(indexrelid) DESC;Rebuilding Indexes#
1-- Rebuild without blocking (PostgreSQL)
2REINDEX INDEX CONCURRENTLY idx_users_email;
3
4-- Or create new and swap
5CREATE INDEX CONCURRENTLY idx_users_email_new ON users(email);
6DROP INDEX idx_users_email;
7ALTER INDEX idx_users_email_new RENAME TO idx_users_email;Conclusion#
Indexes are essential for query performance but come with trade-offs. Start with indexes on foreign keys and frequently-queried columns. Use EXPLAIN ANALYZE to verify index usage. Monitor for unused indexes and remove them.
Remember: the best index is one that helps your actual queries. Profile first, then optimize.