Slow queries can cripple application performance. This guide covers techniques to analyze, understand, and optimize SQL queries for PostgreSQL and other databases.
Understanding Query Execution#
EXPLAIN ANALYZE#
Always start with understanding how your query executes:
1EXPLAIN ANALYZE
2SELECT u.name, COUNT(o.id) as order_count
3FROM users u
4LEFT JOIN orders o ON o.user_id = u.id
5WHERE u.created_at > '2024-01-01'
6GROUP BY u.id;
7
8-- Output:
9-- HashAggregate (cost=1234.56..1234.78 rows=100 width=40) (actual time=45.123..45.456 rows=98 loops=1)
10-- Group Key: u.id
11-- -> Hash Right Join (cost=123.45..1200.00 rows=5000 width=36) (actual time=1.234..40.567 rows=4532 loops=1)
12-- Hash Cond: (o.user_id = u.id)
13-- -> Seq Scan on orders o (cost=0.00..800.00 rows=50000 width=8) (actual time=0.012..20.345 rows=50000 loops=1)
14-- -> Hash (cost=100.00..100.00 rows=1000 width=36) (actual time=1.000..1.000 rows=1000 loops=1)
15-- -> Seq Scan on users u (cost=0.00..100.00 rows=1000 width=36) (actual time=0.010..0.800 rows=1000 loops=1)
16-- Filter: (created_at > '2024-01-01'::date)
17-- Planning Time: 0.234 ms
18-- Execution Time: 45.789 msKey Metrics to Watch#
1-- Cost: Estimated work (arbitrary units)
2-- Rows: Estimated vs actual row counts
3-- Time: Planning + Execution time
4-- Loops: Number of times operation executed
5
6-- Look for:
7-- 1. Seq Scan on large tables (missing index?)
8-- 2. Large difference between estimated and actual rows
9-- 3. Nested loops with high loop counts
10-- 4. Sort operations on large datasetsCommon Optimizations#
1. Use Appropriate Indexes#
1-- Before: Seq Scan
2SELECT * FROM orders WHERE customer_email = 'john@example.com';
3
4-- Add index
5CREATE INDEX idx_orders_customer_email ON orders(customer_email);
6
7-- After: Index Scan (much faster)
8
9-- Composite index for multiple columns
10CREATE INDEX idx_orders_status_date ON orders(status, created_at);
11
12-- Covers queries like:
13SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01';2. Avoid SELECT *#
1-- Bad: Fetches all columns
2SELECT * FROM users WHERE id = 123;
3
4-- Good: Only fetch what you need
5SELECT id, name, email FROM users WHERE id = 123;
6
7-- Even better with covering index
8CREATE INDEX idx_users_email_name ON users(id) INCLUDE (name, email);
9-- Enables index-only scan3. Optimize JOINs#
1-- Ensure join columns are indexed
2CREATE INDEX idx_orders_user_id ON orders(user_id);
3
4-- Use appropriate join type
5-- INNER JOIN when you need matches in both tables
6SELECT u.name, o.total
7FROM users u
8INNER JOIN orders o ON o.user_id = u.id;
9
10-- LEFT JOIN when you need all from left table
11SELECT u.name, COALESCE(COUNT(o.id), 0) as order_count
12FROM users u
13LEFT JOIN orders o ON o.user_id = u.id
14GROUP BY u.id;4. Limit Result Sets#
1-- Bad: Fetch everything, filter in app
2SELECT * FROM products WHERE category = 'electronics';
3
4-- Good: Paginate in database
5SELECT * FROM products
6WHERE category = 'electronics'
7ORDER BY created_at DESC
8LIMIT 20 OFFSET 0;
9
10-- Better: Keyset pagination for large datasets
11SELECT * FROM products
12WHERE category = 'electronics'
13 AND created_at < '2024-01-15 10:30:00'
14ORDER BY created_at DESC
15LIMIT 20;5. Optimize Subqueries#
1-- Bad: Correlated subquery (runs for each row)
2SELECT u.name,
3 (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
4FROM users u;
5
6-- Good: JOIN with aggregation
7SELECT u.name, COUNT(o.id) as order_count
8FROM users u
9LEFT JOIN orders o ON o.user_id = u.id
10GROUP BY u.id, u.name;
11
12-- Alternative: Lateral join for complex subqueries
13SELECT u.name, recent.total
14FROM users u
15LEFT JOIN LATERAL (
16 SELECT SUM(total) as total
17 FROM orders
18 WHERE user_id = u.id
19 AND created_at > NOW() - INTERVAL '30 days'
20) recent ON true;6. Use EXISTS Instead of IN#
1-- Slower with large subquery results
2SELECT * FROM users
3WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
4
5-- Faster: EXISTS stops at first match
6SELECT * FROM users u
7WHERE EXISTS (
8 SELECT 1 FROM orders o
9 WHERE o.user_id = u.id AND o.total > 1000
10);7. Optimize GROUP BY#
1-- Add index on grouped columns
2CREATE INDEX idx_orders_status ON orders(status);
3
4-- Filter before grouping
5SELECT status, COUNT(*)
6FROM orders
7WHERE created_at > '2024-01-01' -- Filter first
8GROUP BY status;
9
10-- Use partial aggregation for time-series
11SELECT
12 date_trunc('day', created_at) as day,
13 COUNT(*)
14FROM orders
15WHERE created_at > NOW() - INTERVAL '30 days'
16GROUP BY date_trunc('day', created_at)
17ORDER BY day;Advanced Techniques#
Materialized Views#
1-- Create materialized view for expensive aggregations
2CREATE MATERIALIZED VIEW daily_sales AS
3SELECT
4 date_trunc('day', created_at) as day,
5 COUNT(*) as order_count,
6 SUM(total) as revenue
7FROM orders
8GROUP BY date_trunc('day', created_at);
9
10-- Create index on materialized view
11CREATE INDEX idx_daily_sales_day ON daily_sales(day);
12
13-- Refresh periodically
14REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;
15
16-- Query the view instead
17SELECT * FROM daily_sales WHERE day > '2024-01-01';Partitioning#
1-- Partition large tables by time
2CREATE TABLE orders (
3 id SERIAL,
4 user_id INTEGER,
5 total DECIMAL,
6 created_at TIMESTAMP
7) PARTITION BY RANGE (created_at);
8
9-- Create partitions
10CREATE TABLE orders_2024_q1 PARTITION OF orders
11 FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
12
13CREATE TABLE orders_2024_q2 PARTITION OF orders
14 FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
15
16-- Queries automatically use partition pruning
17SELECT * FROM orders WHERE created_at > '2024-03-01';
18-- Only scans orders_2024_q1 and later partitionsCommon Table Expressions (CTEs)#
1-- Use CTEs for readability and reuse
2WITH active_users AS (
3 SELECT id, name
4 FROM users
5 WHERE last_login > NOW() - INTERVAL '30 days'
6),
7user_orders AS (
8 SELECT user_id, SUM(total) as total_spent
9 FROM orders
10 WHERE created_at > NOW() - INTERVAL '30 days'
11 GROUP BY user_id
12)
13SELECT
14 au.name,
15 COALESCE(uo.total_spent, 0) as total_spent
16FROM active_users au
17LEFT JOIN user_orders uo ON uo.user_id = au.id
18ORDER BY total_spent DESC;Query Anti-Patterns#
1. Functions on Indexed Columns#
1-- Bad: Can't use index on email
2SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
3
4-- Good: Expression index
5CREATE INDEX idx_users_email_lower ON users(LOWER(email));
6
7-- Or store normalized data
8ALTER TABLE users ADD COLUMN email_normalized VARCHAR(255);
9CREATE INDEX idx_users_email_normalized ON users(email_normalized);2. Implicit Type Conversions#
-- Bad: user_id is INTEGER, comparing to STRING
SELECT * FROM orders WHERE user_id = '123';
-- Good: Use correct type
SELECT * FROM orders WHERE user_id = 123;3. OR Conditions#
1-- Bad: OR can prevent index usage
2SELECT * FROM orders
3WHERE status = 'pending' OR status = 'processing';
4
5-- Good: Use IN
6SELECT * FROM orders
7WHERE status IN ('pending', 'processing');
8
9-- Or UNION for complex conditions
10SELECT * FROM orders WHERE status = 'pending'
11UNION ALL
12SELECT * FROM orders WHERE customer_id = 123;4. LIKE with Leading Wildcard#
1-- Bad: Can't use index
2SELECT * FROM products WHERE name LIKE '%phone%';
3
4-- Good: Full-text search
5CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name));
6
7SELECT * FROM products
8WHERE to_tsvector('english', name) @@ to_tsquery('phone');Monitoring Query Performance#
1-- Enable query statistics
2CREATE EXTENSION pg_stat_statements;
3
4-- Find slowest queries
5SELECT
6 query,
7 calls,
8 total_exec_time / 1000 as total_seconds,
9 mean_exec_time / 1000 as avg_seconds,
10 rows
11FROM pg_stat_statements
12ORDER BY total_exec_time DESC
13LIMIT 20;
14
15-- Find queries with most rows scanned
16SELECT
17 query,
18 shared_blks_read + shared_blks_hit as total_blocks,
19 rows
20FROM pg_stat_statements
21ORDER BY shared_blks_read DESC
22LIMIT 20;Conclusion#
Query optimization is iterative. Start with EXPLAIN ANALYZE, identify bottlenecks, add appropriate indexes, and restructure queries. Monitor production queries continuously to catch regressions early.