Database performance can make or break an application. Slow queries frustrate users, overloaded databases crash systems, and poor schema design creates technical debt that compounds over time. AI helps at every stage—from initial design to production optimization.
Query Optimization#
Analyzing Slow Queries#
Start with EXPLAIN ANALYZE output:
Analyze this slow query and suggest optimizations:
Query:
SELECT u.*, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
ORDER BY total_spent DESC
LIMIT 100;
EXPLAIN ANALYZE output:
[paste output]
Current execution time: 4.2 seconds
Target: Under 100ms
AI identifies:
- Missing indexes
- Inefficient join strategies
- Opportunities for query rewriting
Index Recommendations#
Recommend indexes for these query patterns:
Table: orders (10M rows)
Columns: id, user_id, status, created_at, total, shipping_address_id
Query patterns:
1. WHERE user_id = ? ORDER BY created_at DESC (1000/sec)
2. WHERE status = 'pending' AND created_at < ? (10/min)
3. WHERE created_at BETWEEN ? AND ? (100/day)
4. Full-text search on shipping address (rare)
Current indexes: PRIMARY(id), INDEX(user_id)
Recommend additional indexes with impact analysis.
Query Rewriting#
Rewrite this query for better performance:
```sql
SELECT *
FROM products p
WHERE p.category_id IN (
SELECT c.id FROM categories c WHERE c.parent_id = 5
)
AND p.price < (
SELECT AVG(price) FROM products WHERE category_id = p.category_id
)
ORDER BY p.created_at DESC;
Correlated subquery runs for every row. Optimize this.
## Schema Design
### Normalization Analysis
Review this schema for normalization issues:
1CREATE TABLE orders (
2 id SERIAL PRIMARY KEY,
3 customer_name VARCHAR(255),
4 customer_email VARCHAR(255),
5 customer_phone VARCHAR(50),
6 customer_address TEXT,
7 product_name VARCHAR(255),
8 product_price DECIMAL(10,2),
9 product_category VARCHAR(100),
10 quantity INT,
11 order_date TIMESTAMP
12);Identify normalization issues and suggest improved schema.
### Denormalization Strategy
Design denormalization for this read-heavy use case:
Normalized tables:
- users (id, name, email)
- posts (id, user_id, title, content, created_at)
- comments (id, post_id, user_id, content, created_at)
- likes (id, post_id, user_id)
Read patterns:
- Post feed with author name, comment count, like count (10k/sec)
- Post detail with all comments and authors (1k/sec)
- User profile with post count, total likes received (500/sec)
Current approach: JOIN everything Problem: 200ms average query time
Suggest denormalization with consistency strategy.
### Time-Series Optimization
Design a time-series schema for IoT sensor data:
Requirements:
- 10,000 sensors reporting every minute
- Store 2 years of data (10+ billion rows)
- Query patterns: last hour, daily aggregates, monthly trends
- Alerting on threshold breaches
Options to evaluate:
- PostgreSQL with partitioning
- TimescaleDB
- ClickHouse
Recommend schema and infrastructure.
## Connection Management
### Pool Sizing
Calculate optimal connection pool settings:
Database: PostgreSQL 14 Server: 8 vCPUs, 32GB RAM Application servers: 10 pods, each with connection pool Workload: Mixed read/write, avg query time 5ms
Current issues:
- "connection refused" errors during traffic spikes
- Idle connections consuming memory
- Occasional "too many connections" errors
Recommend:
- max_connections for PostgreSQL
- Pool size per application instance
- Idle timeout settings
### Connection Pooler Setup
Configure PgBouncer for this workload:
Application: 50 serverless functions Database: PostgreSQL (max 100 connections) Query patterns: Short queries, mostly reads
PgBouncer settings to configure:
- Pool mode (session vs transaction vs statement)
- Pool size
- Reserve pool
- Server lifetime
Provide pgbouncer.ini configuration.
## Caching Strategies
### Cache Layer Design
Design a caching strategy for this application:
Hot data:
- User sessions (1M active)
- Product catalog (50k products, updates hourly)
- Cart data (100k active carts)
- Recently viewed (per user)
Cache options:
- Application memory
- Redis cluster
- Database query cache
Requirements:
- Cache invalidation must be accurate
- Eventual consistency acceptable for catalog
- Sessions must be strongly consistent
Design the caching architecture.
### Cache Invalidation
Implement cache invalidation for this scenario:
Cached: User profile with follower count Cache key: user:profile:{userId} TTL: 1 hour
Invalidation triggers:
- User updates profile
- User gains/loses follower
- User changes privacy settings
Problem: Follow/unfollow is high-frequency, invalidating cache constantly.
Design efficient invalidation strategy.
## Replication and Scaling
### Read Replica Strategy
Design read replica usage for this application:
Primary: PostgreSQL (write-heavy) Read patterns:
- User-facing queries (must be fresh)
- Analytics dashboards (can tolerate lag)
- Search indexing (batch, can lag)
- Reporting (nightly, can lag significantly)
Questions:
- How many replicas?
- How to route queries?
- How to handle replica lag?
- How to failover?
### Sharding Strategy
Design a sharding strategy for this table:
Table: events (currently 500M rows, growing 10M/day) Schema: id, user_id, event_type, data, created_at
Access patterns:
- 99% queries filter by user_id
- Occasional queries across all users (analytics)
- Time-range queries within user's events
Evaluate:
- Shard by user_id hash
- Shard by time range
- Hybrid approach
Recommend sharding key and migration strategy.
## Migration and Maintenance
### Zero-Downtime Migrations
Plan a zero-downtime migration:
Change: Add NOT NULL column with default value to users table Table size: 50M rows Constraint: No downtime, no locks > 1 second
Steps needed:
- Add nullable column
- Backfill data
- Add NOT NULL constraint
Provide migration scripts and rollback plan.
### Maintenance Automation
Create maintenance automation for PostgreSQL:
Tasks:
- VACUUM ANALYZE (prevent bloat)
- Reindex (prevent index bloat)
- Identify unused indexes
- Archive old data
- Monitor table sizes
Provide:
- SQL scripts for each task
- Scheduling recommendations
- Alerting thresholds
## Monitoring and Alerting
### Key Metrics
Define monitoring for this PostgreSQL database:
Metrics to track:
- Query performance
- Connection usage
- Replication lag
- Disk usage
- Cache hit rates
For each metric:
- How to measure
- Warning threshold
- Critical threshold
- Remediation steps
### Query Analysis
Analyze these pg_stat_statements results:
Top queries by total time:
- SELECT * FROM products WHERE category = $1 (calls: 1M, mean: 50ms)
- INSERT INTO events ... (calls: 500k, mean: 5ms)
- SELECT * FROM users WHERE id = $1 (calls: 2M, mean: 2ms)
- UPDATE sessions SET ... (calls: 800k, mean: 100ms)
Identify optimization priorities.
## Conclusion
Database optimization is iterative and ongoing. AI accelerates each cycle—analyzing query plans, suggesting indexes, designing schemas, and identifying bottlenecks. The database that runs efficiently today may struggle tomorrow as data grows and patterns change.
Build observability into your database from the start. Monitor continuously, optimize proactively, and use AI to spot opportunities that might otherwise be missed. Your users will experience the difference in every page load.