Database replication ensures your data survives failures and scales to handle read-heavy workloads. Here's how to implement replication effectively.
Replication Patterns#
Primary-Replica (Master-Slave)#
┌──────────────┐
│ Primary │ ← All writes
│ (Master) │
└──────┬───────┘
│ Replication
├──────────────┬──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Replica 1 │ │ Replica 2 │ │ Replica 3 │
│ (Read) │ │ (Read) │ │ (Read) │
└──────────────┘ └──────────────┘ └──────────────┘
Benefits:
- Read scaling
- Backup without impacting primary
- Geographic distribution
Drawbacks:
- Write bottleneck
- Replication lag
- Failover complexity
Multi-Primary (Master-Master)#
┌──────────────┐ ┌──────────────┐
│ Primary 1 │◄───►│ Primary 2 │
│ (R/W) │ │ (R/W) │
└──────────────┘ └──────────────┘
Benefits:
- Write scaling
- No single point of failure
Drawbacks:
- Conflict resolution needed
- More complex
- Higher latency for consistency
PostgreSQL Replication#
Streaming Replication Setup#
1# Primary configuration (postgresql.conf)
2wal_level = replica
3max_wal_senders = 10
4wal_keep_size = 1GB
5synchronous_commit = on
6
7# pg_hba.conf - allow replication connections
8host replication replicator replica_ip/32 scram-sha-2561# Replica setup
2# Stop PostgreSQL, remove data directory
3pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -R
4
5# The -R flag creates standby.signal and configures recovery
6# Start PostgreSQL - it will begin replicatingApplication Configuration#
1// Read/write splitting with Prisma
2import { PrismaClient } from '@prisma/client';
3
4const writeClient = new PrismaClient({
5 datasources: { db: { url: process.env.DATABASE_URL_PRIMARY } },
6});
7
8const readClient = new PrismaClient({
9 datasources: { db: { url: process.env.DATABASE_URL_REPLICA } },
10});
11
12// Use appropriate client
13async function getUser(id: string) {
14 return readClient.user.findUnique({ where: { id } });
15}
16
17async function createUser(data: CreateUserInput) {
18 return writeClient.user.create({ data });
19}
20
21// Or use a single client with read replicas
22const prisma = new PrismaClient().$extends({
23 query: {
24 $allModels: {
25 async $allOperations({ operation, model, args, query }) {
26 const readOperations = ['findUnique', 'findFirst', 'findMany', 'count', 'aggregate'];
27 if (readOperations.includes(operation)) {
28 // Route to replica
29 }
30 return query(args);
31 },
32 },
33 },
34});Synchronous vs Asynchronous#
Asynchronous Replication#
Primary commits → Returns to client → Replicates later
Pros:
- Lower latency
- Primary doesn't wait
Cons:
- Data loss possible
- Replication lag
Synchronous Replication#
1-- PostgreSQL synchronous replication
2-- postgresql.conf on primary
3synchronous_standby_names = 'replica1,replica2'
4synchronous_commit = on
5
6-- Check replication status
7SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
8FROM pg_stat_replication;Primary commits → Waits for replica → Returns to client
Pros:
- Zero data loss
- Strong consistency
Cons:
- Higher latency
- Primary blocked if replica fails
Failover Strategies#
Automatic Failover with Patroni#
1# patroni.yml
2scope: postgres-cluster
3name: node1
4
5restapi:
6 listen: 0.0.0.0:8008
7
8etcd:
9 hosts: etcd1:2379,etcd2:2379,etcd3:2379
10
11bootstrap:
12 dcs:
13 ttl: 30
14 loop_wait: 10
15 retry_timeout: 10
16 maximum_lag_on_failover: 1048576
17 postgresql:
18 use_pg_rewind: true
19 parameters:
20 wal_level: replica
21 hot_standby: on
22 max_wal_senders: 10
23
24postgresql:
25 listen: 0.0.0.0:5432
26 data_dir: /var/lib/postgresql/data
27 authentication:
28 replication:
29 username: replicator
30 password: secretConnection Pooling with PgBouncer#
1; pgbouncer.ini
2[databases]
3mydb = host=primary_host port=5432 dbname=mydb
4mydb_ro = host=replica_host port=5432 dbname=mydb
5
6[pgbouncer]
7listen_addr = 0.0.0.0
8listen_port = 6432
9auth_type = scram-sha-256
10pool_mode = transaction
11max_client_conn = 1000
12default_pool_size = 25Multi-Region Replication#
US-East (Primary)
│
├──► US-West (Async Replica)
│
└──► EU-West (Async Replica)
Considerations:
- Network latency (50-200ms cross-region)
- Conflict resolution strategy
- Data sovereignty requirements
- Failover and failback procedures
AWS RDS Multi-AZ#
1// Automatic failover with RDS
2// Connection string points to endpoint that routes to current primary
3const connectionString = `postgres://user:pass@mydb.cluster-xxx.us-east-1.rds.amazonaws.com:5432/mydb`;
4
5// Read replicas for scaling reads
6const readReplicaString = `postgres://user:pass@mydb.cluster-ro-xxx.us-east-1.rds.amazonaws.com:5432/mydb`;Monitoring Replication#
1-- PostgreSQL replication lag
2SELECT
3 client_addr,
4 state,
5 pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
6 pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / 1024 / 1024 AS lag_mb
7FROM pg_stat_replication;
8
9-- On replica: check lag
10SELECT
11 CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
12 THEN 0
13 ELSE EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp())
14 END AS replication_lag_seconds;1# Prometheus alerting
2- alert: ReplicationLagHigh
3 expr: pg_replication_lag_seconds > 30
4 for: 5m
5 labels:
6 severity: warning
7 annotations:
8 summary: "Database replication lag is {{ $value }}s"Conclusion#
Database replication is essential for high availability and read scaling. Start with primary-replica for most cases, add synchronous replication for zero data loss requirements, and implement automatic failover for production reliability.
Monitor replication lag continuously and test failover procedures regularly. The best replication setup is one you've practiced recovering from.