Back to Blog
DatabaseBackupDisaster RecoveryDevOps

Database Backup and Recovery Strategies

Protect your data from loss. From backup strategies to point-in-time recovery to disaster recovery planning.

B
Bootspring Team
Engineering
December 5, 2023
6 min read

Data loss can destroy a business. A solid backup strategy protects against hardware failures, human errors, and disasters. Here's how to implement bulletproof backups.

Backup Types#

Full Backup: - Complete copy of database - Longest to create - Fastest to restore - Use for: Weekly/monthly backups Incremental Backup: - Only changes since last backup - Fastest to create - Requires all increments to restore - Use for: Daily/hourly backups Differential Backup: - Changes since last full backup - Medium creation time - Only needs full + latest differential - Use for: Daily backups

PostgreSQL Backup#

pg_dump (Logical Backup)#

1# Full database backup 2pg_dump -h localhost -U postgres -d mydb -F c -f backup.dump 3 4# Specific tables 5pg_dump -h localhost -U postgres -d mydb -t users -t orders -F c -f partial.dump 6 7# Schema only 8pg_dump -h localhost -U postgres -d mydb --schema-only -f schema.sql 9 10# Data only 11pg_dump -h localhost -U postgres -d mydb --data-only -f data.sql 12 13# Compressed backup 14pg_dump -h localhost -U postgres -d mydb | gzip > backup.sql.gz 15 16# Restore 17pg_restore -h localhost -U postgres -d mydb backup.dump 18 19# Restore with options 20pg_restore -h localhost -U postgres -d mydb \ 21 --clean --if-exists \ 22 --no-owner --no-privileges \ 23 backup.dump

pg_basebackup (Physical Backup)#

1# Physical backup for point-in-time recovery 2pg_basebackup -h localhost -U replication -D /backup/base \ 3 -Fp -Xs -P -R 4 5# With compression 6pg_basebackup -h localhost -U replication -D /backup/base \ 7 -Ft -z -Xs -P

Continuous Archiving (WAL)#

1# postgresql.conf 2archive_mode = on 3archive_command = 'cp %p /archive/wal/%f' 4wal_level = replica 5 6# Point-in-time recovery 7restore_command = 'cp /archive/wal/%f %p' 8recovery_target_time = '2024-01-15 14:30:00'

Automated Backup Script#

1#!/bin/bash 2# backup.sh 3 4set -e 5 6# Configuration 7DB_HOST="${DB_HOST:-localhost}" 8DB_NAME="${DB_NAME:-mydb}" 9DB_USER="${DB_USER:-postgres}" 10BACKUP_DIR="/backups" 11RETENTION_DAYS=30 12S3_BUCKET="my-backups" 13DATE=$(date +%Y%m%d_%H%M%S) 14 15# Create backup 16BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${DATE}.dump" 17echo "Creating backup: ${BACKUP_FILE}" 18 19pg_dump -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" \ 20 -F c -f "$BACKUP_FILE" 21 22# Compress 23gzip "$BACKUP_FILE" 24BACKUP_FILE="${BACKUP_FILE}.gz" 25 26# Upload to S3 27echo "Uploading to S3..." 28aws s3 cp "$BACKUP_FILE" "s3://${S3_BUCKET}/postgres/${DB_NAME}/" 29 30# Cleanup old local backups 31echo "Cleaning up old backups..." 32find "$BACKUP_DIR" -name "*.dump.gz" -mtime +$RETENTION_DAYS -delete 33 34# Cleanup old S3 backups 35aws s3 ls "s3://${S3_BUCKET}/postgres/${DB_NAME}/" | \ 36 while read -r line; do 37 createDate=$(echo "$line" | awk '{print $1" "$2}') 38 createDate=$(date -d "$createDate" +%s) 39 olderThan=$(date -d "-${RETENTION_DAYS} days" +%s) 40 if [[ $createDate -lt $olderThan ]]; then 41 fileName=$(echo "$line" | awk '{print $4}') 42 aws s3 rm "s3://${S3_BUCKET}/postgres/${DB_NAME}/$fileName" 43 fi 44 done 45 46echo "Backup complete: ${BACKUP_FILE}"

Kubernetes CronJob#

1apiVersion: batch/v1 2kind: CronJob 3metadata: 4 name: postgres-backup 5spec: 6 schedule: "0 2 * * *" # Daily at 2 AM 7 concurrencyPolicy: Forbid 8 jobTemplate: 9 spec: 10 template: 11 spec: 12 containers: 13 - name: backup 14 image: postgres:15 15 command: 16 - /bin/bash 17 - -c 18 - | 19 pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME -F c | \ 20 gzip | \ 21 aws s3 cp - s3://$S3_BUCKET/backups/$(date +%Y%m%d).dump.gz 22 env: 23 - name: DB_HOST 24 value: postgres-service 25 - name: DB_USER 26 valueFrom: 27 secretKeyRef: 28 name: postgres-secret 29 key: username 30 - name: PGPASSWORD 31 valueFrom: 32 secretKeyRef: 33 name: postgres-secret 34 key: password 35 - name: AWS_ACCESS_KEY_ID 36 valueFrom: 37 secretKeyRef: 38 name: aws-secret 39 key: access-key 40 - name: AWS_SECRET_ACCESS_KEY 41 valueFrom: 42 secretKeyRef: 43 name: aws-secret 44 key: secret-key 45 restartPolicy: OnFailure

Recovery Procedures#

1// Recovery runbook 2const recoveryProcedure = { 3 steps: [ 4 { 5 name: 'Assess the situation', 6 actions: [ 7 'Identify what data was lost', 8 'Determine recovery point needed', 9 'Notify stakeholders', 10 ], 11 }, 12 { 13 name: 'Prepare recovery environment', 14 actions: [ 15 'Provision new database server if needed', 16 'Download backup from S3', 17 'Verify backup integrity', 18 ], 19 }, 20 { 21 name: 'Restore database', 22 commands: [ 23 'createdb -h localhost -U postgres mydb_restored', 24 'pg_restore -h localhost -U postgres -d mydb_restored backup.dump', 25 ], 26 }, 27 { 28 name: 'Verify data', 29 actions: [ 30 'Run data integrity checks', 31 'Compare row counts', 32 'Test application connectivity', 33 ], 34 }, 35 { 36 name: 'Switch traffic', 37 actions: [ 38 'Update connection strings', 39 'Restart application servers', 40 'Monitor for errors', 41 ], 42 }, 43 ], 44};

Backup Testing#

1// Automated backup verification 2async function verifyBackup(backupPath: string): Promise<boolean> { 3 const tempDb = `verify_${Date.now()}`; 4 5 try { 6 // Create temp database 7 await exec(`createdb ${tempDb}`); 8 9 // Restore backup 10 await exec(`pg_restore -d ${tempDb} ${backupPath}`); 11 12 // Run integrity checks 13 const checks = await Promise.all([ 14 verifyTableCounts(tempDb), 15 verifyConstraints(tempDb), 16 verifyIndexes(tempDb), 17 runSampleQueries(tempDb), 18 ]); 19 20 return checks.every((c) => c.passed); 21 } finally { 22 // Cleanup 23 await exec(`dropdb ${tempDb}`); 24 } 25} 26 27async function verifyTableCounts(db: string): Promise<CheckResult> { 28 const result = await query(db, ` 29 SELECT schemaname, relname, n_live_tup 30 FROM pg_stat_user_tables 31 ORDER BY n_live_tup DESC 32 `); 33 34 // Compare with expected counts 35 const expected = await getExpectedCounts(); 36 const mismatches = result.filter( 37 (row) => Math.abs(row.n_live_tup - expected[row.relname]) > 100 38 ); 39 40 return { 41 passed: mismatches.length === 0, 42 details: mismatches, 43 }; 44}

Disaster Recovery Plan#

1## DR Checklist 2 3### RTO (Recovery Time Objective): 4 hours 4### RPO (Recovery Point Objective): 1 hour 5 6### Primary Failure 71. [ ] Detect failure (monitoring alerts) 82. [ ] Assess impact 93. [ ] Initiate failover to replica 104. [ ] Update DNS/load balancer 115. [ ] Verify application connectivity 126. [ ] Notify stakeholders 13 14### Regional Failure 151. [ ] Activate DR region 162. [ ] Restore from cross-region backup 173. [ ] Update global DNS 184. [ ] Scale infrastructure 195. [ ] Verify all services operational 20 21### Data Corruption 221. [ ] Identify affected tables/timeframe 232. [ ] Stop writes to affected tables 243. [ ] Restore from point-in-time backup 254. [ ] Reconcile any lost transactions 265. [ ] Resume normal operations 27 28### Contact List 29- DBA On-Call: +1-xxx-xxx-xxxx 30- Infrastructure: +1-xxx-xxx-xxxx 31- CTO: +1-xxx-xxx-xxxx

Best Practices#

DO: ✓ Test backups regularly (monthly) ✓ Store backups in multiple regions ✓ Encrypt backups at rest ✓ Monitor backup job success ✓ Document recovery procedures ✓ Practice recovery drills DON'T: ✗ Store backups only on same server ✗ Skip backup verification ✗ Ignore backup failures ✗ Keep backups indefinitely (storage costs) ✗ Forget to backup WAL for PITR

Conclusion#

Backups are insurance—you hope to never need them, but when you do, they're invaluable. Automate backups, test recovery regularly, and document procedures for when disaster strikes.

The time to prepare for data loss is before it happens.

Share this article

Help spread the word about Bootspring