19. Database Replication

[!NOTE] Replication is keeping identical copies of data on multiple machines. It serves three purposes: fault tolerance (if one machine dies, others have the data), read scaling (spread read traffic across replicas), and geographic locality (place data closer to users). The tradeoff is always consistency vs latency.

Single-Leader Replication

The most common replication model. One node (the leader/primary) accepts all writes. It streams changes to one or more follower/replica nodes, which serve read traffic.

  Writes ─────→ [Leader/Primary]
                    │         │
                    ▼         ▼
              [Follower 1] [Follower 2]
                    ↑         ↑
              Reads ──────────┘

Synchronous vs Asynchronous Replication

Type	How It Works	Pros	Cons
Synchronous	Leader waits for follower to confirm the write	Guaranteed consistency — followers always have latest data	Higher write latency; one slow follower blocks everything
Asynchronous	Leader confirms the write immediately, streams to followers in background	Low write latency; fast	Replication lag — followers may serve stale data
Semi-synchronous	Leader waits for at least 1 follower (not all)	Balance of durability and performance	Slightly more complex

Real-world: PostgreSQL streaming replication defaults to asynchronous. For critical financial systems, you can enable synchronous_commit = on with a synchronous standby, accepting higher write latency for guaranteed durability.

Replication Lag

In async replication, the follower is always slightly behind the leader. This delay is called replication lag. Under normal load, it is milliseconds. Under heavy write load, it can grow to seconds or longer.

Replication lag causes these consistency anomalies:

Read-your-writes violation: A user updates their profile on the leader, then reads from a follower that has not replicated yet — they see their old data. Fix: route reads after writes to the leader for that user, or use a "read-after-write" token.
Monotonic reads violation: A user makes two reads, each served by a different follower. The second follower might be more behind, showing older data than the first read. Fix: pin a user to a specific follower (using consistent hashing by user ID).

Multi-Leader Replication

Multiple nodes accept writes independently. Each leader replicates to every other leader. This is useful for multi-datacenter deployments where you want low-latency writes in every region.

  [Leader US-East] ←──────→ [Leader EU-West]
       │                          │
  [Follower]                 [Follower]

The big problem is write conflicts. If two users edit the same record simultaneously on different leaders, you have a conflict that must be resolved.

Conflict Resolution Strategies

Strategy	How	Pro	Con
Last Writer Wins (LWW)	Highest timestamp wins	Simple, deterministic	Silently drops one write; clock skew issues
Merge	Combine both values (e.g., union of sets)	No data loss	Only works for certain data types
Custom logic	App-level conflict handler	Full control	Complex to implement correctly

Real-world: CouchDB supports multi-leader replication with custom conflict handlers. Google Docs uses Operational Transformation (OT) — a form of merge-based resolution for collaborative editing.

Leaderless Replication

No single node is the leader. Writes and reads go to multiple nodes simultaneously. This model is used by DynamoDB, Cassandra, and Riak.

Quorum Reads and Writes

With N replicas, you configure:

W: Number of replicas that must confirm a write
R: Number of replicas you read from
If W + R > N, you are guaranteed to read the latest write (at least one node overlaps).

Example: N=3, W=2, R=2

Write "user:123 → Alice" → sent to all 3, wait for 2 ACKs
Read  "user:123"          → read from 2 nodes, pick latest version

W + R = 4 > 3 = N → guaranteed to overlap with latest write

Sloppy Quorums and Hinted Handoff

If a node is temporarily unreachable during a write, the write is sent to a different available node instead (sloppy quorum). That temporary node stores a "hint" and forwards the data to the correct node when it comes back online (hinted handoff). This prioritizes availability over strict consistency — a classic AP tradeoff.

Real-world: DynamoDB uses sloppy quorums and hinted handoff to achieve its famous availability guarantees.

Redis vs Memcached: The Replication Difference

Both are in-memory caches, but they differ significantly in replication:

Feature	Redis	Memcached
Replication	Built-in leader-follower replication	No built-in replication
Persistence	RDB snapshots + AOF append-only file	None (pure in-memory)
Data types	Strings, hashes, lists, sets, sorted sets, streams, HyperLogLog	Strings only (key-value)
Clustering	Redis Cluster with automatic sharding	Client-side sharding (no coordination)
Pub/Sub	Built-in	Not supported
Atomic operations	Lua scripting, transactions (MULTI/EXEC)	CAS (compare-and-swap) only
Best for	Complex caching, sessions, queues, real-time features	Simple key-value caching at massive scale

[!TIP] When to choose which: Use Redis when you need data structures beyond simple strings, persistence, replication, or pub/sub. Use Memcached when you need the simplest possible cache with maximum multi-threaded throughput and your data fits a pure key-value model. Facebook runs the world''s largest Memcached deployment because their cache usage is almost entirely simple string lookups at extreme scale.

Gossip Protocol: How Nodes Discover Each Other

In leaderless and peer-to-peer systems, nodes need to know which other nodes are alive, what data they hold, and their health status. The Gossip Protocol (also called epidemic protocol) solves this:

Every few seconds, each node picks a random peer and exchanges state information.
That peer then does the same with another random peer.
Information spreads exponentially — like a rumor or a virus — reaching all nodes in O(log N) rounds.

Gossip is used by Cassandra for failure detection and cluster membership, DynamoDB for state propagation, and Consul for service discovery. It is inherently decentralized — no single point of failure.

Replication Lag Patterns & Solutions

Replication lag is the delay between a write on the leader and its appearance on followers. Even sub-second lag can cause bugs if your application isn''t designed for it.

Read-After-Write Consistency

Problem: User updates their profile, but the next page load reads from a follower that hasn''t received the update yet. The user thinks their change was lost.

Timeline:
  t=0ms: User writes "name=Alice" → Leader
  t=1ms: Leader confirms write → User sees success
  t=2ms: User refreshes page → Read goes to Follower
  t=2ms: Follower hasn''t received update yet → Shows old name!
  t=50ms: Follower receives update (too late)

Solutions:

Read-your-own-writes: After a write, route that user''s reads to the leader for a short window (e.g., 10 seconds).
Causal consistency: Track the write''s log position. Only serve reads from followers that have caught up past that position.
Client-side cache: Show the written data from the client''s local state until the next full refresh.

Monotonic Read Consistency

Problem: User reads data from Follower A (which is up-to-date), then reads from Follower B (which is lagging). The data appears to "go back in time."

Solution: Pin each user session to one specific follower (sticky sessions), or use a version vector to ensure monotonically increasing reads.

Setting Up PostgreSQL Streaming Replication

On primary (postgresql.conf):
wal_level = replicamax_wal_senders = 5synchronous_commit = on  # or ''off'' for async
On replica:
primary_conninfo = ''host=primary_ip port=5432 user=replicator''primary_slot_name = ''replica_1''
Monitor lag:
SELECT client_addr, state, sent_lsn, write_lsn,       flush_lsn, replay_lsn,       (sent_lsn - replay_lsn) AS replication_lagFROM pg_stat_replication;

Failover: What Happens When the Leader Dies?

Leader failure is the most critical event in a replicated system. The process of promoting a follower to leader is called failover:

Normal operation:     After leader failure:
  Leader ← writes      Leader (dead) ✗
    ↓                   Follower 1 → promoted to NEW Leader ✓
  Follower 1            Follower 2 → repoints to new leader
  Follower 2

Dangers of failover:

Data loss: If the old leader had unreplicated writes, those are lost when the new leader takes over.
Split brain: If the old leader comes back online, both nodes think they are the leader. Fencing (STONITH) prevents this.
Cascading failures: The failover itself can trigger more load on the remaining nodes, causing a cascade.

Real-World: GitHub''s 2018 Database Incident

In October 2018, a 43-second network partition between GitHub''s primary MySQL database and its replicas caused a failover. The new primary had subtly different data than the old one. This led to 24 hours of degraded service as GitHub painstakingly reconciled the divergent data. The lesson: automated failover without proper consistency guarantees can cause more damage than manual intervention.

Common Mistakes

❌ Assuming replication = backup — replication copies live operations (including deletes!). If you accidentally delete data, the delete replicates. Use separate backup snapshots for disaster recovery.
❌ Ignoring replication lag — if your application reads from followers, you must handle stale data gracefully. Use read-after-write consistency patterns.
❌ Using multi-leader without a conflict strategy — "last writer wins" silently drops data. Be intentional about conflict resolution.
❌ Not monitoring replication lag — always track replay_lag and alert when it exceeds your SLO. A lagging replica is a ticking time bomb during failover.
❌ Trusting automated failover blindly — GitHub''s 2018 incident proves that automated failover can make things worse without proper data reconciliation.

[!TIP] Key Takeaways:
• Single-leader: simplest model, one write point, scale reads via followers.
• Multi-leader: low-latency writes in every region, but conflict resolution is hard.
• Leaderless: high availability via quorums (W + R > N), used by DynamoDB and Cassandra.
• Replication lag causes read-after-write and monotonic read anomalies — use sticky sessions or causal consistency.
• Failover is dangerous: risk of data loss, split brain, and cascading failures.
• Gossip protocol: decentralized node discovery, O(log N) convergence.
• Redis offers replication + rich data types; Memcached is simpler but faster for pure key-value.