[!NOTE] Replication is keeping identical copies of data on multiple machines. It serves three purposes: fault tolerance (if one machine dies, others have the data), read scaling (spread read traffic across replicas), and geographic locality (place data closer to users). The tradeoff is always consistency vs latency.
Single-Leader Replication
The most common replication model. One node (the leader/primary) accepts all writes. It streams changes to one or more follower/replica nodes, which serve read traffic.
Writes ─────→ [Leader/Primary]
│ │
▼ ▼
[Follower 1] [Follower 2]
↑ ↑
Reads ──────────┘
Synchronous vs Asynchronous Replication
| Type | How It Works | Pros | Cons |
|---|---|---|---|
| Synchronous | Leader waits for follower to confirm the write | Guaranteed consistency — followers always have latest data | Higher write latency; one slow follower blocks everything |
| Asynchronous | Leader confirms the write immediately, streams to followers in background | Low write latency; fast | Replication lag — followers may serve stale data |
| Semi-synchronous | Leader waits for at least 1 follower (not all) | Balance of durability and performance | Slightly more complex |
Real-world: PostgreSQL streaming replication defaults to asynchronous. For critical financial systems, you can enable synchronous_commit = on with a synchronous standby, accepting higher write latency for guaranteed durability.
Replication Lag
In async replication, the follower is always slightly behind the leader. This delay is called replication lag. Under normal load, it is milliseconds. Under heavy write load, it can grow to seconds or longer.
Replication lag causes these consistency anomalies:
- Read-your-writes violation: A user updates their profile on the leader, then reads from a follower that has not replicated yet — they see their old data. Fix: route reads after writes to the leader for that user, or use a "read-after-write" token.
- Monotonic reads violation: A user makes two reads, each served by a different follower. The second follower might be more behind, showing older data than the first read. Fix: pin a user to a specific follower (using consistent hashing by user ID).
Multi-Leader Replication
Multiple nodes accept writes independently. Each leader replicates to every other leader. This is useful for multi-datacenter deployments where you want low-latency writes in every region.
[Leader US-East] ←──────→ [Leader EU-West]
│ │
[Follower] [Follower]
The big problem is write conflicts. If two users edit the same record simultaneously on different leaders, you have a conflict that must be resolved.
Conflict Resolution Strategies
| Strategy | How | Pro | Con |
|---|---|---|---|
| Last Writer Wins (LWW) | Highest timestamp wins | Simple, deterministic | Silently drops one write; clock skew issues |
| Merge | Combine both values (e.g., union of sets) | No data loss | Only works for certain data types |
| Custom logic | App-level conflict handler | Full control | Complex to implement correctly |
Real-world: CouchDB supports multi-leader replication with custom conflict handlers. Google Docs uses Operational Transformation (OT) — a form of merge-based resolution for collaborative editing.
Leaderless Replication
No single node is the leader. Writes and reads go to multiple nodes simultaneously. This model is used by DynamoDB, Cassandra, and Riak.
Quorum Reads and Writes
With N replicas, you configure:
- W: Number of replicas that must confirm a write
- R: Number of replicas you read from
- If W + R > N, you are guaranteed to read the latest write (at least one node overlaps).
Example: N=3, W=2, R=2
Write "user:123 → Alice" → sent to all 3, wait for 2 ACKs
Read "user:123" → read from 2 nodes, pick latest version
W + R = 4 > 3 = N → guaranteed to overlap with latest write
Sloppy Quorums and Hinted Handoff
If a node is temporarily unreachable during a write, the write is sent to a different available node instead (sloppy quorum). That temporary node stores a "hint" and forwards the data to the correct node when it comes back online (hinted handoff). This prioritizes availability over strict consistency — a classic AP tradeoff.
Real-world: DynamoDB uses sloppy quorums and hinted handoff to achieve its famous availability guarantees.
Redis vs Memcached: The Replication Difference
Both are in-memory caches, but they differ significantly in replication:
| Feature | Redis | Memcached |
|---|---|---|
| Replication | Built-in leader-follower replication | No built-in replication |
| Persistence | RDB snapshots + AOF append-only file | None (pure in-memory) |
| Data types | Strings, hashes, lists, sets, sorted sets, streams, HyperLogLog | Strings only (key-value) |
| Clustering | Redis Cluster with automatic sharding | Client-side sharding (no coordination) |
| Pub/Sub | Built-in | Not supported |
| Atomic operations | Lua scripting, transactions (MULTI/EXEC) | CAS (compare-and-swap) only |
| Best for | Complex caching, sessions, queues, real-time features | Simple key-value caching at massive scale |
[!TIP] When to choose which: Use Redis when you need data structures beyond simple strings, persistence, replication, or pub/sub. Use Memcached when you need the simplest possible cache with maximum multi-threaded throughput and your data fits a pure key-value model. Facebook runs the world''s largest Memcached deployment because their cache usage is almost entirely simple string lookups at extreme scale.
Gossip Protocol: How Nodes Discover Each Other
In leaderless and peer-to-peer systems, nodes need to know which other nodes are alive, what data they hold, and their health status. The Gossip Protocol (also called epidemic protocol) solves this:
- Every few seconds, each node picks a random peer and exchanges state information.
- That peer then does the same with another random peer.
- Information spreads exponentially — like a rumor or a virus — reaching all nodes in O(log N) rounds.
Gossip is used by Cassandra for failure detection and cluster membership, DynamoDB for state propagation, and Consul for service discovery. It is inherently decentralized — no single point of failure.
Replication Lag Patterns & Solutions
Replication lag is the delay between a write on the leader and its appearance on followers. Even sub-second lag can cause bugs if your application isn''t designed for it.
Read-After-Write Consistency
Problem: User updates their profile, but the next page load reads from a follower that hasn''t received the update yet. The user thinks their change was lost.
Timeline:
t=0ms: User writes "name=Alice" → Leader
t=1ms: Leader confirms write → User sees success
t=2ms: User refreshes page → Read goes to Follower
t=2ms: Follower hasn''t received update yet → Shows old name!
t=50ms: Follower receives update (too late)
Solutions:
- Read-your-own-writes: After a write, route that user''s reads to the leader for a short window (e.g., 10 seconds).
- Causal consistency: Track the write''s log position. Only serve reads from followers that have caught up past that position.
- Client-side cache: Show the written data from the client''s local state until the next full refresh.
Monotonic Read Consistency
Problem: User reads data from Follower A (which is up-to-date), then reads from Follower B (which is lagging). The data appears to "go back in time."
Solution: Pin each user session to one specific follower (sticky sessions), or use a version vector to ensure monotonically increasing reads.
Setting Up PostgreSQL Streaming Replication
On primary (postgresql.conf):
wal_level = replicamax_wal_senders = 5synchronous_commit = on # or ''off'' for async
On replica:
primary_conninfo = ''host=primary_ip port=5432 user=replicator''primary_slot_name = ''replica_1''
Monitor lag:
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, (sent_lsn - replay_lsn) AS replication_lagFROM pg_stat_replication;
Failover: What Happens When the Leader Dies?
Leader failure is the most critical event in a replicated system. The process of promoting a follower to leader is called failover:
Normal operation: After leader failure:
Leader ← writes Leader (dead) ✗
↓ Follower 1 → promoted to NEW Leader ✓
Follower 1 Follower 2 → repoints to new leader
Follower 2
Dangers of failover:
- Data loss: If the old leader had unreplicated writes, those are lost when the new leader takes over.
- Split brain: If the old leader comes back online, both nodes think they are the leader. Fencing (STONITH) prevents this.
- Cascading failures: The failover itself can trigger more load on the remaining nodes, causing a cascade.
Real-World: GitHub''s 2018 Database Incident
In October 2018, a 43-second network partition between GitHub''s primary MySQL database and its replicas caused a failover. The new primary had subtly different data than the old one. This led to 24 hours of degraded service as GitHub painstakingly reconciled the divergent data. The lesson: automated failover without proper consistency guarantees can cause more damage than manual intervention.
Common Mistakes
- ❌ Assuming replication = backup — replication copies live operations (including deletes!). If you accidentally delete data, the delete replicates. Use separate backup snapshots for disaster recovery.
- ❌ Ignoring replication lag — if your application reads from followers, you must handle stale data gracefully. Use read-after-write consistency patterns.
- ❌ Using multi-leader without a conflict strategy — "last writer wins" silently drops data. Be intentional about conflict resolution.
- ❌ Not monitoring replication lag — always track
replay_lagand alert when it exceeds your SLO. A lagging replica is a ticking time bomb during failover. - ❌ Trusting automated failover blindly — GitHub''s 2018 incident proves that automated failover can make things worse without proper data reconciliation.
[!TIP] Key Takeaways:
• Single-leader: simplest model, one write point, scale reads via followers.
• Multi-leader: low-latency writes in every region, but conflict resolution is hard.
• Leaderless: high availability via quorums (W + R > N), used by DynamoDB and Cassandra.
• Replication lag causes read-after-write and monotonic read anomalies — use sticky sessions or causal consistency.
• Failover is dangerous: risk of data loss, split brain, and cascading failures.
• Gossip protocol: decentralized node discovery, O(log N) convergence.
• Redis offers replication + rich data types; Memcached is simpler but faster for pure key-value.