Skip to content
QuizMaker logoQuizMaker
Activity
System Design: The Complete Guide
5. Advanced Concepts & Patterns
1. Introduction to System Design
2. Vertical vs Horizontal Scaling
3. Load Balancing
4. Caching Strategies
5. CDNs (Content Delivery Networks)
6. SQL vs NoSQL
7. Database Sharding & Partitioning
8. The CAP Theorem
9. Microservices Architecture
10. Message Queues & Event Streaming
12. Design BookMyShow (Ticket Booking)
14. Design Dropbox (Cloud File Storage)
15. How to Approach Any System Design Interview
16. Back-of-the-Envelope Estimation
17. Consistent Hashing
18. Bloom Filters & Probabilistic Data Structures
19. Database Replication
20. Leader Election & Consensus (Raft & Paxos)
21. Distributed Transactions (Saga, 2PC, Outbox)
22. Event Sourcing & CQRS
23. Unique ID Generation at Scale
24. Rate Limiting Algorithms
25. Circuit Breakers & Bulkhead Pattern
26. API Gateway, Proxies & Service Mesh
27. Real-Time Communication
28. Observability (Tracing, Logging, SLOs)
30. Design a Chat System (WhatsApp)
31. Design YouTube (Video Streaming)
32. Design a Web Crawler
CONTENTS

19. Database Replication

How databases copy data across multiple nodes for durability, read scaling, and fault tolerance.

Mar 5, 202622 views0 likes0 fires
18px

[!NOTE] Replication is keeping identical copies of data on multiple machines. It serves three purposes: fault tolerance (if one machine dies, others have the data), read scaling (spread read traffic across replicas), and geographic locality (place data closer to users). The tradeoff is always consistency vs latency.

Single-Leader Replication

The most common replication model. One node (the leader/primary) accepts all writes. It streams changes to one or more follower/replica nodes, which serve read traffic.

  Writes ─────→ [Leader/Primary]
                    │         │
                    ▼         ▼
              [Follower 1] [Follower 2]
                    ↑         ↑
              Reads ──────────┘

Synchronous vs Asynchronous Replication

TypeHow It WorksProsCons
SynchronousLeader waits for follower to confirm the writeGuaranteed consistency — followers always have latest dataHigher write latency; one slow follower blocks everything
AsynchronousLeader confirms the write immediately, streams to followers in backgroundLow write latency; fastReplication lag — followers may serve stale data
Semi-synchronousLeader waits for at least 1 follower (not all)Balance of durability and performanceSlightly more complex

Real-world: PostgreSQL streaming replication defaults to asynchronous. For critical financial systems, you can enable synchronous_commit = on with a synchronous standby, accepting higher write latency for guaranteed durability.

Replication Lag

In async replication, the follower is always slightly behind the leader. This delay is called replication lag. Under normal load, it is milliseconds. Under heavy write load, it can grow to seconds or longer.

Replication lag causes these consistency anomalies:

  • Read-your-writes violation: A user updates their profile on the leader, then reads from a follower that has not replicated yet — they see their old data. Fix: route reads after writes to the leader for that user, or use a "read-after-write" token.
  • Monotonic reads violation: A user makes two reads, each served by a different follower. The second follower might be more behind, showing older data than the first read. Fix: pin a user to a specific follower (using consistent hashing by user ID).

Multi-Leader Replication

Multiple nodes accept writes independently. Each leader replicates to every other leader. This is useful for multi-datacenter deployments where you want low-latency writes in every region.

  [Leader US-East] ←──────→ [Leader EU-West]
       │                          │
  [Follower]                 [Follower]

The big problem is write conflicts. If two users edit the same record simultaneously on different leaders, you have a conflict that must be resolved.

Conflict Resolution Strategies

StrategyHowProCon
Last Writer Wins (LWW)Highest timestamp winsSimple, deterministicSilently drops one write; clock skew issues
MergeCombine both values (e.g., union of sets)No data lossOnly works for certain data types
Custom logicApp-level conflict handlerFull controlComplex to implement correctly

Real-world: CouchDB supports multi-leader replication with custom conflict handlers. Google Docs uses Operational Transformation (OT) — a form of merge-based resolution for collaborative editing.

Leaderless Replication

No single node is the leader. Writes and reads go to multiple nodes simultaneously. This model is used by DynamoDB, Cassandra, and Riak.

Quorum Reads and Writes

With N replicas, you configure:

  • W: Number of replicas that must confirm a write
  • R: Number of replicas you read from
  • If W + R > N, you are guaranteed to read the latest write (at least one node overlaps).
Example: N=3, W=2, R=2

Write "user:123 → Alice" → sent to all 3, wait for 2 ACKs
Read  "user:123"          → read from 2 nodes, pick latest version

W + R = 4 > 3 = N → guaranteed to overlap with latest write

Sloppy Quorums and Hinted Handoff

If a node is temporarily unreachable during a write, the write is sent to a different available node instead (sloppy quorum). That temporary node stores a "hint" and forwards the data to the correct node when it comes back online (hinted handoff). This prioritizes availability over strict consistency — a classic AP tradeoff.

Real-world: DynamoDB uses sloppy quorums and hinted handoff to achieve its famous availability guarantees.

Redis vs Memcached: The Replication Difference

Both are in-memory caches, but they differ significantly in replication:

FeatureRedisMemcached
ReplicationBuilt-in leader-follower replicationNo built-in replication
PersistenceRDB snapshots + AOF append-only fileNone (pure in-memory)
Data typesStrings, hashes, lists, sets, sorted sets, streams, HyperLogLogStrings only (key-value)
ClusteringRedis Cluster with automatic shardingClient-side sharding (no coordination)
Pub/SubBuilt-inNot supported
Atomic operationsLua scripting, transactions (MULTI/EXEC)CAS (compare-and-swap) only
Best forComplex caching, sessions, queues, real-time featuresSimple key-value caching at massive scale

[!TIP] When to choose which: Use Redis when you need data structures beyond simple strings, persistence, replication, or pub/sub. Use Memcached when you need the simplest possible cache with maximum multi-threaded throughput and your data fits a pure key-value model. Facebook runs the world''s largest Memcached deployment because their cache usage is almost entirely simple string lookups at extreme scale.

Gossip Protocol: How Nodes Discover Each Other

In leaderless and peer-to-peer systems, nodes need to know which other nodes are alive, what data they hold, and their health status. The Gossip Protocol (also called epidemic protocol) solves this:

  1. Every few seconds, each node picks a random peer and exchanges state information.
  2. That peer then does the same with another random peer.
  3. Information spreads exponentially — like a rumor or a virus — reaching all nodes in O(log N) rounds.

Gossip is used by Cassandra for failure detection and cluster membership, DynamoDB for state propagation, and Consul for service discovery. It is inherently decentralized — no single point of failure.

Replication Lag Patterns & Solutions

Replication lag is the delay between a write on the leader and its appearance on followers. Even sub-second lag can cause bugs if your application isn''t designed for it.

Read-After-Write Consistency

Problem: User updates their profile, but the next page load reads from a follower that hasn''t received the update yet. The user thinks their change was lost.

Timeline:
  t=0ms: User writes "name=Alice" → Leader
  t=1ms: Leader confirms write → User sees success
  t=2ms: User refreshes page → Read goes to Follower
  t=2ms: Follower hasn''t received update yet → Shows old name!
  t=50ms: Follower receives update (too late)

Solutions:

  • Read-your-own-writes: After a write, route that user''s reads to the leader for a short window (e.g., 10 seconds).
  • Causal consistency: Track the write''s log position. Only serve reads from followers that have caught up past that position.
  • Client-side cache: Show the written data from the client''s local state until the next full refresh.

Monotonic Read Consistency

Problem: User reads data from Follower A (which is up-to-date), then reads from Follower B (which is lagging). The data appears to "go back in time."

Solution: Pin each user session to one specific follower (sticky sessions), or use a version vector to ensure monotonically increasing reads.

Setting Up PostgreSQL Streaming Replication

On primary (postgresql.conf):
wal_level = replicamax_wal_senders = 5synchronous_commit = on  # or ''off'' for async
On replica:
primary_conninfo = ''host=primary_ip port=5432 user=replicator''primary_slot_name = ''replica_1''
Monitor lag:
SELECT client_addr, state, sent_lsn, write_lsn,       flush_lsn, replay_lsn,       (sent_lsn - replay_lsn) AS replication_lagFROM pg_stat_replication;

Failover: What Happens When the Leader Dies?

Leader failure is the most critical event in a replicated system. The process of promoting a follower to leader is called failover:

Normal operation:     After leader failure:
  Leader ← writes      Leader (dead) ✗
    ↓                   Follower 1 → promoted to NEW Leader ✓
  Follower 1            Follower 2 → repoints to new leader
  Follower 2

Dangers of failover:

  • Data loss: If the old leader had unreplicated writes, those are lost when the new leader takes over.
  • Split brain: If the old leader comes back online, both nodes think they are the leader. Fencing (STONITH) prevents this.
  • Cascading failures: The failover itself can trigger more load on the remaining nodes, causing a cascade.

Real-World: GitHub''s 2018 Database Incident

In October 2018, a 43-second network partition between GitHub''s primary MySQL database and its replicas caused a failover. The new primary had subtly different data than the old one. This led to 24 hours of degraded service as GitHub painstakingly reconciled the divergent data. The lesson: automated failover without proper consistency guarantees can cause more damage than manual intervention.

Common Mistakes

  • ❌ Assuming replication = backup — replication copies live operations (including deletes!). If you accidentally delete data, the delete replicates. Use separate backup snapshots for disaster recovery.
  • ❌ Ignoring replication lag — if your application reads from followers, you must handle stale data gracefully. Use read-after-write consistency patterns.
  • ❌ Using multi-leader without a conflict strategy — "last writer wins" silently drops data. Be intentional about conflict resolution.
  • ❌ Not monitoring replication lag — always track replay_lag and alert when it exceeds your SLO. A lagging replica is a ticking time bomb during failover.
  • ❌ Trusting automated failover blindly — GitHub''s 2018 incident proves that automated failover can make things worse without proper data reconciliation.

[!TIP] Key Takeaways:
• Single-leader: simplest model, one write point, scale reads via followers.
• Multi-leader: low-latency writes in every region, but conflict resolution is hard.
• Leaderless: high availability via quorums (W + R > N), used by DynamoDB and Cassandra.
• Replication lag causes read-after-write and monotonic read anomalies — use sticky sessions or causal consistency.
• Failover is dangerous: risk of data loss, split brain, and cascading failures.
• Gossip protocol: decentralized node discovery, O(log N) convergence.
• Redis offers replication + rich data types; Memcached is simpler but faster for pure key-value.

Share this article

Share on TwitterShare on LinkedInShare on FacebookShare on WhatsAppShare on Email

Test your knowledge

Take a quick quiz based on this chapter.

mediumSystem Design
Quiz: Replication
10 questions5 min

Continue Learning

20. Leader Election & Consensus (Raft & Paxos)

Advanced
18 min

21. Distributed Transactions (Saga, 2PC, Outbox)

Advanced
18 min

22. Event Sourcing & CQRS

Advanced
18 min
Lesson 3 of 12 in 5. Advanced Concepts & Patterns
Previous in 5. Advanced Concepts & Patterns
18. Bloom Filters & Probabilistic Data Structures
Next in 5. Advanced Concepts & Patterns
20. Leader Election & Consensus (Raft & Paxos)
← Back to System Design: The Complete Guide
Back to System Design: The Complete GuideAll Categories