17. Consistent Hashing

[!NOTE] When you have N cache servers and you use hash(key) % N to decide which server holds the data, adding or removing a single server remaps almost every key. Consistent hashing solves this by ensuring only ~1/N of keys need to move. It is the backbone of DynamoDB, Cassandra, and most distributed caches.

The Problem with Naive Hashing

Suppose you have 4 cache servers and you route requests using server = hash(key) % 4:

hash("user:123") % 4 = 2  → Server 2
hash("user:456") % 4 = 0  → Server 0
hash("user:789") % 4 = 3  → Server 3

This works perfectly—until Server 3 crashes. Now you have 3 servers, so the formula becomes hash(key) % 3:

hash("user:123") % 3 = 1  → Server 1 (was 2!)
hash("user:456") % 3 = 0  → Server 0 (same)
hash("user:789") % 3 = 0  → Server 0 (was 3!)

Nearly every key maps to a different server. This causes a cache stampede—all those keys are now cache misses, and your database gets hammered with millions of simultaneous requests. At scale, this can cause a full outage.

How bad is it exactly? When going from N to N-1 servers with modulo hashing, approximately (N-1)/N of keys are remapped. For 100 servers losing 1: 99% of keys move. For consistent hashing: only 1% of keys move. That''s the difference between a minor blip and a total system meltdown.

How Consistent Hashing Works

The Hash Ring

Instead of using modulo, imagine a circular number line (a "ring") from 0 to 2³²-1. Both servers and keys are hashed onto positions on this ring:

                    0
                  /    \
               S1        S3
              /            \
           K1                K3
            |                |
           K2                S2
              \            /
               S4        K4
                  \    /
                 2^32 / 0

To find which server owns a key, you walk clockwise from the key''s position until you hit a server. Key K1 → walks clockwise → hits Server S1. Key K3 → walks clockwise → hits Server S2.

What Happens When a Node Leaves?

If Server S3 is removed from the ring, only the keys between S2 and S3 need to be remapped to the next server clockwise (S4). All other keys stay exactly where they are.

Before S3 removal:        After S3 removal:
  K5 → S3                   K5 → S4  (remapped)
  K6 → S3                   K6 → S4  (remapped)
  K1 → S1                   K1 → S1  (unchanged)
  K2 → S1                   K2 → S1  (unchanged)
  K3 → S2                   K3 → S2  (unchanged)
  K4 → S4                   K4 → S4  (unchanged)

Impact: Only ~1/N of keys move (where N = number of servers). Compare this to naive hashing where ~(N-1)/N of keys move.

What Happens When a Node Joins?

If a new Server S5 is added between S1 and S3, it takes over some keys from S3. Only the keys between S1 and S5''s new position are affected. Everything else stays put.

The Uneven Distribution Problem

With only 4 physical servers on the ring, the arc lengths between them are likely uneven. One server might own 50% of the ring''s key space while another owns 5%. This defeats the purpose of load balancing.

Uneven distribution example (4 nodes on ring):
  S1: 10% of keys    ← underloaded
  S2: 45% of keys    ← overloaded (hot spot)
  S3: 15% of keys
  S4: 30% of keys

Virtual Nodes (vnodes): The Solution

Instead of placing each server at one position on the ring, place it at multiple positions using different hash functions (called "virtual nodes" or "vnodes"):

Server A → positions: A1, A2, A3, A4, A5 (5 vnodes)
Server B → positions: B1, B2, B3, B4, B5 (5 vnodes)
Server C → positions: C1, C2, C3, C4, C5 (5 vnodes)

With 150+ vnodes per server, the key distribution becomes nearly uniform. When a server is removed, its vnodes are spread around the ring, so the load is redistributed evenly across all remaining servers—not dumped on a single neighbor.

How many vnodes? The standard is 100–256 vnodes per physical node. Cassandra defaults to 256. More vnodes = better distribution but more metadata to manage. The sweet spot depends on cluster size:

Cluster Size	Recommended Vnodes	Distribution Variance
3–10 nodes	128–256	±5%
10–100 nodes	64–128	±2%
100+ nodes	32–64	±1%

Comparison Table

Approach	Keys Remapped on Node Removal	Load Distribution	Complexity
Naive Modulo `hash % N`	~(N-1)/N (~75% for 4 nodes)	Even only with perfect hash	Trivial
Consistent Hashing (no vnodes)	~1/N (~25% for 4 nodes)	Often uneven	Low
Consistent Hashing (with vnodes)	~1/N, spread evenly	Nearly uniform	Medium
Jump Hash (Google)	~1/N, perfectly even	Perfect	Low (but no node removal)

Implementation Pseudocode

class ConsistentHashRing:
    def init(self, vnodes_per_server=150):
        self.ring = SortedDict()  # position → server_id
        self.vnodes = vnodes_per_server

    def add_server(self, server_id):
        for i in range(self.vnodes):
            position = hash(f"{server_id}:vnode:{i}")
            self.ring[position] = server_id

    def remove_server(self, server_id):
        for i in range(self.vnodes):
            position = hash(f"{server_id}:vnode:{i}")
            del self.ring[position]

    def get_server(self, key):
        if not self.ring:
            return None
        position = hash(key)
        # Walk clockwise: find the first position >= key''s hash
        idx = self.ring.bisect_left(position)
        if idx == len(self.ring):
            idx = 0  # wrap around to the beginning
        return self.ring.values()[idx]

The bisect_left operation is O(log N) where N is the total number of vnodes. For 100 servers with 150 vnodes each, that''s log(15,000) ≈ 14 comparisons — extremely fast.

Data Replication on the Hash Ring

In production databases like Cassandra and DynamoDB, consistent hashing is extended for replication. A key is stored not just on its primary node, but also on the next R-1 nodes clockwise on the ring (where R = replication factor, typically 3):

Replication Factor = 3:

Key K hashes to position P on the ring.
  → Primary: next node clockwise (Node A)
  → Replica 1: second node clockwise (Node B)
  → Replica 2: third node clockwise (Node C)

If Node A fails:
  → Node B and C still have copies
  → Reads succeed immediately
  → New writes go to Node B (temporary coordinator)
  → When Node A recovers, it syncs from B and C

This is how DynamoDB and Cassandra achieve high availability without a single point of failure.

Real-World Usage (Deep Dive)

Amazon DynamoDB

DynamoDB uses consistent hashing to assign partition ranges to storage nodes. When a partition gets hot (e.g., during a Prime Day sale), DynamoDB can split the partition and reassign it to new nodes—all without affecting other partitions. This is how DynamoDB delivers single-digit millisecond reads at any scale.

Key insight: DynamoDB''s auto-splitting is essentially adding a new vnode in the middle of a hot range, taking half the traffic from the original partition. This is transparent to the application.

Apache Cassandra

Cassandra''s token ring architecture is a textbook implementation of consistent hashing with vnodes. Each node in a Cassandra cluster is assigned 256 vnodes by default. When you run nodetool status, you can see the token ranges each node owns. Adding a new node to a Cassandra cluster automatically rebalances tokens by streaming data from neighboring nodes.

$ nodetool status
Datacenter: dc1
=================
Status=Up/Down  |/ State=Joining/Normal
--  Address     Load       Tokens  Owns (effective)
UN  10.0.0.1    256.5 GB   256     33.8%
UN  10.0.0.2    248.1 GB   256     33.1%
UN  10.0.0.3    251.3 GB   256     33.1%

Discord

Discord uses consistent hashing to route messages to the correct Cassandra partition based on channel ID. With over 200 million monthly users generating billions of messages, even distribution across storage nodes is critical. Consistent hashing ensures that adding new storage nodes doesn''t trigger a global data migration.

Load Balancers (Nginx, HAProxy)

Many load balancers offer consistent hashing as a load balancing algorithm. When a backend server goes down, only the requests that were going to that server are redistributed—all other clients keep hitting the same backend, maintaining their session state or cache affinity. This is critical for WebSocket connections and sticky sessions.

Consistent Hashing vs. Jump Hash vs. Rendezvous Hashing

Algorithm	Key Movement	Lookup Time	Supports Node Removal?	Memory Usage
Consistent Hash	~1/N	O(log N)	Yes	O(N × vnodes)
Jump Hash (Google)	~1/N (perfect)	O(log N)	No (append-only)	O(1)
Rendezvous Hash	~1/N	O(N)	Yes	O(N)

When to use which: Consistent hashing is the default choice for most distributed systems. Jump Hash is better when nodes are only added (never removed) — perfect for sharded databases. Rendezvous hashing is simpler to implement but has O(N) lookup time.

Common Mistakes

❌ Using naive modulo for distributed systems — it causes cascading cache misses on topology changes.
❌ Forgetting virtual nodes — without vnodes, consistent hashing still produces uneven distribution.
❌ Not considering replication — in production, keys must be replicated to multiple nodes for durability.
❌ Confusing consistent hashing with rendezvous hashing — both minimize remapping, but use different algorithms. Consistent hashing uses a ring; rendezvous hashing scores all nodes and picks the highest.
❌ Using too few vnodes — fewer than 100 vnodes per node leads to noticeable imbalance. Use 100–256.

[!TIP] Key Takeaways:
• Naive hash % N remaps nearly all keys when N changes — catastrophic for caches.
• Consistent hashing uses a ring so only ~1/N of keys move on node add/remove.
• Virtual nodes (vnodes) fix uneven distribution — use 100–200 per physical node.
• Extend with replication: store each key on R consecutive nodes for fault tolerance.
• Used by DynamoDB, Cassandra, Discord, Memcached, and most distributed caches.
• In interviews, mention consistent hashing whenever you discuss caching, sharding, or load balancing.