[!NOTE] In a monolith with a single database, auto-incrementing IDs (
id SERIAL) work fine. In a distributed system with multiple database shards, multiple services, and messages flowing through queues, you need IDs that are globally unique without coordination, roughly sortable by time, and fast to generate.
Why Auto-Increment Fails in Distributed Systems
- Single point of failure: The database sequence is on one machine. If it goes down, no service can generate IDs.
- Cross-shard conflicts: If Shard A and Shard B both have
id SERIAL, they will both generate ID 1, 2, 3... — collisions everywhere. - Performance bottleneck: Every service must round-trip to the database just to get an ID, even before doing useful work.
UUID v4 (Random)
UUID v4 generates 128-bit random identifiers: 550e8400-e29b-41d4-a716-446655440000
- Pro: No coordination needed — any node can generate one instantly. Collision probability is astronomically low (2122 possibilities).
- Con: Not sortable by time. Random values cause poor B-tree index performance in databases — new inserts scatter across the index instead of appending to the end, causing excessive page splits.
UUID v7 (Timestamp-Ordered)
UUID v7 (RFC 9562, 2024) embeds a Unix timestamp in the first 48 bits, followed by random bits:
Timestamp (48 bits) | Random (80 bits)
─────────────────────┼──────────────────
018f5e3c-7a00 | 7xxx-xxxx-......
- Pro: Sortable by time (newer UUIDs sort after older ones). B-tree friendly — new inserts always append near the end. No coordination needed.
- Con: Slightly less random than v4 (but still astronomically unique). Not universally supported yet (newer standard).
Twitter Snowflake
Twitter created the Snowflake ID generator in 2010 to produce unique, time-sorted, 64-bit integers at massive scale. A Snowflake ID is a 64-bit integer packed as:
0 | Timestamp (41 bits) | Datacenter (5 bits) | Machine (5 bits) | Sequence (12 bits)
─┼─────────────────────────┼────────────────────┼──────────────────┼────────────────────
| Milliseconds since epoch| 32 datacenters | 32 machines/DC | 4096 IDs/ms/machine
- Capacity: 4,096 IDs per millisecond per machine × 1,024 machines = ~4 million IDs per millisecond.
- Sortable: Timestamp is in the most significant bits, so IDs are naturally sorted by creation time.
- No coordination: Each machine generates IDs independently using its unique datacenter + machine ID.
- Compact: 64-bit integer fits in a single database
BIGINTcolumn — half the size of a UUID.
Clock Skew: The Achilles'' Heel
Snowflake relies on system clocks. If a machine''s clock drifts backward (e.g., after NTP correction), it could generate duplicate IDs. Solutions:
- Wait until the clock catches up (Snowflake''s approach — refuse to generate IDs during clock regression).
- Use a logical clock component that tracks the highest seen timestamp.
Comparison Table
| Method | Size | Sortable | Coordination | DB Index Performance | Collision Risk |
|---|---|---|---|---|---|
| DB Auto-Increment | 32/64 bit | Yes | Centralized DB | Excellent | None (single source) |
| UUID v4 | 128 bit | No | None | Poor (random scattering) | Near zero |
| UUID v7 | 128 bit | Yes | None | Good (time-ordered) | Near zero |
| Snowflake | 64 bit | Yes | Machine ID assignment | Excellent | Near zero |
Real-World Usage
- Twitter: Invented Snowflake. Every tweet ID is a Snowflake — you can extract the timestamp from any tweet ID.
- Discord: Uses a Snowflake variant for message IDs. Since IDs are time-sorted, fetching "messages after ID X" is a simple range query.
- Instagram: Uses a modified Snowflake that combines a timestamp with a shard ID and a per-shard auto-increment sequence, embedded in a 64-bit integer.
Snowflake ID: Implementation Deep Dive
Snowflake ID Structure (64 bits):
| 1 bit | 41 bits | 10 bits | 12 bits |
| unused | timestamp (ms) | machine ID | sequence num |
timestamp: milliseconds since custom epoch (e.g., 2020-01-01)
→ 2^41 ms = ~69 years before overflow
machine_id: unique per server (1024 possible machines)
sequence: per-millisecond counter (4096 IDs per ms per machine)
Max throughput per machine: 4,096,000 IDs/second
Max throughput (1024 machines): ~4 billion IDs/second
class SnowflakeGenerator:
EPOCH = 1577836800000 # 2020-01-01 00:00:00 UTC
def init(self, machine_id):
self.machine_id = machine_id # 0-1023
self.sequence = 0
self.last_timestamp = -1
def generate(self):
timestamp = current_time_ms() - self.EPOCH
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 0xFFF # 12-bit mask
if self.sequence == 0:
timestamp = wait_for_next_ms(self.last_timestamp)
else:
self.sequence = 0
if timestamp < self.last_timestamp:
raise ClockMovedBackwardsError() # Handle NTP skew!
self.last_timestamp = timestamp
return ((timestamp << 22) |
(self.machine_id << 12) |
self.sequence)
Database Index Impact: UUID v4 vs v7 Benchmark
| Metric | UUID v4 (random) | UUID v7 (time-sorted) | Snowflake (64-bit) | Auto-increment |
|---|---|---|---|---|
| Insert throughput | ~5,000/sec | ~12,000/sec | ~15,000/sec | ~15,000/sec |
| Index size (1M rows) | ~80 MB | ~80 MB | ~40 MB | ~30 MB |
| Index fragmentation | Very high (random) | Low (sequential) | Low (sequential) | None |
| B-tree page splits | Frequent | Rare (append-only) | Rare | Never |
| Distributed-friendly | Yes | Yes | Yes (needs machine_id) | No (SPOF) |
Why UUID v4 is slow for inserts: B-tree indexes are optimized for sequential inserts (new rows go to the rightmost leaf). Random UUIDs cause inserts across all leaf pages, triggering constant page splits and cache misses. UUID v7 and Snowflake IDs are time-sorted, so inserts are always sequential — same performance as auto-increment.
Choosing the Right ID Strategy
| Use Case | Best Choice | Why |
|---|---|---|
| Simple web app, single DB | Auto-increment | Simplest, best performance |
| Microservices, needs coordination-free | UUID v7 | Time-sorted, no central authority |
| High-throughput (>100K IDs/sec) | Snowflake | Compact (64-bit), sortable, fast |
| Public-facing URLs | NanoID or hashids | Short, URL-safe, non-guessable |
| Globally unique, don''t care about sorting | UUID v4 | Simplest distributed option |
| Short URL slugs | Base62 encoded counter | Compact, human-readable |
Common Mistakes
- ❌ Using UUID v4 as a primary key in a relational database — random inserts destroy B-tree performance. Use UUID v7 or Snowflake instead.
- ❌ Exposing sequential IDs in URLs — users can enumerate resources (e.g.,
/user/1,/user/2,/user/3). Use non-guessable IDs for public-facing resources. - ❌ Assuming clocks are always monotonic — NTP adjustments can move clocks backward. Snowflake implementations must handle this with a wait or error.
- ❌ Using 128-bit UUIDs when 64-bit Snowflake would suffice — doubles your index size for no benefit. Use Snowflake if you control the ID generation infrastructure.
- ❌ Not planning for machine_id assignment — in Snowflake, each server needs a unique machine_id. Use ZooKeeper, Consul, or Kubernetes pod index for assignment.
[!TIP] Key Takeaways:
• Auto-increment fails in distributed systems (SPOF, cross-shard collisions).
• UUID v4: simple but not sortable and bad for DB indexes. Avoid as primary key.
• UUID v7: sortable, DB-friendly, no coordination. Best default choice for new systems.
• Snowflake: 64-bit, sortable, compact (4M IDs/sec/machine). Best for high-throughput systems.
• Clock skew is real. Never assume monotonic system time in distributed environments.
• Use NanoID or hashids for short, URL-safe, non-guessable public identifiers.