8. The CAP Theorem

[!CAUTION] CAP is about what happens during a network partition. When the network splits, you must choose: reject requests (consistency) or serve responses that may be stale (availability). This is not a theoretical exercise—it defines how your users experience outages.

The CAP Properties

Consistency (C)

Every read receives the most recent successful write (or returns an error). All nodes see the same data at the same time.

Availability (A)

Every request to a non-failing node gets a response—but the response may not include the latest write.

Partition Tolerance (P)

The system continues operating even if network messages between nodes are dropped or delayed. In real-world distributed systems, partitions are inevitable—cables get cut, switches fail, cloud regions lose connectivity.

Why Partition Tolerance Is Mandatory

If your system spans more than one node (multiple availability zones, regions, data centers), network partitions will happen. This is not a theoretical possibility—it''s a certainty. AWS, Google Cloud, and Azure all experience inter-region connectivity issues regularly. This means the real decision is: during a partition, do you prioritize Consistency or Availability?

CP Systems: Correctness Over Uptime

When the network splits, a CP system rejects some requests rather than risk returning incorrect data.

Real-World Example: Google Spanner

Google Spanner is a globally distributed SQL database that chooses Consistency + Partition Tolerance. It uses GPS-synchronized atomic clocks (called TrueTime) across data centers to guarantee that every read sees the latest write, globally. When a partition occurs, Spanner may briefly reject writes to maintain consistency. Google uses Spanner for Google Ads billing—where even a single inconsistent read could mean billions in mischarges.

Best for: Banking, payments, inventory counts, uniqueness constraints (anything where "wrong" data is worse than "no" data).
Trade-off: Some user actions temporarily fail. Clients must handle retries gracefully.
Other CP systems: ZooKeeper (used by Kafka for leader election), etcd (used by Kubernetes), HBase.

AP Systems: Uptime Over Correctness

When the network splits, an AP system keeps answering requests—even if some responses are stale or need later reconciliation.

Real-World Example: Amazon DynamoDB & the Shopping Cart

Amazon designed DynamoDB (originally Dynamo) with Availability + Partition Tolerance because they learned a critical business lesson: an unavailable shopping cart costs more money than a slightly stale shopping cart. If a user adds an item to their cart during a network partition, DynamoDB accepts the write on whichever node is reachable. When the partition heals, it reconciles using vector clocks and "last writer wins" conflict resolution. The user might briefly see an older version of their cart, but the system never goes down.

Best for: Social feeds, likes, view counters, analytics, caches—anything where being slightly stale is acceptable.
Trade-off: Conflict resolution complexity. You need strategies for reconciling diverged data.
Other AP systems: Cassandra (Discord, Instagram), CouchDB, Riak.

Common CAP Misconceptions

"Pick any 2 of 3": Misleading. In modern distributed systems, P is not optional—you''re really choosing between C and A during partitions.
"CAP applies during normal operation": Wrong. CAP specifically describes behavior during partition events. When the network is healthy, you can have both consistency and availability.
"Eventual consistency = no consistency": Wrong. Eventually consistent systems do converge to a consistent state—they just don''t guarantee when. Twitter''s follower count might lag by a few seconds, but it eventually catches up.

Practical Consistency Tools

Real systems tune the consistency/availability balance with techniques like:

Quorums: Read from R replicas, write to W replicas out of N total. If R + W > N, you guarantee that reads and writes overlap—ensuring strong consistency. Cassandra lets you configure this per-query.
Leader-based replication: All writes go through one leader node, followers replicate asynchronously. PostgreSQL and MySQL default to this model.
Bounded staleness: "Serve data that is at most 5 seconds old." Azure Cosmos DB offers this as a configurable consistency level.

PACELC: The More Complete Picture

PACELC extends CAP: if there is a Partition, choose A or C; Else (no partition), choose Latency or Consistency. This explains real-world behavior better:

DynamoDB: PA/EL (Available during partition, Low latency otherwise)
Google Spanner: PC/EC (Consistent always, even at latency cost)
Cassandra: PA/EL (Available and fast, eventually consistent)
CockroachDB: PC/EC (Strong consistency, higher latency for cross-region writes)

[!TIP] In interviews, describe the partition explicitly (e.g., "Region A cannot reach Region B") and then explain what the user sees in each region under CP vs AP. Bonus points for mentioning PACELC and explaining behavior outside partition events.

The CAP Properties

Consistency (C)

Availability (A)

Partition Tolerance (P)

Why Partition Tolerance Is Mandatory

CP Systems: Correctness Over Uptime

Real-World Example: Google Spanner

AP Systems: Uptime Over Correctness

Real-World Example: Amazon DynamoDB & the Shopping Cart

Common CAP Misconceptions

Practical Consistency Tools

PACELC: The More Complete Picture

Share this article

Test your knowledge

Continue Learning

9. Microservices Architecture

10. Message Queues & Event Streaming

12. Design BookMyShow (Ticket Booking)

8. The CAP Theorem

The CAP Properties

Consistency (C)

Availability (A)

Partition Tolerance (P)

Why Partition Tolerance Is Mandatory

CP Systems: Correctness Over Uptime

Real-World Example: Google Spanner

AP Systems: Uptime Over Correctness

Real-World Example: Amazon DynamoDB & the Shopping Cart

Common CAP Misconceptions

Practical Consistency Tools

PACELC: The More Complete Picture

Share this article

Test your knowledge

Continue Learning

9. Microservices Architecture

10. Message Queues &amp; Event Streaming

12. Design BookMyShow (Ticket Booking)

10. Message Queues & Event Streaming