6. SQL vs NoSQL

[!NOTE] Choosing a database is a high-leverage decision. You can change a load balancer in a day; migrating primary data stores can take months of dual writes, backfills, and careful cutovers. Choose wisely—this decision will haunt (or help) you for years.

Start with Access Patterns, Not Hype

Before picking "SQL vs NoSQL," write down your top reads and top writes:

What are the 3 most important queries your app runs?
What must be strongly consistent (can never be wrong)?
What can be eventually consistent (slightly stale is fine)?
Do you need full-text search, time-series analytics, or graph traversal?

The biggest mistake junior engineers make is picking a database based on popularity ("MongoDB is trendy!") instead of access patterns.

SQL (Relational Databases)

Examples: PostgreSQL, MySQL, Oracle, SQL Server.

Relational databases store data in normalized tables with defined schemas and strong integrity constraints. SQL shines when correctness and complex querying matter.

Real-World Example: Stripe

Stripe processes hundreds of billions of dollars in payments annually. They run on PostgreSQL because financial transactions demand ACID guarantees. When you transfer $1,000, the debit from Account A and credit to Account B must either both succeed or both fail—there is zero tolerance for partial updates.

What You Get with SQL

ACID transactions: Multi-row correctness (payments, inventory, accounting).
Joins and rich querying: Relationships, aggregations, window functions.
Constraints: Unique keys, foreign keys, NOT NULL checks—guardrails against bad data.
Mature tooling: Decades of optimization, indexing strategies, and battle-tested ORMs.

Scaling SQL (Common Patterns)

Read replicas: Scale reads by replicating a primary to followers. Instagram uses dozens of PostgreSQL read replicas for their feed.
Connection pooling: PgBouncer protects the DB from connection exhaustion during traffic spikes.
Partitioning: Split large tables by time or tenant. Shopify partitions merchant data to keep indexes small.
Caching: Cache hot reads in Redis—protects the DB from spikes.
Sharding: Last resort. Vitess (used by YouTube) makes MySQL sharding manageable.

Where SQL Gets Painful

Write bottlenecks: A single primary (leader) limits write throughput.
Cross-shard operations: Joins and transactions across shards become expensive or impossible.
Schema migrations at scale: Altering a 500GB table with zero downtime requires tools like pg_repack or gh-ost.

NoSQL (A Family of Datastores)

NoSQL is not one database—it is an entire category of datastores, each optimized for specific access patterns.

NoSQL Families (with Real-World Users)

Key-Value (Redis, DynamoDB): Lightning-fast get/put by key. Amazon built DynamoDB to power their shopping cart—it handles millions of requests per second during Prime Day.
Document (MongoDB, Firestore): JSON-like documents with flexible schemas. Great for content management and mobile app backends.
Wide-Column (Cassandra, ScyllaDB): Extreme write throughput. Discord migrated from MongoDB to Cassandra (then ScyllaDB) to store billions of chat messages.
Graph (Neo4j): Efficient relationship traversal. LinkedIn uses graph databases to power "People You May Know" across 900+ million members.
Search (Elasticsearch): Full-text search. Wikipedia uses Elasticsearch to search 60+ million articles in 300+ languages.

Consistency and Indexing Tradeoffs

Consistency models: Many NoSQL systems are eventually consistent. Your app must handle retries and idempotency.
Secondary indexes: Can be limited or expensive—design around the primary key first.
Denormalization: Store data in query-friendly shapes and keep them updated via events or jobs.

Polyglot Persistence (How Uber Does It)

Most production systems use multiple databases, each for what it does best:

MySQL (Vitess): System of record for trips, riders, drivers.
Redis: Caching, session management, geospatial indexing for nearby drivers.
Cassandra: Billions of GPS trace points with massive write throughput.
Elasticsearch: Powering search across restaurants (Uber Eats).
Apache Pinot: Real-time analytics dashboards for operations teams.

A Practical Decision Checklist

Multi-entity transactions and strong constraints? PostgreSQL or MySQL.
Massive scale with simple key lookups? DynamoDB or Redis.
Rapidly evolving nested objects? MongoDB or Firestore.
Extreme write volume with predictable queries? Cassandra or ScyllaDB.
Full-text search? Add Elasticsearch alongside your primary DB.
Relationship-heavy traversals? Neo4j or Amazon Neptune.

[!IMPORTANT] Pick the database that makes your top 3 reads and top 3 writes simple. Then design everything else (indexes, partitions, caches, queues) around those access patterns. If you start with the wrong database, no amount of caching or indexing can fix it.