22. Event Sourcing & CQRS

[!NOTE] Traditional applications store the current state of data. Event Sourcing stores every change that led to the current state. Instead of "Account balance is $500," you store: "Deposited $1000, Withdrew $300, Withdrew $200." The current state is derived by replaying events. This gives you a complete audit trail, the ability to rebuild state at any point in time, and powerful debugging capabilities.

CRUD vs Event Sourcing

Aspect	Traditional CRUD	Event Sourcing
Storage	Current state only (UPDATE overwrites)	Append-only log of events
History	Lost (unless you add audit tables manually)	Complete history by default
Replay	Impossible	Replay all events to rebuild state at any point
Schema changes	Migrate existing rows	Add event versioning; old events remain immutable
Complexity	Low	Higher (event replay, snapshotting, versioning)
Debugging	"Why is this value wrong?" is hard to answer	Replay events to see exactly what happened

How Event Sourcing Works

Command received: "Transfer $200 from Account A to Account B."
Validate: Check business rules (sufficient balance, account active).

Emit events: Append immutable events to the event store:

Event 1: {type: "MoneyDebited", account: "A", amount: 200, timestamp: "..."}
Event 2: {type: "MoneyCredited", account: "B", amount: 200, timestamp: "..."}

Update read model: A projector reads events and updates a denormalized read view (e.g., account balance table).

CQRS: Command Query Responsibility Segregation

CQRS separates the write model (how you accept and validate changes) from the read model (how you serve queries). This allows you to optimize each independently:

Write Path:
  Client → Command → Validate → Event Store (append-only)
                                    │
                                    ▼
                              Projector (async)
                                    │
                                    ▼
Read Path:                    Read Database (denormalized)
  Client → Query → Read DB → Response

Why this is powerful:

The write side can be a simple append-only log (fast writes).
The read side can be a heavily denormalized, pre-computed view optimized for specific queries (fast reads).
You can have multiple read models for different query patterns (SQL for reports, Elasticsearch for search, Redis for real-time dashboards).

[!IMPORTANT] CQRS introduces eventual consistency between the write and read sides. There is a delay (usually milliseconds) between when an event is written and when the read model reflects it. Your application must handle this — for example, after a user submits an order, redirect to a "processing" page instead of immediately showing the order as confirmed.

Kafka Architecture Deep-Dive

Apache Kafka is the most popular event streaming platform and the backbone of many Event Sourcing implementations. Understanding its internals is essential:

Core Concepts

Topic: A named feed of messages (e.g., "user-signups", "order-events").
Partition: A topic is split into partitions for parallelism. Each partition is an ordered, append-only log.
Offset: Each message within a partition has a unique sequential offset. Consumers track their position using offsets.
Consumer Group: A group of consumers that jointly consume a topic. Each partition is consumed by exactly one consumer in the group.

Topic: "orders" (3 partitions)

  Partition 0:  [msg0, msg1, msg2, msg3, ...]  ←  Consumer A
  Partition 1:  [msg0, msg1, msg2, ...]         ←  Consumer B
  Partition 2:  [msg0, msg1, ...]               ←  Consumer C

  Consumer Group "order-service": A, B, C

Why Kafka Uses Partitions

Parallelism: More partitions = more consumers = higher throughput.
Ordering: Messages within a partition are strictly ordered. Messages across partitions have no guaranteed order.
Scaling: LinkedIn''s Kafka cluster processes 7 trillion messages per day across thousands of partitions.

Exactly-Once Semantics

Kafka supports exactly-once semantics (EOS) via idempotent producers and transactional writes. The producer assigns a sequence number to each message; the broker deduplicates. For consume-transform-produce pipelines, Kafka transactions ensure that the read offset commit and the output write are atomic.

Real-World Usage

LMAX Exchange

LMAX is a financial exchange that processes 6 million transactions per second with sub-millisecond latency. They use Event Sourcing with a single-threaded event processor called the Disruptor. Every order, trade, and cancellation is an immutable event. They can replay the entire market state from any point in history for auditing.

Walmart

Walmart uses Event Sourcing to track inventory across 4,700+ stores. Instead of storing "Item X has 47 units," they store every receipt, shipment, return, and shelf-stocking event. This provides a complete audit trail and allows them to detect inventory discrepancies by replaying events.

Event Sourcing + CQRS: Complete Architecture

                 Command Side (Write)              Query Side (Read)
                 ═══════════════════              ═════════════════
  [Client] ──→ [Command Handler]                 [Client] ──→ [Query Handler]
                    │                                              │
                    ▼                                              ▼
               [Validate & Apply]                          [Read Model DB]
                    │                                         (Denormalized)
                    ▼                                              ↑
               [Event Store]                                       │
               (Append-only) ──→ [Event Bus] ──→ [Projection Builder]
                                  (Kafka)        (Builds read models
                                                  from events)

Key insight: The write side and read side can use completely different databases. Write: append-only event store (PostgreSQL, EventStoreDB). Read: denormalized views in Redis, Elasticsearch, or a read-optimized database. This separation lets you optimize each side independently.

Kafka Consumer Groups: Parallel Event Processing

Topic: "order-events" (6 partitions)

Consumer Group "payment-service":
  Consumer A: partitions 0, 1     (processes payment events)
  Consumer B: partitions 2, 3     (processes payment events)
  Consumer C: partitions 4, 5     (processes payment events)

Consumer Group "analytics-service":
  Consumer D: partitions 0, 1, 2  (builds analytics)
  Consumer E: partitions 3, 4, 5  (builds analytics)

Each group processes ALL events independently.
Within a group, each partition is handled by exactly one consumer.
Adding consumers = more parallelism (up to # of partitions).

When to Use Event Sourcing

Good Fit	Bad Fit
Financial systems (audit trail required)	Simple CRUD apps
Collaborative editing (Google Docs, Figma)	Systems with mostly reads, few writes
Systems needing time-travel queries	Simple e-commerce product catalogs
Complex domains with many state transitions	Prototypes or MVPs
Replay/debugging of production issues	Systems where event ordering is hard

Common Mistakes

❌ Using Event Sourcing for simple CRUD apps — the overhead is not justified for a basic blog or settings page.
❌ Forgetting snapshots — replaying millions of events to rebuild state is slow. Snapshot every 100-1000 events.
❌ Treating events as mutable — events are facts that happened. Never update or delete. Emit compensating events instead.
❌ Ignoring event versioning — as your system evolves, event schemas change. Use versioned events (v1, v2) and upcasters.
❌ Building too many projections — each projection adds complexity and eventual consistency lag. Start with 1-2 read models.

[!TIP] Key Takeaways:
• Event Sourcing: store changes, not state. Complete audit trail and replay capability.
• CQRS: separate write model (commands → events) from read model (projections → fast queries).
• Eventual consistency between write and read sides is the tradeoff.
• Kafka: partitioned, ordered logs. Consumer groups for parallel processing. Exactly-once via idempotent producers.
• Use Event Sourcing for domains where history matters (finance, inventory, collaboration). Avoid for simple CRUD.