Learning Outcome
After this lesson, you should be able to split a stream into lower and upper halves with two balanced heaps.
Problem Statement
Design a data structure that supports addNum and findMedian for a stream of integers.
| Input | Output | Why |
|---|---|---|
add 1, add 2, findMedian, add 3, findMedian | 1.5, then 2 | After [1,2], median is average 1.5. After [1,2,3], median is 2. |
Brute Force Approach
Store all numbers and sort after every insertion or median query. This makes repeated operations slow.
Optimized Approach
Use a max-heap for the lower half and a min-heap for the upper half. Rebalance so sizes differ by at most one.
Exact Pseudocode
addNum(x):
add x to lower half
move largest lower value to upper half
if upper half is larger:
move smallest upper value to lower half
findMedian():
if lower has more values:
return lower top
return average of both tops
Reference Code
import heapq
class MedianFinder:
def __init__(self):
self.left = []
self.right = []
def addNum(self, num):
heapq.heappush(self.left, -num)
heapq.heappush(self.right, -heapq.heappop(self.left))
if len(self.right) > len(self.left):
heapq.heappush(self.left, -heapq.heappop(self.right))
def findMedian(self):
if len(self.left) > len(self.right):
return -self.left[0]
return (-self.left[0] + self.right[0]) / 2Sample Dry Run
| Step | State | Result |
|---|---|---|
| Add 1 | left=[1], right=[] | median is 1 |
| Add 2 | left=[1], right=[2] | median is 1.5 |
| Add 3 | left=[2,1], right=[3] | median is 2 |
| Invariant | left top <= right top | tops define median |
Complexity
| Measure | Value | Reason |
|---|---|---|
| Time | O(log n) add, O(1) median | Each add performs heap push and pop operations. |
| Space | O(n) | All stream values are stored across the two heaps. |
Edge Cases
- Even count returns average of both heap tops.
- Odd count returns the top of the larger heap.
- Heap sizes should differ by at most one.
Interview Checklist
- Keep lower half in a max-heap.
- Keep upper half in a min-heap.
- Rebalance after every insertion.
FAQs
Why two heaps?
They keep quick access to the largest lower-half value and smallest upper-half value.
Why rebalance?
The median depends on the middle one or two values, so heap sizes must stay close.
What is the core pattern?
Two heaps for streaming median.