Find Median from Data Stream hard
Problem Statement
Design a data structure that supports the following two operations:
void addNum(int num)
: Add a number to the data structure.double findMedian()
:Return the median of all elements so far.
For the sake of simplicity and clarity, assume that all input numbers are unique.
Example 1:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (1 + 2) / 2 = 1.5
medianFinder.addNum(3); // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0
Example 2:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.addNum(3); // arr = [1, 2, 3]
medianFinder.addNum(4); // arr = [1,2,3,4]
medianFinder.findMedian();// return 2.5
Steps:
The most efficient approach to solve this problem involves using two heaps: a min-heap and a max-heap.
-
Initialization: Create a min-heap (
min_heap
) to store the larger half of the numbers and a max-heap (max_heap
) to store the smaller half of the numbers. -
addNum(num)
:- If
max_heap
is empty ornum
is less than or equal to the root ofmax_heap
, addnum
tomax_heap
. - Otherwise, add
num
tomin_heap
. - Balance the heaps: Ensure that the difference in sizes between the two heaps is at most 1. If the size difference is greater than 1, move the root of the larger heap to the smaller heap.
- If
-
findMedian()
:- If the sizes of
max_heap
andmin_heap
are equal, the median is the average of their roots. - Otherwise, the median is the root of the larger heap.
- If the sizes of
Explanation:
By using two heaps, we maintain a sorted structure implicitly. The max-heap always holds the smaller half of the elements, and the min-heap holds the larger half. This allows us to find the median efficiently in O(1) time after the heaps are balanced. The balancing step ensures that the median can always be calculated from the roots of the heaps.
Code:
import heapq
class MedianFinder:
def __init__(self):
# max_heap stores the smaller half of numbers
self.max_heap = []
# min_heap stores the larger half of numbers
self.min_heap = []
def addNum(self, num: int) -> None:
heapq.heappush(self.max_heap, -num) # Use negative to simulate max-heap
# Balance heaps
if self.max_heap and self.min_heap and -self.max_heap[0] > self.min_heap[0]:
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)
if len(self.max_heap) > len(self.min_heap) + 1:
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)
elif len(self.min_heap) > len(self.max_heap):
val = heapq.heappop(self.min_heap)
heapq.heappush(self.max_heap, -val)
def findMedian(self) -> float:
if len(self.max_heap) == len(self.min_heap):
return (-self.max_heap[0] + self.min_heap[0]) / 2.0
else:
return -self.max_heap[0]
Complexity:
-
Time Complexity:
addNum(num)
: O(log n), where n is the number of elements added. Heap operations take logarithmic time.findMedian()
: O(1) – Accessing the root of a heap is constant time.
-
Space Complexity: O(n), as we store all the numbers in the heaps.