Find Median from Data Stream hard

Problem Statement

Design a data structure that supports the following two operations:

  • void addNum(int num): Add a number to the data structure.
  • double findMedian() :Return the median of all elements so far.

For the sake of simplicity and clarity, assume that all input numbers are unique.

Example 1:

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (1 + 2) / 2 = 1.5
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0

Example 2:

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.addNum(4);   // arr = [1,2,3,4]
medianFinder.findMedian();// return 2.5

Steps:

The most efficient approach to solve this problem involves using two heaps: a min-heap and a max-heap.

  1. Initialization: Create a min-heap (min_heap) to store the larger half of the numbers and a max-heap (max_heap) to store the smaller half of the numbers.

  2. addNum(num):

    • If max_heap is empty or num is less than or equal to the root of max_heap, add num to max_heap.
    • Otherwise, add num to min_heap.
    • Balance the heaps: Ensure that the difference in sizes between the two heaps is at most 1. If the size difference is greater than 1, move the root of the larger heap to the smaller heap.
  3. findMedian():

    • If the sizes of max_heap and min_heap are equal, the median is the average of their roots.
    • Otherwise, the median is the root of the larger heap.

Explanation:

By using two heaps, we maintain a sorted structure implicitly. The max-heap always holds the smaller half of the elements, and the min-heap holds the larger half. This allows us to find the median efficiently in O(1) time after the heaps are balanced. The balancing step ensures that the median can always be calculated from the roots of the heaps.

Code:

import heapq

class MedianFinder:
    def __init__(self):
        # max_heap stores the smaller half of numbers
        self.max_heap = []  
        # min_heap stores the larger half of numbers
        self.min_heap = []

    def addNum(self, num: int) -> None:
        heapq.heappush(self.max_heap, -num)  # Use negative to simulate max-heap

        # Balance heaps
        if self.max_heap and self.min_heap and -self.max_heap[0] > self.min_heap[0]:
            val = -heapq.heappop(self.max_heap)
            heapq.heappush(self.min_heap, val)

        if len(self.max_heap) > len(self.min_heap) + 1:
            val = -heapq.heappop(self.max_heap)
            heapq.heappush(self.min_heap, val)
        elif len(self.min_heap) > len(self.max_heap):
            val = heapq.heappop(self.min_heap)
            heapq.heappush(self.max_heap, -val)

    def findMedian(self) -> float:
        if len(self.max_heap) == len(self.min_heap):
            return (-self.max_heap[0] + self.min_heap[0]) / 2.0
        else:
            return -self.max_heap[0]

Complexity:

  • Time Complexity:

    • addNum(num): O(log n), where n is the number of elements added. Heap operations take logarithmic time.
    • findMedian(): O(1) – Accessing the root of a heap is constant time.
  • Space Complexity: O(n), as we store all the numbers in the heaps.