Find Median from Data Stream hard

Problem Statement

Design a data structure that supports the following two operations:

  • void addNum(int num): Add a integer number from the data stream to the data structure.
  • double findMedian(): Return the median of all elements so far in the data structure.

Example 1

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (because the median is (1 + 2) / 2 = 1.5)
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0

Example 2

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(5);    // arr = [5]
medianFinder.addNum(2);    // arr = [5,2]
medianFinder.findMedian(); // return 3.5 (because the median is (2 + 5) / 2 = 3.5)

Steps

  1. Data Structure: We will use two heaps: a max-heap small to store the smaller half of the numbers and a min-heap large to store the larger half. small will always have at most one more element than large if the total number of elements is odd.

  2. addNum(int num):

    • If small is empty or num is less than or equal to the top of small, add num to small.
    • Otherwise, add num to large.
    • Balance the heaps: If the size difference between small and large is greater than 1, move the top element from the larger heap to the smaller heap.
  3. findMedian():

    • If the total number of elements is odd, the median is the top of small.
    • If the total number of elements is even, the median is the average of the tops of small and large.

Explanation

Using two heaps allows us to efficiently maintain the sorted order of the elements without explicitly sorting the entire array each time. The max-heap (small) tracks the smaller half, and the min-heap (large) tracks the larger half. By maintaining a balanced state between these two heaps, we can quickly find the median.

Code

#include <queue>

class MedianFinder {
public:
    priority_queue<int> small; // max heap
    priority_queue<int, vector<int>, greater<int>> large; // min heap

    MedianFinder() {}

    void addNum(int num) {
        small.push(num);
        large.push(small.top());
        small.pop();

        if (small.size() < large.size()) {
            small.push(large.top());
            large.pop();
        }
    }

    double findMedian() {
        if (small.size() > large.size()) {
            return (double)small.top();
        } else {
            return (double)(small.top() + large.top()) / 2.0;
        }
    }
};

Complexity

  • Time Complexity:

    • addNum: O(log n), where n is the number of elements added. Heap operations take logarithmic time.
    • findMedian: O(1). Accessing the top of a heap is constant time.
  • Space Complexity: O(n), as we store all the numbers in the heaps.