Finding the Median from a Data Stream: A Detailed Explanation
In this post, we will uncover the solution for very popular coding interview question categories as ‘Hard’ on Leetcode.
Problem Statement
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
- For example, for
arr = [2,3,4]
, the median is3
. - For example, for
arr = [2,3]
, the median is(2 + 3) / 2 = 2.5
.
Implement the MedianFinder class:
MedianFinder()
initializes theMedianFinder
object.void addNum(int num)
adds the integernum
from the data stream to the data structure.double findMedian()
returns the median of all elements so far. Answers within10-5
of the actual answer will be accepted.
Understanding the Problem
Imagine you’re receiving a stream of integers. At any given point, you might be asked to determine the median of all the numbers received so far. This challenge might sound simple initially, but consider the implications. As the data is streaming in, your list of numbers is continuously growing. A naive approach, like sorting the list each time we need to find the median, would be highly inefficient.
The Optimal Approach
Instead of sorting every time, we can use two priority queues (or heaps): a max heap to represent the lower half and a min heap for the upper half. By ensuring that these heaps are balanced in size, we can quickly determine the median by looking at the tops of these heaps.
Here’s a breakdown of the strategy:
- Max Heap (
low
): This heap will store the smaller half of the numbers. By nature of being a max heap, the largest number in the lower half will always be at the root. - Min Heap (
high
): This heap will store the larger half of the numbers. As a min heap, the smallest number in the upper half will always be at the root.
By balancing these heaps (keeping their sizes either equal or the max heap having one more element), the median can be found as follows:
- If the number of total elements is odd, the median is the top of the max heap.
- If even, the median is the average of the tops of the max and min heaps.
Java Solution
import java.util.PriorityQueue;
public class MedianFinder {
// Max heap to store the lower half
private PriorityQueue<Integer> low = new PriorityQueue<>((a, b) -> b - a);
// Min heap to store the upper half
private PriorityQueue<Integer> high = new PriorityQueue<>();
/** Initialize your data structure here. */
public MedianFinder() { }
// Adds a number into the data structure
public void addNum(int num) {
// Add the number to the max heap
low.offer(num);
// Move the largest number of the lower half to the upper half
high.offer(low.poll());
// Balance the heaps, ensuring the max heap (low) has the same count or one more element than the min heap (high)
if (low.size() < high.size()) {
low.offer(high.poll());
}
}
// Returns the median of the current data stream
public double findMedian() {
// If even, return average of tops of both heaps
if (low.size() == high.size()) {
return (low.peek() + high.peek()) / 2.0;
}
// If odd, return top of max heap (low)
return low.peek();
}
}
How Does the Code Work?
When adding a number:
- We initially offer it to the
low
max heap. - We then move the largest number in the
low
heap to thehigh
min heap. - Finally, we balance the heaps to maintain our invariant.
When finding the median:
- If the total number of elements is even (i.e., both heaps are of the same size), the median is the average of the tops of both heaps.
- If odd, the median is the top of the
low
heap, which will have the extra element.
Now, Let’s break down the time and space complexities:
1. Time Complexity:
a) addNum(int num)
:
- We offer the number to one of the heaps, which has a time complexity of O(logn) where nn is the number of elements in the heap.
- Then, in the worst case, we might have to re-balance by moving an element from one heap to another, which is another O(logn) operation.
Since the heaps essentially remain half the size of the total number of elements received so far, the overall time complexity for the addNum
method remains O(logn).
b) findMedian()
:
- Accessing the top element of a heap (using
peek()
) is a O(1)operation. - We access the top of both heaps at most, so the time complexity remains O(1) for
findMedian
.
2. Space Complexity:
- We are using two heaps to store the numbers. In the worst case (when the number of elements is even), both heaps combined will store all the numbers. Therefore, the space complexity is O(n), where nn is the number of elements added so far.
To summarize:
- The time complexity of
addNum
is O(logn). - The time complexity of
findMedian
is O(1). - The space complexity for the entire data structure is O(n).
Follow me if you found this post useful, more to come!
Further Learning at DSAGuide.com
For an in-depth exploration of data structures, algorithms, and coding interview solutions, I invite you to visit DSAGuide.com. Elevate your understanding with our detailed guides on essential data structures, algorithms and coding interview problem solving.