Finding the Median from a Data Stream: A Detailed Explanation

In this post, we will uncover the solution for very popular coding interview question categories as ‘Hard’ on Leetcode.

Tech Sauce
4 min readAug 22, 2023

Problem Statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

Understanding the Problem

Imagine you’re receiving a stream of integers. At any given point, you might be asked to determine the median of all the numbers received so far. This challenge might sound simple initially, but consider the implications. As the data is streaming in, your list of numbers is continuously growing. A naive approach, like sorting the list each time we need to find the median, would be highly inefficient.

The Optimal Approach

Instead of sorting every time, we can use two priority queues (or heaps): a max heap to represent the lower half and a min heap for the upper half. By ensuring that these heaps are balanced in size, we can quickly determine the median by looking at the tops of these heaps.

Here’s a breakdown of the strategy:

  1. Max Heap (low): This heap will store the smaller half of the numbers. By nature of being a max heap, the largest number in the lower half will always be at the root.
  2. Min Heap (high): This heap will store the larger half of the numbers. As a min heap, the smallest number in the upper half will always be at the root.

By balancing these heaps (keeping their sizes either equal or the max heap having one more element), the median can be found as follows:

  • If the number of total elements is odd, the median is the top of the max heap.
  • If even, the median is the average of the tops of the max and min heaps.

Java Solution

import java.util.PriorityQueue;

public class MedianFinder {

// Max heap to store the lower half
private PriorityQueue<Integer> low = new PriorityQueue<>((a, b) -> b - a);

// Min heap to store the upper half
private PriorityQueue<Integer> high = new PriorityQueue<>();

/** Initialize your data structure here. */
public MedianFinder() { }

// Adds a number into the data structure
public void addNum(int num) {
// Add the number to the max heap
low.offer(num);

// Move the largest number of the lower half to the upper half
high.offer(low.poll());

// Balance the heaps, ensuring the max heap (low) has the same count or one more element than the min heap (high)
if (low.size() < high.size()) {
low.offer(high.poll());
}
}

// Returns the median of the current data stream
public double findMedian() {
// If even, return average of tops of both heaps
if (low.size() == high.size()) {
return (low.peek() + high.peek()) / 2.0;
}

// If odd, return top of max heap (low)
return low.peek();
}
}

How Does the Code Work?

When adding a number:

  1. We initially offer it to the low max heap.
  2. We then move the largest number in the low heap to the high min heap.
  3. Finally, we balance the heaps to maintain our invariant.

When finding the median:

  1. If the total number of elements is even (i.e., both heaps are of the same size), the median is the average of the tops of both heaps.
  2. If odd, the median is the top of the low heap, which will have the extra element.

Now, Let’s break down the time and space complexities:

1. Time Complexity:

a) addNum(int num):

  • We offer the number to one of the heaps, which has a time complexity of O(log⁡n) where nn is the number of elements in the heap.
  • Then, in the worst case, we might have to re-balance by moving an element from one heap to another, which is another O(log⁡n) operation.

Since the heaps essentially remain half the size of the total number of elements received so far, the overall time complexity for the addNum method remains O(log⁡n).

b) findMedian():

  • Accessing the top element of a heap (using peek()) is a O(1)operation.
  • We access the top of both heaps at most, so the time complexity remains O(1) for findMedian.

2. Space Complexity:

  • We are using two heaps to store the numbers. In the worst case (when the number of elements is even), both heaps combined will store all the numbers. Therefore, the space complexity is O(n), where nn is the number of elements added so far.

To summarize:

  • The time complexity of addNum is O(log⁡n).
  • The time complexity of findMedian is O(1).
  • The space complexity for the entire data structure is O(n).

Follow me if you found this post useful, more to come!

Further Learning at DSAGuide.com
For an in-depth exploration of data structures, algorithms, and coding interview solutions, I invite you to visit DSAGuide.com. Elevate your understanding with our detailed guides on essential data structures, algorithms and coding interview problem solving.

--

--

Tech Sauce
Tech Sauce

Written by Tech Sauce

Everything Software Engineering ...

No responses yet