Skip to content

Median of data stream

get the median of a data stream

最简单的做法是插入排序,但每次插入耗时\(O(N)\)。下面的两种方法均为\(O(\log N)\)的插入耗时。(查询耗时均为\(O(1)\)

双堆

很巧妙的做法,插入时可能要进行堆间的元素交换。

class MedianFinder {
public:
    /** initialize your data structure here. */
    priority_queue<int> mx;
    priority_queue<int, vector<int>, greater<int>> mn;

    MedianFinder() {
    }

    void addNum(int num) {
        // logn
        if (mx.size() == mn.size()) {
            if (!mn.empty() && num > mn.top()) {
                mx.push(mn.top()); mn.pop();
                mn.push(num);
            } else {
                mx.push(num);
            }
        } else {
            if (num < mx.top()) {
                mn.push(mx.top()); mx.pop();
                mx.push(num);
            } else {
                mn.push(num);
            }
        }
    }

    double findMedian() {
        if (mn.size() == mx.size()) {
            return (mn.top() + mx.top()) * 0.5;
        } else {
            return mx.top();
        }
    }
};

multiset维护中点迭代器

multiset本身就是红黑树,具有\(O(\log N)\)的插入耗时,关键在于如何实现\(O(1)\)的查询耗时。

class MedianFinder {
    multiset<int> data;
    multiset<int>::iterator mid;

public:
    MedianFinder() : mid(data.end()) {}

    void addNum(int num)
    {
        int n = data.size();
        data.insert(num);

        if (!n)                                 // first element inserted
            mid = data.begin();
        else if (num < *mid)                    // median is decreased
            mid = (n & 1 ? mid : prev(mid));
        else                                    // median is increased
            mid = (n & 1 ? next(mid) : mid);
    }

    double findMedian()
    {
         int n = data.size();
        return (*mid + *next(mid, n % 2 - 1)) * 0.5;
    }
};