LC4: Median of Two Sorted Arrays

Problem

Leetcode 4: Median of Two Sorted Arrays

There are two sorted arrays nums1 and nums2 of size $m$ and $n$ respectively.

Find the median of the two sorted arrays.

You may assume nums1 and nums2 cannot be both empty.

Solution

Linear Time

首先, 拿到这道求两个有序数组中位数的题, 我们先有一个baseline, 无论是每个数组用两个指针移动去比大小, 还是建一个新数组去把两个有序数组比完大小存进去然后直接 len(array)/2 一步解千愁, 至少我们有一个worst case是 $O(m+n)$ 的解决方案.

sticker

Logarithmic Time I

当然, 我们不能这么没有出息对吧! 接下来, 我们就要考虑能否在一个$O(log(n))$的时间内解决这个问题. 既然是要一个对数的时间复杂度, 那我们就有了基本思想, 至少是固定倍数的去简化这个问题:

  1. 分别获取两个数组的中位数进行比较.

  2. 分别截取中位数大的数组的左半部分和中位数小的数组的左半部分.

  • 一个问题是边界点的控制: 显然, 对于奇数个数的数组, 我们需要包含中位数; 对于偶数个数的数组, 它的两个中位数也都不能舍弃.

  • 第二个问题是: 我们并不能直接直接把两个数组的一半部分都舍弃掉, 否则在两个数组个数不等的情况下破坏了我们剩余子问题的中位数位置. 所以我们每次只能舍弃 $2 * min(m/2 - 1, n/2 - 1)$ 这么多的元素.

  1. 然后我们就把这个问题递归的分解为数量级几乎是原有一半的子问题.

这样, 如上算法的时间复杂度就是对数级别, 只是系数大了一点.

sticker

Logarithmic Time II

如果进一步深化我们对问题的理解, 暂时性的扩大题目的要求, 而不是拘泥于去求中位数, 我们就能写出更好的子命题. 大家也许会想到这个与 Top K 问题有点相似, 区别就是后者是基于随机无序的数组, 而此题的要求是在两个有序数组的合集中求 Top K. 下面的kth函数, 就通过递归在$O(log(m + n))$内解决了这个问题.

def findMedianSortedArrays(self, A, B):
    l = len(A) + len(B)
    if l % 2 == 1:
        return self.kth(A, B, l // 2)
    else:
        return (self.kth(A, B, l // 2) + self.kth(A, B, l // 2 - 1)) / 2

def kth(self, a, b, k):
    if not a:
        return b[k]
    if not b:
        return a[k]
    ia, ib = len(a) // 2, len(b) // 2
    ma, mb = a[ia], b[ib]

    if ia + ib < k:
        if ma > mb:
            return self.kth(a, b[ib + 1:], k - ib - 1)
        else:
            return self.kth(a[ia + 1:], b, k - ia - 1)
    else:
        if ma > mb:
            return self.kth(a[:ia], b, k)
        else:
            return self.kth(a, b[:ib], k)

sticker

Logarithmic Time III

MissMary提出了基于 Binary Search 的 $O(log(min(m, n)))$ 的想法. 进一步归纳问题, 其实就是要我们把两个数组分成两个如下部分.

      left_part          |        right_part
A[0], A[1], ..., A[i-1]  |  A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1]  |  B[j], B[j+1], ..., B[n-1]

s.t. len(left_part) == len(right_part)
     max(left_part) <= min(right_part)

而这两个条件等价于

s.t. i + j == m - i + n - j (or: m - i + n - j + 1) => j = (m + n + 1)/2 - i
     B[j-1] <= A[i] and A[i-1] <= B[j]

知道了ij的关系之后, 我们就是要找出这样的i满足上述条件.

<1> Set imin = 0, imax = m, then start searching in [imin, imax]

<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i

<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
     that we may encounter:
    <a> B[j-1] <= A[i] and A[i-1] <= B[j]
        Means we have found the object `i`, so stop searching.
    <b> B[j-1] > A[i]
        Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
        Can we `increase` i?
            Yes. Because when i is increased, j will be decreased.
            So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
            be satisfied.
        Can we `decrease` i?
            `No!` Because when i is decreased, j will be increased.
            So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
            be never satisfied.
        So we must `increase` i. That is, we must ajust the searching range to
        [i+1, imax]. So, set imin = i+1, and goto <2>.
    <c> A[i-1] > B[j]
        Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
        That is, we must ajust the searching range to [imin, i-1].
        So, set imax = i-1, and goto <2>.

这样, 我们就有了如下的代码:

def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1
    m = len(nums1)
    n = len(nums2)
    imin = 0
    imax = m
    median = 0
    i = 0
    j = 0

    while(imin <= imax):
        i = int((imin + imax) / 2)
        j = int((n + m + 1) / 2 - i)

        if i < m and j > 0 and nums2[j-1] > nums1[i]:
            imin = i + 1
        elif i > 0 and j < n and nums2[j] < nums1[i-1]:
            imax = i - 1
        else:
            if i == 0:
                median = nums2[j-1]
            elif j == 0:
                median = nums1[i-1]
            else:
                median = max(nums1[i-1], nums2[j-1])
            break
    if (n + m) % 2 == 1:
        return median

    if i == m:
        return (median + nums2[j]) / 2
    if j == n:
        return (median + nums1[i]) / 2

    return (median + min(nums1[i], nums2[j])) / 2

Reference

Intuitive Python O(log (m+n)) solution, by kth smallest in the two sorted arrays

Share my O(log(min(m,n)) solution with explanation

Yang Li
Yang Li
@zjzsliyang