Sorted Array Merge#


Contents#


Given two sorted arrays, a merged sorted array containing the elements within the inputs can be constructed using \(\Theta(m + n)\) time and space where \(m\) and \(n\) are the sizes of the two input arrays respectively. This can be done by creating a pointer called the head into each array. The values of the heads are compared and the smallest one is placed into the output array and then advanced. Let’s call this operation sorted-merge(A, B).

 INPUT    an array A[1:n] of n values sorted in ascending order and an array B[1:m] of m sorted values in ascending order
OUTPUT    an array C[1:n + m] containing all n + m values of A and B sorted in ascending order

SORTED_ARRAY_MERGE(A, B, n, m)
 1    n     = len(A)
 2    idx_A = 1
 3    A[n]  = N   // place a sentinel value N larger than all values in B at the end of array A
 4
 5    m     = len(B)
 6    idx_B = 1
 7    B[n]  = M   // place a sentinel value M larger than all values in A at the end of array B
 8
 9    C           // empty array of length n + m
10
11    for i = 1 to n + m
12        if A[idx_A] < B[idx_B]
13            C[i] = A[idx_A]
14            idx_A++
15        else
16            C[i] = B[idx_B]
17            idx_B++
18    return C

Given a sorted array, there is a smallest element in the array at the head/front of the array.

Each iteration compares the heads of the arrays, copies the lesser of the two values into the next position in the output array, and advances the head of the array that contained the lesser value to the next position.

Example#

// INITIALIZATION

   |
A: 1 2 3 3  4
B: 2 4 6 8 10
   |

C:

// i = 1

     |
A: 1 2 3 3  4
B: 2 4 6 8 10
   |

C: 1

// i = 2

     |
A: 1 2 3 3  4
B: 2 4 6 8 10
     |

C: 1 2

// i = 3

       |
A: 1 2 3 3  4
B: 2 4 6 8 10
     |

C: 1 2 2

Correctness#

\(\textbf{Invariant}\)

1. \(1 \le i \le n + m + 1\) (i.e., \(i\) ranges from \(1\) to one greater than \(n + m\))
2. subarray \(C[1:i - 1]\) contains the \(i - 1\) smallest values of arrays \(A\) and \(B\) sorted in ascending order
3. \(A[\texttt{idx}_A]\) and \(B[\texttt{idx}_B]\) are the smallest values of subarrays \(A[\texttt{idx}_A:n]\) and \(B[\texttt{idx}_B:m]\) respectively not already in \(C[1:i - 1]\)

\(\textbf{Initialization}\)

\(C\) is empty.

Since \(\texttt{idx}_A = 1\) and \(\texttt{idx}_B = 1\) and the arrays \(A\) and \(B\) are sorted in ascending order, \(A[\texttt{idx}_A]\) and \(B[\texttt{idx}_B]\) are the smallest values of arrays \(A\) and \(B\) respectively.

\(\textbf{Maintenance}\)

It must be shown that if both the \(\texttt{condition}\) is true and the \(\texttt{invariant}\) is true after iteration \(i\) then the \(\texttt{invariant}\) is true after iteration \(i' = i + 1\).

Assume that the \(\texttt{condition}\)

\(1 \le i \lt n + m + 1\)

and the \(\texttt{invariant}\)

1. \(1 \le i \le n + m + 1\)
2. \(C[1:i - 1]\) contains the \(i - 1\) smallest values of arrays \(A\) and \(B\) sorted in ascending order
3. \(A[\texttt{idx}_A]\) and \(B[\texttt{idx}_B]\) are the smallest values of subarrays \(A[\texttt{idx}_A:n]\) and \(B[\texttt{idx}_B:m]\) respectively not already in \(C[1:i - 1]\)

are true after iteration \(i\).

It must be shown that the \(\texttt{invariant}\) is true after iteration \(i' = i + 1\).

1. \(1 \le \texttt{idx}_A', \texttt{idx}_B' \le i' \le n + m + 1\)
2. \(C[1:i' - 1]\) contains the \(i'\) smallest values of \(A\) and \(B\) sorted in ascending order
3. \(A[\texttt{idx}_A']\) and \(B[\texttt{idx}_B']\) are the smallest values of subarrays \(A[\texttt{idx}_A':n]\) and \(B[\texttt{idx}_B':m]\) respectively not already in \(C[1:i' - 1]\)

\(\texttt{CASE}\quad A[\texttt{idx}_A] \lt B[\texttt{idx}_B]\)

\(i' = i + 1\)

\(C[i'] = A[\texttt{idx}_A]\)

\(\texttt{idx}_A' = \texttt{idx}_A + 1\)

\(\texttt{idx}_B' = \texttt{idx}_B\)

\( \begin{aligned} & 1 \le \texttt{idx}_A, && \texttt{idx}_B \le i && \lt n + m \\ \iff & 1 \le \texttt{idx}_A + 1, && \texttt{idx}_B \le i + 1 && \le n + m \\ \iff & 1 \le \texttt{idx}_A', && \texttt{idx}_B' \le i' && \le n + m \\ \end {aligned} \)

Since \(C[1:i]\) contains the smallest \(i\) elements in \(A\) and \(B\) sorted in ascending order and \(C[i'] = A[\texttt{idx}_A]\) is the smallest element in \(A\) and \(B\) not already in \(C\), \(C[1:i']\) contains the smallest \(i'\) elements in \(A\) and \(B\) sorted in ascending order and \(A[\texttt{idx}_A']\) and \(B[\texttt{idx}_B']\) are the smallest values of subarrays \(A[\texttt{idx}_A':n]\) and \(B[\texttt{idx}_B':m]\) respectively not already in \(C[1:i']\).

\(\texttt{CASE}\quad A[\texttt{idx}_A] \ge B[\texttt{idx}_B]\)

By symmetry.

After \(i + 1\) iterations, the lesser value of \(A[idx_A]\) and \(B[idx_B]\) is copied over to \(C\) and the respective \(idx\) is incremented (the head of the array is advanced). If we say without loss of generality that \(A[idx_A]\) is the lesser value then it becomes \(A[idx_A + 1]\). Since the arrays are sorted and the smallest element of \(A\) was just copied over to \(C\), \(A[idx_A + 1]\) is now the smallest element of \(A\). \(C\) now contains the smallest \(i + 1\) elements from \(A\) and \(B\).

\(\textbf{Postcondition}\)

It must be shown that if both the \(\texttt{condition}\) is false and the \(\texttt{invariant}\) is true then the \(\texttt{postcondition}\) is true.

Assume that the \(\texttt{condition}\) is false:

\(\lnot(i \lt n + m + 1) \implies i \not\lt n + m + 1 \implies i \ge n + m + 1\)

Assume the \(\texttt{invariant}\) is true:

1. \(1 \le \texttt{idx}_A, \texttt{idx}_B \le i \le n + m + 1\)
2. \(C[1:i - 1]\) contains the \(i - 1\) smallest values of arrays \(A\) and \(B\) sorted in ascending order
3. \(A[\texttt{idx}_A]\) and \(B[\texttt{idx}_B]\) are the smallest values of subarrays \(A[\texttt{idx}_A:n]\) and \(B[\texttt{idx}_B:m]\) respectively not already in \(C[1:i - 1]\)

Show that the \(\texttt{postcondition}\) is true.

1. \(i \ge n + m + 1 \land i \le n + m + 1 \iff i = n + m + 1\)
2. \(C[1:i - 1] = C[1:(n + m + 1) - 1]\) contains the \((n + m + 1) - 1\) smallest values of arrays \(A\) and \(B\) sorted in ascending order
3. \(N = A[n + 1]\) and \(M = B[m + 1]\) are the smallest values of subarrays \(A[\texttt{idx}_A:n]\) and \(B[\texttt{idx}_B:m]\) respectively not already in \(C[1:(n + m + 1) - 1]\)

\(\textbf{Termination}\)

It must be shown that there is some value \(\texttt{DF}_0\) past which \(\texttt{loop variant}\) strictly decreases

The quantity is \(n + m - i\) nonnegative. \(\texttt{DF}_0 = 0\)

Regardless of the case, \(i\) is incremented on every iteration.

\(n + m - (i + 1) \lt n + m - i\)


Complexity#

SORTED_ARRAY_MERGE(A, B, n, m)
11    for i = 1 to n + m
12        if A[idx_A] < B[idx_B]
13            C[i] = A[idx_A]
14            idx_A++
15        else
16            C[i] = B[idx_B]
17            idx_B++
18    return C

Time#

\(T(n, m) = \Theta(n + m)\)

If we make the assumption that \(n \ge m\) then we have

\(T(n) = O(n + n) = O(n)\)

Space#

\(S(n) = \Theta(n + m)\) (the size of the memory allocation required for the output array)

If we make the assumption that \(n \ge m\) then we have

\(S(n) = O(n + n) = O(n)\)


Implementations#

C#

/*    merge_two-sorted-arrays.c (also known as "sorted-array merge")
 */

#include <stdio.h> // printf


/**************************************************
 *
 *          auxiliary
 *
 **************************************************/
void print_arr (int A[], int n, char *str) {
  printf("%s: ", str);
  for (int i = 0; i < n; i++)
    printf("%d ", A[i]);
  printf("\n");
}

void swap (int *a, int *b) {
  int t = *a;
     *a = *b;
     *b =  t;
}


/**************************************************
 *
 *          SORTED ARRAY MERGE
 *
 **************************************************/
void sorted_array_merge (int A[], int B[], int C[], int n, int m) {
  int idx_A = 0;
  int idx_B = 0;

  for (int i = 0; i < n + m; i++) {
    // if A is not exhausted and either B is exhaused or the elem of A is less than the elem of B
    if (idx_A < n && (idx_B >= m || A[idx_A] < B[idx_B]))
      C[i] = A[idx_A++];
    // if A is exhausted or if the elem of A is not less than the elem of B
    else
      C[i] = B[idx_B++];
  }
}


/**************************************************
 *
 *          driver
 *
 **************************************************/
int main (void) {
  int A[] = {1, 2, 3, 3,  4};
  int B[] = {2, 4, 6, 8, 10};
  int   n = sizeof(A) / sizeof(A[0]);
  int   m = sizeof(B) / sizeof(B[0]);
  int C[n + m];

  print_arr(A, n,     "A");
  print_arr(B, m,     "B");
  sorted_array_merge(A, B, C, n, m);
  print_arr(C, n + m, "C");

  return 0;
}

Python#

# sorted_array_merge.py

def sorted_array_merge (A, B, C, n, m):
  idx_A, idx_B = 0, 0

  for _ in range(n + m):
    if idx_A < n and (idx_B >= m or A[idx_A] < B[idx_B]):
      C.append(A[idx_A])
      idx_A += 1
    else:
      C.append(B[idx_B])
      idx_B += 1

A = [1, 2, 3, 3,  4]
n = len(A)
B = [2, 4, 6, 8, 10]
m = len(B)
C = []

print(A)
print(B)
sorted_array_merge(A, B, C, n, m)
print(C)

Acknowledgments#