Naive Selection: k-th smallest element

Contents

Naive Selection: \(k\)-th smallest element#


Contents#


Approaches to the selection problem

1. The \(k\)-th smallest element will be the element at index \(k\) in an array that is sorted in ascending order. Sort the array in \(\Theta(n \log n)\) time and then lookup the \(k\)-th element. This approach does some unnecessary work. We don’t need the entire array to be sorted, just the first \(k\) elements. For example, if we had an array of \(1000\) elements and we wanted to find the third smallest one, following this approach we would spend a lot of time sorting irrelevant elements.

2. An improved approach sorts only the necessary elements and follows from the observation that finding the minimum element in the array can be thought of as a special case of the selection problem where \(k = 0\). In this case a simple linear time algorithm can be expressed as follows.

 INPUT    an unsorted array A
OUTPUT    the index of the smallest element

FIND_MIN (A, n)
1    curr_min = A[0]
2    curr_idx = 0
3
4    for i = 1 to n
5        if A[i] < curr_min
6            curr_min = A[i]
7            curr_idx = i
8
9    return curr_idx

Following this algorithm, we have the smallest element. We can find the \(k\)-th smallest element by calling this algorithm \(k\) times, skipping over previously found values. A simple way to perform this skipping is to create a header pointer to the front of the array and then swap the located minimum value with the value located at the header pointer, then advance the header pointer.

 INPUT    an unsorted array A and a pointer to the "head"
OUTPUT    the index of the smallest element

FIND_MIN (A, n)
1    curr_min = A[head]
2    curr_idx = head
3
4    for i = head to n
5        if A[i] < curr_min
6            curr_min = A[i]
7            curr_idx = i
8
9    return curr_idx

 INPUT    an unsorted array A and an integer k
OUTPUT    the value of the k-th smallest element

SELECT (A, k)
1    for i = 0 to k
2        idx = FIND_MIN(A, i)
3        swap(A[idx], A[i])
4
5    return A[k]

If we were to run this algorithm with \(k = n\) then this algorithm would just be selection sort. Following this algorithm, the input array will be partially sorted (i.e., the first \(k\) elements will be sorted) and we will have the \(k\)-th smallest element.

Complexity

FIND_MIN requires \(\Theta(n)\) time and it is called \(k\) times, so the total time complexity is \(\Theta(kn)\).

We can better than this (at least in expectation) by using a partial quicksort rather than a partial selection sort, which is what leads to quickselect.