Quickselect#
Contents#
Introduction#
Quickselect is based on the idea of using a partial quicksort instead of a partial selection sort on the naive selection algorithm which finds the \(k\)-th minimum element. Like quicksort, quickselect uses a partition function to select a pivot element and swap lesser elements to the left of it and greater elements to the right of it.
Quicksort#
Quicksort recursively sorts both sub arrays created by the partition function. However, quickselect only needs to recurse on one of them. Following the partition function, the pivot element is in its appropriate sorted position. If the returned pivot is less than \(k\) then we want to examine the right sub array. If the returned pivot is greater than \(k\) then we want to examine the left sub array. If the returned pivot is \(k\) then we’ve found it and we can stop.
INPUT an array A, the index of the first element of A, the index of the last element of A
OUTPUT nothing (in-place)
QUICKSORT (A, stt, end)
1 if stt == end
2 return A[stt] // the single-element array A[stt]
3
4 pvt = partition(A, stt, end)
5
6 QUICKSORT(A, stt, pvt - 1)
7 QUICKSORT(A, pvt + 1, end)
INPUT an array A, the index of the first element of A, the index of the last element of A
OUTPUT the index of the pivot element
partition (A, stt, end)
1 pvt = choose_pivot(A, stt, end) // select a pivot element and return its index
2
3 swap(A[pvt], A[end]) // swap the pivot element to the end of the array
4
5 j = stt
6 for i = stt to end - 1
7 if A[i] <= A[pvt]
8 swap(A[j], A[i])
9 j++
10
11 swap(A[j], A[end]) // index j is pointing to the first element larger than the pivot element, so swap them
12
13 return j // return the index of the pivot element
Quickselect#
INPUT an array A, the index of the first element of the array, the index of the last element of the array, an integer k
OUTPUT the k-th smallest element
QUICKSELECT (A, stt, end, k)
1 if stt == end // base case, an array of just one element
2 return A[stt]
3
4 pvt = partition(A, stt, end)
5
6 if k == pvt
7 return A[k]
8 else if k < pvt
9 return quickselect(A, stt, pvt - 1, k)
10 else
11 return quickselect(A, pvt + 1, end, k)
INPUT an array A, the index of the first element of A, the index of the last element of A
OUTPUT the index of the pivot element
partition (A, stt, end)
1 pvt = choose_pivot(A, stt, end) // select a pivot element and return its index
2
3 swap(A[pvt], A[end]) // swap the pivot element to the end of the array
4
5 j = stt
6 for i = stt to end - 1
7 if A[i] <= A[pvt]
8 swap(A[j], A[i])
9 j++
10
11 swap(A[j], A[end]) // index j is pointing to the first element larger than the pivot element, so swap them
12
13 return j // return the index of the pivot element
Correctness
Proof by induction
Base case \(n = 1\)
When the array consists of a single element the element is the smallest element which is returned by the algorithm.
Inductive step
Assume that the algorithm is correct for all \(1 \le n \lt x\). It must be shown that the algorithm is correct for \(n = x\). Depending on the choice of pivot there are three cases.
1. the \(k\)-th element is in the left sub array
This means that the size of the left sub array must be larger than \(k\), and that the \(k\)-th element of the whole array is the same as the \(k\)-th element of the left sub array. The size of the left sub array must be less than \(x\) (this must be true even in the worst case because the largest possible value for the size of the left sub array is \(x - 1\); the pivot itself is not included in either sub array so each call to partition will reduce the total size of the sub arrays by at least \(1\)). By the inductive hypothesis, the algorithm is correct for all arrays of size \(n \lt x\) and therefore the algorithm will be correct for finding the \(k\)-th element of the sub array. Thus the algorithm is correct in this case.
2. the \(k\)-th element is the pivot
This case is trivial. The partition function ensures that the pivot element is in its appropriate sorted order. Thus if the pivot is equal to \(k\) we have found the \(k\)-th element and returned it.
3. the \(k\)-th element is in the right sub array
This case is similar to the case of the left sub array. If the \(k\)-th element is in the right sub array then this is analogous to calling quickselect on the right sub array (which will be correct due to the inductive hypothesis). However, the \(k\) value will need to be different here as the \(k\)-th element in the entire array is actually the \((k - \ell - 1)\)-th element of the right sub array where \(\ell\) is the size of the left sub array. The detail is handled implicitly by the algorithm described above because of the \(\text{pivot} + 1\) starting offset in the recursive call to quickselect.
Thus we’ve proven the base case and proven that if the algorithm is correct for all array sizes up to \(x\) it is also correct for an array size \(x\) and so correctness has been show by induction on \(x\).
Complexity
The complexity analysis of quickselect is similar to that of quicksort to the extent that the selection of the pivot plays a major role in its complexity.
In the best case, the selected pivot is exactly the median in which case the array is split evenly resulting in
\(T(n) = T(n/2) + \Theta(n) = \Theta(n)\)
Unlike quicksort, quickselect must only solve one of the two sub problems and so the total complexity is dominated by the partition function.
In the worst case, the selected pivot is the ? and the complexity is
\(T(n) = T(n - 1) + \Theta(n) = \Theta(n^2)\)
In general, if \(\ell\) is the size of the left sub array and \(r\) is the size of the right sub array then the worst case complexity is
\(T(n) = \max\{ T(r), T(\ell) \} + \Theta(n) + P(n)\) where \(P(n)\) is the time complexity of the pivot selection algorithm used.
Performance considered
The time complexity of quickselect depends on the
Acknowledgments#
[ h ] Rumbaugh, Douglas B. (Su 2024) CMPSC 465 Data Structures and Algorithms. Penn State University.