Sorting (with attitude)
This post takes a look at a number of different sorting techniques: heap sort, merge sort, quick sort and quick select, counting sort, radix sort, and bucket sort.
The notes below come from the Algorithms with Attitude YouTube channel, specifically the Sorting playlist comprised of the following videos: , Heap Sort, Merge Sort: Top-Down and Bottom-Up, Quick Sort and Quick Select, Runtime Analysis for Quick Sort and Quick Select, Lower Bounds for Comparison Based Sorting: Decision Trees, and Linear Time Sorting: Counting Sort, Radix Sort, and Bucket Sort
There are several sections in this post, accessible in the table of contents on the right, but most of the sections have been listed below for ease of reference and navigation.
- Heap sort
- Merge sort - top-down and bottom-up
- Merging lists
- Algorithm concept
- Standard recursive merge sort (top-down)
- Analysis
- Optimizations
- Naive parallelization
- Unwinding the recursion
- Bottom-up merge sort (iterative)
- Comparison to top-down (8 elements)
- Comparison to top-down (13 elements)
- History lesson
- A different optimization (bottom-up)
- Bottom-up implementation (optimized)
- Timsort ideas
- Quick sort and quick select
- Introduction
- Quick sort concept
- Quick sort pseudocode (1)
- Partitioning (CLRS/Lomuto partition)
- Worst case performance
- Quick sort pseudocode (2)
- Quick sort pseudocode (3)
- Expected performance
- Randomized quick sort
- Expected performance (continued)
- Partitioning with duplicates
- Optimizations
- Naive parallelization
- Selection
- Quick select
- Implementations
- Runtime analysis for quick sort and quick select
- Lower bounds for comparison based sorting - decision trees
- Linear time sorting - counting sort, radix sort, and bucket sort
- LeetCode implementations
Heap sort
There is another post dedicated to binary heaps, where heap sort is discussed in the general context of heaps as a whole. The content below for heap sort has been excerpted from the linked post for the sake of completeness.
Overview
Suppose we have the heapify and deleteMax operations as well as the buildHeap operation. Heap sort is then quite easy: first, build a max-heap; then, delete all its values:
buildHeap()deleteMax()times
Done.
buildHeap() phase
For a bit more detail, we start with the buildHeap operation. We assume we're given an array of values to be sorted, and we use the space of that array to store the heap itself. One of the nice properties of heap sort is that it is an in-place algorithm — it only needs a fixed amount of memory beyond what is used to store the data being sorted.
deleteMax() phase
Next, we delete the max value. We swap the root with the last leaf node (bottom level to the far right) and then delete the last leaf node — the new root value then has to be compared to its children and possibly sifted down the tree to ensure the properties of the whole heap and all sub-heaps are respected. We conclude the sifting down when what was previously the last leaf node is now either a leaf or the root of its own sub-heap.
We continue to call deleteMax until the entire array is empty. As indicated above, note that when deleteMax is called, we don't just overwrite the value to be deleted — we swap it with the last leaf of the heap, and continue as explained above. What happens is that we delete more and more items, and the heap gets smaller and smaller. But all of the values are still stored in the array, until finally, the heap is gone.
Analysis
buildHeap:deleteMaxtimes: each, total
For our analysis, we had a buildHeap operation, which was time, and then deleteMax operations, which are worst-case time each. That's total time.
Note that we would still have an algorithm even if we ran insertions to start the algorithm instead of the linear buildHeap. If we want to be more precise, we can better bound the total number of comparisons:
buildHeap: comparisons (CLRS)delteMaxtimes: comparisons (CLRS)
Note that the above are not asymptotic results. The 2 term above comes from the fact that to move down one level in heapify usually takes 2 comparisons, but that can be improved by optimizing the heapify method.
In all, heap sort gives us a worst-case in-place sorting algorithm that is not stable. It's supposedly slower in practice than a good quicksort.
Merge sort - top-down and bottom-up
Merging lists
The idea for merge sort is that if we have two lists of numbers that are already sorted, then we can merge those two lists into one longer sorted list in linear total time. If each list is sorted smallest to largest, then we can find the overall smallest by just comparing the smallest of each list. The next smallest is the smallest of the lists remaining after we take that first element out. For equal values, we take the one from the left list as the smaller one (even though it really doesn't make a difference) — its value started in a smaller index; thus, taking it in case of a tie, we'll keep it in a smaller index. That will make merge sort a stable sorting routine — it won't flip the relative order of equal values.
Here's an example run:

Merging two lists is pretty easy and takes linear time: .
Algorithm concept
MergeSort(list):
- What if the first half of the list was already sorted?
- What if the second half of the list was already sorted?
- Merge the two sorted lists into one.
If we had a list and broke it in half and each half was magically sorted for us, then we could merge those two halves to make a complete sorted list:

Excluding that magic part, that's a really simple algorithm. So where did those two sorted lists come form? Starting with the two unsorted halves of the original list, we sort them recursively with merge sort:

Recursion is the magic. As a base case for the recursion, if we're given a list with one item (or none), then it's sorted:
MergeSort(array A)
if A.length < 2: return
Copy the first half of A to new array B
MergeSort(B)
Copy the second half of A to new array C
MergeSort(C)
merge B and C back into A
The pseudocode above can be implemented in Python as follows:
Click "Run Code" to see the output here
Here's a concise breakdown of the time and space complexity of the approach above:
- Time: The algorithm recursively divides the source array,
A, into halves,BandC, leading to a logarithmic number of levels, , as shown in the animation above (in the animation example, we have ). Each level involves merging the entire array, which takes time linear, . Hence, the total time complexity is . - Space: New arrays,
BandC, are created to hold halves of the original array,A, for each recursive call. Each level of the recursion tree, as demonstrated in the animation, uses space; since there are levels in the recursion tree, this means the total space complexity is .
Standard recursive merge sort (top-down)
While we could copy the array halves out before they're sorted and then sort them, as shown above and in the animation, that implementation uses more space than is needed. It's not too bad, maybe twice what we'd usually need, until we get into implementations that run in parallel, which can be much worse.
Instead, it's more efficient to only copy the data out of the original array after it's sorted, immediately merging back into the original array; that is, only copy lists after they are sorted, not beforehand:
- Use original array to store unsorted values waiting to be sorted
- Copy sorted values to another array just before the merge back into the original space
- Less space (especially for naive parallelized version)
MergeSort(A[], start, end)
if(end <= start) return
mid = (end + start) / 2
MergeSort(A[], start, mid)
MergeSort(A[], mid + 1, end)
copy A[start...mid] to B[]
copy A[mid + 1...end] to C[]
merge B[] and C[] back into A[start...end]
We can have start and end indices as parameters in our recursive calls, specifying which parts of the original array we are currently sorting. So, to sort the array from the start to the end, we recursively try to sort its first half from the start index to the middle index. And that will first try sort its first half, the first quarter of the whole array, etc., and we'll return to the full second half of the array after the first half is complete.
Each of the merge operations will copy out two sorted lists, and then merge back into the original array, overwriting it. We'll call this the standard "unoptimized" version of top-down merge sort (it's better than the previous approach, but there are still several improvements we can make). We assume here that we're just comparing integer values in an array. More generally we might be sorting records or objects by some key value, but the basic idea is the same. If we're working in a language where we have to explicitly free up memory instead of using garbage collection, we'd need to free up the extra space used for merging the lists once we finish with them.
In the animation of this algorithm below, it's easy to see that we use linear extra space during the algorithm instead of space as before:

The pseudocode above can be implemented in Python as follows:
Click "Run Code" to see the output here
Summary of time and space complexity for the approach above:
- Time: The algorithm above again recursively divides the source array into halves ( levels in the recursion tree) and merges all elements at each level for a total runtime of .
- Space: The total extra space used by temporary arrays
BandC, at any point in time, is proportional to , the number of elements in the source array,A; hence, the overall space complexity is .
Analysis
Concise summaries of the time/space analysis for each previous approach was provided at the end of each approach's section, but it's helpful to consider the analysis of merge sort from a more formal point of view. It's worth noting merge sort is probably one of the most analyzed sorting algorithms there is, and its recurrence relation is likely the most frequently analyzed recurrence relation ever.
To run the algorithm on items, we recursively sort the first items and then recursively sort the second items, and then we merge them. The merge takes linear time. We get the following recurrence relation:
Solving this recurrence relation shows that merge sort takes time .
As an arbitrary tie-breaker, we'll use the convention CLRS uses, namely that if there are an odd number of items, then the first or left half will get the extra one (e.g., if the entire array has 13 elements then the first half will range from index 0 throught index 6, (end + start) // 2 = (0 + 12) // 2 = 6, thus containing a total of 7 items).
In summary:
- Runtime analysis:
- Tie breakers:
- Important: for equal key values, the lower indexed (left) value goes first (makes algorithm stable)
- Arbitrary: for an odd number of elements, the left half gets the extra one
Optimizations
Optimization 1
Only copy the first sorted list, and leave the second list in place:
- Decreases copies by about 25%
- Doesn't change number of comparisons
Recall the animation for the standard recursive merge sort algorithm:

Before the merge, the entire sorted array is copied elsewhere, and it looks kind of like one operation, but it isn't. We're copying the entire array into another position, which takes time linear in the size of that array.
Instead of copying out both the left and the right arrays, we can copy out the left array and just leave the right array in-place. If we fill in the sorted values from small to large, then by the time we need to overwrite anything in the right array, that value will already have been copied elsewhere. Unoptimized, a merge of two size lists takes read and write operations to copy the arrays elsewhere and more to copy them back into the array during the merge. There are also between and comparisons made during the merge. Here, for this first optimization, we get rid of read and write operations for one of the initial list copies (the right one). It won't change the asymptotic runtime — it will just make it run a little more quickly, getting rid of a quarter of our copy operations.
This new optimization of only copying the left array and leaving the right one in-place can be visualized as follows:

We can modify our code from the standard implementation to implement this optimization as follows:
Click "Run Code" to see the output here
Optimization 2
Once the first list is merged, the remainder of the second list is already in-place
- Decrease copies (amount depends on data)
- Decrease comparisons (amount depends on data)
This second optimization is easier to see after we've made the first optimization. Once all elements have been copied from the left list into the combined list, the remaining elements from the right list are already in-place so we don't have to loop through the remaining elements from the right list to copy them into the merged list — they're already there.
That was true even without the first optimization. In the standard version, we would have been copying values from the right list (i.e., list C) to just overwrite the same values in the original list (i.e., list A). But the second optimization is easier to notice here, where instead of just overwriting data (i.e., list A) with the same data from somewhere else (i.e., list C), we'd be overwriting the data with itself (i.e., list A) from the exact same position (i.e., also list A).
We can explicitly see this if we look at the code provided for implementing the first optimization:
# merge B and C back into A
def merge(A, start, end, B):
i = 0 # start index of B
j = (end + start) // 2 + 1 # start index of C (in A)
k = start
# compare smallest elements before each merge
while i < len(B) and j < end + 1:
if B[i] <= A[j]:
A[k] = B[i]
i += 1
else:
A[k] = A[j]
j += 1
k += 1
# merge any remaining elements of B into A
while i < len(B):
A[k] = B[i]
i += 1
k += 1
# merge any remaining elements of C into A
while j < end + 1:
A[k] = A[j]
j += 1
k += 1
def merge_sort(A, start, end):
# base case (empty or single-element list is already sorted)
if end <= start:
return
mid = (end + start) // 2
merge_sort(A, start, mid) # recursively sort first half of A
merge_sort(A, mid + 1, end) # recursively sort second half of A
B = A[start:mid+1] # copy sorted first half to B[]
merge(A, start, end, B) # merge sorted values B[] and C[] back into A[start...end]
All we have to do to implement this second optimization is remove the highlighted lines above:
Click "Run Code" to see the output here
Optimization 3
Only allocate working space once
- Decrease allocations and deallocations
Instead of allocating or deallocating or garbage collecting space for the merges over and over again, we know it's going to take linear total space, so we can just allocate a single array and use it for all of the needed copies. Anytime we need to copy a value from the original array, just copy it into the same indexed position of the extra array. That spot is reserved for data from the same index. We'll see why using that position is nice in a moment.
Click "Run Code" to see the output here
Optimization 4
Before any copy prior to merge, test if lists are already ordered:
- Potentially decrease copies and comparisons
- In worst case, adds comparisons to the runtime
- In bese case, algorithm now takes
Another optimization sort of overlaps the idea of the first two, although it can be put into place independently from them. Sometimes we might run into the case that the two lists we're going to merge are already sorted; that is, the largest element from the leftmost list is no bigger than the smallest element from the rightmost list. In that case, we don't need to copy the lists out and merge them — they're already forming a sorted list when combined. So we can check for that before we perform the merge.
This optimization has a bit of a tradeoff. For each of the merges that takes place in the algorithm, of them for a sort of items, we're adding an extra comparison to see if the merge is needed. If the elements are random, then we can avoid approximately half of the one-item list merges. For merges of two-sized two lists, we can avoid approximately 1/6 of the comparisons. Those avoided merges should pay for the extra comparisons. But without really careful analysis, or better yet, testing, it's hard to be sure just how much we'll gain for random inputs for this optimization.
But even if we don't gain much on random inputs, it gets us something in some special cases. Our basic merge sort, even with the first three optimizations, will always take time proportional to : best case, worse case, every case. With this fourth optimization, if the input is already sorted or even almost sorted, then merge sort can take linear time to run, improving the asymptotic best-case performance of the algorithm. This is a way we can help those special cases, and it shouldn't hurt the random case. In the worst case, it will add just comparisons to the algorithm, which isn't that big of a deal for an algorithm anyway.
Here's an animation that shows the third and fourth optimizations at work:

The code needed to implement this final optimization to top-down merge sort is as simple as adding the following if statement before attempting to merge:
# ...
if A[mid] > A[mid + 1]:
# ...
That is, the first list goes from A[start] to A[mid], inclusive, and the second list goes from A[mid + 1] to A[end], inclusive. If A[mid] is greater than A[mid + 1], then this means the two lists do need to be merged; otherwise, no merging is necessary. This gives us our final implementation of top-down merge sort:
Click "Run Code" to see the output here
Optimization summary
- Optimization 0: Only copy lists after they are sorted, not beforehand (less space, especially for naive parallelized version)
- Optimization 1: Only copy the first sorted list, leave the second sorted list in-place (decreases copies by 25%)
- Optimization 2: Once first list is merged, the remainder of the second list is already in-place and doesn't need to be copied (decrease copies, amount depends on data)
- Optimization 3: Only allocate working space once (decrease allocations and deallocations)
- Optimization 4: Before any copy prior to merge, test if lists are already ordered (decreases copies, amount depends on data, gives best case runtime)
Naive parallelization
There's an almost trivial way to partially parallelize this algorithm.
0th version
Once we say that we're going to separately sort the first and second half of the array, we can do those in parallel. Below, the animation shows all of the sub calls for the zeroth version of the unoptimized algorithm happening simulataneously:

Each of the merge operations is done by one processor, and the last merge operation takes linear time all by itself. The two merges at the level below that are each done in parallel, so that level takes only half as much time. Each level takes half as much time as the level above it. So it takes linear total time to do order work for the algorithm, using up to processors at a time.
The animation above actually shows the problem with our original approach of copying out unsorted values and then recursively sorting them before the merge. It takes a lot of space. Each recursive level uses linear space to hold unsorted data, but it needs to wait for deeper recursive calls to complete in order to overwrite that data. That version would need space. So we shouldn't copy the data into new space until after it's been sorted and is ready to merge (like our standard recursive model).
But what about the four optimizations we made from that model? All four of them work in parallel too.
4th version
Use the original array as our unsorted data instead of copying it out. Don't copy the second of the two sorted arrays to be merged. When the left array is done merging, don't re-copy the right array. Just allocate one array for space (ever). And check to see if the two arrays are already in order before merging. We can do all of those in parallel. The problem there is, if we animate it, so much is going on at once it's hard to see what's happening:

But at least it happens quickly! It will take processors, and order runtime and order space. For perfectly sorted data, it would take logarithmic time, but generally we'd almost always expect it to take linear time.
But there are ways to parallelize the merge itself, and parallel merge sort can run in like logarithmic runtime, but we won't get into that for this post.
Unwinding the recursion
Let's now take a look at what happens within the recursive calls with a smaller example of just eight items:

Consider the comparisons and merges done throughout the entire algorithm. We'll use the 0th version of the algorithm to visualize this (because we don't erase stuff it shows a nice trace of everything the algorithm does so we can see which lists were merged to make which list):

At the very bottom of the recursion, we see the unsorted values:

We just interpret each element as a sorted list of length 1, the base case of the recursion. One level up, we see that we will quickly have a sorted list of the first two items to be merged from the sorted list of the second two items. Eventually, a sorted list of the third pair of items gets merged with a sorted list of the last two items:
merge 17 with -22: compare (17, -22)
merge 15 with 20: compare (15, 20)
merge -22, 17 with 15, 20: compare (-22, 15), (17, 15), (17, 20)
merge 33 with 25: compare (33, 25)
merge 22 with 19: compare (22, 19)
merge 25, 33 with 19, 22: compare (25, 19), (25, 22)
merge -22, 15, 17, 20 with 19, 22, 25, 33:
compare (-22, 19), (15, 19), (17, 19), (20, 19), (20, 22)
Bottom-up merge sort (iterative)
Another version of merge sort, bottom-up (or iterative), doesn't use recursion. Instead, it just starts with an interpretation of the list as a bunch of length 1 sorted lists, and it merges the first two of those, and the next two, and the next two, etc., until it's passed through the entire array once, merging sorted lists of one item each into sorted lists of two items each. Then it goes through the entire array again, this time merging sorted lists of size two into lists of size four. It keeps going and each pass through the array it merges lists that are twice as large as they were for the previous pass.
That's the idea of bottom-up merge sort. Starting with an unsorted set of numbers, interpreted as sorted lists of size each, pass over the list, merging pairs of lists, leaving us with lists of size each, and then again leaving lists of size each. And continue that until we have one list of all items.
Let's consider again the example we used above:

Bottom-up merge sort sorts the array above in the following manner:

The following list of merges and comparisons can be helpful to track with the animation above:
merge 17 with -22: compare (17, -22)
merge 15 with 20: compare (15, 20)
merge 33 with 25: compare (33, 25)
merge 22 with 19: compare (22, 19)
merge -22, 17 with 15, 20: compare (-22, 15), (17, 15), (17, 20)
merge 25, 33 with 19, 22: compare (25, 19), (25, 22)
merge -22, 15, 17, 20 with 19, 22, 25, 33:
compare (-22, 19), (15, 19), (17, 19), (20, 19), (20, 22)
We'll compare the bottom-up approach to the top-down approach in the next two sections, but the bottom-up implementation is included towards the end.
Comparison to top-down (8 elements)
The example above is somewhat of a special case because , which is a perfect power of . It turns out that when we're sorting a list comprised of a number of items that is a perfect power of , the comparisons done by top-down and bottom-up merge sort will exactly match. For example, we can place side-by-side the bottom-up comparisons we just did in the example above (left) with the comparisons made in the unwinding the recursion section (right):
merge 17 with -22: compare (17, -22)
merge 15 with 20: compare (15, 20)
merge 33 with 25: compare (33, 25)
merge 22 with 19: compare (22, 19)
merge -22, 17 with 15, 20: compare (-22, 15), (17, 15), (17, 20)
merge 25, 33 with 19, 22: compare (25, 19), (25, 22)
merge -22, 15, 17, 20 with 19, 22, 25, 33:
compare (-22, 19), (15, 19), (17, 19), (20, 19), (20, 22)
merge 17 with -22: compare (17, -22)
merge 15 with 20: compare (15, 20)
merge -22, 17 with 15, 20: compare (-22, 15), (17, 15), (17, 20)
merge 33 with 25: compare (33, 25)
merge 22 with 19: compare (22, 19)
merge 25, 33 with 19, 22: compare (25, 19), (25, 22)
merge -22, 15, 17, 20 with 19, 22, 25, 33:
compare (-22, 19), (15, 19), (17, 19), (20, 19), (20, 22)
As can be seen above, the order can change, but the actual comparisons will be the same ones. That won't be the case if we look at this example with items, which is clearly not a perfect power of .
Comparison to top-down (13 elements)
Let's now consider an example array with items, a number that is very clearly not a power of :

To sort items bottom-up, we start in the same way: compare the first two items, then the next two, and continue until we have sorted lists of size each, where the last item has nothing to merge with so it just gets left as a size list. On the next pass, we start merging those longer lists. The first two lists of length get merged into a list of length . Then the next two lists of length get merged into a list of length . Then another. But that last item is still on its own:

After our th pass through the numbers merging lists, we'll be left with a bunch of sorted lists that are all of length . Except the last list! After the first pass, that last list will be smaller than all other lists (unless we started with a number of items that is a perfect power of ). Above, we paused the process after the second pass, where we had three lists of length each and then one list of length :

The next pass will be the first pass where we have an even number of lists, so that last list will finally get merged with something, specifically the length list to its left. In our final pass, we'll merge a length list on the left with a length list on the right for an overall length sorted list:

Note that in the top-down version of merge sort we never merge lists of such different sizes. The lists always need to be within size of each other. But in the bottom-up case, the last list can be very different in size while all the other lists are exactly the same size. Also note that the bottom-up version can have really different comparisons from the top-down version:
1st pass:
merge 17 with -22
merge 15 with 20
merge -50 with 25
merge 22 with 19
merge 10 with 40
merge 44 with 50
2nd pass
merge -22, 17 with 15, 20
merge -50, 25 with 19, 22
merge 10, 40 with 44, 50
3rd pass
merge -22, 15, 17, 20 with -50, 19, 22, 25
merge 10, 40, 44, 50 with 25
4th pass
merge -50, -22, 15, 17, 19, 20, 22, 25 with 10, 25, 40, 44, 50
merge 17 with -22
merge 15 with 20
merge -22, 17 with 15, 20
merge -50 with 25
merge -50, 25 with 22
merge -22, 15, 17, 20 with -50, 22, 25
merge 19 with 10
merge 10, 19 with 40
merge 44 with 50
merge 44, 50 with 25
merge 10, 19, 40 with 25, 44, 50
-50, -22, 15, 17, 20, 22, 25 with 10, 19, 25, 40, 44, 50
Unlike when we sorted items, the comparisons above aren't just the same comparisons in different order. For example, in the bottom-up example, never gets compared to , but that would be the first comparison done to recursively sort the second half of the elements in the top-down version.
On the other hand, the bottom-up version will compare and as its fourth comparison. But because they're in separate halves of the original array, the from the left will be compared to the on the right, and the won't ever get compared to .
The two versions give different sets of comparisons even though they are asymptotically the same. The bottom-up versions makes passes through all the data-making merges, linea time for each pass, so it also takes time. Comparisons are different because the lists being merged are different. We can take a look at individual comparisons if we really want to make sure we understand all of the details.
History lesson
Why would we want the bottom-up version instead of the top-down version? Some people like to avoid recursion. Some older languages might even not allow recursion, but either approach will work in most situations. But that may not have always been the case in the past.
In the past, the standard way to store lots of data was on reel-to-reel tapes. Those tapes didn't really have random access. They worked best just reading data sequentially from the beginning of the tape. Recursive merge sort doesn't work so well on those. It jumps around a lot, but if we had our data split between two tapes, with half of the lists on one tape and half of the lists on the other tape, then we could play both of those tapes from the start, reading off the first item from each. We could merge those two lists into a list on a third tape, and then we could merge the next list from the two tapes on to a fourth tape. After going through all of the lists on the first two tapes, alternating which tape we merged them on to, we'd now have all of our data in half as many lists, split between third and fourth tapes. Next, we'd take those two tapes and merge their data back onto the first two tapes. This allows us to merge while only having to rewind the tapes a logarithmic number of times.
People used to have to worry about stuff like minimizing tape rewinds. More recently, external memory merge sort makes a special attempt to minimize how many times we need to access a disk when we have too much data to fit in main memory.
Let's recap:
- Data was stored on reel-to-reel tapes. Random access was poor. Sequential access was better.
- For merge sort:
- Store half of the lists on one tape, half of the lists on a second tape.
- Merge the first list from each tape onto a third tape, and the second list from each tape onto a fourth.
- Alternate which tape you merge onto between the 3rd and 4th tapes.
- After completing a pass through all data, now you have all data on the 3rd and 4th tapes; merge them onto tapes 1 and 2.
A different optimization (bottom-up)
Let's take a look at one optimization for the bottom-up version, different than those we used for the top-down recursive version. Maybe we want to just allocate a second array once, but this time we won't copy sorted values out and then merge them back; instead, we'll just use the sorted lists and merge them onto the other array. Then use the sorted lists in that second array and merge them back into the original array. Go back and forth between the two. If we happen to need an odd number of total passes, then we'll need to copy data back into the original array when we finish, but we get rid of all of the copy operations right before any merge for the entire algorithm.
We can visualize this different sort of optimization that mimics details found in the history lesson section:

The algorithm can be implemented in Python as follows:
Click "Run Code" to see the output here
Bottom-up implementation (optimized)
Recalling details from the section on bottom-up merge sort, along with the different optimizations discussed for top-down merge sort, what would a decent implementation of bottom-up merge sort actually look like in code? There won't be much of a change in the actual merging of sublists:
# merge left list and right list back into A
def merge(A, start, mid, end, temp):
for i in range(start, mid + 1): # copy left subarray into temp
temp[i] = A[i]
i = start # index into temp (left subarray)
j = mid + 1 # index into right subarray in A
k = start # index into merged array in A
# merge temp (left subarray) and A (right subarray) back into A
while i < mid + 1 and j < end + 1:
if temp[i] <= A[j]:
A[k] = temp[i]
i += 1
else:
A[k] = A[j]
j += 1
k += 1
# merge remaining elements from temp back into A
while i < mid + 1:
A[k] = temp[i]
i += 1
k += 1
# remaining elements of right list in A are already in-place (no need for merging)
def merge_sort(A, ...):
# ???
Specifically, the left sublist will range from start to mid, inclusive; furthermore, the left sublist is copied into another list, temp, while the right sublist, which ranges from mid + 1 to end, inclusive, remains in-place. If we recall the various optimizations, then we see the code outline above hints at optimizations 0 through 3:
- Optimization 0: Only copy lists after they are sorted, not beforehand (less space, especially for naive parallelized version)
- Optimization 1: Only copy the first sorted list, leave the second sorted list in-place (decreases copies by 25%)
- Optimization 2: Once first list is merged, the remainder of the second list is already in-place and doesn't need to be copied (decrease copies, amount depends on data)
- Optimization 3: Only allocate working space once (decrease allocations and deallocations)
- Optimization 4: Before any copy prior to merge, test if lists are already ordered (decreases copies, amount depends on data, gives best case runtime)
Optimization 4 should be simple to implement within the body of the merge_sort function. The trickier stuff comes down to effectively handling the mechanics. For example, each pass through the array needs to involve merging lists twice the size of the previous pass (except the base case which just merges lists of size ), and we also need to effectively handle cases where the size of the input array is not a perfect power of (e.g., see the example where the input array had items). Let's handle these concerns separately.
Doubling the size of lists to be merged on each pass
Let size denote the size or length of the lists to be merged for a given pass over all items. The bottom-up implementation starts from the bottom (i.e., the base cases, which are trivially sorted lists of length each) and works up (i.e., the entire input array of size ). Thus, it makes sense we should initialize size = 1 to indicate we are starting with the base case. When should we stop? Maybe it's easier to think about when we should continue: while size < n (this means whatever list we've built up so far by sorting and merging is not yet complete). We can outline the body of the merge_sort function as follows:
def merge_sort(A):
n = len(A)
temp = [0] * n # allocate working space only once
size = 1 # size of sublists to sort and merge
while size < n:
for start in range(0, n, 2 * size):
mid = ?
end = ?
? merge(A, start, mid, end, temp) ?
size *= 2
- Line
4indicates our initializesize = 1condition. - Line
5indicates we're going to continue attempting to sort and merge lists untilsize >= n, at which point we should stop. - Line
6indicates thestartposition for the left list of whatever left/right list pair we consider for sorting and merging. Thestartposition should begin at0for the very first left list, andstartshould clearly never go beyond the last index of the input array (i.e.,startcan go up to, but not include,n). Importantly, the2 * sizeargument torangespecifies the index wherestartshould be for the left list of each new left/right pair of lists (in Python, the third argument torangespecifies the step size by which the first argument should grow). For example, whensize = 1, the step size is2 * start = 2 * 1 = 2, which means we have the followingstartvalues (i.e., first endpoint of each left list for pairs of left/right lists):0, 2, 4, 6, .... In general, the following illustration shows the structure of how left/right pairs of sublists are arranged: The cautionary note under the last pair of left/right sublists will be discussed and resolved in the next section, but for now it's helpful to simply note we will need to deal with cases where a left sublist exists and a right sublist does not exist, indicating we should not attempt a merge. - Line
10indicates we should double the size of sublists we're attempting to merge on each full pass.
Given the details outlined above, how should mid and end be calculated (lines 7 and 8, respectively)? Since these variables denote array index values, they need to be computed so as to ensure each left and right sublist has a length of size (except possibly the last right sublist).
mid: If the left sublist starts at indexstartand ends at indexmid, inclusive, then we must havemid - start + 1 == sizeto accurately capture the fact thatmidis included and the size of the entire sublist is of lengthsize. For example, if we hadstart = xand we're about to try to merge sublists of size4, then we havesize = 4, but going fromstart = xtostart + size = x + 4, inclusive, would be a mistake because then we'd have five items, not four, specifically at the following indexes:x,x + 1,x + 2,x + 3,x + 4. We need to subtract out the last one. Consequently, we should ultimately havemid = start + (size - 1).end: The right sublist begins atmid + 1and ends atend. If we're not dealing with the very last right sublist, then all other right sublists should have a length ofsize, which can be obtained just like above:end - (mid + 1) + 1 == size. That is, we should haveend = (mid + 1) + (size - 1). If we are dealing with the very last right sublist, then note that theendvalue should not exceed the last index of the input array,n - 1. Hence, we should really haveend = min((mid + 1) + (size - 1), n - 1).
Handling edge cases concerning the last sublist(s)
Our work in the previous section gives us the following outline of the merge_sort function:
def merge_sort(A):
n = len(A)
temp = [0] * n # allocate working space only once
size = 1 # size of sublists to sort and merge
while size < n:
for start in range(0, n, 2 * size):
mid = start + (size - 1)
end = min((mid + 1) + (size - 1), n - 1)
? merge(A, start, mid, end, temp) ?
size *= 2
Our final concern is highlighted above: how and when should we actually attempt to merge sublists? When they're not sorted, of course! We can easily go ahead and implement the fourth and final optimization from the top-down approach:
# ...
if A[mid] > A[mid + 1]:
merge(A, start, mid, end, temp)
# ...
Recall that mid is the right endpoint of the left sublist, and mid + 1 is the left endpoint of the right sublist. If A[mid] > A[mid + 1], then it must be the case that the current pair of sublists we're considering are not in order. We need to sort and merge them.
But what if A[mid + 1] doesn't actually exist? Consider again the example of the array input with items:

On the first pass, where size = 1, note that start takes on the following values: 0, 2, 4, 6, 8, 10, 12. Notably, the last start value corresponds to the last element in the array: A[12] = 25. We have the following:
start = 12mid = start + (size - 1) = 12 + (1 - 1) = 12end = min((mid + 1) + (size - 1), n - 1) = min(12 + 1 + 1 - 1, 13 - 1) = min(13, 12) = 12
This is a problem. Even checking the condition A[mid] > A[mid + 1] would throw an error because, in the case of the example array pictured above, A[mid + 1] = A[12 + 1] = A[13], which does not exist. We'd get an index out of bounds error. As previously mentioned and illustrated via animation, when there's nothing for the last sublist to be sorted and/or merged with (i.e., the right sublist does not exist), we simply skip it because it's trivially sorted:

Only when the last left sublist can be merged with another sublist do we actually attempt a sort and merge (even though the lists may be of very different sizes):

When does it make sense to attempt a sort and merge between two lists? When we actually have two lists! The condition A[mid] > A[mid + 1] is simply an optimization, but the condition mid < end should be checked first before the optimization and certainly before any attempted merge; that is, if mid >= end, then we don't actually have a left list and a right list, only a left list. This observation lets us complete the body of the merge_sort function:
def merge_sort(A):
n = len(A)
temp = [0] * n # allocate working space only once
size = 1 # size of sublists to sort and merge
while size < n:
for start in range(0, n, 2 * size):
mid = start + (size - 1)
end = min((mid + 1) + (size - 1), n - 1)
if mid < end and A[mid] > A[mid + 1]: # `mid < end` ensures we have two lists for sorting and merging;
# `A[mid] > A[mid + 1]` ensures we don't sort and merge the two lists unnecessarily (optimization 4)
merge(A, start, mid, end, temp)
size *= 2
Implementation
Using the observations detailed above, we can stitch the merge and merge_sort functions together to complete a decently optimized bottom-up implementation of merge sort:
Click "Run Code" to see the output here
Timsort ideas
There are, of course, other optimizations that have not been discussed. For instance, insertion sort can sort a small number of elements faster than merge sort. So we could use that to change merge sort's base case. A sort called Timsort does that and adds other optimizations, some of which are extensions of the optimizations we've alread seen. It aims to use less time and less space even if it isn't asymptotically different for most cases.
Recap:
- Use insertion sort for bigger base case (it works better on a smaller number of elements than merge sort)
- Timsort:
- Also use large increasing/decreasing runs for base case
- Copy only part of left or right array before merge (whichever is smaller)
- Also use large increasing/decreasing runs for base case
- Optimize when many items come from one list in a row
Quick sort and quick select
Introduction
Quicksort is a recursive algorithm that sorts data in expected order time with excellent real-world performance. It was introduced by Tony Hoare in 1960, and then Robert Sedgewick put out a 300 page dissertation analyzing the Knuth out of it in 1975.
Quick sort concept
Especially if we already understand recursion, the idea behind quick sort is really simple. First, separate the items into one group with small values and another group with large values. Next, sort each of those two groups separately. The initial splitting of the set into two parts is called partitioning the set. Traditionally, we partition the set by picking an element from the array, called the pivot (or bound), and compare every element against it.
For example, suppose we have the following array, and we choose the last element, 25, to be the pivot:
[40, 41, 17, -22, 25, 55, -18, 35, 10, 25, 33, 19, 44, 51, 25]
Let's identify the elements that are larger than or smaller or equal to the pivot:
For the sake of clarity, let's color the pivot in , the values strictly larger than the pivot in , and the values less than or equal to the pivot in :
Ideally, we could just move the elements smaller than or equal to the pivot to the lower index positions of the array and then move the elements larger than the pivot to the higher index positions of the array, leaving the pivot between those blocks in its correct, final location:
This process can be animated in the following way:

Next, we just need to recursively sort the small and large sets/blocks, which are completely independent from each other, so they don't need to interact:

Let's recap what we have so far for the concept of quick sort:
- Separate elements into two groups: small keys, large keys
- Recursively sort the group with small keys, then the group with large keys
- Use one element (the bound, or pivot) to separate small vs. large groups
Quick sort pseudocode (1)
To flesh out a few more details, like a lot of recursive programs, after the user calls the high-level sorting routine, it calls recursive quick sort, which uses a couple of extra parameters to mark the start and the end part of the array we're currently sorting. When we make our recursive calls, those parameters allow us to specify which of the two partitions we want to recursively sort:
QuickSort(A[], start, end)
if(start >= end) return
pivotIndex = Partition(A, start, end) # Partition returns an index
QuickSort(A, start, pivotIndex - 1)
QuickSort(A, pivotIndex + 1, end)
The other big detail to flesh out is how we actually partition.
Partitioning (CLRS/Lomuto partition)
We'll start with the version in CLRS [24] (by Nico Lomuto). For the part of the array we're currently sorting, it uses the last element as its pivot and compares it against the other elements in the block being sorted, left to right. It will separate the array into two blocks. The first block gets elements that are smaller than or equal to the pivot, and the second block gets elements that are larger than the pivot. While it's running (i.e., while the Partition routine is executing), there's a third block, the elements we haven't looked at yet.
Technically, as CLRS notes, as the Partition procedure runs, each element falls into exactly one of four regions, some of which may be empty:
- Elements known to be smaller than or equal to the pivot (first block described above)
- Elements known to be greater than the pivot (second black described above)
- Elements not yet placed into either side of the partition because their comparative value to the pivot is not yet known (third block described above)
- The pivot itself
So when we start, the third region above (i.e., the third block) contains everything except the pivot:

So if the value we see is larger than the pivot, then it's a really simple case: we just extend the block of bigger elements in past the one we just saw. This is the case for the first two elements above, namely and . These elements are both greater than the pivot, ; thus, we extend the region colored in magenta:

If, however, the new value is smaller than or equal to the pivot, then we need to make room for the small partition (in ) to grow. We do that by swapping the new value with the first item in the partition of larger elements. Above, this means swapping with because
- is smaller than or equal to the pivot (i.e., ), and
- is the first element of the partition of larger elements
We can see this as follows:

This actually happens for the next two elements as well, and , because they are both smaller than or equal to our pivot:

The single swap we see when we encounter an element smaller than or equal to the pivot essentially slides the partition or "window" of larger elements over by one. But notice that this changes the relative order of elements in the large block. We can easily see this if we revisit the last three swaps in our example:

Note how the large block changes from to after encountering , then back to after encountering , and then again back to after encountering .
Shuffling those items isn't a big problem, but this means that quick sort is not a stable sort: if two elements have the same key, then quick sort might switch which one comes first.
After the partition completes, the pivot never needs to be moved again:

It has its location in the final sorted array. All of the elements to its left are less than or equal to it, and all of the elements to its right are strictly larger than it.
We can recursively sort the block of small items with smaller indices and then the block of larger items with larger indices. The following animation shows the rest of the sort:

Let's recap our partitioning strategy so far:
- Pivot is rightmost element
- Compare pivot against elements, left to right
- Block of small elements, then large elements, then untested elements
- Unstable sort: elements with equal keys might switch order
Worst case performance
If our data has almost all distinct values, in random order, then quicksort has good expected runtime. But rewatch the last part of the animation above — as soon as we finish recursively sorting the small items with smaller indices and move on to the larger items with larger indices, we see a case where quick sort does not work very well:

If the data is already sorted, then each time we pivot, we pick the largest item, compare it against all of the other items, and then sort again on all of the items except that large pivot. In the worst case, if we keep picking really bad pivots, then quick sort takes time.
Excluding the recursion, it's really good on space: it just needs a constant number of indices and room to swap elements. But if we consider the program call stack, the same worst case example above looks really bad. If we keep picking the largest element as the pivot and we always make our first recursive call on that first partition with almost all of the elements from our current subproblem, then we can end up needing linear depth for our program call stack.
Quick sort pseudocode (2)
To fix that, make the first recursive call on whichever partition has fewer elements:
QuickSort(A[], start, end)
if(start >= end) return
pivotIndex = Partition(A, start, end) # partition returns an index
if(pivotIndex - start < end - pivotIndex) # first partition has fewer elements
QuickSort(A, start, pivotIndex - 1)
QuickSort(A, pivotIndex + 1, end)
else # second partition has fewer elements
QuickSort(A, pivotIndex + 1, end)
QuickSort(A, start, pivotIndex - 1)
This will "procrastinate" the majority of our work until later, letting us empty the small subproblem from our program stack before we get to our real work.
Assuming that our compiler optimizes tail recursion, that will reduce our worst case program stack logarithmic depth.
Quick sort pseudocode (3)
Even if our compiler doesn't optimize tail recursion, we can. First, we manually make the recursive call to the subproblem with fewer elements. Next, because that second recursive call was the last thing that we needed to do, we can unwind the recursion, getting rid of the second recursion by putting the entire procedure inside of a loop:
QuickSort(A[], start, end)
while (start < end)
pivotIndex = Partition(A, start, end) # partition returns an index
if(pivotIndex - start < end - pivotIndex)
QuickSort(A, start, pivotIndex - 1)
start = pivotIndex + 1
else
QuickSort(A, pivotIndex + 1, end)
end = pivotIndex - 1
This will improve the worst case space requirements of the algorithm. If we're not quite sure what tail recursion is (a Stack Overflow post might be helpful too), then we don't need to worry about it too much. Just know the pseudocodes discussed above are iterations on improving the space requirement for quick sort, not the time requirement.
Let's recap what we've got so far for the worst case:
- Time:
- Space: references, each, excluding program stack, but the program stack grows to depth. space in uniform cost model, in logarithmic cost model.
- Make first recursive call on group with fewer elements. Worst case program stack depth; space in uniform cost model, in logarithmic cost model.
Expected performance
Above, we talked about worst-case performance, which is pretty bad. But what about average-case performance? Well, average over what exactly? What inputs do we expect to be given, and what order are they in? Generally, we don't know ahead of time.
To analyze expected behavior, first we'll assume the array doesn't have too many duplicate items, ideally none like in the following example array:

But the items above are almost perfectly sorted, which we previously saw was our worst case.
Randomized quick sort
So we need one minor change in the algorithm. Pick a random element from the block we're sorting as our pivot. We can use the same Partition code we used previously, but first swap a random pivot from the block we're sorting to the back of that block, where the Partition code uses it as a pivot.
For example, with the array above, when we first start, "the block we're sorting" is the entire array; hence, we pick any element at random in the entire array, say 12 in the example above, and swap it with the item at the back of the block, 59, and then carry on with our swaps as usual:

Now we need to recursively sort the lower block and then the higher block. Starting with the lower block, we pick an element at random from [-23, -18, -15, 9, -10], say -18, and we swap that element with -10:

The entire array gets sorted in this manner. The full animation of the sort is shown below, where the pivot used in each block sort is random:

Let's recap what we've discussed so far about randomized quick sort:
- Pick a random element from the partition as a pivot
- Swap that pivot into the last position of the block
- Run the rest of the partitioning just like before
Expected performance (continued)
Now, regardless of the initial order of the elements, we can talk about the expected performance of the algorithm over those random pivot choices. The analysis gets pretty rough, which is why it's broken out into its own entire section. But the expected runtime is with a logarithmic depth program stack. It's considered to be an in-place algorithm — it doesn't use a lot of extra space. CLRS calls this "randomized quick sort" and, in general, it's probably a good idea in real life to randomize our pivots.
Let's recap:
- Assumption: no (or few) duplicate values
- Expected runtime
- Expected stack depth;
Partitioning with duplicates
What if we don't want to make the assumption that there aren't many duplicates? Randomized or not, if everything has the same key, then we end up in a worst-case situation:

Even if we had some special-case code to check if all keys are equal, inputs where almost everything is equal, except a constant number of smaller values, will still take expected time. There will also be expected linear stack depth unless we call the subproblem with fewer elements first (e.g., the second and third pseudocodes discussed previously).
If we're really worried about duplicates, then we can modify the Partition procedure to partition into three blocks, so that after partitioning, we have items less than the pivot, items equal to the pivot, and then items greater than the pivot.
There's a trade-off here. Maybe for elements that are less than the pivot we still have only one comparison, but now we need a three-way rotation instead of a swap (this is remarked on in greater detail in the implementation of a more optimized version of randomized quick sort), and for other elements we have an extra comparison to check for equality. The payoff is that when we make our recursive calls, we get to ignore the entire block of elements equal to the pivot. Our previous worst-case input, where we had lots of duplicate values, took time, but having many duplicate values is now a best-case input, which would result in a linear runtime.
The animation below illustrates all of this (for the sake of clarity, the choice of pivot has not been randomized):

When our pivot is randomized, if one element is repeated a lot, then we have a higher chance to pick it and get rid of all of those values from our recursive subproblems.
Let's recap:
- Many duplicate values force expected runtime
- Instead of two blocks (, ), partition into three blocks (, , )
- expected and best-case runtime if there are no duplicates
- Duplicates decrease the runtime, best-case runtime
Optimizations
Hoare's partition
Let's now talk about some optimizations, starting with a different way to partition, that's much closer to Hoare's original method from somewhere around 1960. To switch things up, we'll use the middle element as the pivot value instead of the end:

That value tends to work well if the data is almost sorted when we start. We just copy its value and compare against that copy without needing to swap it to the last position.
This strategy will also start a block of small elements on the left, comparing the pivot value against values and expanding that block until it runs into something that is greater than or equal to the pivot:

The change is that next it starts the block of large elements on the right side of the block, comparing against elements with high indices, expanding that block until it runs into something smaller than or equal to the pivot:

After both of those progressions have stopped, we know that the lower index references an element not smaller than the pivot (i.e., in the example above), and the larger index references an element not larger than the pivot (i.e., in the example above). So we swap those two items (i.e., we swap and in our example above), extending the block on each side:

Then, we continue again from the lower index, repeating all of that until those two blocks meet:

This way of doing things can be summarized as follows:
- Use pivot from middle (not required)
- Partition of small elements on left (extend this partition until we run into a large element)
- Partition of large elements on right (extend this partition until you run into a small element)
- When both partitions are blocked, swap those elements and continue
Two things to note here. The first issue is that after we finish partitioning, the pivot might not be in its final location. It gets absorbed or swapped into one of the two blocks. It might not be on the boundary between the two blocks.
For example, this happened above in our example, where the pivot, , ended up in the right block and not on the boundary between the two blocks:

What this means is that when we make our two recursive calls, those two calls, combined, include all of the elements of our current block — they don't get to exclude the pivot.
The second issue is that no matter if we are looking at the block on the left or on the right, if we have a value equal to the pivot, then that value will stop that block from progressing:

And it will be swapped into the other block:

That can make us do some weird things, like above, where we swap a for a . We can see the rest of the sort play out in the animation below:

The observation of doing weird things like swapping a for a stands in sharp contrast to the intuition we gained in our previous approaches; specifically, our intuition previously was to let equal values stay in whatever block we were trying to extend, not stopping on those values, not swapping them, and that would let our partition finish as quickly as possible.
That was also the approach suggested by Hoare in one of his early papers. But why is that seemingly more optimized approach a mistake? Why stop each block on values equal to the pivot? Isn't it good to avoid having extra swaps? If we again consider extreme cases with one value duplicated for almost the entire array, it could be easy to just extend the partition from one side to include them all — that might finish partitioning without any swaps at all, but it can also bring us back to the worst-case performance of our first partition when we didn't worry about duplicates:

Instead, if we stop extending each block when we hit an equal value, then that extreme duplicate case works really nicely. The duplicate values should get divided nicely between the two partitions, and we don't end up with worst-case performance for the algorithm. We avoid that worst-case performance without any other changes needed.
It seems a little weird to call this an "optimization" over the other partition when this one came first (historically). But what makes it faster? Both partitions use about the same number of comparisons, but the Hoare partition uses a lot fewer swaps. The Lomuto partition will basically require one swap for each element less than or equal to the pivot. We expect about swaps to partition elements. Each swap in the Hoare partition causes both blocks to get bigger, expanding towards each other, so the most we can get is swaps, but randomly distributed distinct values will give about expected swaps, only one-third as many.
So if the Hoare partition came first, and it runs faster than on random data, because it needs fewer swaps, and it works better in the case of duplicate data, then why is the other partition (i.e., Lomuto) the one that's most commonly taught?
The Hoare partition is really easy to mess up, especially the edge cases (e.g., maybe because we don't automatically remove the pivot itself from the subproblems we call). The code might seem to work well for large problems with a random pivot, but when the subproblems get smaller, then it's easy for those edge cases to bite us. Even if our code is correct, it's harder to prove it.
Let's recap some of the observations above about the Hoare partition:
- Pivot gets included in one partition
- Values equal to the pivot can be split between partitions (no special block needed for duplicates)
- Random pivot on distinct values requires expected swaps (Lomuto expects about )
- Harder to code, teach, debug, and prove correct
Thinking about quick sort (pedagogical simulation)
Many things discussed above are all relatively low-level implementation details for partitions that don't really play into the big picture idea of quick sort. For David's class, if he wants students to simulate the default vanilla quick sort, then he lets students use the Lomuto partition, but he instructs them to pretend it's stable.
That is, pick the last element as the pivot, imagine that it gets compared to all of the other elements, left to right, and then the small elements slide left, the large elements slide right, and the pivot moves to its proper location, all without changing the relative order of elements within each group:

We then recurse on the small elements, then the large elements:

That makes it really easy to simulate quick sort, to think about it, and its recursion as a whole while letting us ignore lower level details of how swaps change the ordering of elements within the blocks for some particular partitioning algorithm.
Pivot picking
Beyond trying to decrease the number of swaps, like Hoare's partition does, we really do want to pick a good pivot. Randomizing the choice of pivot is good but still needs some reasonable chance to pick a bad pivot. And when we do pick a bad pivot, one of our subproblems will be almost as large as the one we started with.
So if we're partitioning 10,000 items, there's a 2% chance that one of our partitions is really big, 9,900 items or more. There's a 20% chance that we get a partition with at least 9,000 items. Maybe for that many items it's worth just a little more time to try to avoid a bad pivot.
If we pick three random elements and then use use the median of them as our pivot, then the odds of picking a bad pivot falls by a factor of over 33 for that extreme case. It helps us avoid really bad pivot picks. For large blocks, it's probably worth that minor overhead. Even without randomizing, consider finding the median from among the first, last, and middle element of the block. If the block is in random order, then that should work well. And if the block is somewhat sorted, then that middle element will be a good pivot.
Let's recap:
- Probability of picking a "bad" pivot for large :
- 2% chance of picking a pivot in top or bottom 1%
- 20% chance of picking a pivot in top or bottom 10%
- If we pick three random elements and take median as pivot:
- 0.0596% chance of picking a pivot in top or bottom 1%
- 5.6% chance of picking a pivot in top or bottom 10%
- For non-randomized version, consider using the median of the first, last, and middle items from a block
Insertion sort (small subproblems)
For small enough subproblems, insertion sort is faster than quick sort. Similar to what was mentioned for merge sort, once the block we are sorting in quick sort becomes small enough, we can just use insertion sort or we could leave those small problems unsorted and run insertion sort on the entire array after the rest of quick sort finishes. Caching issues supposedly make the first approach faster though.
Recap:
- Insertion sort runs faster than quick sort on small inputs
- Similar to merge sort — we can use it for small subproblems
- Unlike merge sort, we could instead skip the small subproblems and call insertion sort once after we are done with quick sort
- Insertion sort works well on "almost sorted" arrays
- The first approach may work better due to caching issues
Multiple pivots
We can also try using two or three pivots to partition into more than two subproblems. Sedgewick looked at this in the 70s, and it didn't look like it would outperform using just a single pivot. But new analysis and experiments since 2009 have shown that 2 or 3 pivots might be faster, and 2 pivots was actually used to sort primitives in Jave 7. Some papers say it's due to fewer swaps. Others think it's due to cache efficiencies. Ands others think it may be architecture dependent.
Recap:
- 2 or 3 Pivots
- 3 or 4 Subproblems
- Sedgewick, 1975: not worth it
- Yaroslavskiy, 2009: dual pivot is faster (experimentally)
- Papers in the 2010s...
- Fewer swaps?
- Caching efficiency?
- Architecture dependent?
Naive parallelization
We should also mention parallelism. When we have two recursive subproblems to handle, if we have multiple processors, then that's a really simple place we could run those two recursive calls in parallel, similar to what we saw in the naive parallel merge sort.
Simple parallelism would take quick sort down to expected linear time and logarithmic depth in the stack, but its worst-case performance wouldn't improve over a regular quick sort.
There is a more complex way to parallelize the partitioning, giving a polylogarithmic parallel quicksort.
Recap: Naive quick sort parallelization would give expected
- runtime
- stack depth
- space in the uniform cost model
- space in logarithmic cost model
by just running subproblems in parallel, but there is also a faster, more complicated parallel quick sort that parallelizes the partitioning.
Selection
Let's now cover a different problem that is so highly related to quick sort that it almost seems wrong to not consider: quick select.
Quick select is an algorithm for selection: given a set of numbers, it's easy to pick the maximum item or minimum item in linear time. But what if we want to select the median element or the th smallest element?
One possible approach would be to sort the numbers first. And then just take the correct one after they're sorted.
To recap the simple selection problem:
- Examples:
- Find the maximum element
- Find the minimum element
- Find the median element
- Find the th smallest element
- Possible approach:
- Sort the elements
- Take the correct one
Quick select
Imagine we take the sorting approach mentioned above. Futhermore, suppose we use quick sort as our sorting algorithm of choice. Below, imagine that we're looking for the 10th smallest element:

The 10th smallest element in the example array above, , is highlighted for the sake of clarity.
For such a small example input array, we can also just present the sorted array to confirm this before illustrating how quick sort figures into things:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[-45, -18, 10, 17, 19, 25, 25, 25, 33, 35, 40, 41, 44, 51, 55]
Of course, in general, we don't know ahead of time what the th smallest value will be. It just helps to have the element highlighted in order to make the animations clearer.
We use the regular version of quick sort, and we begin sorting the numbers by using the last element as the pivot, . Let's see how things work after the first partition:

It's business as usual: all values smaller than or equal to the pivot end up in the partition to the left of the pivot, and all values greater than the pivot end up in the partition to the right of the pivot. After we finish this first partitioning, quick sort expects us to recursively sort that partition of values less than or equal to the pivot, . But why bother with them? We know that there are 8 elements with value or less:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[17, 25, -45, -18, 19, 10, 25, 25, 51, 35, 40, 33, 44, 55, 41]
But we're looking for the 10th smallest value! So we should just ignore that entire left partition of smaller (or equal) items and just look at the items in the partition of larger items. Above, we can see is the second smallest element in the partition of larger items but still the 10th smallest overall.
Let's now recursively look in the partition of larger items:

As we see above, when we step into that block of larger items (i.e., from after the first partition), we pick the last element of the partition, , as our pivot, and we partition around that pivot, leaving us with the following arrangement of items after the partitioning:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[17, 25, -45, -18, 19, 10, 25, 25, 35, 40, 33, 41, 44, 55, 51]
We see that is the fourth smallest element from the block we just partitioned and the 12th smallest overall. Thus, the element we're looking for must come from the partition of values smaller than , namely , , . We thus recurse on that partition of smaller values, choosing as our pivot:

We see that our pivot ends up being the smallest element from the block we just partitioned and the 9th smallest overall:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[17, 25, -45, -18, 19, 10, 25, 25, 33, 40, 35, 41, 44, 55, 51]
The element we're looking for must come from the partition of values larger than , namely and . We recurse on that partition of larger values, choosing as our pivot:

After the partitioning, we see that our pivot ends up being the smallest element from the block we just partition, and its the 10th smallest overall:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[17, 25, -45, -18, 19, 10, 25, 25, 33, 35, 40, 41, 44, 55, 51]
This is what we wanted. We can finally return its value as our answer. Note how the final array above isn't close to being sorted. We have 17, 25, -45, ... as the start, and that is okay. The goal of quick select isn't to sort the array. It's to select some value in expected linear time using a sorting algorithm to help do so, where quick sort is well suited to the task with its partitioning scheme.
Let's recap: if we're using quick sort, and our pivot is
- less than the element we are looking for: ignore the partition of smaller elements
- larger than the element we are looking for: ignore the partition of bigger elements
- the element we are looking for: stop looking
Quick select pseudocode (1)
The pseudocode for quick select looks a lot like that for quick sort:
QuickSelect(A[], start, end, rank)
if(start == end) # implies rank == 1
return A[start]
pivotIndex = Partition(A, start, end) # Partition returns an index
if(pivotIndex == rank)
return A[rank]
if(pivotIndex < rank)
return QuickSelect(A, pivotIndex + 1, end, rank) # desired value resides in partition of larger values
else
return QuickSelect(A, start, pivotIndex - 1, rank) # desired value resides in partition of smaller values
But note that QuickSelect above has only one recursive call (i.e., to the partition we should search to find our desired value). Furthermore, QuickSelect needs one more parameter than QuickSort, namely rank, which specifies the "rankth" smallest value from the array A[start:end], where 1 <= rank <= end - start + 1. When start == end, we have end - start + 1 == 1, which means 1 <= rank <= 1, meaning rank == 1.
After partitioning, we'll need at most one recursive call to either the smaller or larger partition, unless we get lucky: if the rank of the pivot equals the rank we seek, then stop searching.
Quick select pseudocode (2)
Also, notice above that because there's only one recursive call, and it's tail recursion, we can use the same technique we used on quick sort to get rid of the tail recursion, giving us a version of quick select with no recursion:
QuickSelect(A[], start, end, rank)
while(start <= end)
if(start == end) # implies rank == 1
return A[start]
pivotIndex = Partition(A, start, end) # Partition returns an index
if(pivotIndex == rank)
return A[rank]
if(pivotIndex < rank)
start = pivotIndex + 1 # desired value resides in partition of larger values
else
end = pivotIndex - 1 # desired value resides in partition of smaller values
Conclusion
All of the quick sort partitioning algorithm choices are still present in both pseudocode implementations of quick select. We probably want to pick our pivot randomly (unlike what's presented in the pseudocode).
Quick select pseudocode (2)
Implementations
Quick sort pseudocode (1)
If we revisit the quick sort pseudocode (1) section, and then fill out the details for the Partition procedure (it's helpful to use [24] for this purpose), then we end up with pseudocode that looks something like the following:
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSort(A[], start, end)
if(start < end)
pivotIndex = Partition(A, start, end) # partition the subarray around the pivot, which ends up in A[pivotIndex]
QuickSort(A, start, pivotIndex - 1) # recursively sort the low side
QuickSort(A, pivotIndex + 1, end) # recursively sort the high side
The pseudocode above can be implemented almost exactly as above in Python:
Click "Run Code" to see the output here
Quick sort pseudocode (2)
If we revisit the quick sort pseudocode (2) section, where we made a first attempt at reducing the space requirement by first recursing on the smaller partition, then we get pseudocode that looks like the following, where the Partition procedure remains the same (i.e., the space requirement has now been improved a little bit, but the time requirement, largely dictated by the untouched Partition procedure, remains the same):
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSort(A[], start, end)
if(start >= end) return
pivotIndex = Partition(A, start, end) # partition returns an index
if(pivotIndex - start < end - pivotIndex) # first partition has fewer elements
QuickSort(A, start, pivotIndex - 1)
QuickSort(A, pivotIndex + 1, end)
else # second partition has fewer elements
QuickSort(A, pivotIndex + 1, end)
QuickSort(A, start, pivotIndex - 1)
We can implement the pseudocode above in Python as follows:
Click "Run Code" to see the output here
Quick sort pseudocode (3)
If we revisit the quick sort pseudocode (3) section, where we tried to further reduce the space requirement by eliminating the recursion ourselves, then we end up with pseudocode that looks something like the following (as before, the Partition procedure remains the same, meaning we're improving our space requirement but not our time requirement):
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSort(A[], start, end)
while (start < end)
pivotIndex = Partition(A, start, end) # partition returns an index
if(pivotIndex - start < end - pivotIndex)
QuickSort(A, start, pivotIndex - 1)
start = pivotIndex + 1
else
QuickSort(A, pivotIndex + 1, end)
end = pivotIndex - 1
We can implement the pseudocode above in Python as follows:
Click "Run Code" to see the output here
Randomized quick sort
If we revisit the randomized quick sort section, as well as CLRS [24], then we see choosing a random pivot at the outset results in updating our pseudocode in the following way:
RandomizedPartition(A[], start, end)
i = Random(start, end)
exchange A[end] with A[i]
return Partition(A[], start, end)
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSort(A[], start, end)
if(start < end)
pivotIndex = RandomizedPartition(A, start, end) # partition the subarray around the pivot, which ends up in A[pivotIndex]
QuickSort(A, start, pivotIndex - 1) # recursively sort the low side
QuickSort(A, pivotIndex + 1, end) # recursively sort the high side
Note that the Partition procedure has been left unchanged, but we now call the new RandomizedPartition procedure within QuickSort. It's worth noting the QuickSort layout in the pseudocode above reverts back to the first pseudocode for quick sort (i.e., no attempts at space optimization).
We can implement the pseudocode above in Python by using the randint method from Python's random module (i.e., random.randint(a, b) returns a random integer such that ):
Click "Run Code" to see the output here
Randomized quick sort (optimization with three partitions)
TBD
Quick sort with Hoare's partition
The following pseudocode is adapted from CLRS [24] and shows the original partitioning algorithm due to C. A. R. Hoare:
HoarePartition(A, start, end)
pivot = A[start]
i = start - 1
j = end + 1
while TRUE
repeat
j = j - 1
until A[j] <= x
repeat
i = i + 1
until A[i] >= x
if i < j
exchange A[i] with A[j]
else return j
The pseudocode above may be implemented in Pythom in the following way (the point of emphasis is the partition function, not the quicksort function, which may be implemented to use tail recursion or not):
Click "Run Code" to see the output here
Quick select (randomized)
If we look at the pseudocode in the first section on quick select, and we follow the advice to randomize our pivots, then we end up with pseudocode that looks something like the following:
RandomizedPartition(A[], start, end)
i = Random(start, end)
exchange A[end] with A[i]
return Partition(A[], start, end)
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSelect(A[], start, end, rank)
if(start == end) # implies rank == 1
return A[start]
pivotIndex = RandomizedPartition(A, start, end) # Partition returns an index
if(pivotIndex == rank)
return A[rank]
if(pivotIndex < rank)
return QuickSelect(A, pivotIndex + 1, end, rank) # desired value resides in partition of larger values
else
return QuickSelect(A, start, pivotIndex - 1, rank) # desired value resides in partition of smaller values
The pseudocode above can be implemented in Python in the following way:
Click "Run Code" to see the output here
Quick select (randomized with tail recursion)
If we look at the pseudocode in the second section on quick select, and we follow the advice to randomize our pivots, then we end up with pseudocode that looks something like the following:
RandomizedPartition(A[], start, end)
i = Random(start, end)
exchange A[end] with A[i]
return Partition(A[], start, end)
Partition(A[], start, end)
pivot = A[end] # the pivot
i = start - 1 # highest index into the low side (empty at first)
for j = start to end - 1 # process each element other than the pivot
if A[j] <= pivot # does element A[j] belong on the low side?
i = i + 1 # index of a new slot in the low side
exchange A[i] with A[j] # place element A[j] there
exchange A[i + 1] with A[end] # pivot goes just to the right of the low side
return i + 1 # new index of the pivot
QuickSelect(A[], start, end, rank)
while(start <= end)
if(start == end) # implies rank == 1
return A[start]
pivotIndex = Partition(A, start, end) # Partition returns an index
if(pivotIndex == rank)
return A[rank]
if(pivotIndex < rank)
start = pivotIndex + 1 # desired value resides in partition of larger values
else
end = pivotIndex - 1 # desired value resides in partition of smaller values
The pseudocode above can be implemented in Python in the following way:
Click "Run Code" to see the output here
Runtime analysis for quick sort and quick select
Introduction
We'll start by looking at the intuition behind and formal anlysis of quick select. Then we'll move on to quick sort.
Quick select analysis
In quick select, when we partition the array, it takes linear time, and afterwards we have one partition comprised of elements smaller than the pivot and one partition comprised of elements larger than the pivot. If one partition has 90% of the original set, and the number we're looking for is in that partition, it seems like we got a bit unlucky. We'd hope to do better than that.
What happens if we get unlucky like that every time? That is, if our pivot only eliminates 10% of the elements for each pass, then we get a simple recurrence relation:
Any of the three analytical methods we saw previously can be used to prove that this recurrence relation is linear.
We could guess that it's linear and use the substitution method, expand it using the recursion tree method, or use the master method:
- Substitution method:
- Inductive hypothesis: for ,
- True for
- Recursion tree:
- Prove by induction
- For ,
- Master Theorem:
- for , case 3,
Even if we're most pessimistic and imagine that we only get rid of 1% of the set each time we partition, changing our recurrence to :
- Substitution method:
- Inductive hypothesis: for ,
- True for
- Recursion tree:
- Prove by induction
- For ,
- Master Theorem:
- for , case 3,
Or maybe 1% of the set each 5 times we partition, changing our recurrence to :
- Substitution method:
- Inductive hypothesis: for ,
- True for
- Recursion tree:
- Prove by induction
- For ,
- Master Theorem:
- for , case 3,
We still get a linear runtime. It's just that the constants get worse, as seen above. That's a good intuition, but what if we want to make it a bit more formal and accurate?
Complexity of precision
We're assuming that we're looking for the th smallest item from distinct values when we perform quick select, where the first smallest item is the smallest item, and the th smallest item is the largest item, and all of the items are distinct.
Our runtime might then depend not only on but also on . Can we write down an accurate recurrence relation for that? It takes linear time to partition, and for any from to , there's an equal probability of picking the th smallest item from the array as our pivot. If is less than , then we need to recursively search all elements with rank larger than for the th smallest element. If equals , then we're done. And if is greater than , then we need to search all elements with rank less than for the th smallest. This gives us the following recurrence:
Maybe the formulation above is a bit too precise. It looks complex enough that we really don't want to solve it. We'll leave it as an exercise for the reader. Ha.
Formal analysis
Instead, we'll make some pessimistic assumptions that aren't quite as clumsy as the ones we started with. Imagine we're running the algorithm but against an adversary that gets to change the rank we're seeing after each time we partition, as long as it doesn't change it to something we've already discarded, but it can force us to always have to search the bigger partition (i.e., the adversary can tell us if the rank is larger or smaller than each pivot we pick):
- Pick a random pivot from distinct items
- Adversary won't let us finish until there is only one item left
- Adversary always picks the larger partition for us to recursively search
Thus, given distinct items, we still assume, that we pick a random item as our pivot, but unless there's just one item, we never get lucky and pick it, and we always have to recursively step into whichever partition has more elements. If our pivot has rank less than , then we need to search everything that's larger than it, and if the pivot has rank larger than , then we need to search everything smaller:
We can simplify some repeated terms, and the recurrence relation above is pessimistic but also more realistic than our first pessimistic recurrence relation. It allows for the possibility that we get a really bad partition like picking the absolute maximum value element when we're looking for something else, but that possibility is weighted by a realistic probability.
So how do we solve that recurrence relation? Well, this is a recurrence relation where we should be glad to have a guess to start. Guessing that it's linear, we use the substitution method, plug in for the arithmetic series, and see that linear works.
Specifically, the inductive hypothesis is that for , , and we have
which is true for .
The math above makes the simplifying assumption that is even, but the result holds for odd too.
Quick sort analysis
Let's start with the same intuitive approach we used for quick select. If we break exactly in the middle, we get a well-known recurrence relation that gives us order runtime:
But what if we go back to our assumption that we get a pivot 90% of the way through the block? We can use the substitution method to show that that gives an order relation:
- What if we pick 90% pivot each pass?
- Substitution method:
- Inductive hypothesis: for ,
- True for
But already that's pretty ugly looking. We could also prove that with the Akra-Bazzi extension of the Master Theorem.
Original approach
If we want more precision, then, unlike quick select, we only have one parameter here. So let's again go back to write out the recurrence relation, assuming that we pick the th smallest element as our pivot. And we'll assume that we're using the Lomuto partition on distinct elements. We'll have two recursive calls, one on the elements smaller than the pivot and one on the elements larger than the pivot:
Not very pretty. The above was the analysis in the first edition of CLRS. A much more graceful analysis has sense been used.
Below is a really small array:

And instead of asking how long it takes to run, let's ask a much more limited question: if we run randomized quick sort, what's the probability that 3 will get compared to 17? If we happen to pick 9 as our first pivot, then it will separate 3 and 17 into different partitions, and then during the entire rest of the sort they won't get compared:

Similarly, if we had chosen 7, 8, or 15, then 3 and 17 also wouldn't ever get compared. Only if we choose 3 or 17 as our first pivot will they get compared to each other. Because 3 and 17 are the minimum and maximum items, we will know after picking just one pivot if those values will get compared. If one of them is chosen first, then they get compared. Or if one of the four values between them gets chosen first, then they don't get compared.
There's a chance they get compared because there are four values between them.
Let's recap — for randomized quick sort on the small example array above, what is the probability that 3 gets compared to 17?
- If we pick as our first pivot, then and will never be compared.
- Similarly, if , , or is our first pivot, then and will never be compared.
- If we pick or as our first pivot, then and will be compared.
- Four items between the two extreme values:
- chance they do not get compared.
- chance they get compared.
Now let's consider a bigger array:

To simplify it, the array holds integers 1 through 20, inclusive. What's the probability of comparing the number 2 against the number 7 during quick sort? To make it a bit easier to see what's going on, we'll add a second view of the same values but in sorted order:

Like before, there are four values between 2 and 7 in the sorted order, but now we have a lot more values that we can pick as a pivot. If we happen to pick 11 as our first pivot, then it gets compared to everything, and both 2 and 7 fall into the partition of smaller elements. Unlike before, we picked our first pivot, but we still don't know if they're ever going to get compared to each other or not:

The array after the first partition is as follows:

Stepping into that partition on the left, if we pick 1 as our next pivot, then it gets compared to the numbers 2 to 10, but both 7 and 2 are larger than the pivot, 1, so they both fall into the same partition again:

The partition containing 7 and 2 after this second partitioning still needs to be sorted, and we still don't know if they are going to be compared:

As long as we keep picking pivots either larger than 7 or smaller than 2, then 7 and 2 will both fall into the same partition, and we won't know yet if we're going to compare them. But, at some point, we'll pick a pivot of 2, 7, or something between them. If we pick 2 or 7 before anything between them, then they get compared to each other, and if we pick something between them first, then they get separated into different partitions, and they won't ever be compared (below, 7 is picked):

The result after this third partitioning, where we use 7 as the pivot, shows how 7 and 2 get compared and are then separated thereafter:

Even though everything above looks different than the first small array with just values, the probably that 2 and 7 get compared is still . The added values larger than 7, or smaller than 2, don't change that probability of comparing those two items. The important thing is that there are four values between them.
Let's recap our observations from using this bigger array — what is the probability that (non-extreme) elements, 2 and 7, get compared?
- For pivot , both fall into the same partition, maybe compared later.
- For pivot , both fall into the same partition, maybe compared later.
- Eventually, , , or something between them will be chosen as a pivot.
- If or is chosen before , for , they get compared.
- If , for , is chosen before or , they do not get compared.
- Four items between the two chosen values:
- chance they do not get compared.
- chance they get compared.
More generally, we can calculate the probability that the th smallest value is ever compared to the th smallest value. If we want to know how many comparisons take place during the entire algorithm, then we can sum that up over all possible and values. That might look a little complex but not compared to the original recurrence relation we were considering.
We manipulate our sum to get the inner summation to be the harmonic series, and it works out to order comparisons. Because quick sort takes time proportional to the number of comparisons it performs, this sorting algorithm gives us an order expected runtime for quick sort.
In summary:
- For , there are elements from the th to the th smallest
- Probability th smallest element is compared to the th smallest element:
- Over all pairs, the expected number of comparisons is given by the following:
Lower bounds for comparison based sorting - decision trees
Introduction
This section will be about lower bounds for comparison-based sorting, using decision trees. We should have already seen some of the following basic comparison-based sorting algorithms:
- Bubble sort
- Selection sort
- Insertion sort
- Merge sort
- Heap sort
- Quick sort
We'll use insertion sort the most. We'll see what a comparison-based sort is, and we'll specifically look at how it acts on elements. We'll also introduce decision trees and how they relate back to the algorithm, and then show trees for each of the comparison-based sorts listed above. We use those trees to make a lower bound argument for sorting items, and then generalize that to prove a lower bound for sorting items.
Sorting (n square to n log n to ?)
Let's say that the first sorting algorithm we learn is insertion sort. It seems great at first. But then we learn merge sort, and we realize that insertion sort isn't that great. Insertion sort's worst-case runtime doesn't look so good once we know that we can do better and sort the same array in using merge sort. But is good? Well, can we do better?
Insertion sort
Here's the insertion sort algorithm in pseudocode:
InsertionSort(array A)
for(i = 1 to A.length - 1)
tmp = A[i]
j = i
while(j > 0 && tmp < A[j - 1])
A[j] = A[j - 1]
j--
A[j] = tmp
Let's consider running the algorithm above on an array with just three items:
InsertionSort(array A[0..2])
for(i = 1 to 2)
tmp = A[i]
j = i
while(j > 0 && tmp < A[j - 1])
A[j] = A[j - 1]
j--
A[j] = tmp
If the code above is running, then the outcomes of the comparisons between elements from A tell us everything there is to know about what is happening in the code. So, if the first comparison comes out true (i.e., line 5), then the first two items of the array will get swapped; otherwise, they won't get swapped. Either way, we won't enter that while loop a second time, until we are in the second pass of the outer for loop.
An ordered list of true/false outcomes for the comparisons tells us everything that there is to know about what happens in the code, even without a list of what was compared. The first comparison is between the first two elements, and the second comparison is between the third element and which of the first two elements is smaller, assuming they are different. If that second comparison comes out false, then there isn't a third comparison. If the first two comparisons are false, then the numbers given to us were already in increasing order.
This is what we're talking about when we say a comparison-based sorting algorithm. The important, data-driven forks in the code happen because two different elements are compared against each other, and the outcome of that comparison determines what will happen next in the code. Insertion, bubble, selection, heap, merge, and quick sorts are all comparison-based sorting algorithms. And there are more. They are general sorting algorithms that can sort any inputs as long as input values can be compared to each other.
With just items, we know that we go through that outer loop twice. Let's get rid of that loop:
InsertionSort(array A[0..2])
i = 1
tmp = A[i]
j = i
while(j > 0 && tmp < A[j - 1])
A[j] = A[j - 1]
j--
A[j] = tmp
i = 2
tmp = A[i]
j = i
while(j > 0 && tmp < A[j - 1])
A[j] = A[j - 1]
j--
A[j] = tmp
For the comparisons inside the while loop, what happens depends on the outcome of those comparisons. Let's get rid of those loops too, while keeping the same comparisons between the same elements in the same order, but the rest of the code looks different:
InsertionSort(array A[0..2])
x = A[0], y = A[1], z = A[2]
if(x <= y)
if(y <= z)
A[0] = x, A[1] = y, A[2] = z x <= y <= z
else
if(x <= z)
A[0] = x, A[1] = z, A[2] = y x <= z < y
else
A[0] = z, A[1] = x, A[2] = y z < x <= y
else
if(x <= z)
A[0] = y, A[1] = x, A[2] = z y < x <= z
else
if(y <= z)
A[0] = y, A[1] = z, A[2] = x y <= z <= x
else
A[0] = z, A[1] = y, A[2] = x z < y < x
The code above is ugly, but we could write it once we we know if/else statements without even knowing loops. It has the original comparisons, and they will be executed in the same order as the original code, starting with the first two items. But all comparisons are now written explicitly.
If we consider running insertion sort on [1, 3, 2], then we can see the three comparisons the code will perform:
InsertionSort(array A = [1, 3, 2])
x = 1, y = 3, z = 2
if(1 <= 3)
if(3 <= 2)
A[0] = x, A[1] = y, A[2] = z x <= y <= z
else
if(1 <= 2)
A[0] = 1, A[1] = 2, A[2] = 3 1 <= 2 < 3
else
A[0] = z, A[1] = x, A[2] = y z < x <= y
else
if(x <= z)
A[0] = y, A[1] = x, A[2] = z y < x <= z
else
if(y <= z)
A[0] = y, A[1] = z, A[2] = x y <= z <= x
else
A[0] = z, A[1] = y, A[2] = x z < y < x
Like any comparison-based sort, it's the relative sizes of the inputs that are important, not the absolute values. So, insertion sort will go through the same steps to sort [11, 13, 12] as it does for [1, 3, 2]. The problem here is that this is just a horrific way to see what's happening. So let's introduce decision trees.
Decision tree for insertion sort
Below is a decision tree for insertion sort on items:

It might look like some kind of data structure, but it isn't. It's just a more visual representation of which elements get compared by the code. The non-leaf nodes of the decision tree represent the comparisons, and the two children for a node represent the two possible outcomes of its comparison. The leaves represent final outcomes for the program.
So we can again look at what happens on input [1, 3, 2], and see which leaf we get to:

The path above, from the decision tree root to a leaf, represents running the algorithm on some particular set of data. Each inner node on the path represents a comparison that the algorithm executes, in that order. The length of the path is the number of comparisons the algorithm executes for that input.
It's easy to see above that while [1, 3, 2] has comparisons, other inputs like [2, 1, 3], would only have two.
Other decision trees
Different sorting algorithms have different decision trees. We just saw the decision tree for insertion sort:

Let's look at the decision tree for bubble sort:

We see from the decision tree above that bubble sort always performs comparisons, although for two possible inputs, the last comparison doesn't tell us anything we didn't already know:

Let's consider now the decision tree for selection sort:

Selection sort is only broadly defined in [24], but for our implementation (looping from front, select min element, swap to front), we have another occurrence of making a comparison that doesn't tell us something we don't already know:

We also get this weird 7th outcome, where if the first two elements are equal but larger than the third, it sorts but unstably puts the second item before the first:

Weird stuff isn't exclusive to quadratic sorts. Heap sort has two of those useless comparisons, a bunch of unstable outputs:

And the strange leaf node with two labels, x <= z < y and x < z = y, we hit if the first value is strictly less than the second, and the third is somewhere between those two, but if all three are equal, then we go to a completely different node (far right leaf node with label y <= z <= x).
We wouldn't choose to make redundant comparisons if we were making a decision tree for items from scratch, but to write code that cleanly works on any number of inputs using natural operations on a heap, some comparisons can be duplicated.
Let's now consider the decision tree for merge sort:

The tree above is the decision tree for both top-down and bottom-up merge sort. Those are different algorithms, with different trees for or more elements, but for or less, they have the same trees. Meanwhile, for just items, if we are only doing one comparison, then there are pretty much only possible trees, depending on if we are stable on equal items or not.
Finally, let's look at the decision tree for quick sort:

We can see its instability in the output above (leaf node on bottom with label x < y <= x).
Bound for 3 items
Let's recall all of the decision trees remarked on above:

Some of these trees always take comparisons, while others sometimes use and other times use . The real question is whether or not we can make a decision tree with at most comparisons. And the answer is no.
Each of the trees above has at least leaf nodes, because if we give the program distinct values like 1, 2, and 3, then there are 6 possible permutations that we can use to order the input. Each comparison has at most children, for the possible outcomes of the comparison. If we try to make a binary tree, with at most comparisons, then it gives us a tree with height at most , with at most leaves. To get permutations into leaves, at least permutations must go to one leaf, and the algorithm will do the exact same thing for both of those permutations. If we reorder different permutations in the same way such as, "keep the first element first but swap the other two," at most one of those two permutations can be sorted afterwards. So every permutation needs to get to its own leaf. It boils down to the problem that is not less than or equal to . So no matter what our comparison-based sorting algorithm is, for items, its decision tree needs height at least , and some inputs will require comparisons.
Let's recap these seemingly dense observations — to sort items:
- distinct inputs have permutations
- At most comparisons: at most possible different outcomes
- No two permutations can be sorted by getting to the same outcome leaf
- (proof left as an exercise)
- With height , each permutation cannot have its own outcome.
- We need at least comparisons (worst case) to sort items.
Bound for n items
Finally the finale: we don't want to sort just items, we want to sort items. Great. Assume that we have some comparison-based sorting algorithm for sorting items. Instead of just permutations, items can come in permutations, so the decision tree for the algorithm will have to have at least leaves. If the decision tree's height is , then it will have at most leaves. So, has to be at least , and has to be at least order .
If the tree has height order , that means that, in the worst case, the decision tree makes order comparisons, so the algorithm takes at least time; that is, no matter how clever it might be, any comparison-based sorting algorithm will take at least runtime in the worst case.
That's it. That's the whole argument. It doesn't prove that comparisons are sufficient, it just proves it as a lower bound. Of course, we know that order comparisons are sufficient, because merge sort and heap sort both follown that bound. Consequently, we won't find a general sorting algorithm asymptotically faster than them.
Let's recap — how about a height tree to sort items:
- distinct inputs have permutations
- At most comparisons: at most possible different outcomes
- Need large enough to make
- Need
- Every comparison-based sorting algorithm needs worst case time
- This doesn't prove that comparisons are sufficient
- Merge sort and heap sort prove comparisons are sufficient
- Comparison-based sorts cannot asymptotically beat merge or heap sort
Linear time sorting - counting sort, radix sort, and bucket sort
Introduction
At this point, we should have seen a few general-purpose comparison-based sorting algorithms. For example:
We should have also seen an argument for why comparison-based sorting algorithms cannot beat for worst-case runtime.
The three algorithms we're about to discuss, namely counting sort, radix sort, and bucket sort, run faster, but they aren't comparison-based.
To use the algorithms non-comparison-based algorithms referenced above, we need to know something more about the keys we will be sorting. Exploiting that extra knowledge lets us sort more quickly, but it doesn't work in all cases like a general, comparison-based sort.
We will start with counting sort, followed by radix sort, which frequently uses counting sort. Bucket sort is third, and we will also see how the ideas of bucket sort and counting sort kind of overlap, even if they are different algorithms.
Counting sort
Counting sort applies when the keys come from a small set of integers, maybe ranging from 0 up to some constant (e.g., input from a small range of integers, through ).
For example, consider the following array of integers, where integer values range from to , inclusive:

For the simple case above, we start by just counting the number of each key that we have. Assuming that we index arrays from , we can just use each key to increment that counter:

As shown above, once we finish counting the number of keys for each integer, we already have an idea of what the final sorted array will look like:
That is, the final sorted array starts with five items of key 0, one item of key 1, one item of key 2, four items of key 3, and finally four items of key 4.
We will make a new array for our sorted list:

If we just wanted a list of keys in sorted order, then we could use the counter information to fill the empty array above: just loop through the counters, and write that many of each key: write five 0s, then one 1, one 2, four 3s, and four 4s, and then we're done:

The problem with that approach is that even if we have integer keys, we generally aren't sorting integers; we're sorting more complex things, or objects, which happen to have integer key values. We don't want to write down the number 0 in that first slot — we actually want to have a reference to the object with key value 0.
Let's go back to right after we finished counting:


The rest of the algorithm is a way to do those reference copies cleanly and efficiently. Now, for the counting array above, the 5 just stands for the number 5, a count of how many items have key 0. It wasn't copied from anywhere — it was just incremented from the value 4. Using the counting sort version in CLRS [24], the next step is to just replace each counter with the sum of all counters up to that one, by iterating over the counting array, left to right, and adding the previous value to each count:

The transformations above leave us with the following counting array:
Now, that value 7 in the 2 slot means that there are 7 values in the array of 2 or less, and we know that they will go into indices 0 to 6, in the sorted array. The elements with key value 3 will end up in indices 7 to 10. Each position holds one index place past where the last object with that value will go in the array. This off-by-one nature is because we're assuming that we index arrays from 0. If, instead, we indexed from 1, then the number in each key slot would match the last position.
So the last part of the algorithm goes through the original, unsorted array, in reverse order:

It looks up the key, in this case 0 (at index position 14), and looks at the array of counters. It sees a 5 in the count array, so it knows that object with key 0, the last one, belongs in position 5 - 1 = 4. In the animation below, when we see the key moving from the original, unsorted array, that is a copying of the actual reference, it isn't just the number. So the algorithm decrements the 0 counter from the counts array (from 5 to 4), and then uses that value as an index where the reference gets copied:

That decrement is important. The highest indexed key 0 object went into index 4, but there are more objects with key 0, and we don't want to put them in the same place. When we place that last 0 key object, we modified the index so that now we are ready to place the second highest indexed 0 key object.
We continue with the previous item from the original array, with key value 4, copy it into the sorted array:

And then we run into another object with key 0. It's a good thing we decremented our 0 counter because now we don't overwrite that first object we copied down:

Great. We continue on, through all of the original objects, looking up their keys, using that key to look up and decrement the appropriate slot in the counter array, which now stores an index, and using it to give a position to copy the object in the sorted array.

Everything slots in there nicely, and when we finish, now that the counters have all been decremented, they hold the first index of each key type:

The first 0 key object is in index 0, the first 3 key object is in index 7, and so on.
Let's recap what we've discussed of counting sort so far (assuming we have an input from a small range of integers, to ):
- Loop through items, incrementing appropriate counter for each
- Loop through count, adding previous value, to get summation of counts
- Loop through items in reverse. For each item:
- Use item key to index the count array
- Decrement that count array value
- Use decremented value as array index to copy item into sorted array
Stability
Here's a question: why did we walk through the original array in reverse order? If we had walked through it left to right, but had done the same counter lookup, decrement, and copy to that position, that last array would still be sorted. This brings us to the concept of stability in sorting.
We say that a sort is stable if, for two objects with the same key value, they don't swap orders. So, if there is a tie, we don't get one thing jumping in front of something that it was behind. The original order is the tie-breaker.
A different version of counting sort doesn't need the third array (i.e., the sorted array we come up with separately), so it uses less extra space, only linear space in the number of counters (we just overwrite the original array), but that version of counting sort isn't stable. We prefer our sorts to be stable — it's not required, but it's preferred and we'll see why shortly.
By traveling through the unsorted array from the back to the front, while decrementing the counters, the sort is stable. If we went front to back, our sort would be anti-stable, everything would always flip orders in case of a tie. Now, it wouldn't be too hard to make a variant where, before we started copying values, we get the counter array to look like it looked above, namely
where each counter holds the index of the first position of objects with that key. If we did that, then we could go through the original array, left to right, incrementing counters as we use them, instead of going right to left, decrementing them before we use them. But the decrementing version probably has one less step of manipulation of those counters, so it's marginally faster.
Counting sort performance
Great. So what's the runtime? Well, we allocate and initialize counters if the numbers go from to . That takes time linear in . We step through the unsorted array in both the first and third step, and that takes time linear in the size of the array. And we accumulate all of those counters in the second step, which takes time linear in the range again. Thus, it takes time linear in the size of the input plus the range: time and space. If the range is a constant, that is linear in the input size. Even if the range is, say, linear in the input size, so we are sorting records with value to , or , it still takes time (and space) linear in the size of the input.
While we can say that counting sort takes linear time, if the range is bigger than the size of our input, then it starts to get a bit impractical. Thus, if we're sorting numbers up to , it still officially takes linear time and space, but that's a lot of space. We know some order sorts, with pretty small hidden constant terms, and has to be pretty big before starts comparing to , so we really don't want to use counting sort in practice if the range is that big.
Radix sort
Radix sort helps us extend the limiting range encountered in counting sort a bit more practically. It's a pretty old sorting technique. How old? Something like radix sort was used to sort punched cards well over 100 years ago. The intuition for radix sort is that we can run counting sort on each digit of the input separately.
Suppose we're sorting numbers from to :

Then we could first sort them one digit at a time. Each digit just goes from 0 to 9, so counting sort would work great.
Most significant digit first
We'll start with the "most significant digit first" version, and we'll see the least significant digit version from CLRS [24] after that.
If we are first sorting by the 100s digit, then we will need to pad numbers less than 100 with a 0 in front. That is, suppose we take a look at the values in indices 0 through 5 in the array pictured previously:
What would the key value be for the first element, , when we're sorting first by the 100s digit? The number doesn't even have a 100s digit. Thus, we need to pad it with a . If we pad values appropriately, then we can highlight the key value as the first digit for each number (i.e., the 100s digit):
For the sample above, our count array would look as follows:
Returning to our original array, we first note the value contained at each index:

Then we note the corresponding key value based off the digit in the 100s place for each value (just like above):

We use counting sort as we did previously, counting the number of times each key value occurs:

Once we've made a full pass for the frequency count, we adjust the counts array (so we can use decrements to construct the stable sorted array of items by the 100s place):

Once the counts array is adjusted, as above, then we can construct the sorted array by the 100s place:

The outcome is a new list of values, where all values are sorted by the 100s digit, lowest to highest:

The following table summarizes our findings nicely from our first use of counting sort on the hundreds digit:
| Hundreds digit | Values | Count |
|---|---|---|
0 | 19, 57 | 2 |
1 | 158, 122, 166 | 3 |
2 | 223 | 1 |
3 | 339 | 1 |
4 | 438, 470, 468, 443, 413, 444 | 6 |
5 | 523 | 1 |
6 | 626, 621 | 2 |
7 | 730, 773, 717, 711 | 4 |
8 | 808, 849, 804, 854, 840 | 5 |
9 | 0 |
To recap, we run counting sort on the hundreds digit, and after that finishes, we have all of the numbers, in blocks, sorted by their hundreds digit (as shown in the picture above). We would then want to sort each of those blocks independently by their tens digit.
That's a good amount of bookkeeping. Probably the easiest way to do it is to use recursion to organize everything. This is kind of like quick sort, except here we might have 10 recursive blocks to sort instead of just 2. So, on the numbers just obtained by our first use of counting sort, we would sort the first two numbers under 100, 19 and 57, by their tens digit. And then the next three numbers in the 100s, 158, 122, 166. And eventually five numbers in the 800s: 808, 849, 804, 854, 840. There aren't any 900s here, so we could skip that call, and any of the recursive calls it would have made.
Each recursive call to sort by the tens digit will then make recursive calls to sort by the units digit, though we can just sort by insertion sort once a subproblem is small enough.
This version is really helpful if we are sorting lexicographically, which is a generalization of alphabetical or dictionary order for strings. If we are dealing with strings, then they're not really digits, but we can map each character to an integer, use counting sort to sort by first character, and then recursively sort strings that start with the same character. For strings, we don't need to fill in leading values to buffer them to be the same length. This approach relates to radix tree and trie data structures.
Let's recap our radix sort approach using the most significant digit first:
- Sort by most significant digit with linear time counting sort (pad front with
0's if it has fewer digits) - Recursively sort each value's block by next most significant digit
- Optimization: once recursive blocks get small enough, use insertion sort
- Works best for lexicographical ordering
- Can sort by characters instead of digits
- No padding needed to make strings equal lengths
- Related to radix trees and tries
Least significant digit first
If we are sorting numbers (i.e., instead of strings), then the other choice is to actually sort by the least significant digit first, and then not worry about separating blocks for subproblems.
We run counting sort, and we use the units digit as our key:

After making a first pass with counting sort, using the units digit as our key, we have the list sorted by units digit:

But the difference from before is that now we take that sorted list, ignore the fact that we sorted it already (by the units digit), and then sort the entire list again by the tens place; that is, we reset the count array

and then also reset the key value for each number to reflect its digit in the tens place:

We then proceed just as we did before (counting occurrence of each key, incrementing count array, then copying to sorted array by decrementing):

We then do the same thing for the 100s digit:

The outcome is then the final sorted array:

In the animations above, we reuse the sorted array after the first pass instead of allocating it fresh each time. Why bother sorting by the units digit and then the tens digit just to re-sort it all? Well, 700 anything is bigger than 500 anything, so ignoring lower digits is fine. The only time the other digits come into play is if two values have the same 100s digit. If we're sorting two numbers in the 500s, then we want 580s to come before 590s. But we just finished running counting sort on the tens digit, so all 80s were before all 90s, and counting sort is stable, so it will leave 580s before 590s.
For 793 and 799 in our example array, the units digit sort put the 3 before the 9, and both the 10s and 100s digit sorts are stable, so 793 will end up before 799.
It's important that counting sort is stable.
Let's recap what we've seen so far for this version of counting sort (i.e., using the least significant digit first):
- Sort list by least significant digit with linear time (stable) counting sort
- Sort entire list again by next least significant digit
- Repeat on increasingly significant digits until we run out of digits
- Stability: equal keys will be sorted by less significant (previously sorted) digits
We don't have to treat numbers as decimal, base 10 numbers when we run radix sort. We can treat the number has having base , or radix , with digits. We can do counting sort passes over the number, sorting over to on each pass. Note that doesn't even have to be a constant: if we are sorting numbers to , then we can treat those numbers as base numbers, and do passes over them to sort, in linear total time. That example shows how radix sort expands the class of inputs that we can sort in linear time, but it's still going to be best at handling something like limited integers or strings, so it's still not as general a sort as, say, quick sort. It also uses extra space compared to quick sort, which could limit its performance for large enough inputs.
In short:
- Can use any radix (base): , , , characters, etc.
- For passes, takes time
- Can even use base
Bucket sort
This finally brings us to our next linear time sort: bucket sort. Bucket sort doesn't assume that the input is an integer, but it does assume that it is uniformly distributed. So, in its most basic form, we assume that all of the inputs are randomly and uniformly chosen from real numbers over some range.
In the animations below, we'll assume that the numbers are distributed from 0 up to 100 (the following numbers are floating point numbers, and there's a total of twenty of them):

So what we do is that if we have items to sort (i.e., above), then we break our range into different equal-sized parts, and we treat each part as a bucket for any numbers in that subrange:

We'll use linked lists to store the elements in each bucket:

Thus, based on the value, we can just jump right to the correct bucket, and append that entry onto the end of that linked list in constant time per entry:

When that finishes, how should the buckets look? Because the numbers were uniformly distributed, if we have a lot of items, then we should expect over of the buckets to be empty, over of the buckets to have just one item, and almost of the buckets to have two items. Over 90% of the items are in buckets of size or less.
More generally, as grows large, we expect approximately:
- buckets with items
- buckets with item
- buckets with items
- buckets with items
- buckets with (constant) items
Said another way, as grows large, we expect approximately:
- Over 36% of items in buckets with 1 item
- Over 73% of items in buckets with 2 or fewer items
- Over 91% of items in buckets with 3 or fewer items
- Over 98% of items in buckets with 4 or fewer items
- Over 99.5% of items in buckets with 5 or fewer items
Returning to our consideration of what happens once each item is placed in its appropriate bucket, it should be clear that each bucket is unsorted, but everything in one bucket goes before anything in the next bucket. Thus, we can sort the buckets independently with insertion sort. Even though insertion sort is officially a quadratic sort, it's really efficient for sorting small sets. So we use insertion sort to sort every bucket (below, the first notable change in order occurs for the bucket with label 55):

A precise performance analysis may be found in CLRS [24], but it shouldn't be a shocking outcome that the sum of the square of the expected distribution of bucket sizes is linear, since almost all of the items are in buckets size or less, and the expected number of buckets of each size just keeps decreasing as the size gets bigger.
Oncee we have each bucket sorted individually, we can just go through them and copy them back into our original array. Below, we show the linked lists being concatenated to each other instead:

A third option is to combine the lists without sorting each one individually, and then run insertion sort on the combined list, which is almost sorted. Because the combined list is almost sorted, insertion sort will finish the job quickly.
Each of the options above takes expected linear total time.
Quick recap of bucket sort:
- Useful when data is randomly chosen from a uniform distribution
- To sort values, break range into equal "buckets"
- Use value to compute index of its bucket in constant time
- Sort each bucket with insertion sort or once after appending buckets
- space and expected run time total
Broken lower bound?
So, it is fine to say that these sorts run in linear time, but we should reconcile that with the lower bound on comparison-based sorts. So what happens here? When we look at a number, in constant time, we can figure out which bucket it goes into. There are buckets, but in constant time we can pick one of them, because we use the numeric value to find an index key for the bucket (in constant time, bubble sort has "-way forking"). Unlike comparison-based sorts, that isn't a two-way fork, less than or greater than. In constant time, we fork ways. With a decision tree using two-way forks, to get to the possible permutations of items, the tree would need order depth to have enough leaves in the tree for all of the permutations. But with an -way fork, is order , only giving us a linear lower bound on the runtime. We get a logarithmic speed up in the lower bound. Of course, we can't really sort without looking at the data, so we always expect a linear lower bound to sort with just one processor.
Counting sort also uses a bigger fork to beat the lower bound, and, of course, radix sort uses counting sort.
To summarize:
- In constant time, bucket sort has -way forking
- Counting sort on values uses -way forking
- Only possible inputs, not
Bucket sort for counting sort input
One last thing to mention: if we have an input that looks like counting sort applies, then we can still use bucket sort, with one bucket per key value. Remember, for counting sort, we know the values are just in some fixed range of integers, but it doesn't make any assumptions about distribution like bucket sort does. So now we have all these unsorted buckets, but everything in one bucket goes before anything in the next bucket, and because all entries in a bucket have the same key, we don't need to sort each bucket afterwards. Each bucket is already sorted with just one value. Even if we do run insertion sort, it runs in linear time if all inputs have the same key anyway. So everything finishes in linear time.
We can see all of this unfold in the animation below:

To recap:
- One bucket per key value
- Items might be distributed unevenly
- No need to sort each bucket, all values for a bucket have the same key
- (even for the worst case) for range
LeetCode implementations
The code in each of the following playgrounds can be used to successfuly pass/solve the following LeetCode problem: LC 912. Sort an Array.
Merge sort
Top-down merge sort:
Click "Run Code" to see the output here
Bottom-up merge sort:
Click "Run Code" to see the output here
