Asymptotic notation and analysis, recurrence relations, and random background math topics (with attitude)
This post explores asymptotic notation and analysis, namely big- notation and a variety of related properties. This sets the stage for discussing recurrence relations and ultimately the Master Theorem for analyzing the time complexity of recurrence relations. Finally, we take a look at a few isolated math topics that will proof fruitful for future analyses.
The notes below come from the Algorithms with Attitude YouTube channel, specifically the following playlists: Asymptotic Notation and Analysis, Recurrence Relations, and Random Background Math Topics.
There are several sections in this post, accessible in the table of contents on the right, but most of the sections have been listed below for ease of reference and navigation.
- Asymptotic notation and analysis
- Definitions
- Runtime
- Simple categories
- Goals
- Big-O definition
- Formal definition of big-O
- Example 1
- Example 2
- Simplification of functions
- Precision of use
- Example 2 revisited
- Going from big-O to little-o
- Little-o definition
- Example 2 for little o
- Five asymptotic sets
- Five asymptotic sets (definitions)
- Historical notation
- Analogy
- Example 3
- Proofs, properties, and pictures
- Runtime of simple programs and loops
- Usage of asymptotic notation
- Definitions
- Recurrence relations
- Math topics
- Epilogue
Asymptotic notation and analysis
Definitions
Runtime
For a given program, we'd like to know how runtime will change compared to the input size:
But there are some issues here:
- What machine are you running the program on?
- What compiler are you using?
- How much precision is needed?
These issues make a precise discussion concerning runtime incredibly difficult. Unless we introduce some simplifications. Asymptotic notation is going to be an incredibly useful simplification. It will also make it possible for us to talk about the efficiency of algorithms or pseudocode. Maybe there we'll count clock cycles, total operations, or just a few specific operations.
Simple categories
If we wanted to discuss a function like operations, then it's clearly linear. So is :
But what if we had a small non-linear term:
In this case, it looks like at about the red function is sandwiched between the other two linear functions: the blue function (upper) and green function (lower). If we go even further out, then we'll see that our new function really is "linear-ish":
It looks very close to the linear function.
Goals
Asymptotic notation lets us group runtimes into easy to discuss sets. It becomes possible, logical, but most importantly, meaningful to describe a runtime as "linear-ish" or "the runtime grows like ". Asymptotic notation gives us a concise, precisely defined language for making just such statements. It also makes it possible for us to ignore small -values (like -values less than in the previous figures).
Big-O definition
The notation is used to define the set of functions which grow no faster than . We're going to ignore multiplicative constants (e.g., vs. ... that's like changing to a machine that's 5 times faster — everything on that machine is 5 times faster, but we still want to know for a particular program how its runtime is going to scale with input size). We can also ignore small values of . Generally speaking, we benchmark difficult cases (i.e., really large inputs), not easy ones. No one brags that their algorithm calculates a maximum really quickly over two or three elements.
Formal definition of big-O
If we're interested in defining what means in a formal manner, then we can do so as follows:
To use this definition constructively, we just ask if that rule follows for some particular function , in which case that function is in the set :
The condition above that be non-negative (i.e., ) is often excluded because program runtimes are assumed to not be negative!
Both notational statements above may benefit from the use of parentheses. For the uninitiated, if some of the symbols don't look familiar, then referencing the logic symbols may be helpful. For the sake of consistency with the video, we'll stick with the video's notation used throughout the examples and proofs.
Example 1
Let's start with a simple example:
We have , and we want to prove . We'll start with our definition:
For this example, we'll temporarily ignore :
We'll come back to it later. Next, we fill in specifics for our functions using the definition above: and .
Now let's see if our rule holds. Let :
It's clearly true that for . Hence, we could actually have :
We thus see that .
Example 2
Next, we consider the not-perfectly-linear function :
But we still want to show that , where and . We again start with our definition:
Now we plug in our particular and :
Suppose we let :
After a bit of tinkering and algebra, suppose we choose :
Then we need to show that when :
Basically, if , then note that and ; that is, . Hence, .
Why is ? Since we know that is positive, we can rearrange the original inequality as , which simplifies to . Squaring both sides, we get , which is exactly the condition that .
The deduction above is quite simple — it's easy to imagine examples where much more complicated mathematical expressions could be involved, and we might need to rely on advanced mathematics to work out a suitable, convincing proof.
Simplification of functions
What does working out the proof above actually do for us? Let's see:
- (potentially helpful)
- (less helpful)
- (weaker statement ... we prefer the first one because smaller upper bounds are stronger statements)
Precision of use
The above shows there are a couple of different ways that asymptotic notation is used. For HeapSort, for example, most texts would say HeapSort uses comparisons. But some texts (e.g., Sedgewick, Knuth, some research papers, etc.) are more precise and would say HeapSort uses no more than comparisons; that is, take the set of functions that is and, to each of those functions in the set, add exactly to get a new set. And the maximum number of comparisons used by HeapSort is no bigger than some function in that set.
Here, with what we've done so far, only the smaller-order terms are brushed into the asymptotic notation. This allows for comparison between different algorithms of the same order growth like and helps us in thinking about optimization because in real life, of course, multiplicative factors do matter. In this usage, the big- notation has the same meaning, but it's used differently. It's more sophisticated and informative but also more mathematically challenging.
The first way of using -notation is more widespread.
Example 2 revisited
Going back to our second example:
We've seen that , and we mentioned that too,
This may seem obvious, but exploring it will lead us to another useful asymptotic definition. We put up our definition of big-:
And we fill in the specifics for our functions and :
For this proof, we'll choose and :
We should really prove that the values chosen above for and actually work, but a picture will do for now:
Going from big-O to little-o
For the example above, let's assume we've shown . How important was our choice of ? What would have happened if, instead, we chose ? Remember, does not need to be an integer ... just a positive constant. If we chose , then cheating by using pictures instead of actually proving things, it seems that for we have :
We can see this in the picture below:
And it turns out it doesn't matter what constant we pick here. For any , we can find such that when . Hence, . The observation we just made is going to be the basis for another asymptotic definition, so-called "little ", where the function if 's growth strictly dominates the growth of .
Little-o definition
So we have the following definition for big-:
And now the following definition for little-:
Example 2 for little o
If we fill in the specifics of our functions as before for this new definition, we get the following:
If we simplify the above for when is at least 1, then we have the following for :
We can actually calculate a value for . The value will do. Hence, if , then . Hence .
Five asymptotic sets
If we extend the little- notation above, then we can obtain a little- (little omega) notation as well to come up with five asymptotic sets in total:
- strict upper bound
- upper bound
- upper and lower bound
- lower bound
- strict lower bound
Some textbooks will introduce even more asymptotic notations.
Five asymptotic sets (definitions)
We can define the previous asymptotic sets formally as follows:
- and
We've already seen the first two definitions above for upper bounds. The last two are symmetric to them except they're lower bounds. So for those definitions we have instead of .
The little- and big- cases are where it's especially important to remember that can be a value less than . If we forget that, then in the little- case, our proof will probably be wrong or incomplete. And in the big- case, we might not be able to get our proof to go through at all.
Historical notation
Historically, set notation such as that for is not used. Instead, we'd write , and we'd read this as, " is big- of ." So we can use that in our previous formal definitions if we'd like.
It's worth noting that all of the other growths can be defined by just using the big- definition. Like . But those definitions probably aren't as constructive as the ones listed above in terms of helping us prove membership.
Lots of books have slight differences in how the different asymptotic sets are defined, but most are defining the same sets, especially if we only consider increasing positive functions. The definitions above are slight modifications of the ones given in CLRS. But they actually define the same sets as CLRS, even if the definitions are slightly altered. For example, in [21], for little-, the definition uses instead of , the only difference being the inequality is strict (i.e., instead of ). But they define the same set.
Analogy
The reason the asymptotic sets above are put in the order they are in is because there's some analogy here:
If , then , but it's not in any of the other sets.
If , then and but and .
The relations remarked on above look kind of like inequalities:
- is sort of like
- is sort of like
- is sort of like
- is sort of like
- is sort of like
This explains why there are naturally five asymptotic sets.
Example 3
For the last example, we'll look at the bound in the context of some new functions:
We want to consider a function that grows like . Let's prove our bound by proving both upper and lower bounds (we'll give some and values that work, but we should verify that the given values work):
- because for , ,
- because preceding and succeeding lines
- because for , ,
We can also add in a faster and slower growing function if we want to see all five asymptotic definitions in action:
We have the following for :
- because for , ,
- because for , ,
- because for , ,
- because preceding and succeeding lines
- because for , ,
- because for , ,
- because for , , .
It may be worth working through the details to verify everything above.
To repeat one point, but with a concrete example, if we grow like , then we cannot be little- of or little- of :
Proofs, properties, and pictures
Proofs about functions
One kind of task that students are often asked to do is to prove that one function's growth is bounded by another function's growth. The goal is to get students more familiar with the asymptotic definitions, formal proofs, and results in an intuition for how quickly the functions grow.
So let's start with the almost trivial example involving and :
Our goal will be to show that . For this, we'll go directly to the definition:
We'll see the following:
So the definition holds. Done. Obviously, with simple enough problems, the proof is very easy when there aren't any algebraic hangups with using the definition directly. But what if we take the functions and :
The function with base grows doubly exponentially slow, but the exponent, , grows large but not too quickly. But, in all, we don't have a great intuition for and its growth. Just plugging it into the definition above for little- doesn't seem like it's going to give us much in the way of a fruitful approach for comparing the growth of and .
This is where a lot of students get stuck and don't know where to go. On the surface, we may not even have a clue as to which one grows faster:
We may just look at the picture above of the graphs and decide we're good. For now, we'll just put this problem aside.
Proofs about properties
Students are also often asked to prove properties about asymptotic growth. So let's say we're given that and for any constant positive constant , for sufficiently large , grows larger than the constant. We want to show that what we're given implies . More formally:
We won't tease out all the little details of the proof, but it really begins with working with the definition of little-:
After a little bit of algebra, we also use the fact that grows past any constant. Combining these different facts and picking a big enough -value so that all facts apply ( in the bullet point below), we get the little- definition of what we're trying to prove and we're done (last bullet point below).
For arbitrary :
- Let , find .
- Pick: . Then, for , :
- and . Use below:
Why are proofs like these assigned? It helps one gain familiarity with proofs and definitions, but the main thing is that this is part of the science in "computer science". Also, it might actually be useful ... but not seeing any concrete functions can lead to one thinking such exercises as the one above are just way too abstract, resulting in theorems and results that aren't too interesting or useful.
"Proof" by picture
But then one runs into a problem like the following and just toils away:
It's tempting to just make a plot of the functions like the one above and call it a "proof". But it definitely is not a proof. A plot can give us an idea, but we need more than that to be precise and fully confident in our result.
Proofs about functions using properties
The plot above turns out to be misleading. Shock! To find the relationship between the functions being plotted, let's go back to the property we had before:
If we have and , then taking the logarithm of each function gives us
Suddenly things become much easier. They each have a term, but while is a constant, we know that grows. It may grow extremely slowly, but the point is that it grows. It's not constant. It eventually grows to infinity. Since our property specified above holds, we have that
implies
Above, everything has been simplified so much that we can even see the "crossing point":
So passes when . But, of course, no one would actually plot the graph of these two functions that far out on a whim.
Of course, we could argue the -value above is so big that it doesn't matter. And maybe it doesn't matter for some real-life applications. But the bottom line is that we should be firmly convinced that a picture is not "proof" of anything. Use the definitions and use the properties.
Runtime of simple programs and loops
Non-loop example
Let's start with a really simple program comprised of only a few lines:
findRoot(a, b, c) {
tmp = b * b
tmp = tmp - 4 * a
tmp = sqrt(tmp)
tmp = tmp - b
tmp = tmp / (2 * a)
return tmp
}
Each line takes a fixed amount of time. That doesn't mean that the lines take the same amount of time as each other. The subtraction line (line 5
) is probably quicker than the line with both subtract and multiplication (line 3
), but if we're just trying to get the asymptotic runtime of the program, then we only need to know that each line individually has some fixed, constant amount of time it can take. We then run all of the lines and see that the program takes a constant amount of time total.
We usually assume we have fixed size values (e.g., 32-bit or 64-bit). Then we can say that simple arithmetic functions take constant time, but what about the square root operation? That function might actually loop through several iterations of simpler arithmetic, but with fixed-size values, we can still say that it takes constant time; that is, if the size of the inputs/numbers being processed is fixed and bounded, such as 32-bit or 64-bit, then even if the square root function involves multiple iterations or steps internally, then the total number of operations is bounded by a constant number due to the fixed size of the inputs. If we were to work with arbitrarily large numbers (e.g., arbitrary-precision arithmetic), then the complexity of the square root function would depend on the number of bits, making it a non- operation; in that case, the time complexity could be closer to depending on whatever method is used to calculate the square root.
For more complex functions though, we might need to know or calculate their runtime.
Max - simple loop
Consider the following simple program:
replaceMax(A[1...n], val) {
index = findMaxIndex(A)
tmp = A[index]
A[index] = val
return tmp
}
It takes an array, A
, and replaces the maximum value in that array with some new value and returns the old max. We need to be careful of line 2
, where we use the findMaxIndex
method. How long it takes to run depends on how large A
is. That may seem very obvious in this context, but when using programming language library functions like indexOf
for Java's ArrayList
, people sometimes forget.
The program above takes a constant amount of time plus the amount of time needed to run the findMaxIndex
method:
findMaxIndex(A[1...n]) {
tmp = 1
for (i = 2 to n)
if (A[tmp] < A[i])
tmp = i
return tmp
}
How long does the program above take? If we exclude the loop, it looks great. It takes some constant amount of time. What constant? We need to allocate some variables, assign some values, etc. For the sake of having something specific, let's say the program above has a constant runtime of 7
. Whatever value we choose won't be important in the end because we'll see that it can be brushed into our asymptotic notation. Returning to our program, we next have our loop. The loop iterates times, and to get the runtime of the loop, we have to add up the loop's time over all those iterations. Because we have an upper bound of iterations, we can use that to simplify the math. For each iteration, we have an increment, a couple of comparisons, maybe an assignment or two; again, we have a fixed number of operations. Let's call that 9
.
Max - upper bound
At this point, if we let represent the runtime of our program, we're looking at something like the following for an upper bound on the runtime on our program:
With just a bit of manipulation, we can see that we can get a linear upper bound on using our big- notation:
The constants chosen above really weren't all that important. Let's demonstrate this. We could do the same calculations with any constants:
Max - lower bound
We can do the same thing for a lower bound:
Here, even if we ignore any of the program that isn't in the loop, and assume the loop takes just a single unit of time, we can still get a lower bound on the runtime:
Because the upper and lower bounds are linear for the replaceMax
program, we've proven that the runtime is :
Bubble sort - nested loops
Of course, we could look at more complex single loops where each iteration through the loop takes a different amount of time, but we can also see an example of that in the outer loop of Bubblesort
's nested loops:
Bubblesort(A[1...n]) {
for (i = 1 to n)
for (j = n downto i + 1)
if (A[j] < A[j - 1])
swap(A[j], A[j - 1])
}
While Bubblesort
's performance is not the most incredible its code is quite nice for analysis. So let's drop the constant term outside of the loops because we know it isn't going to dominate the runtime. We run through iterations of the outer loop and for each iteration there's an inner loop, and we need its runtime. That will be another sum over the iterations of the inner loop. And the time of each inner loop iteration is ... some constant like 7
again:
Bubble sort - upper bound
While the outer loop can have a different runtime each iteration, we sum over all of them, chug through some algebra and then get a runtime of no more than :
Note how this meshes with our big- definition:
Depending on how tight we want to get the analysis, we could have made some simplifications. Instead of working out the algebra perfectly like we did above, we could have used more inequalities earlier on in the chain of simplifications:
These new simplifications still result in a runtime upper bound of . The constant is pretty arbitrary because 7
was obviously rather arbitrary.
Bubble sort - comparisons
On the other hand, if we want to count some key operations like the comparison or maybe the swap, then we could try to calculate that number more precisely. We can get rid of all occurrences of the arbitrary constant 7
and say that Bubblesort
takes no more than comparisons:
And we can make the same "aggressive" simplifications as before:
The term in the previous set of calculations really is more informative though because that somewhat arbitrary 7
is gone and now the 1/2
means something. In either case, noting that the algorithm's runtime is proportional to the number of comparisons, we still get a big- upper bound of for the runtime.
Bubble sort - lower bound
For our lower bound, first recall the definition of big- we're trying to satisfy:
We use similar math compared to what we did for the upper bound:
But this simplification from before doesn't quite work. We can't make the inside loop larger for proving a lower bound. Working the math out yields the lower order term of , which is kind of a pain because now we have that ; that is is bigger than minus something, but for our definition we want to just be bigger than a constant times . We can't just ignore the smaller term of .
So we could pick a constant like and a positive term and be on our merry way:
If we wanted to use a bigger constant term, then we would need to use a larger :
We could actually use any constant less than . If we wanted to use a more sophisticated analysis, then we could say the number of comparisons is at least one half of minus some function that grows no faster than linear in :
That's a lower bound on the number of comparisons by taking an upper bound on that negative function.
Combining the results above for upper and lower bounds of Bubblesort
, we have :
Usage of asymptotic notation
For algorithm analysis, we generally talk about one or more of the following:
- Best case
- Average case
- Worst case
Analyzing for the best case is usually too optimistic to be helpful. For instance, best case in a linear search is that the first element you look at it what you need. This is . That's not helpful. On the other hand, we might consider what happens when the search fails (i.e., we never discovered our target and we had to loop through the entire list to be certain). Even in the best case, when the search fails, that search is going to take linear time. The best case for a program like merge sort or heap sort might be interesting, but it still probably isn't useful.
Analyzing for the average case is good if we either know what distribution of the data will be (which might be hard) or we have a randomized algorithm and we take the average over our own random choices. Especially for randomized algorithms, analysis of the average case can be really useful, but sometimes the math can get really difficult.
Analyzing for the worst case is probably the most common because it's pessimistic. Whatever we come up with ... can't be worse than that! And even if we're very pessimistic, if we can prove good bounds, then we will have a strong statement. Oftentimes worst case analysis is a lot easier than average case analysis in terms of the math involved. And for a lot of algorithms, the worst case and average case order of growth may very well be the same.
For example, considering merge sort over different values, the best, worst, and average case runtime is order . The worst case analysis is good enough to tell us most of the story. But in quicksort it tells us a lot less. Because quicksort's worst case performance is worse than the overwhelming majority of its runs.
So we see that which type of analysis we might use depends on which algorithm we're discussing. Besides talking about algorithm performance, we can also talk about problem complexity. So let's consider the problem of sorting, specifically using some comparison-based sorting algorithms:
InsertionSort
: worst case sorting is an problem.QuickSort
: worst case sorting is an problem.MergeSort
: worst case sorting is an problem.
Different algorithms with different runtimes can solve the comparison-based sorting problem, some being faster, others being slower. But we can show that any comparison-based sorting algorithm must use at least order comparisons; that is, comparison-based sorting takes comparisons in the worst case. This goes beyond saying that algorithms we've already discovered need that many comparisons — it even holds for algorithms nobody has thought up yet. Any comparison-based sort will need order comparisons in the worst case.
MergeSort
, which is a comparison-based sorting algorithm, can sort using comparisons. That means that comparison-based sorting is a comparison problem. It requires that many comparisons from any algorithm, and the bound can be achieved. It doesn't matter that worse algorithms are out there like BubbleSort
.
Recurrence relations
Recursion and recurrence relations
Fibonacci
Let's start with the famous Fibonacci sequence:
F_0 = 0
F_1 = 1
F_{i > 1} = F_{i - 1} + F_{i - 2}
Fibonacci(n) {
if (n < 2)
return n;
return Fibonacci(n - 1) + Fibonacci(n - 2);
}
The above is a terrible recursive program to calculate the th Fibonacci number. Assuming the numbers stay small enough to add in constant time, and ignoring the runtime of the recursive calls, it takes constant time to calculate:
But we have those recursive calls. The relation above is known as a recurrence relation. Recurrence relations will pop up whenever we want to analyze the runtime of programs that involve some sort of recursive call. To be complete, we can give runtimes for the base cases as well: , . But generally, the base case is just going to be that the program takes some constant amount of time to run for some fixed size inputs.
In this example, using the Fibonacci sequence, the runtime actually grows like the Fibonacci sequence itself since the recurrence relation basically is the Fibonacci sequence.
BowlaNachos
But we can imagine many scenarios where this would not be the case:
B_0 = 0
B_1 = 1
B_{i > 1} = B_{i - 1} x B_{i - 2}
BowlaNachos(n) {
if (n < 2)
return n;
return BowlaNachos(n - 1) x BowlaNachos(n - 2);
}
What will the runtime of the program above be? We would have the following:
Notably, the runtime still grows likem the Fibonacci sequence! The program still has two recursive calls to problems with the same size arguments as the Fibonacci program (i.e., and ). So its runtime would be basically the same as the program for computing the Fibonacci program.
But the sequence isn't so interesting because it's 0
everywhere except at B_1 = 1
.
Recursive linked list search
Let's move to a slightly more useful program. Imagine we had an unsorted linked list of different values, and we want to search those values for one value in particular, val
, and we'll just return true
or false
if it's in the list or not, respectively. If the list has n
items, then we could test the first item and then recursively search the rest of the list:
list.search(val) {
if (list.head == nil)
return false;
return list.head.search(val);
}
node.search(val) { # T(n) <=
if (node.val == val)
return true; # Θ(1) +
if (node.next == nil)
return false; # Θ(1) +
return node.next.search(val); # T(n - 1)
}
The check at the current location (line 8
) takes constant time. If we don't find the value, then we have a recursive call to the next item in the list which has one fewer items in its remaining list (line 12
).
This gives us another recurrence relation:
Note the "" sign above. For the program above, if we find the value, then we just stop and there wouldn't be anymore recursive calls. Here, we can only say the program above has a constant time for its lower bound.
Merge sort
Let's now look at merge sort:
mergeSort(A[], start, end) { # T(n) =
if (start <= end) {
return; # Θ(1) +
}
mid = (start + end) / 2; # Θ(1) +
mergeSort(A, start, mid); # T(ceil(n / 2)) +
mergeSort(A, mid + 1, end) # T(floor(n / 2)) +
merge(A, start, mid, end); # Θ(n)
}
merge(A, start, mid, end)
Θ(n) merging code goes here
For the program above, we recursively sort the first half of the array, then the second half, then merge the two sorted halves together to sort the entire array. The merge in all the non-recursive calls take linear total time. And the two recursive calls are each to about half the array so we get a recurrence like the following:
We see that the floors and ceilings are a bit ugly, and it'd be much nicer to not have to worry about them.
Justifying rounding
We can sometimes pretend that all divisions happen nicely, and we'll give somewhat of a justification here for why we can do that.
Consider the expression
as opposed to
- If is a power of 2 (i.e., ), then the two versions above are equivalent and we get as a solution (subsequent sections will show how to solve recurrence relations).
- If is not a power of 2, then imagine solving for the surrounding perfect powers of 2: . It turns out each of these functions, both the one that's too big and the one that's too small, both grow like :
- ,
- , so .
- If is between two functions that both grow like , namely and , then is : .
More generally, for increasing functions (which effectively are all functions that model runtime), any recurrence should be bounded between the cases of rounding everything down versus rounding everything up. Hence, for
where is assumed to be an increasing function, we have
If the two cases of "rounding everything up" and "rounding everything down" give the same order of growth, then it seems like it should be safe to round.
When using the substitution method to solve recurrences, we'll adhere to the following guidelines:
- assume all divisions happen without remainder
- solve for
- check that
- rounding should not have mattered (under and upper estimates will both use an inductive step that uses the same order of growth)
- everything above happens over a huge group of functions, at least constant through exponential, that is, through for constant . It doesn't happen to be true for some extremely fast-growing functions like , but once we get to those runtimes, rounding won't be our biggest issue.
Substitution method
Linked list search
Let's start with the recurrence for searching a linked list. Given
we want to prove that using the big- definition:
The basic technique of the substitution method is to just guess the answer and then try to prove that your guess is correct using induction. In the case above, we guess the relation is . Inductively, we'll assume
and try to prove that that implies the relation holds for , the total runtime of the algorithm.
We see the above is true for a ton of -values. Let's just pick one. It's true for , (the is not important in this case). By our definition of big-, we've proven our recurrence is .
Simplifications
To be fair, we did make use of some simplifications in the calculations above. The first few have to do with the constants we used. For the base case of the recurrence, we never actually referred to it explicitly. From what we proved previously, we had and . But are they? We'll come back to in a bit, but right now let's consider what happens if .
Similarly, we modified the actual recurrence and assumed that the term was 1. What if it was some other constant? Here, we're assuming we're trying to solve an equation for time, but we haven't defined what 1 unit of time is anyway. We can ignore these issues by "renormalizing" our time units. Note that the number 1 in doesn't get renormalized — that 1 is not a time unit number, it's just a number. Also, the merge sort recurrence we'll see next has a two multiplier. We can't renormalize that out of the equation. It's also unitless. But these other changes we make are okay.
If we're trying to get a perfect solution to the recurrence, then we need the exact base cases; otherwise, realizing that for fixed size input our program should take a fixed time, we can define our units in terms of that fixed time for any small sized inputs we want. Also, sometimes we get some math where we need an , but otherwise we just use it to let us ignore some stuff we want to ignore.
So even if we want to search an empty list with no items, that would take some positive time. We can't actually get no matter how we renormalize. But we just ignore that and say we don't care about zero, we only care about -values at least 1. We can ignore small values in our big- definition.
To recap:
- Renormalizing issues:
- Base case:
- ? ?
- What if ?
- Fixed term:
- Relation: .
- Assumed: .
- What if ?
- 1 new unit = old units.
- new units old units.
- old units new units.
- Base case:
- We tend to ignore in the inductive step.
- We use it when we want to ignore some low-end items.
Merge sort upper bound
Next, we want to prove that merge sort grows no faster than . Given
we want to prove using the big- definition:
Just like before, we inductively assume
and plug in and knock out some algebra:
True for , (the is not important in this case).
Guess too big
Let's not get to where the guesses came from just yet for the proof above, but let's explore what happens when we guess incorrectly. What if we guess too big? That is, instead of guessing , suppose we guessed . Then we'd want to prove using the big- notation:
We can try to prove it inductively by assuming
Then let's see what happens:
So it's true for any . So in this case we didn't actually need to calculate an value. The smaller the -value is the larger the -value will be. So if we pick , then . We use the term to assume that is at least 200. And if is less than 200, then we use the fact that the program takes a fixed time for a fixed number of values, renormalize to that time that's needed, and assume that everything holds for our base cases there.
What we've really just proven is that since we've shown the above holds for any so long as .
Guess too small
What happens if we guess too small? Suppose we're given and we try to prove that using the big- definition:
We inductively assume
Now we can try to plug in and knock out some algebra:
The proof fails. It doesn't prove anything. But in this case we can make some simple changes to our calculations and get a different proof.
Given
let's prove that using the little- definition:
Let's prove it inductively. Assume
Plug and play:
We see it holds, .
The successful proof above for little- is the same as the failed proof for big- except the was changed to a . So if is , then grows strictly faster than , so definitely cannot be . This conclusively shows the previous proof had to fail because we were trying to prove something that was false.
Tight lower bound proof
To be complete, let's give a proof of the tight lower bound. Given
we want to prove that using the big- definition:
We will prove it inductively. Assume
We get the following
True for . We have
Failed proof
Let's try to learn something from a failed proof for another recurrence. Given
we want to prove that using the big- definition:
We go through our normal steps by trying to prove this inductively. Let's assume
We get the following:
This is obviously never true for any . What does a failed proof imply, if anything? Does it imply that the thing we are trying to prove is false (i.e., that the recurrence isn't linear)? No. A failed proof doesn't really mean anything. Of course, if something isn't true, then our proof of that false thing had better fail! But not being able to prove something with a faulty proof doesn't mean that thing you are trying to prove is patently false.
Realistically, a failed proof is an opportunity to reflect. Maybe we should jump to a bigger function with a higher order of growth than such as or . Those will go through easily.
Lower order terms
But what if we just go a tiny bit bigger? Given
let's try to prove using the big- definition: