Skip to main content

Asymptotic notation and analysis, recurrence relations, and random background math topics (with attitude)

Daniel Farlow
Software Engineer

This post explores asymptotic notation and analysis, namely big-OO notation and a variety of related properties. This sets the stage for discussing recurrence relations and ultimately the Master Theorem for analyzing the time complexity of recurrence relations. Finally, we take a look at a few isolated math topics that will proof fruitful for future analyses.

Attribution

The notes below come from the Algorithms with Attitude YouTube channel, specifically the following playlists: Asymptotic Notation and Analysis, Recurrence Relations, and Random Background Math Topics.

There are several sections in this post, accessible in the table of contents on the right, but most of the sections have been listed below for ease of reference and navigation.

Asymptotic notation and analysis

Definitions

Runtime

For a given program, we'd like to know how runtime will change compared to the input size:

But there are some issues here:

  • What machine are you running the program on?
  • What compiler are you using?
  • How much precision is needed?

These issues make a precise discussion concerning runtime incredibly difficult. Unless we introduce some simplifications. Asymptotic notation is going to be an incredibly useful simplification. It will also make it possible for us to talk about the efficiency of algorithms or pseudocode. Maybe there we'll count clock cycles, total operations, or just a few specific operations.

Simple categories

If we wanted to discuss a function like 5n5n operations, then it's clearly linear. So is 2n+202n + 20:

But what if we had a small non-linear term:

In this case, it looks like at about n8n\geq 8 the red function is sandwiched between the other two linear functions: the blue function (upper) and green function (lower). If we go even further out, then we'll see that our new function really is "linear-ish":

It looks very close to the 2n+202n + 20 linear function.

Goals

Asymptotic notation lets us group runtimes into easy to discuss sets. It becomes possible, logical, but most importantly, meaningful to describe a runtime as "linear-ish" or "the runtime grows like n2n^2". Asymptotic notation gives us a concise, precisely defined language for making just such statements. It also makes it possible for us to ignore small nn-values (like nn-values less than 88 in the previous figures).

Big-O definition

The notation O(g(n))O(g(n)) is used to define the set of functions which grow no faster than g(n)g(n). We're going to ignore multiplicative constants (e.g., 5n25n^2 vs. n2n^2 ... that's like changing to a machine that's 5 times faster — everything on that machine is 5 times faster, but we still want to know for a particular program how its runtime is going to scale with input size). We can also ignore small values of nn. Generally speaking, we benchmark difficult cases (i.e., really large inputs), not easy ones. No one brags that their algorithm calculates a maximum really quickly over two or three elements.

Formal definition of big-O

If we're interested in defining what O(g(n))O(g(n)) means in a formal manner, then we can do so as follows:

O(g(n)){f(n)c>0,n0n0,0f(n)cg(n)}O(g(n))\equiv \{ f(n)\mid \exists c > 0, n_0\ni\forall n\geq 0, 0\leq f(n)\leq c\cdot g(n) \}

To use this definition constructively, we just ask if that rule follows for some particular function f(n)f(n), in which case that function f(n)f(n) is in the set O(g(n))O(g(n)):

f(n)O(g(n))c>0,n0nn0,0f(n)cg(n)f(n) \in O(g(n))\Leftrightarrow \exists c > 0, n_0\ni\forall n\geq n_0, 0\leq f(n)\leq c\cdot g(n)

The condition above that ff be non-negative (i.e., 0f(n)\textcolor{red}{0\leq f(n)}) is often excluded because program runtimes are assumed to not be negative!

Notational options

Both notational statements above may benefit from the use of parentheses. For the uninitiated, if some of the symbols don't look familiar, then referencing the logic symbols may be helpful. For the sake of consistency with the video, we'll stick with the video's notation used throughout the examples and proofs.

Example 1

Let's start with a simple example:

We have f(n)=5nf(n) = 5n, and we want to prove 5nO(n)5n\in O(n). We'll start with our definition:

f(n)O(g(n))c>0,n0nn0,f(n)cg(n)f(n)\in O(g(n))\Leftrightarrow \exists c > 0, n_0\ni\forall n\geq n_0, f(n)\leq c\cdot g(n)

For this example, we'll temporarily ignore n0n_0:

f(n)O(g(n))c>0,n0nn0,f(n)cg(n)f(n)\in O(g(n))\Leftrightarrow \exists c > 0, \textcolor{gray}{n_0\ni\forall n\geq n_0}, f(n)\leq c\cdot g(n)

We'll come back to it later. Next, we fill in specifics for our functions using the definition above: f(n)=5nf(n) = 5n and g(n)=ng(n) = n.

5nO(n)c>0,n0nn0,5ncn5n\in O(n)\Leftrightarrow \exists c > 0, \textcolor{gray}{n_0\ni\forall n\geq n_0}, 5n\leq c\cdot n

Now let's see if our rule holds. Let c=6c = 6:

5nO(n)c(=6),n0nn0,5ncn=6n5n\in O(n)\Leftrightarrow \exists\textcolor{red}{c(=6)}, \textcolor{gray}{n_0\ni\forall n\geq n_0}, 5n\leq \textcolor{red}{c}\cdot n = \textcolor{red}{6}\cdot n

It's clearly true that 5n6n5n\leq 6n for n0n\geq 0. Hence, we could actually have n0=0n_0 = 0:

5nO(n)c(=6),n0n0,5ncn=6n5n\in O(n)\Leftrightarrow \exists\textcolor{red}{c(=6)}, n_0\ni\forall n\geq \textcolor{red}{0}, 5n\leq \textcolor{red}{c}\cdot n = \textcolor{red}{6}\cdot n

We thus see that 5nO(n)5n\in O(n).

Example 2

Next, we consider the not-perfectly-linear function f(n)=2n+6n+30f(n) = 2n + 6\sqrt{n} + 30:

But we still want to show that f(n)O(g(n))f(n)\in O(g(n)), where f(n)=2n+6n+30f(n) = 2n + 6\sqrt{n} + 30 and g(n)=ng(n) = n. We again start with our definition:

f(n)O(g(n))c>0,n0nn0,f(n)cg(n)f(n) \in O(g(n)) \Leftrightarrow \exists c>0, n_0\ni\forall n\geq n_0, f(n)\leq c\cdot g(n)

Now we plug in our particular f(n)f(n) and g(n)g(n):

2n+6n+30O(n)c>0,n0nn0,2n+6n+30cn2n + 6\sqrt{n} + 30 \in O(n) \Leftrightarrow \exists c>0, n_0\ni\forall n\geq n_0, 2n + 6\sqrt{n} + 30\leq c\cdot n

Suppose we let c=4c = 4:

2n+6n+30O(n)c(=4),n0nn0,2n+6n+30cn=4n2n + 6\sqrt{n} + 30 \in O(n) \Leftrightarrow \exists\textcolor{red}{c(=4)}, n_0\ni\forall n\geq n_0, 2n + 6\sqrt{n} + 30\leq \textcolor{red}{c}\cdot n = \textcolor{red}{4}\cdot n

After a bit of tinkering and algebra, suppose we choose n0=36n_0 = 36:

2n+6n+30O(n)c(=4),n0(=36)nn0,2n+6n+304n2n + 6\sqrt{n} + 30 \in O(n) \Leftrightarrow \exists c(=4), \textcolor{red}{n_0(=36)\ni\forall n\geq n_0}, 2n + 6\sqrt{n} + 30\leq 4\cdot n

Then we need to show that 2n+6n+304n2n + 6\sqrt{n} + 30\leq 4n when n36n\geq 36:

6n+302n?n366n+30n+n=2n2n+6n+304n6\sqrt{n}+30\leq 2n?\quad n\geq 36\Rightarrow 6\sqrt{n}+30\leq n+n = 2n\Rightarrow 2n+6\sqrt{n}+30\leq 4\cdot n

Basically, if n36n\geq 36, then note that 6nn6\sqrt{n}\leq n and 30n30\leq n; that is, 2n+6n+302n+n+n=4n2n + 6\sqrt{n} + 30\leq 2n + n + n = 4n. Hence, f(n)O(g(n))f(n)\in O(g(n)).

Why is 6nn6\sqrt{n}\leq n? Since we know that nn is positive, we can rearrange the original inequality as 6nn6\leq\frac{n}{\sqrt{n}}, which simplifies to 6n6\leq\sqrt{n}. Squaring both sides, we get 36n36\leq n, which is exactly the condition that n36n\geq 36.

The deduction above is quite simple — it's easy to imagine examples where much more complicated mathematical expressions could be involved, and we might need to rely on advanced mathematics to work out a suitable, convincing proof.

Simplification of functions

What does working out the proof above actually do for us? Let's see:

  • 2n+6n+30O(n)2n + 6\sqrt{n} + 30\in O(n) (potentially helpful)
  • nO(2n+6n+30)n\in O(2n + 6\sqrt{n} + 30) (less helpful)
  • 2n+6n+30O(n2)2n + 6\sqrt{n} + 30\in O(n^2) (weaker statement ... we prefer the first one because smaller upper bounds are stronger statements)

Precision of use

The above shows there are a couple of different ways that asymptotic notation is used. For HeapSort, for example, most texts would say HeapSort uses O(nlgn)O(n\lg n) comparisons. But some texts (e.g., Sedgewick, Knuth, some research papers, etc.) are more precise and would say HeapSort uses no more than 2nlgn+O(n)2n\lg n + O(n) comparisons; that is, take the set of functions that is O(n)O(n) and, to each of those functions in the set, add exactly 2nlgn2n\lg n to get a new set. And the maximum number of comparisons used by HeapSort is no bigger than some function in that set.

Here, with what we've done so far, only the smaller-order terms are brushed into the asymptotic notation. This allows for comparison between different algorithms of the same order growth like nlgnn\lg n and helps us in thinking about optimization because in real life, of course, multiplicative factors do matter. In this usage, the big-OO notation has the same meaning, but it's used differently. It's more sophisticated and informative but also more mathematically challenging.

The first way of using OO-notation is more widespread.

Example 2 revisited

Going back to our second example:

We've seen that f(n)O(n)f(n)\in O(n), and we mentioned that f(n)O(n2)f(n)\in O(n^2) too,

This may seem obvious, but exploring it will lead us to another useful asymptotic definition. We put up our definition of big-OO:

f(n)O(g(n))c>0,n0nn0,f(n)cg(n)f(n) \in O(g(n)) \Leftrightarrow \exists c>0, n_0\ni\forall n\geq n_0, f(n)\leq c\cdot g(n)

And we fill in the specifics for our functions f(n)f(n) and g(n)g(n):

2n+6n+30O(n2)c>0,n0nn0,2n+6n+30cn22n + 6\sqrt{n} + 30 \in O(n^2) \Leftrightarrow \exists c>0, n_0\ni\forall n\geq n_0, 2n + 6\sqrt{n} + 30\leq c\cdot n^2

For this proof, we'll choose c=1c = 1 and n0=8n_0 = 8:

2n+6n+30O(n2)nn0=8,2n+6n+30c(=1)n22n + 6\sqrt{n} + 30 \in O(n^2) \Leftarrow \textcolor{red}{\forall n\geq n_0 = 8, 2n + 6\sqrt{n} + 30\leq c(=1)\cdot n^2}

We should really prove that the values chosen above for cc and n0n_0 actually work, but a picture will do for now:

Going from big-O to little-o

For the example above, let's assume we've shown f(n)O(n2)f(n)\in O(n^2). How important was our choice of c=1c=1? What would have happened if, instead, we chose c=12c=\frac{1}{2}? Remember, cc does not need to be an integer ... just a positive constant. If we chose c=12c=\frac{1}{2}, then cheating by using pictures instead of actually proving things, it seems that for n13n\geq 13 we have f(n)12n2f(n)\leq\frac{1}{2}\cdot n^2:

2n+6n+30O(n2)nn0=13,2n+6n+30c(=1/2)n22n + 6\sqrt{n} + 30 \in O(n^2) \Leftarrow \textcolor{lime}{\forall n\geq n_0 = 13, 2n + 6\sqrt{n} + 30\leq c(=1/2)\cdot n^2}

We can see this in the picture below:

And it turns out it doesn't matter what constant we pick here. For any c>0c > 0, we can find n0n_0 such that f(n)cn2f(n)\leq c\cdot n^2 when nn0n\geq n_0. Hence, 2n+6n+30O(n2)2n+6\sqrt{n}+30\in O(n^2). The observation we just made is going to be the basis for another asymptotic definition, so-called "little oo", where the function f(n)o(g(n))f(n)\in o(g(n)) if g(n)g(n)'s growth strictly dominates the growth of f(n)f(n).

Little-o definition

So we have the following definition for big-OO:

f(n)O(g(n))c>0,n0nn0,f(n)cg(n)f(n) \in O(g(n)) \Leftrightarrow \exists c>0, n_0\ni\forall n\geq n_0, f(n)\leq c\cdot g(n)

And now the following definition for little-oo:

f(n)o(g(n))c>0,n0nn0,f(n)cg(n)f(n) \in o(g(n)) \Leftrightarrow \forall c>0, \exists n_0\ni\forall n\geq n_0, f(n)\leq c\cdot g(n)

Example 2 for little o

If we fill in the specifics of our functions as before for this new definition, we get the following:

2n+6n+30o(n2)c>0,n0nn0,2n+6n+30cn22n+6\sqrt{n}+30 \in o(n^2) \Leftrightarrow \forall c>0, \exists n_0\ni\forall n\geq n_0, 2n+6\sqrt{n}+30\leq c\cdot n^2

If we simplify the above for when nn is at least 1, then we have the following for n1n\geq 1:

2n+6n+308n+3040n2n+6\sqrt{n}+30\leq 8n + 30\leq 40n

We can actually calculate a value for n0n_0. The value n0=40+40/cn_0 = 40+40/c will do. Hence, if nn0=40+40/cn\geq n_0=40+40/c, then 40ncn240n\leq cn^2. Hence 2n+6n+30o(n2)2n+6\sqrt{n}+30\in o(n^2).

Five asymptotic sets

If we extend the little-oo notation above, then we can obtain a little-ω\omega (little omega) notation as well to come up with five asymptotic sets in total:

  • f(n)o(g(n))f(n) \in o(g(n)) strict upper bound
  • f(n)O(g(n))f(n) \in O(g(n)) upper bound
  • f(n)Θ(g(n))f(n) \in \Theta(g(n)) upper and lower bound
  • f(n)Ω(g(n))f(n) \in \Omega(g(n)) lower bound
  • f(n)ω(g(n))f(n) \in \omega(g(n)) strict lower bound

Some textbooks will introduce even more asymptotic notations.

Five asymptotic sets (definitions)

We can define the previous asymptotic sets formally as follows:

  • f(n)=o(g(n))c>0,n0nn0,0f(n)cg(n)f(n) = o(g(n)) \Leftrightarrow \forall c > 0, \exists n_0\ni\forall n\geq n_0, 0\leq f(n)\leq c\cdot g(n)
  • f(n)=O(g(n))c>0,n0nn0,0f(n)cg(n)f(n) = O(g(n)) \Leftrightarrow \exists c > 0, n_0\ni\forall n\geq n_0, 0\leq f(n)\leq c\cdot g(n)
  • f(n)=Θ(g(n))f(n)O(g(n))f(n) = \Theta(g(n)) \Leftrightarrow f(n)\in O(g(n)) and f(n)Ω(g(n))f(n)\in\Omega(g(n))
  • f(n)=Ω(g(n))c>0,n0nn0,0cg(n)f(n)f(n) = \Omega(g(n)) \Leftrightarrow \exists c > 0, n_0\ni\forall n\geq n_0, 0\leq c\cdot g(n)\leq f(n)
  • f(n)=ω(g(n))c>0,n0nn0,0cg(n)f(n)f(n) = \omega(g(n)) \Leftrightarrow \forall c > 0, \exists n_0\ni\forall n\geq n_0, 0\leq c\cdot g(n)\leq f(n)

We've already seen the first two definitions above for upper bounds. The last two are symmetric to them except they're lower bounds. So for those definitions we have f(n)cg(n)f(n)\geq c\cdot g(n) instead of f(n)cg(n)f(n)\leq c\cdot g(n).

The little-oo and big-Ω\Omega cases are where it's especially important to remember that cc can be a value less than 11. If we forget that, then in the little-oo case, our proof will probably be wrong or incomplete. And in the big-Ω\Omega case, we might not be able to get our proof to go through at all.

Historical notation

Historically, set notation such as that for 3nO(n2)3n\in O(n^2) is not used. Instead, we'd write 3n=O(n2)3n=O(n^2), and we'd read this as, "3n3n is big-OO of n2n^2." So we can use that in our previous formal definitions if we'd like.

It's worth noting that all of the other growths can be defined by just using the big-OO definition. Like f(n)Ω(g(n))g(n)O(f(n))f(n)\in \Omega(g(n))\Leftrightarrow g(n)\in O(f(n)). But those definitions probably aren't as constructive as the ones listed above in terms of helping us prove membership.

Lots of books have slight differences in how the different asymptotic sets are defined, but most are defining the same sets, especially if we only consider increasing positive functions. The definitions above are slight modifications of the ones given in CLRS. But they actually define the same sets as CLRS, even if the definitions are slightly altered. For example, in [21], for little-oo, the definition uses 0f(n)<cg(n)0\leq f(n) < c\cdot g(n) instead of 0f(n)cg(n)0\leq f(n)\leq c\cdot g(n), the only difference being the inequality is strict (i.e., << instead of \leq). But they define the same set.

Analogy

The reason the asymptotic sets above are put in the order they are in is because there's some analogy here:

  • f(n)o(g(n))f(n) \in o(g(n))
  • f(n)O(g(n))f(n) \in O(g(n))
  • f(n)Θ(g(n))f(n) \in \Theta(g(n))
  • f(n)Ω(g(n))f(n) \in \Omega(g(n))
  • f(n)ω(g(n))f(n) \in \omega(g(n))

If f(n)o(g(n))f(n)\in o(g(n)), then f(n)O(g(n))f(n)\in O(g(n)), but it's not in any of the other sets.

If f(n)Θ(g(n))f(n)\in\Theta(g(n)), then f(n)O(g(n))f(n)\in O(g(n)) and f(n)Ω(g(n))f(n)\in \Omega(g(n)) but f(n)∉o(g(n))f(n)\not\in o(g(n)) and f(n)∉ω(g(n))f(n)\not\in \omega(g(n)).

The relations remarked on above look kind of like inequalities:

  • f(n)o(g(n))f(n) \in o(g(n)) is sort of like f(n)<g(n)f(n) < g(n)
  • f(n)O(g(n))f(n) \in O(g(n)) is sort of like f(n)g(n)f(n)\leq g(n)
  • f(n)Θ(g(n))f(n) \in \Theta(g(n)) is sort of like f(n)=g(n)f(n) = g(n)
  • f(n)Ω(g(n))f(n) \in \Omega(g(n)) is sort of like f(n)g(n)f(n)\geq g(n)
  • f(n)ω(g(n))f(n) \in \omega(g(n)) is sort of like f(n)>g(n)f(n) > g(n)

This explains why there are naturally five asymptotic sets.

Example 3

For the last example, we'll look at the Θ\Theta bound in the context of some new functions:

We want to consider a function that grows like n2n^2. Let's prove our Θ\Theta bound by proving both upper and lower bounds (we'll give some cc and n0n_0 values that work, but we should verify that the given values work):

  • f(n)=O(n2)f(n) = O(n^2) because for c=2\textcolor{red}{c=2}, nn0=2\textcolor{red}{n\geq n_0 = 2}, f(n)2n2f(n)\leq \textcolor{red}{2}\cdot n^2
  • f(n)=Θ(n2)f(n) = \Theta(n^2) because preceding and succeeding lines
  • f(n)=Ω(n2)f(n) = \Omega(n^2) because for c=12\textcolor{lime}{c=\frac{1}{2}}, nn0=9\textcolor{lime}{n\geq n_0=9}, 12n2f(n)\textcolor{lime}{\frac{1}{2}}\cdot n^2\leq f(n)

We can also add in a faster and slower growing function if we want to see all five asymptotic definitions in action:

We have the following for f(n)=n26n+15f(n)=n^2-6n+15:

  • f(n)=o(n3)f(n) = o(n^3) because for c>0\textcolor{red}{c>0}, nn0=3+1/c\textcolor{red}{n\geq n_0=3+1/c}, f(n)cn3f(n)\leq\textcolor{red}{c}\cdot n^3
  • f(n)=O(n3)f(n) = O(n^3) because for c=1\textcolor{red}{c=1}, nn0=2\textcolor{red}{n\geq n_0 = 2}, f(n)1n2f(n)\leq\textcolor{red}{1}\cdot n^2
  • f(n)=O(n2)f(n) = O(n^2) because for c=2\textcolor{fuchsia}{c=2}, nn0=2\textcolor{fuchsia}{n\geq n_0=2}, f(n)2n2f(n)\leq\textcolor{fuchsia}{2}\cdot n^2
  • f(n)=Θ(n2)f(n) = \Theta(n^2) because preceding and succeeding lines
  • f(n)=Ω(n2)f(n) = \Omega(n^2) because for c=12\textcolor{purple}{c=\frac{1}{2}}, nn0=9\textcolor{purple}{n\geq n_0=9}, 12n2f(n)\textcolor{purple}{\frac{1}{2}}\cdot n^2\leq f(n)
  • f(n)=Ω(n)f(n) = \Omega(n) because for c=1\textcolor{lime}{c=1}, nn0=0\textcolor{lime}{n\geq n_0=0}, 1n2f(n)\textcolor{lime}{1}\cdot n^2\leq f(n)
  • f(n)=ω(n)f(n) = \omega(n) because for c>0\textcolor{lime}{c > 0}, nn0=6+c\textcolor{lime}{n\geq n_0=6 + c}, cnf(n)\textcolor{lime}{c}\cdot n\leq f(n).

It may be worth working through the details to verify everything above.

To repeat one point, but with a concrete example, if we grow like Θ(n2)\Theta(n^2), then we cannot be little-oo of n2n^2 or little-ω\omega of n2n^2:

  • f(n)=o(n3)f(n) = o(n^3)
  • f(n)=O(n3)f(n) = O(n^3)
  • f(n)=O(n2)  but  f(n)o(n2)f(n) = O(n^2)\;\textcolor{red}{\text{but}}\; f(n)\neq o(n^2)
  • f(n)=Θ(n2)f(n) = \Theta(n^2)
  • f(n)=Ω(n2)  but  f(n)ω(n2)f(n) = \Omega(n^2)\;\textcolor{red}{\text{but}}\; f(n)\neq \omega(n^2)
  • f(n)=Ω(n)f(n) = \Omega(n)
  • f(n)=ω(n)f(n) = \omega(n)

Proofs, properties, and pictures

Proofs about functions

One kind of task that students are often asked to do is to prove that one function's growth is bounded by another function's growth. The goal is to get students more familiar with the asymptotic definitions, formal proofs, and results in an intuition for how quickly the functions grow.

So let's start with the almost trivial example involving f(n)=n2f(n) = n^2 and g(n)=n3g(n)=n^3:

Our goal will be to show that n2=o(n3)n^2 = o(n^3). For this, we'll go directly to the definition:

f(n)=o(g(n))c>0,n0nn0,0f(n)cg(n)f(n) = o(g(n)) \Leftrightarrow \forall c > 0, \exists n_0\ni\forall n\geq n_0, 0\leq f(n)\leq c\cdot g(n)

We'll see the following:

n2cn3n1/c=n0n2=o(n3)n^2\leq c\cdot n^3\Leftrightarrow n\geq 1/c = n_0\Rightarrow n^2=o(n^3)

So the definition holds. Done. Obviously, with simple enough problems, the proof is very easy when there aren't any algebraic hangups with using the definition directly. But what if we take the functions f(n)=n3f(n) = n^3 and g(n)=(lglgn)lgng(n) = (\lg\lg n)^{\lg n}:

The function gg with base lglgn\lg\lg n grows doubly exponentially slow, but the exponent, lgn\lg n, grows large but not too quickly. But, in all, we don't have a great intuition for (lglgn)lgn(\lg\lg n)^{\lg n} and its growth. Just plugging it into the definition above for little-oo doesn't seem like it's going to give us much in the way of a fruitful approach for comparing the growth of ff and gg.

This is where a lot of students get stuck and don't know where to go. On the surface, we may not even have a clue as to which one grows faster:

n3c(lglgn)lgn  ???(lglgn)lgncn3  ???\begin{align*} n^3\leq c\cdot (\lg\lg n)^{\lg n} &\Leftrightarrow \; ???\\ (\lg\lg n)^{\lg n}\leq c\cdot n^3 &\Leftrightarrow \; ??? \end{align*}

We may just look at the picture above of the graphs and decide we're good. For now, we'll just put this problem aside.

Proofs about properties

Students are also often asked to prove properties about asymptotic growth. So let's say we're given that lgf(n)=o(lgg(n))\lg f(n) = o(\lg g(n)) and for any constant positive constant cc, for sufficiently large nn, g(n)g(n) grows larger than the constant. We want to show that what we're given implies f(n)=o(g(n))f(n) = o(g(n)). More formally:

(lgf(n)=o(lgg(n)))(c,m0nm0,cg(n))f(n)=o(g(n))\textcolor{red}{(\lg f(n) = o(\lg g(n)))} \land \textcolor{blue}{(\forall c, \exists m_0\ni\forall n\geq m_0, c\leq g(n))} \Rightarrow \textcolor{fuchsia}{f(n) = o(g(n))}

We won't tease out all the little details of the proof, but it really begins with working with the definition of little-oo:

c>0,n0nn0,lgf(n)clgg(n)c>0,n0nn0,2lgf(n)2clgg(n)c>0,n0nn0,f(n)=2lgf(n)2clgg(n)=(2lgg(n))c=gc(n)c=12,n0nn0,f(n)g(n)n0nn0,f(n)g(n)\begin{array}{rrrrrrrr} \textcolor{red}{\forall c > 0,} & \textcolor{red}{\exists n_0\ni\forall n\geq n_0,} &{} &{} \textcolor{red}{\lg f(n)} &\textcolor{red}{\leq c\lg g(n)} &{} &{}\\ \forall c > 0, & \exists n_0\ni\forall n\geq n_0, &{} &{} 2^{\lg f(n)} &\leq 2^{c\lg g(n)} &{} &{}\\ \forall c > 0, & \exists n_0\ni\forall n\geq n_0, &f(n) &= 2^{\lg f(n)} &\leq 2^{c\lg g(n)} &= (2^{\lg g(n)})^c &= g^c(n)\\ c =\tfrac{1}{2}, & \exists n_0\ni\forall n\geq n_0, &f(n) &{} &\leq &{} &{} \sqrt{g(n)}\\ {} & \exists n_0\ni\forall n\geq n_0, &{} &f(n) &\leq \sqrt{g(n)} &{} &{} &{} \end{array}

After a little bit of algebra, we also use the fact that g(n)g(n) grows past any constant. Combining these different facts and picking a big enough nn-value so that all facts apply (n0n'_0 in the bullet point below), we get the little-oo definition of what we're trying to prove and we're done (last bullet point below).

For arbitrary c>0c' > 0:

  • Let c=1/(c)2\textcolor{blue}{c = 1/(c')^2}, find m0nm0,1/(c)2g(n)\textcolor{blue}{m_0\ni\forall n\geq m_0, 1/(c')^2\leq g(n)}.
  • Pick: n0=max(n0,m0)n'_0 = \max(\textcolor{red}{n_0}, \textcolor{blue}{m_0}). Then, for c>0c' > 0, nn0\forall n\geq n'_0:
  • 1cg(n)\textcolor{blue}{1\leq c'\sqrt{g(n)}} and f(n)g(n)\textcolor{red}{f(n)\leq\sqrt{g(n)}}. Use cc' below:
  • c>0,n0nn0,f(n)=1f(n)cg(n)g(n)=cg(n)\textcolor{fuchsia}{\forall c' > 0, \exists n'_0\ni\forall n\geq n'_0, f(n)} = \textcolor{blue}{1}\cdot\textcolor{red}{f(n)}\leq\textcolor{blue}{c'\sqrt{g(n)}}\cdot\textcolor{red}{\sqrt{g(n)}}=\textcolor{fuchsia}{c' g(n)}

Why are proofs like these assigned? It helps one gain familiarity with proofs and definitions, but the main thing is that this is part of the science in "computer science". Also, it might actually be useful ... but not seeing any concrete functions can lead to one thinking such exercises as the one above are just way too abstract, resulting in theorems and results that aren't too interesting or useful.

"Proof" by picture

But then one runs into a problem like the following and just toils away:

It's tempting to just make a plot of the functions like the one above and call it a "proof". But it definitely is not a proof. A plot can give us an idea, but we need more than that to be precise and fully confident in our result.

Proofs about functions using properties

The plot above turns out to be misleading. Shock! To find the relationship between the functions being plotted, let's go back to the property we had before:

(lgf(n)=o(lgg(n)))(c,m0nm0,cg(n))f(n)=o(g(n))\textcolor{red}{(\lg f(n) = o(\lg g(n)))} \land \textcolor{blue}{(\forall c, \exists m_0\ni\forall n\geq m_0, c\leq g(n))} \Rightarrow \textcolor{fuchsia}{f(n) = o(g(n))}

If we have f(n)=n3f(n) = n^3 and g(n)=(lglgn)lgng(n) = (\lg\lg n)^{\lg n}, then taking the logarithm of each function gives us

lg(f(n))=3lgn,lg(g(n))=lgnlglglgn\lg(f(n)) = 3\lg n,\qquad \lg(g(n)) = \lg n \lg\lg\lg n

Suddenly things become much easier. They each have a lgn\lg n term, but while 33 is a constant, we know that lglglgn\lg\lg\lg n grows. It may grow extremely slowly, but the point is that it grows. It's not constant. It eventually grows to infinity. Since our property specified above holds, we have that

3lgn=o(lgnlglglgn)3\lg n = o(\lg n\lg\lg\lg n)

implies

n3=o((lglgn)lgn)n^3 = o((\lg\lg n)^{\lg n})

Above, everything has been simplified so much that we can even see the "crossing point":

3lgnlgnlglglgn3lglglgn2223=2256n3\lg n\leq \lg n\lg\lg\lg n \Leftrightarrow 3\leq \lg\lg\lg n \Leftrightarrow 2^{2^{2^3}} = 2^{256}\leq n

So g(n)=(lglgn)lgng(n) = (\lg\lg n)^{\lg n} passes f(n)=n3f(n) = n^3 when n2256n\geq 2^{256}. But, of course, no one would actually plot the graph of these two functions that far out on a whim.

Of course, we could argue the nn-value above is so big that it doesn't matter. And maybe it doesn't matter for some real-life applications. But the bottom line is that we should be firmly convinced that a picture is not "proof" of anything. Use the definitions and use the properties.

Runtime of simple programs and loops

Non-loop example

Let's start with a really simple program comprised of only a few lines:

findRoot(a, b, c) {
tmp = b * b
tmp = tmp - 4 * a
tmp = sqrt(tmp)
tmp = tmp - b
tmp = tmp / (2 * a)
return tmp
}

Each line takes a fixed amount of time. That doesn't mean that the lines take the same amount of time as each other. The subtraction line (line 5) is probably quicker than the line with both subtract and multiplication (line 3), but if we're just trying to get the asymptotic runtime of the program, then we only need to know that each line individually has some fixed, constant amount of time it can take. We then run all of the lines and see that the program takes a constant amount of time total.

We usually assume we have fixed size values (e.g., 32-bit or 64-bit). Then we can say that simple arithmetic functions take constant time, but what about the square root operation? That function might actually loop through several iterations of simpler arithmetic, but with fixed-size values, we can still say that it takes constant time; that is, if the size of the inputs/numbers being processed is fixed and bounded, such as 32-bit or 64-bit, then even if the square root function involves multiple iterations or steps internally, then the total number of operations is bounded by a constant number due to the fixed size of the inputs. If we were to work with arbitrarily large numbers (e.g., arbitrary-precision arithmetic), then the complexity of the square root function would depend on the number of bits, making it a non-O(1)O(1) operation; in that case, the time complexity could be closer to O(logn)O(\log n) depending on whatever method is used to calculate the square root.

For more complex functions though, we might need to know or calculate their runtime.

Max - simple loop

Consider the following simple program:

replaceMax(A[1...n], val) {
index = findMaxIndex(A)
tmp = A[index]
A[index] = val
return tmp
}

It takes an array, A, and replaces the maximum value in that array with some new value and returns the old max. We need to be careful of line 2, where we use the findMaxIndex method. How long it takes to run depends on how large A is. That may seem very obvious in this context, but when using programming language library functions like indexOf for Java's ArrayList, people sometimes forget.

The program above takes a constant amount of time plus the amount of time needed to run the findMaxIndex method:

findMaxIndex(A[1...n]) {
tmp = 1
for (i = 2 to n)
if (A[tmp] < A[i])
tmp = i
return tmp
}

How long does the program above take? If we exclude the loop, it looks great. It takes some constant amount of time. What constant? We need to allocate some variables, assign some values, etc. For the sake of having something specific, let's say the program above has a constant runtime of 7. Whatever value we choose won't be important in the end because we'll see that it can be brushed into our asymptotic notation. Returning to our program, we next have our loop. The loop iterates n1n - 1 times, and to get the runtime of the loop, we have to add up the loop's time over all those iterations. Because we have an upper bound of nn iterations, we can use that to simplify the math. For each iteration, we have an increment, a couple of comparisons, maybe an assignment or two; again, we have a fixed number of operations. Let's call that 9.

Max - upper bound

At this point, if we let T(n)T(n) represent the runtime of our program, we're looking at something like the following for an upper bound on the runtime on our program:

T(n)7+i=1n9T(n)\leq 7 + \sum_{i=1}^n 9

With just a bit of manipulation, we can see that we can get a linear upper bound on T(n)T(n) using our big-OO notation:

T(n)=O(g(n))c>0,n0nn0,0T(n)cg(n)T(n)7+i=1n99n+710n(n7=n0)\begin{align*} T(n)&= O(g(n))\equiv\exists c > 0, n_0\ni\forall n\geq n_0, 0\leq T(n)\leq cg(n)\\ T(n)&\leq 7 + \sum_{i=1}^n 9\leq 9n + 7\leq 10n \quad(n\geq 7 = n_0) \end{align*}

The constants chosen above really weren't all that important. Let's demonstrate this. We could do the same calculations with any constants:

T(n)a+i=1nbbn+a(b+1)n(na=n0)T(n)\leq a + \sum_{i=1}^n b\leq bn + a \leq (b + 1)n \quad(n\geq a = n_0)

Max - lower bound

We can do the same thing for a lower bound:

T(n)=Ω(g(n))c>0,n0nn0,0cg(n)T(n)T(n)7+i=1n99n+710n(n7=n0)T(n)i=2n1\begin{align*} T(n)&= \Omega(g(n))\equiv\exists c > 0, n_0\ni\forall n\geq n_0, 0\leq cg(n)\leq T(n)\\ T(n)&\leq 7 + \sum_{i=1}^n 9\leq 9n + 7\leq 10n \quad(n\geq 7 = n_0)\\ T(n)&\geq\sum_{i=2}^n 1 \end{align*}

Here, even if we ignore any of the program that isn't in the loop, and assume the loop takes just a single unit of time, we can still get a lower bound on the runtime:

T(n)1=n112n(n2=n0)T(n)\geq 1 = n - 1\geq\frac{1}{2}n \quad(n\geq 2 = n_0)

Because the upper and lower bounds are linear for the replaceMax program, we've proven that the runtime is Θ(n)\Theta(n):

T(n)=O(n),T(n)=Ω(n),T(n)=Θ(n)T(n) = O(n),\quad T(n) = \Omega(n),\quad T(n) = \Theta(n)

Bubble sort - nested loops

Of course, we could look at more complex single loops where each iteration through the loop takes a different amount of time, but we can also see an example of that in the outer loop of Bubblesort's nested loops:

Bubblesort(A[1...n]) {
for (i = 1 to n)
for (j = n downto i + 1)
if (A[j] < A[j - 1])
swap(A[j], A[j - 1])
}

While Bubblesort's performance is not the most incredible its code is quite nice for analysis. So let's drop the constant term outside of the loops because we know it isn't going to dominate the runtime. We run through nn iterations of the outer loop and for each iteration there's an inner loop, and we need its runtime. That will be another sum over the iterations of the inner loop. And the time of each inner loop iteration is ... some constant like 7 again:

T(n)i=1nj=i+1n7T(n)\leq\sum_{i=1}^n\sum_{j=i+1}^n 7

Bubble sort - upper bound

While the outer loop can have a different runtime each iteration, we sum over all of them, chug through some algebra and then get a runtime of no more than 72n2\frac{7}{2}n^2:

T(n)i=1nj=i+1n7=i=1n7[n(i+1)+1]=i=1n7(ni)=7i=1n7i=1ni=7n27n(n+1)2=72n27n72n2=O(n2)\begin{align*} T(n) &\leq\sum_{i=1}^n\sum_{j=i+1}^n 7\\ &= \sum_{i=1}^n 7[n - (i + 1) + 1]\\ &= \sum_{i=1}^n 7(n-i)\\ &= 7\sum_{i=1}^n - 7\sum_{i=1}^n i\\ &= 7n^2 - 7\frac{n(n+1)}{2}\\ &=\frac{7}{2}n^2 - \frac{7}{n}\\ &\leq\frac{7}{2}n^2\\ &= O(n^2) \end{align*}

Note how this meshes with our big-OO definition:

T(n)=O(g(n))c>0,n0nn0,0T(n)cg(n)T(n) = O(g(n))\equiv\exists c > 0, n_0\ni\forall n\geq n_0, 0\leq T(n)\leq cg(n)

Depending on how tight we want to get the analysis, we could have made some simplifications. Instead of working out the algebra perfectly like we did above, we could have used more inequalities earlier on in the chain of simplifications:

T(n)i=1nj=i+1n7i=1nj=1n7=i=1n7n=7n2T(n) \leq\sum_{i=1}^n\sum_{j=i+1}^n 7 \leq\sum_{i=1}^n\sum_{j=1}^n 7 = \sum_{i=1}^n 7n = 7n^2

These new simplifications still result in a runtime upper bound of O(n2)O(n^2). The constant is pretty arbitrary because 7 was obviously rather arbitrary.

Bubble sort - comparisons

On the other hand, if we want to count some key operations like the comparison or maybe the swap, then we could try to calculate that number more precisely. We can get rid of all occurrences of the arbitrary constant 7 and say that Bubblesort takes no more than n22\frac{n^2}{2} comparisons:

T(n)i=1nj=i+1n1=i=1n1[n(i+1)+1]=i=1n1(ni)=i=1ni=1ni=n2n(n+1)2=12n21n12n2=O(n2)\begin{align*} T(n) &\leq\sum_{i=1}^n\sum_{j=i+1}^n 1\\ &= \sum_{i=1}^n 1[n - (i + 1) + 1]\\ &= \sum_{i=1}^n 1(n-i)\\ &= \sum_{i=1}^n - \sum_{i=1}^n i\\ &= n^2 - \frac{n(n+1)}{2}\\ &=\frac{1}{2}n^2 - \frac{1}{n}\\ &\leq\frac{1}{2}n^2\\ &= O(n^2) \end{align*}

And we can make the same "aggressive" simplifications as before:

T(n)i=1nj=i+1n1i=1nj=1n1=i=1nn=n2T(n) \leq\sum_{i=1}^n\sum_{j=i+1}^n 1 \leq\sum_{i=1}^n\sum_{j=1}^n 1 = \sum_{i=1}^n n = n^2

The n22\frac{n^2}{2} term in the previous set of calculations really is more informative though because that somewhat arbitrary 7 is gone and now the 1/2 means something. In either case, noting that the algorithm's runtime is proportional to the number of comparisons, we still get a big-OO upper bound of O(n2)O(n^2) for the runtime.

Bubble sort - lower bound

For our lower bound, first recall the definition of big-Ω\Omega we're trying to satisfy:

T(n)=Ω(g(n))c>0,n0nn0,0cg(n)T(n)T(n)=\Omega(g(n))\equiv\exists c > 0, n_0\ni\forall n\geq n_0, 0\leq cg(n)\leq T(n)

We use similar math compared to what we did for the upper bound:

T(n)i=1nj=i+1n1=i=1n1[n(i+1)+1]=i=1n1(ni)=i=1ni=1ni=n2n(n+1)2=12n21n=n2n2=n2\begin{align*} T(n) &\textcolor{red}{\geq}\sum_{i=1}^n\sum_{j=i+1}^n 1\\ &= \sum_{i=1}^n 1[n - (i + 1) + 1]\\ &= \sum_{i=1}^n 1(n-i)\\ &= \sum_{i=1}^n - \sum_{i=1}^n i\\ &= n^2 - \frac{n(n+1)}{2}\\ &=\frac{1}{2}n^2 - \frac{1}{n}\\ &=\frac{n^2-n}{2}\\ &= n^2 \end{align*}

But this simplification from before doesn't quite work. We can't make the inside loop larger for proving a lower bound. Working the math out yields the lower order term of (n2n)/2(n^2 - n)/2, which is kind of a pain because now we have that T(n)(n2n)/2T(n)\geq (n^2 - n)/2; that is T(n)T(n) is bigger than n2/2n^2/2 minus something, but for our definition we want to just be bigger than a constant times n2n^2. We can't just ignore the smaller term of nn.

So we could pick a constant like c=14c=\frac{1}{4} and a positive n0n_0 term and be on our merry way:

T(n)n2n214n2(n2=n0)T(n)\geq\frac{n^2 - n}{2}\geq\frac{1}{4}n^2 \quad(n\geq 2 = n_0)

If we wanted to use a bigger constant term, then we would need to use a larger n0n_0:

T(n)n2n213n2(n3=n0)T(n)\geq\frac{n^2-n}{2}\geq\frac{1}{3}n^2 \quad(n\geq 3 = n_0)

We could actually use any constant less than 1/21/2. If we wanted to use a more sophisticated analysis, then we could say the number of comparisons is at least one half of n2n^2 minus some function that grows no faster than linear in nn:

T(n)12n2O(n)T(n)\geq\frac{1}{2}n^2 - O(n)

That's a lower bound on the number of comparisons by taking an upper bound on that negative function.

Combining the results above for upper and lower bounds of Bubblesort, we have T(n)=Θ(n2)T(n) = \Theta(n^2):

T(n)=O(n2),T(n)=Ω(n2),T(n)=Θ(n2)T(n) = O(n^2),\quad T(n) = \Omega(n^2),\quad T(n) = \Theta(n^2)

Usage of asymptotic notation

For algorithm analysis, we generally talk about one or more of the following:

  • Best case
  • Average case
  • Worst case

Analyzing for the best case is usually too optimistic to be helpful. For instance, best case in a linear search is that the first element you look at it what you need. This is Θ(1)\Theta(1). That's not helpful. On the other hand, we might consider what happens when the search fails (i.e., we never discovered our target and we had to loop through the entire list to be certain). Even in the best case, when the search fails, that search is going to take linear time. The best case for a program like merge sort or heap sort might be interesting, but it still probably isn't useful.

Analyzing for the average case is good if we either know what distribution of the data will be (which might be hard) or we have a randomized algorithm and we take the average over our own random choices. Especially for randomized algorithms, analysis of the average case can be really useful, but sometimes the math can get really difficult.

Analyzing for the worst case is probably the most common because it's pessimistic. Whatever we come up with ... can't be worse than that! And even if we're very pessimistic, if we can prove good bounds, then we will have a strong statement. Oftentimes worst case analysis is a lot easier than average case analysis in terms of the math involved. And for a lot of algorithms, the worst case and average case order of growth may very well be the same.

For example, considering merge sort over nn different values, the best, worst, and average case runtime is order nlgnn\lg n. The worst case analysis is good enough to tell us most of the story. But in quicksort it tells us a lot less. Because quicksort's worst case performance is worse than the overwhelming majority of its runs.

So we see that which type of analysis we might use depends on which algorithm we're discussing. Besides talking about algorithm performance, we can also talk about problem complexity. So let's consider the problem of sorting, specifically using some comparison-based sorting algorithms:

  • InsertionSort: worst case sorting is an O(n2)O(n^2) problem.
  • QuickSort: worst case sorting is an O(nlgn)O(n\lg n) problem.
  • MergeSort: worst case sorting is an O(nlgn)O(n\lg n) problem.

Different algorithms with different runtimes can solve the comparison-based sorting problem, some being faster, others being slower. But we can show that any comparison-based sorting algorithm must use at least order nlgnn\lg n comparisons; that is, comparison-based sorting takes Ω(nlgn)\Omega(n\lg n) comparisons in the worst case. This goes beyond saying that algorithms we've already discovered need that many comparisons — it even holds for algorithms nobody has thought up yet. Any comparison-based sort will need order nlgnn\lg n comparisons in the worst case.

MergeSort, which is a comparison-based sorting algorithm, can sort using O(nlgn)O(n\lg n) comparisons. That means that comparison-based sorting is a Θ(nlgn)\Theta(n\lg n) comparison problem. It requires that many comparisons from any algorithm, and the bound can be achieved. It doesn't matter that worse algorithms are out there like BubbleSort.

Recurrence relations

Recursion and recurrence relations

Fibonacci

Let's start with the famous Fibonacci sequence:

F_0 = 0
F_1 = 1
F_{i > 1} = F_{i - 1} + F_{i - 2}

Fibonacci(n) {
if (n < 2)
return n;
return Fibonacci(n - 1) + Fibonacci(n - 2);
}

The above is a terrible recursive program to calculate the nnth Fibonacci number. Assuming the numbers stay small enough to add in constant time, and ignoring the runtime of the recursive calls, it takes constant time to calculate:

T(n)=T(n1)+T(n2)+Θ(1)T(n) = T(n - 1) + T(n - 2) + \Theta(1)

But we have those recursive calls. The relation above is known as a recurrence relation. Recurrence relations will pop up whenever we want to analyze the runtime of programs that involve some sort of recursive call. To be complete, we can give runtimes for the base cases as well: T(0)=Θ(1)T(0) = \Theta(1), T(1)=Θ(1)T(1) = \Theta(1). But generally, the base case is just going to be that the program takes some constant amount of time to run for some fixed size inputs.

In this example, using the Fibonacci sequence, the runtime actually grows like the Fibonacci sequence itself since the recurrence relation basically is the Fibonacci sequence.

BowlaNachos

But we can imagine many scenarios where this would not be the case:

B_0 = 0
B_1 = 1
B_{i > 1} = B_{i - 1} x B_{i - 2}

BowlaNachos(n) {
if (n < 2)
return n;
return BowlaNachos(n - 1) x BowlaNachos(n - 2);
}

What will the runtime of the program above be? We would have the following:

T(n)=T(n1)+T(n2)+Θ(1)T(0)=Θ(1),  T(1)=Θ(1)\begin{align*} T(n) &= T(n - 1) + T(n - 2) + \Theta(1)\\ T(0) &= \Theta(1),\; T(1) = \Theta(1) \end{align*}

Notably, the runtime still grows likem the Fibonacci sequence! The program still has two recursive calls to problems with the same size arguments as the Fibonacci program (i.e., n1n-1 and n2n-2). So its runtime would be basically the same as the program for computing the Fibonacci program.

But the sequence isn't so interesting because it's 0 everywhere except at B_1 = 1.

Let's move to a slightly more useful program. Imagine we had an unsorted linked list of different values, and we want to search those values for one value in particular, val, and we'll just return true or false if it's in the list or not, respectively. If the list has n items, then we could test the first item and then recursively search the rest of the list:

list.search(val) {
if (list.head == nil)
return false;
return list.head.search(val);
}

node.search(val) { # T(n) <=
if (node.val == val)
return true; # Θ(1) +
if (node.next == nil)
return false; # Θ(1) +
return node.next.search(val); # T(n - 1)
}

The check at the current location (line 8) takes constant time. If we don't find the value, then we have a recursive call to the next item in the list which has one fewer items in its remaining list (line 12).

This gives us another recurrence relation:

T(n)T(n1)+Θ(1)T(0)=Θ(1),  T(1)=Θ(1)\begin{align*} T(n) &\leq T(n - 1) + \Theta(1)\\ T(0) &= \Theta(1),\; T(1) = \Theta(1) \end{align*}

Note the "\leq" sign above. For the program above, if we find the value, then we just stop and there wouldn't be anymore recursive calls. Here, we can only say the program above has a constant time for its lower bound.

Merge sort

Let's now look at merge sort:

mergeSort(A[], start, end) {      # T(n) = 
if (start <= end) {
return; # Θ(1) +
}
mid = (start + end) / 2; # Θ(1) +
mergeSort(A, start, mid); # T(ceil(n / 2)) +
mergeSort(A, mid + 1, end) # T(floor(n / 2)) +
merge(A, start, mid, end); # Θ(n)
}

merge(A, start, mid, end)
Θ(n) merging code goes here

For the program above, we recursively sort the first half of the array, then the second half, then merge the two sorted halves together to sort the entire array. The merge in all the non-recursive calls take linear total time. And the two recursive calls are each to about half the array so we get a recurrence like the following:

T(n)=T(n/2)+Θ(n/2)+Θ(n)T(0)=Θ(1),  T(1)=Θ(1)\begin{align*} T(n) &= T(\lceil n/2 \rceil) + \Theta(\lfloor n/2\rfloor) + \Theta(n)\\ T(0) &= \Theta(1),\; T(1) = \Theta(1) \end{align*}

We see that the floors and ceilings are a bit ugly, and it'd be much nicer to not have to worry about them.

Justifying rounding

We can sometimes pretend that all divisions happen nicely, and we'll give somewhat of a justification here for why we can do that.

Consider the expression

T(n)=4T(n/2)+Θ(n/2)+Θ(n)T(n) = 4T(\lceil n/2 \rceil) + \Theta(\lfloor n/2\rfloor) + \Theta(n)

as opposed to

T(n)=8T(n/2)+Θ(n)T(n) = 8T(n/2) + \Theta(n)
  • If nn is a power of 2 (i.e., n=2kn = 2^k), then the two versions above are equivalent and we get T(n)=Θ(n3)T(n) = \Theta(n^3) as a solution (subsequent sections will show how to solve recurrence relations).
  • If nn is not a power of 2, then imagine solving for the surrounding perfect powers of 2: 2k<n<2k+12^k < n < 2^{k+1}. It turns out each of these functions, both the one that's too big and the one that's too small, both grow like n3n^3:
    • T(m=2k)T(n)T(2m),(m<n<2m)T(m = 2^k) \leq T(n)\leq T(2m),\qquad (m < n < 2m)
    • T(m)=Θ(m3)T(m) = \Theta(m^3), T(2m)=Θ((2m)3)=Θ(m3)T(2m) = \Theta((2m)^3) = \Theta(m^3)
    • n3/8<m3<8n3n^3 / 8 < m^3 < 8n^3, so Θ(m3)=Θ(n3)\Theta(m^3) = \Theta(n^3).
    • If T(n)T(n) is between two functions that both grow like Θ(n3(\Theta(n^3(, namely T(m)T(m) and T(2m)T(2m), then T(n)T(n) is Θ(n3)\Theta(n^3): T(n)=Θ(n3)T(n) = \Theta(n^3).

More generally, for increasing functions (which effectively are all functions that model runtime), any recurrence should be bounded between the cases of rounding everything down versus rounding everything up. Hence, for

T(n)=T(n/2)+T(n/2)+Θ(n)T(n) = T(\lceil n/2 \rceil) + T(\lfloor n/2 \rfloor) + \Theta(n)

where T(n)T(n) is assumed to be an increasing function, we have

2T(n/2)+Θ(n)T(n)2T(n/2)+Θ(n)2T(\lfloor n/2 \rfloor) + \Theta(n) \textcolor{red}{\leq T(n)\leq} 2T(\lceil n/2 \rceil) + \Theta(n)

If the two cases of "rounding everything up" and "rounding everything down" give the same order of growth, then it seems like it should be safe to round.

When using the substitution method to solve recurrences, we'll adhere to the following guidelines:

  • assume all divisions happen without remainder
  • solve for T(n)T(n)
  • check that T(n1)=Θ(T(n))T(n - 1) = \Theta(T(n))
  • rounding should not have mattered (under and upper estimates will both use an inductive step that uses the same order of growth)
  • everything above happens over a huge group of functions, at least constant through exponential, that is, Θ(1)\Theta(1) through Θ(Xn)\Theta(X^n) for constant XX. It doesn't happen to be true for some extremely fast-growing functions like n!n!, but once we get to those runtimes, rounding won't be our biggest issue.

Substitution method

Let's start with the recurrence for searching a linked list. Given

T(n)T(n1)+1T(n)\leq T(n - 1) + 1

we want to prove that T(n)=O(n)T(n) = O(n) using the big-OO definition:

c>0,n0nn0,T(n)cn\exists c > 0, n_0\ni\forall n\geq n_0, T(n)\leq c\cdot n

The basic technique of the substitution method is to just guess the answer and then try to prove that your guess is correct using induction. In the case above, we guess the relation is O(n)O(n). Inductively, we'll assume

k<n,T(k)ck\forall k < n, T(k)\leq c\cdot k

and try to prove that that implies the relation holds for T(n)T(n), the total runtime of the algorithm.

T(n)T(n1)+1c(n1)+1(by ind. hyp.; let k=n1)cnc+1?cn1c\begin{align*} T(n) &\leq\textcolor{red}{T(n-1)} + 1 \\ &\leq\textcolor{red}{c\cdot (n - 1)} + 1 & \text{(by ind. hyp.; let $k=n-1$)}\\ &\leq cn - c + 1\\ &\stackrel{?}{\leq} c\cdot n\\ &\Leftrightarrow 1\leq c \end{align*}

We see the above is true for a ton of cc-values. Let's just pick one. It's true for c=1c = 1, nn0=1n\geq n_0=1 (the n0n_0 is not important in this case). By our definition of big-OO, we've proven our recurrence is O(n)O(n).

Simplifications

To be fair, we did make use of some simplifications in the calculations above. The first few have to do with the constants we used. For the base case of the recurrence, we never actually referred to it explicitly. From what we proved previously, we had T(1)=1T(1) = 1 and T(0)=0T(0) = 0. But are they? We'll come back to T(0)T(0) in a bit, but right now let's consider what happens if T(1)=c1T(1) = c_1.

Similarly, we modified the actual recurrence and assumed that the Θ(1)\Theta(1) term was 1. What if it was some other constant? Here, we're assuming we're trying to solve an equation for time, but we haven't defined what 1 unit of time is anyway. We can ignore these issues by "renormalizing" our time units. Note that the number 1 in Θ(1)\Theta(1) doesn't get renormalized — that 1 is not a time unit number, it's just a number. Also, the merge sort recurrence we'll see next has a two multiplier. We can't renormalize that out of the equation. It's also unitless. But these other changes we make are okay.

If we're trying to get a perfect solution to the recurrence, then we need the exact base cases; otherwise, realizing that for fixed size input our program should take a fixed time, we can define our units in terms of that fixed time for any small sized inputs we want. Also, sometimes we get some math where we need an n0n_0, but otherwise we just use it to let us ignore some stuff we want to ignore.

So even if we want to search an empty list with no items, that would take some positive time. We can't actually get T(0)=0T(0) = 0 no matter how we renormalize. But we just ignore that and say we don't care about zero, we only care about nn-values at least 1. We can ignore small values in our big-OO definition.

To recap:

  • Renormalizing issues:
    • Base case:
      • T(1)=1T(1) = 1? T(0)=0T(0) = 0?
      • What if T(1)=c1T(1) = c_1?
    • Fixed term:
      • Relation: T(n)T(n1)+Θ(1)T(n)\leq T(n-1) + \textcolor{red}{\Theta(1)}.
      • Assumed: T(n)T(n1)+1T(n)\leq T(n-1) + 1.
      • What if T(n)T(n1)+c2T(n)\leq T(n-1) + c_2?
    • 1 new unit = max(c1,c2)\max(c_1, c_2) old units.
    • T(1)T(1) new units max(c1,c2)\leq\max(c_1,c_2) old units.
    • T(n)T(n1)+c2T(n)\leq T(n-1)+ c_2 old units T(n1)+1\leq T(n-1) + 1 new units.
  • We tend to ignore n0n_0 in the inductive step.
    • We use it when we want to ignore some low-end items.

Merge sort upper bound

Next, we want to prove that merge sort grows no faster than nlgnn\lg n. Given

T(n)2T(n/2)+nT(n)\leq 2T(n/2) + n

we want to prove T(n)=O(nlgn)T(n) = O(n\lg n) using the big-OO definition:

c>0,n0nn0,T(n)cnlgn\exists c > 0, n_0\ni\forall n\geq n_0, T(n)\leq c\cdot n\lg n

Just like before, we inductively assume

k<n,T(k)cklgk\forall k < n, T(k)\leq c\cdot k\lg k

and plug in and knock out some algebra:

T(n)2T(n/2)+n2c(n/2)lg(n/2)+n(by ind. hyp.; let k=n/2)=cn((lgn)1)+n=cnlgncn+n?cnlgnncn\begin{align*} T(n) &\leq 2\textcolor{red}{T(n/2)} + n\\ &\leq 2\textcolor{red}{c\cdot (n/2)\lg(n/2)} + n & \text{(by ind. hyp.; let $k=n/2$)}\\ &= cn((\lg n) - 1) + n\\ &= cn\lg n - cn + n\\ &\stackrel{?}{\leq} c\cdot n\lg n\\ &\Leftrightarrow n\leq cn \end{align*}

True for c=1c = 1, nn0=1\forall n\geq n_0 = 1 (the n0n_0 is not important in this case).

Guess too big

Let's not get to where the guesses came from just yet for the proof above, but let's explore what happens when we guess incorrectly. What if we guess too big? That is, instead of guessing T(n)=O(nlgn)T(n) = O(n\lg n), suppose we guessed T(n)=O(n2)T(n) = O(n^2). Then we'd want to prove T(n)=O(n2)T(n) = O(n^2) using the big-OO notation:

c>0,n0nn0,T(n)cn2\exists c > 0, n_0\ni\forall n\geq n_0, T(n)\leq c\cdot n^2

We can try to prove it inductively by assuming

k<n,T(k)cn2\forall k < n, T(k)\leq c\cdot n^2

Then let's see what happens:

T(n)2T(n/2)+n2c(n/2)2+n(by ind. hyp.; let k=n/2)=cn2/2+n?cn2ncn2/21cn/2\begin{align*} T(n) &\leq 2\textcolor{red}{T(n/2)} + n\\ &\leq 2\textcolor{red}{c\cdot (n/2)^2} + n & \text{(by ind. hyp.; let $k=n/2$)}\\ &= cn^2/2 + n\\ &\stackrel{?}{\leq} c\cdot n^2\\ &\Leftrightarrow n\leq cn^2/2\\ &\Leftrightarrow 1\leq cn/2 \end{align*}

So it's true for any c>0,nn0=2/cc>0, \forall n\geq n_0=2/c. So in this case we didn't actually need to calculate an n0n_0 value. The smaller the cc-value is the larger the n0n_0-value will be. So if we pick c=1/100c=1/100, then n0=200n_0 = 200. We use the n0n_0 term to assume that nn is at least 200. And if n/2n/2 is less than 200, then we use the fact that the program takes a fixed time for a fixed number of values, renormalize to that time that's needed, and assume that everything holds for our base cases there.

What we've really just proven is that T(n)=o(n2)T(n) = o(n^2) since we've shown the above holds for any c>0c>0 so long as nn0=2/cn\geq n_0=2/c.

Guess too small

What happens if we guess too small? Suppose we're given T(n)T(n/2)+nT(n)\leq T(n/2) + n and we try to prove that T(n)=O(n)T(n) = O(n) using the big-OO definition:

c>0,n0nn0,T(n)cn\exists c > 0, n_0\ni\forall n\geq n_0, T(n)\leq c\cdot n

We inductively assume

k<n,T(k)cn\forall k < n, T(k)\leq c\cdot n

Now we can try to plug in and knock out some algebra:

T(n)2T(n/2)+n2c(n/2)+n(by ind. hyp.; let k=n/2)=cn+n?cnn0\begin{align*} T(n) &\leq 2\textcolor{red}{T(n/2)} + n\\ &\leq 2\textcolor{red}{c\cdot (n/2)} + n & \text{(by ind. hyp.; let $k=n/2$)}\\ &= cn + n\\ &\textcolor{red}{\stackrel{?}{\leq}} c\cdot n\\ &\Leftrightarrow n\leq 0 \end{align*}

The proof fails. It doesn't prove anything. But in this case we can make some simple changes to our calculations and get a different proof.

Given

T(n)2T(n/2)+nT(n)\textcolor{red}{\geq} 2T(n/2) + n

let's prove that T(n)=ω(n)T(n) = \omega(n) using the little-ω\omega definition:

c>0,n0nn0,T(n)cn\forall c > 0,\exists n_0\ni\forall n\geq n_0, T(n)\geq c\cdot n

Let's prove it inductively. Assume

k<n,T(k)cn\forall k < n, T(k) \textcolor{red}{\geq} c\cdot n

Plug and play:

T(n)2T(n/2)+n2c(n/2)+n(by ind. hyp.; let k=n/2)=cn+n?cnn0\begin{align*} T(n) &\textcolor{red}{\geq} 2T(n/2) + n\\ &\textcolor{red}{2c\cdot (n/2) + n} & \text{(by ind. hyp.; let $k=n/2$)}\\ &= cn + n\\ &\textcolor{red}{\stackrel{?}{\geq}} c\cdot n\\ &\Leftrightarrow n\geq 0 \end{align*}

We see it holds, c,T(n)=ω(n)\forall c, T(n) = \omega(n).

The successful proof above for little-ω\omega is the same as the failed proof for big-OO except the \leq was changed to a \geq. So if T(n)T(n) is ω(n)\omega(n), then T(n)T(n) grows strictly faster than ω(n)\omega(n), so T(n)T(n) definitely cannot be O(n)O(n). This conclusively shows the previous proof had to fail because we were trying to prove something that was false.

Tight lower bound proof

To be complete, let's give a proof of the tight lower bound. Given

T(n)2T(n/2)+nT(n)\geq 2T(n/2) + n

we want to prove that T(n)=Ω(nlgn)T(n) = \Omega(n\lg n) using the big-Ω\Omega definition:

c>0,n0nn0,cnlgnT(n)\exists c > 0, n_0\ni\forall n\geq n_0, c\cdot n\lg n\leq T(n)

We will prove it inductively. Assume

k<n,cklgkT(k)\forall k < n, c\cdot k\lg k\leq T(k)

We get the following

T(n)2T(n/2)+n2c(n/2)lg(n/2)+n(by ind. hyp.; let k=n/2)=cn((lgn)1)+n=cnlgncn+n?cnlgnncn\begin{align*} T(n) &\geq 2\textcolor{red}{T(n/2)} + n\\ &\geq 2\textcolor{red}{c\cdot (n/2)\lg(n/2)} + n & \text{(by ind. hyp.; let $k=n/2$)}\\ &= cn((\lg n) - 1) + n\\ &= cn\lg n - cn + n\\ &\stackrel{?}{\geq} c\cdot n\lg n\\ &\Leftrightarrow n\geq cn \end{align*}

True for c=1,nn0=1c=1, \forall n\geq n_0 = 1. We have

T(n)=Ω(nlgn),T(n)=O(nlgn),T(n)=Θ(nlgn)T(n) = \Omega(n\lg n),\qquad T(n) = O(n\lg n),\qquad T(n) = \Theta(n\lg n)

Failed proof

Let's try to learn something from a failed proof for another recurrence. Given

T(n)2T(n/2)+1T(n)\leq 2T(n/2) + 1

we want to prove that T(n)=O(n)T(n) = O(n) using the big-OO definition:

c>0,n0nn0,T(n)cn\exists c > 0, n_0\ni\forall n\geq n_0, T(n)\leq c\cdot n

We go through our normal steps by trying to prove this inductively. Let's assume

k<n,T(k)ck\forall k < n, T(k)\leq c\cdot k

We get the following:

T(n)2T(n/2)+12c(n/2)+1=cn+1?cn10\begin{align*} T(n) &\leq 2\textcolor{red}{T(n/2)} + 1\\ &\leq 2\textcolor{red}{c\cdot (n/2)} + 1\\ &= cn + 1\\ &\stackrel{?}{\leq} cn\\ &\Leftrightarrow 1\leq 0 \end{align*}

This is obviously never true for any c>0c > 0. What does a failed proof imply, if anything? Does it imply that the thing we are trying to prove is false (i.e., that the recurrence isn't linear)? No. A failed proof doesn't really mean anything. Of course, if something isn't true, then our proof of that false thing had better fail! But not being able to prove something with a faulty proof doesn't mean that thing you are trying to prove is patently false.

Realistically, a failed proof is an opportunity to reflect. Maybe we should jump to a bigger function with a higher order of growth than nn such as nlgnn\lg n or n2n^2. Those will go through easily.

Lower order terms

But what if we just go a tiny bit bigger? Given

T(n)2T(n/2)+1T(n)\leq 2T(n/2) + 1

let's try to prove T(n)=O(n+1)T(n) = O(n + 1) using the big-OO definition:

c>0,n0nn0,T(n)