Complexity Primer¶

In this document, I will summarize some of the complexity work we have done this semester to help you study for the final exam.

What is complexity?¶

Complexity is a rough estimate of how long your program will run. More complex a program, the longer it will take.
Often the running time complexity depends how much data your program is using. The easiest way to think about this as the length of a list.
We use the variable N to refer to the number of elements in a list. All complexity values will be a function of N.

Basic types of complexity (in increasing order)¶

Constant time O(1): This is the best type of complexity, it means that your program will run the same no matter how large the data is.

How is it this possible? Thanks to containers like sets and dictionaries, some operations can be constant time. Basic constant time operations are
```
>>> val in SetX
>>> val in DictionaryY
```
where SetX is a set of values and DictionaryY is a dictionary of values. As each value in the set or dictionary is stored in a hash of its value, searching for a value is constant time
```
To compute val in SetX:
    find hash for val (a simple math computation)
    check the memory location for hash (a single lookup)
```
Many list operations like append, insert, delete, pop has constant time cost. But append is generally cheaper than insert or delete. All set and dictionary update operations are generally constant time.
Logarithmic time O(logN): This means that the runtime of your program increases where slowly as your data becomes longer. Log is often base 2.

For example, if N is doubled, your run time will increase by a small constant. If N is quadruples, it will increase by twice that constant.

Log n is pretty good, but means that your program is not reading all the data to find the output. This is possible in some special cases. For example, in binary search, since the data is sorted, we can keep eliminating half the data at each step. See Lecture 20 — Searching for more details.
Linear time O(N): This means that your program reads each item in the list or a fraction of it. For example, if you read roughly half the list (i.e. N/2 items), this is still a function of N.

Think of linear time like this: If you double the size of your list, your program will take about twice the time. Linear time programs are still among the best.
- Programs that read each item in a list are all linear time:
```
>>> val in ListX
>>> ListX.count(val)
```
O(N LogN): This is slower than linear, but with an additional logarithmic complexity. Basically, anything above linear means you are doing multiple passes over the same data, but the number of passes is logarithmic. Again, log is base 2.

For example, suppose N=64 and your program takes 64*log(64)= 384 seconds (just assume for now). If you doubled your input, 128, your program is expected to take 896 seconds. It is more than double, but still increasing slowly.

We have seen O(N LogN) run time in sorting. The best sorting algorithms are all of this complexity. Our merge sort was in fact of this complexity. See Lecture 21 — Sorting for more details.
Quadratic time O(N^2): This is substantially worse than any of the previous programs. It simply means that you are making N passes over the data. It grows very quickly as N increases.

For example, suppose your program for N=64 takes: 4096 seconds. If we doubled the input size, 128 items, it will now take: 16,384 seconds. This is much worse.

Sometimes quadratic time is unavoidable, but often containers we learnt in this class (sets and dictionaries especially) can help reduce the complexity considerably.
More complex programs: It is possible to have programs with worse complexity, like cubed, etc. We hope that you do not write any programs with quadratic time complexity, let alone worse.

Notes¶

Even programs with the same complexity may differ in run time. For example, doing one or two passes over a list are both linear complexity, but of course it is better to do a pass. The difference between these two programs is considered a constant time difference (one program takes half the time of the other for example).
Be careful: Small change may have a big impact in complexity
```
>>> val in DictionaryY         ###this is O(1)
>>> val in DictionaryY.keys()  ###this is O(N)
```
Why? Because, DictionaryY.keys() first constructs a list of keys in the dictionary, then searches val in this list.

However, keys in a dictionary are stored hashed like a set, so val in DictionaryY will simply do a set look up.

Assessing Complexity¶

We generally look at complexity of big operations and loops.
Remember, a simple line like
```
if x in L
```
already has complexity O(N) where N=len(L). So, identify operations that are complex. For example, the following operations all have O(N) complexity
```
min(L)
max(L)
L.count(x)
```
because you cannot really find the result of these without looking at all the values in L.
- If you wrote a function to compute these functions, it would go something like:
```
minval = L[0]
for val in L:
    if val < minval:
        minval = val
```
which clearly has to go through all the items in L. Of course, Python version of min is actually implemented in C, so it will likely to be a bit faster than yours even though they are both O(N).
Remember that the sort function has complexity O(N logN):
```
L.sort()
```
Slicing a list reads N items and creates a list of N items:
```
L1 = L[: len(L)/2]
```
This has complexity O(N), as you go over N/2 items and create a new list with that many items.
When you have a loop:
1. first, look at the complexity of operations within the loop,
2. then, multiply that with how many times you repeat the loop.
For example:
```
for item in L:
    print L.count(item)
```
We know that L.count has complexity O(N), and we repeat it N items (for each item in the list). So, this simple program has complexity O(N^2).
Often a program may have multiple consecutive steps with different complexity. We generally report on the highest complexity step. For example, if a program had two steps, first O(N^2) and the second one O(N), we would generally say this program has O(N^2) complexity, instead of O(N^2+N). Technically both are fine, but often most of run time really depend on the more complex first step.

Time vs Space Complexity¶

The definition of complexity in this document really refers to the time complexity. There are other measures of complexity, like space complexity, which tells how much memory your program will require. We do not study this in this class, but it is an important part of programs as well.

Some Examples¶

We will conclude this document with some example programs and their analysis:

Some list programs
```
def find_idx(L, val):
    for i in range(len(L)):
        if L[i] == val:
            return i
    return None
```
This function has complexity O(N) because you may need to search all the way to the end of the list to find (or fail to find) the item.

Also, note that range(len(L)) actually creates a list of N items, another O(N) operation.
```
for i in range(len(L1)-1):
    for j in range(i+1, len(L1)):
        print L[i], L[j]
```
If you do the computation, this generates roughly N^2/2 pairs. Hence, this operation has complexity O(N^2). Most double loops are of this complexity unless the inner loop has a fixed iteration:
```
for i in range(len(L1)-1):
    for j in range(5):
        print L[i],
    print
```
The inner loop is repeated only 5 times each time, a constant operation. So this loop actually has O(N) complexity.

Some set and dictionary programs:

S = set(L)    ## O(N) step
for item in S: ## Suppose M is #unique items, O(M*N)
    print L.count(S)

This program has complexity O(N+M*N), hence we will say it is O(M*N). Note that if M is fixed, then this is a linear operation. If M can be as large as N, it is a quadratic operation.
D = {}
for item in L:
    if item not in D:
        D[item] = 1
    else:
        D[item] += 1
Given all operations inside the loop are O(1) (constant time), the whole program has complexity O(N).