Runtime evaluation of algorithms
Previous: Scientific Libraries, Pandas, Numpy | Next: Scientific Libraries, Statistics |
Required course material for the lesson
Powerpoint: Runtime evaluation of algorithms
Video: Runtime Analysis See this first
Subjects covered
Runtime evaluation of algorithms
Big O notation
Big O notation
When you "calculate" the running time of a program or an algorithm you write your result in the Big O notation. The running time is the complexity of the algorithm. The hardest part about Big O is understanding the concept - performing the actual calculation is rather easy in most cases. Especially since there is no real calculation to do, but mostly just analysis of the code.
Some loosely explained theory
Any program is performing some operations when it is running. This could be addition of a number to a variable. It could be asking which number is the biggest or if two strings are identical. You have been programming; every statement is an operation.
Calculating Big O is estimating the number of operations. This is not the number of statements. Some statements are executed several times in loops. Big O is more similar to the accumulated number of times each statement has been executed during the run of the program. While it is possible to calculate this number exactly, given an input and given that randomness is not part of the program, then this is not necessary and very uninteresting and possibly difficult.
What is needed is an estimate that explains how the program's running time scales with the size of the input; Big O. Very often - almost always - a program contains loops that loop over data. Let's take the program that calculates the average of a list/file of numbers as an example.
# pseudo code open file for each line in file sum = sum + number(line) increment(line_counter) close file print sum/line_counter
The opening and closing of the file is done once, likewise the printing of the sum. The summing is done once per number/line. This means the more numbers there are in the file, the more additions are made. It scales with the number of numbers. The line_counter is also incremented once per line, so that obviously also scale in the same way as the summing.
To put this in Big O notation: O(1 + n + n + 1 + 1) = O(2n + 3).
This expression is still too complex. The "small stuff" is eliminated, this means remove the constants. They don't tell much about how running time scales anyway.
The final Big O notation: O(n), where n is the number of numbers.
If you have a more complex Big O like: O(n*n + n), then the single n is thrown away as that is not important compared to n*n, so O(n*n) is the final result.
A collecting of internet resources on Big O
Big O in plain English
A Beginner's Guide to Big O Notation
Big O Notation
Big-O Algorithm Complexity Cheat Sheet
Big O notation explained
How to calculate Big O
Big O notations - a youtube video
Wikipedia's entry on Big O is particular useless. It is most likely correct, but impossible to understand. Don't go there for information unless you like pain.
If all you want to do is to measure how long your program takes to run in real time (seconds and minutes), then you should just use the time command in unix. Simply put it in front of your program.
# Normally you call your program like ./mypythonprogram.py fastafile # To see how long it takes to execute, use time ./mypythonprogram.py fastafile
How to calculate Big O in the project
When a program or algorithm is small, then it is often trivial to perform this "calculation". In bigger pieces of software, you often need to be more systematic about it. Since we know that the loops are what matters in Big O, then keeping track of the loops in the software as comments is a great way of starting a system. In this example, I have a list of numbers where all must be multiplied in pairs and then summed.
The code:
numbers = [2,3,5,7,11,13] sum = 0 for i in range(len(numbers)-1): for j in (j+1, range(numbers)): sum += numbers[i] * numbers[j] print("Result:", sum)
I start adding a comment about Big O in the inner loop. I discard the minor issues, like in essence the loop is at most n-1 long, and minimum 1 long. The worst case is what we use in these situations.
numbers = [2,3,5,7,11,13] sum = 0 for i in range(len(numbers)-1): # O(n), where n is the number of numbers, worst case for j in (j+1, range(numbers)): sum += numbers[i] * numbers[j] print("Result:", sum)
Then I add a Big O comment to the outer loop. Since I already have the Big O for the inner loop, it is easy.
numbers = [2,3,5,7,11,13] sum = 0 # O(n*n), where n is the number of numbers for i in range(len(numbers)-1): # O(n), where n is the number of numbers, worst case for j in (j+1, range(numbers)): sum += numbers[i] * numbers[j] print("Result:", sum)
Now I have the result, O(n*n), easily argued and explainable. Shown directly in the code. I should add some logical reasoning, but I have a strong analysis in the code, which can support me.
Exercises to be handed in
"Calculate" can mean many things here; Analyze, argue, explain, show, or even demonstrate. For some algorithms (not these) it even means "do a lot of math". The calculation must be convincing and explained. Just showing the result is insufficient.
- Calculate Big O of exercise 2 in Python Recap and Objects
- Calculate Big O of exercise 1 in Regular Expressions
- Calculate Big O of exercise 2 in Regular Expressions
- Calculate Big O of exercise 5 in Regular Expressions
- Calculate Big O of exercise 9 in Regular Expressions
- Calculate Big O of exercise 9 in Making Functions
- Calculate Big O of exercise 1 in Advanced Data Structures and New Data Types
- Calculate Big O of exercise 4 in Advanced Data Structures and New Data Types
- Calculate Big O of exercise 6 in Comprehension, Generators, Functions and Methods