Functions: Difference between revisions
Jump to navigation
Jump to search
Line 16: | Line 16: | ||
== Exercises to be handed in == | == Exercises to be handed in == | ||
Any function you are required to make here needs a small program around it to show that it works - so you also make that as part of your hand-in. | Any function you are required to make here needs a small program around it to show that it works - so you also make that as part of your hand-in.<br> | ||
The exercises 5 to 8 belongs together as a group and should be solved in order. They are about reading, changing and writing fasta files with many entries. You will likely find 5 especially hard. | |||
# The [https://en.wikipedia.org/wiki/Hamming_distance Hamming distance] is the distance between two strings of '''equal''' length. Make a function which takes two strings as arguments and calculates the distance between them. Add error handling. | # The [https://en.wikipedia.org/wiki/Hamming_distance Hamming distance] is the distance between two strings of '''equal''' length. Make a function which takes two strings as arguments and calculates the distance between them. Add error handling. | ||
# Make a function '''fibonacci(no1, no2, count)''' which calculates the first '''count''' [https://en.wikipedia.org/wiki/Fibonacci_number fibonacci numbers] based on '''no1''' and '''no2''' and returns them in a list. The next number in a fibonacci sequence is the sum of the two previous numbers. Test it with printing the resulting list (one number per line) from fibonacci(0,1,20). | # Make a function '''fibonacci(no1, no2, count)''' which calculates the first '''count''' [https://en.wikipedia.org/wiki/Fibonacci_number fibonacci numbers] based on '''no1''' and '''no2''' and returns them in a list. The next number in a fibonacci sequence is the sum of the two previous numbers. Test it with printing the resulting list (one number per line) from fibonacci(0,1,20). |
Latest revision as of 11:53, 1 September 2025
Previous: List manipulation | Next: Simple pattern matching |
Required course material for the lesson
Powerpoint: Functions
Resource: Example code - Functions
Subjects covered
- Functions: a function is both a way of hiding complexity and a way of reusing code.
- Arguments, scope of variables.
Exercises to be handed in
Any function you are required to make here needs a small program around it to show that it works - so you also make that as part of your hand-in.
The exercises 5 to 8 belongs together as a group and should be solved in order. They are about reading, changing and writing fasta files with many entries. You will likely find 5 especially hard.
- The Hamming distance is the distance between two strings of equal length. Make a function which takes two strings as arguments and calculates the distance between them. Add error handling.
- Make a function fibonacci(no1, no2, count) which calculates the first count fibonacci numbers based on no1 and no2 and returns them in a list. The next number in a fibonacci sequence is the sum of the two previous numbers. Test it with printing the resulting list (one number per line) from fibonacci(0,1,20).
- Make a function that returns the unique elements of a list as a list. Try it on the accession numbers in ex5.acc, which contains 6461 unique accessions, but also make your own file with simple numbers.
- Make a function that calculates the standard deviation of a list of numbers. Try with the file ex1.dat, where you pool all the numbers in the the 3 columns into one list and get the result 1.8355. You can either do the two-pass algorithm (iteration through all the numbers twice) which is clear from the formula or the one-pass algorithm which you can derive from the formula if you are up to the challenge.
- Look at the fasta file dna7.fsa. It contains several fasta entries. Reading more than one entry in a file is more complex. Make a program that mostly consists of one function fastaread(filename) which takes a filename as a parameter and returns 2 lists, first list is the entry headers, second list is the entry sequences (as single strings without whitespace). Add appropriate error handling to the function. Just print the headers and sequences out pairwise as proof of concept.
Hint: I will point out that the reading and writing of fasta files with many entries is a regular occurrence in bioinformatics (and exam), so be sure to get it right. Many people mistakenly believe that they should use a form of stateful parsing with a flag for this - doing so confuses the issue, so abstain from that. - Make a function fastawrite(filename, headers, sequences) which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes a fasta file with that file name. Add appropriate error handling to the function. You can test your function using the file dna7.fsa and your previous function - reading dna7.fsa and then writing a new fasta file, should make the files identical.
- Make a function that take a DNA sequence (string) as parameter and return the complement strand (reverse complement). Make the function generic and portable, i.e. not dependent on any external factors. Together with the functions you made in exercise 5 & 6, read the fasta entries in file dna7.fsa and write the complement strands in the new file revdna7.fsa. Add 'Complement strand' to each header.
- Make a function that calculates the GC-content of a DNA sequence. It takes a DNA (string) as parameter and returns the GC percentage. Make the function generic and portable, i.e. not dependent on any external factors. Together with the functions you made in exercise 5 & 6, read the fasta entries in file dna7.fsa and write only the sequences which has a GC-content percentage over 50% in the new file dna7GC.fsa.
Exercises for extra practice
- Make a function that calculates the factorial. Add some input control to the function to make sure you get positive integers, when you ask for a number. Consider if you want to raise an exception if the function gets invalid data or you want to halt the program.
- Make the factorial function computation use recursion.