Functions, namespace, memory management: Difference between revisions

Latest revision as of 13:10, 16 March 2026

Next: Comprehension, generators, iteration

Required course material for the lesson

Powerpoint: Functions - this will be short as it is a reminder.
Powerpoint: Namespace.
Powerpoint: Decorators
Video: Functions in Python
Video: Examples and identifying data types
Resource: Example code - Functions
Video: Live Coding

Subjects covered

Short about functions
Namespace in Python
Function decorators

Exercises to be handed in

The 2 first exercises are re-use from earlier. It is for a good purpose as will be seen later.
You go back to solo git use to maintain your git skills. Commit every exercise to your private exercise repository.
For many of the exercises you need to make a small program that uses your function in order to test it.

Make a function fastaread(filename) which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file dna7.fsa.
Make a function fastawrite(filename, headers, sequences) which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file dna7.fsa. If you first read the file with fastaread and then write a new file with fastawrite, then if the files are identical, you know you have done right.
Make a function normalize that takes as argument a list of numbers. The function normalizes the numbers between 0 and 1 and returns a normalized list. Normalization in this context is a linear transformation - min-max rescaling.
Make a new normalize function based on the above. This time the function takes three arguments; the list, a min, and a max that the values should be scaled/normalized between The default value for min and max are 0 and 1. Maybe choose some other names for min and max, as those are built-in functions, you overwrite if you use those names.
Make a program that reads the ex1.dat file and counts how many positive and negative numbers there are in each column. Display the result. Now use your latest normalize function to normalize the numbers in each column between -1 and 1, and then again count the numbers of positive and negative in each column. Display. All this in one program.
This is the first part of a larger program - you might want to also read the next 3 exercises to get the full picture. You know that column based files can use different delimiters. The typical example of this is a tab-separated file, where the tab separates the columns. Other classic delimiters can be comma, colon, semicolon or the pipe sign. Now make a function, determineDelimiter, that as argument takes a line, investigates the line and determines if the delimiter is tab, comma, colon, semicolon or pipe sign in the preferred order, and return a single char which is the delimiter. It is required that the delimiter is present at least once in the line. If no delimiter can be found, return None.
This is the second part. Sometimes column based files uses the first line as an identifying headline where each column gets a name that describes the data in the column. See the employee-data.csv file as an example. Make a function identifyColumn, that takes three arguments, a delimiter, a headline and a column name and return which column number the column name belongs to. Return None if a column can not be identified.
This the the third part. You learned about input from command line and options last week. Now make a function parseCommand that analyzes the command line for input. Specifically the function should parse a command line that look like this:
someprog.py [-c <positiveIntegerList>] [-n <nameList>] <filename>
The two lists are comma-separated with no spaces. It must return a (possibly empty) list of numbers and a (possibly empty) list of names and a file name. The two options are mutually exclusive, which means it is an error to use both. The function should return a dict with this content {'file':filename, '-c': positiveIntegerList, '-n': nameList}
This is the fourth and final part. Use your 3 functions to make a program namedcut.py, that works a bit like the Unix cut command in the sense that it selects the columns to display. The program will use as input the employee-data.csv file. If you use the -c option, it will display the columns in the list in the given order. If you use the -n option it will display the named columns in the list in the given order. No matter the original delimiter the output should always be tab-separated. If a column based file is missing a headline then using the -n option is an error. Note that you would benefit from making a usage function for easy error handling. If neither option -c nor option -n are used, then the program displays all columns. This effectively means you can use the program to change the delimiter from comma/colon/semicolon to tab. If you have trouble using the functions in your program, think about the tasks/steps that are required to successfully make the program. Also, test your program on the ex1.dat file as the program is (should be) designed to work on that too. The number of Asians in the Ethnicity column is 403 in employee-data.csv

Note that I added a description of the output in exercise 8, and changed the wording of 9, so there is no headline strip. Also, I have uploaded an new version of the employee-data.csv file, as there were some charset coding errors in it.

@@ Line 1: / Line 1: @@
 __NOTOC__
 {| width=500  style="font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;"
-|Previous: [[More git]]
+|Previous: [[Collaborative git]]
 |Next: [[Comprehension, generators, iteration]]
 |}
@@ Line 7: / Line 7: @@
 Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_05-Functions.ppt Functions] - this will be short as it is a reminder.<br>
 Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_05-Namespace.ppt Namespace].<br>
-Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_05-MemoryManagement.ppt Memory management].<br>
+Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_07-Decorators.ppt Decorators]<br>
 Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=f509116f-48e5-445b-bf6e-af27012ba23f Functions in Python]<br>
 Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=21e9d4d0-2ba5-465c-9939-af27012b7a7e Examples and identifying data types]<br>
@@ Line 16: / Line 16: @@
 * Short about functions
 * Namespace in Python
-* Memory management
+* Function decorators
 == Exercises to be handed in ==
-# Make a function, that returns the relevant one-letter designation for the correct amino acid, when you give it a codon (3 bases). You can find a [[codon list]] here. If something invalid is given as input to the function, raise an exception. Maybe make a dict with codons. You should also have a bit of code apart from the function so you can test it.
+The 2 first exercises are re-use from earlier. It is for a good purpose as will be seen later.<br>
-# Make a function that calculates the factorial. Add some input control to the function to make sure you get positive integers, when you ask for a number. Consider if you want to raise an exception if the function gets invalid data or you want to halt the program.
+You go back to solo git use to maintain your git skills. Commit every exercise to your private exercise repository.<br>
-# Make a function '''fastaread(filename)''' which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''.
+For many of the exercises you need to make a small program that uses your function in order to test it.
-# Make a function '''fastawrite(filename, headers, sequences)''' which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''.
+# Make a function '''fastaread(filename)''' which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''.<br><br>
-# Make a function that take a DNA sequence (string) as parameter and return the complement strand (reverse complement). Make the function generic and portable, i.e. not dependent on any external factors. Together with the functions you made in exercise 3 & 4, read the fasta entries in file ''dna7.fsa'' and write the complement strands in the new file ''revdna7.fsa''. Add 'Complement strand' to each header.
+# Make a function '''fastawrite(filename, headers, sequences)''' which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''. If you first read the file with '''fastaread''' and then write a new file with '''fastawrite''', then if the files are identical, you know you have done right.<br><br>
-# Make a function that calculates the GC-content of a DNA sequence. It takes a DNA (string) as parameter and returns the GC percentage. Make the function generic and portable, i.e. not dependent on any external factors. Together with the functions you made in exercise 3 & 4, read the fasta entries in file ''dna7.fsa'' and write only the sequences which has a GC-content percentage over 50% in the new file ''dna7GC.fsa''.
+# Make a function '''normalize''' that takes as argument a list of numbers. The function normalizes the numbers between 0 and 1 and returns a normalized list. Normalization in this context is a linear transformation - [https://en.wikipedia.org/wiki/Feature_scaling#Rescaling_(min-max_normalization) min-max rescaling].<br><br>
-# <font color="#AA00FF">Make a function that calculates the standard deviation (1.8355) of a list of numbers.  Use it on the numbers in ''ex1.dat''. You can either do the two-pass algorithm (looking at all the numbers twice) which is clear from the formula or the one-pass algorithm.</font><br>[[File:StandardDeviation.gif]]
+# Make a new '''normalize''' function based on the above. This time the function takes three arguments; the list, a min, and a max that the values should be scaled/normalized between The default value for min and max are 0 and 1. Maybe choose some other names for min and max, as those are built-in functions, you overwrite if you use those names.<br><br>
-# Make a function that returns the unique elements of a list as a list. Try it on the accession numbers in ''ex5.acc'', which contains 6461 unique accessions, but also make your own file with simple numbers.
+# Make a program that reads the ''ex1.dat'' file and counts how many positive and negative numbers there are in each column. Display the result. Now use your latest normalize function to normalize the numbers in each column between -1 and 1, and then again count the numbers of positive and negative in each column. Display. All this in one program.<br><br>
-# Make a function '''fibonacci(no1, no2, count)''' which calculates the first '''count''' [https://en.wikipedia.org/wiki/Fibonacci_number fibonacci numbers] based on '''no1''' and '''no2''' and returns them in a list. The next number in a fibonacci sequence is the sum of the two previous numbers. Test it with printing the resulting list (one number per line) from fibonacci(0,1,20).
+# This is the first part of a larger program - you might want to also read the next 3 exercises to get the full picture. You know that column based files can use different delimiters. The typical example of this is a tab-separated file, where the tab separates the columns. Other classic delimiters can be comma, colon, semicolon or the pipe sign. Now make a function, '''determineDelimiter''', that as argument takes a line, investigates the line and determines if the delimiter is tab, comma, colon, semicolon or pipe sign in the preferred order, and return a single char which is the delimiter. It is required that the delimiter is present at least once in the line. If no delimiter can be found, return None.<br><br>
-# The [https://en.wikipedia.org/wiki/Hamming_distance Hamming distance] is the distance between two strings of '''equal''' length. Make a function which takes two strings as arguments and calculates the distance between them. Add error handling.
+# This is the second part. Sometimes column based files uses the first line as an identifying headline where each column gets a name that describes the data in the column. See the ''employee-data.csv'' file as an example. Make a function '''identifyColumn''', that takes three arguments, a delimiter, a headline and a column name and return which column number the column name belongs to. Return None if a column can not be identified.<br><br>
+# This the the third part. You learned about input from command line and options last week. Now make a function '''parseCommand''' that analyzes the command line for input. Specifically the function should parse a command line that look like this:<br>'''someprog.py [-c <positiveIntegerList>] [-n <nameList>] <filename>'''<br> The two lists are comma-separated with no spaces. It must return a (possibly empty) list of numbers and a (possibly empty) list of names and a file name. The two options are mutually exclusive, which means it is an error to use both. The function should return a dict with this content {'file':filename, '-c': positiveIntegerList, '-n': nameList}<br><br>
+# This is the fourth and final part. Use your 3 functions to make a program '''namedcut.py''', that works a bit like the Unix cut command in the sense that it selects the columns to display. The program will use as input the ''employee-data.csv'' file. If you use the -c option, it will display the columns in the list in the given order. If you use the -n option it will display the named columns in the list in the given order. No matter the original delimiter the output should always be tab-separated. If a column based file is missing a headline then using the -n option is an error. Note that you would benefit from making a '''usage''' function for easy error handling. If neither option -c nor option -n are used, then the program displays all columns. This effectively means you can use the program to change the delimiter from comma/colon/semicolon to tab. If you have trouble using the functions in your program, think about the tasks/steps that are required to successfully make the program. Also, test your program on the ''ex1.dat'' file as the program is (should be) designed to work on that too. The number of Asians in the Ethnicity column is 403 in ''employee-data.csv''<br><br>
+Note that I added a description of the output in exercise 8, and changed the wording of 9, so there is no headline strip. Also, I have uploaded an new version of the employee-data.csv file, as there were some charset coding errors in it.
 == Exercises for extra practice ==

Functions, namespace, memory management: Difference between revisions

Latest revision as of 13:10, 16 March 2026

Required course material for the lesson

Subjects covered

Exercises to be handed in

Exercises for extra practice

Navigation menu

Search