Functions, namespace, memory management: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
|||
| (11 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
{| width=500 style="font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;" | {| width=500 style="font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;" | ||
|Previous: [[ | |Previous: [[Collaborative git]] | ||
|Next: [[Comprehension, generators, iteration]] | |Next: [[Comprehension, generators, iteration]] | ||
|} | |} | ||
| Line 19: | Line 19: | ||
== Exercises to be handed in == | == Exercises to be handed in == | ||
The 2 first exercises are re-use from earlier. It is for a good purpose as will be seen later.<br> | |||
You go back to solo git use to maintain your git skills. Commit every exercise to your private exercise repository.<br> | |||
# Make a function '''fastaread(filename)''' which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''. | For many of the exercises you need to make a small program that uses your function in order to test it. | ||
# Make a function '''fastawrite(filename, headers, sequences)''' which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''. | # Make a function '''fastaread(filename)''' which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''.<br><br> | ||
# Make a function '''fastawrite(filename, headers, sequences)''' which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file ''dna7.fsa''. If you first read the file with '''fastaread''' and then write a new file with '''fastawrite''', then if the files are identical, you know you have done right.<br><br> | |||
# Make a function that | # Make a function '''normalize''' that takes as argument a list of numbers. The function normalizes the numbers between 0 and 1 and returns a normalized list. Normalization in this context is a linear transformation - [https://en.wikipedia.org/wiki/Feature_scaling#Rescaling_(min-max_normalization) min-max rescaling].<br><br> | ||
# Make a new '''normalize''' function based on the above. This time the function takes three arguments; the list, a min, and a max that the values should be scaled/normalized between The default value for min and max are 0 and 1.<br><br> | |||
# | # Make a program that reads the ex1.dat file and counts how many positive and negative numbers there are in each column. Display the result. Now use your latest normalize function to normalize the numbers in each column between -1 and 1, and then again count the numbers of positive and negative in each column. Display. All this in one program.<br><br> | ||
# This is the first part of a larger program - you might want to also read the next 3 exercises to get the full picture. You know that column based files can use different delimiters. The typical example of this is a tab-separated file, where the tab separates the columns. Other classic delimiters can be comma, colon, semicolon or the pipe sign. Now make a function, '''determineDelimiter''', that as argument takes a line, investigates the line and determines if the delimiter is tab, comma, colon, semicolon or pipe sign in the preferred order, and return a single char which is the delimiter. It is required that the delimiter is present at least once in the line. If no delimiter can be found, return None.<br><br> | |||
# This is the second part. Sometimes column based files uses the first line as an identifying headline where each column gets a name that describes the data in the column. See the ''employee-data.csv'' file as an example. Make a function '''identifyColumn''', that takes three arguments, a delimiter, a headline and a column name and return which column number the column name belongs to. Return None if a column can not be identified.<br><br> | |||
# This the the third part. You learned about input from command line and options last week. Now make a function '''parseCommand''' that analyzes the command line for input. Specifically the function should parse a command line that look like this:<br>'''someprog.py [-c <positiveIntegerList>] [-n <nameList>] <filename>'''<br> The two lists are comma-separated with no spaces. It must return a (possibly empty) list of numbers and a (possibly empty) list of names and a file name. The two options are mutually exclusive, which means it is an error to use both.<br><br> | |||
# This is the fourth and final part. Use your 3 functions to make a program '''namedcut.py''', that works a bit like the Unix cut command in the sense that it selects the columns to display. The program will use as input the ''employee-data.csv'' file. If you use the -c option, it will display the columns in the list in the given order, but stripped of the headline. If you use the -n option it will display the named columns in the list in the given order - no headline strip. No matter the original delimiter the output should always be tab-separated. If a column based file is missing a headline then using the -n option is an error. Note that you would benefit from making a '''usage''' function for easy error handling. If you have trouble using the functions in your program, think about the tasks/steps that are required to successfully make the program. Also, test your program on the ''ex1.dat'' file as the program is (should be) designed to work on that too. The number of Asians in the Ethnicity column is 404 in ''employee-data.csv''<br><br> | |||
== Exercises for extra practice == | == Exercises for extra practice == | ||
Latest revision as of 20:48, 1 March 2026
| Previous: Collaborative git | Next: Comprehension, generators, iteration |
Required course material for the lesson
Powerpoint: Functions - this will be short as it is a reminder.
Powerpoint: Namespace.
Powerpoint: Memory management.
Video: Functions in Python
Video: Examples and identifying data types
Resource: Example code - Functions
Video: Live Coding
Subjects covered
- Short about functions
- Namespace in Python
- Memory management
Exercises to be handed in
The 2 first exercises are re-use from earlier. It is for a good purpose as will be seen later.
You go back to solo git use to maintain your git skills. Commit every exercise to your private exercise repository.
For many of the exercises you need to make a small program that uses your function in order to test it.
- Make a function fastaread(filename) which takes a filename as a parameter and returns 2 lists, first list is the headers, second list is the sequences (as single strings without whitespace). Add appropriate error handling to the function. You can test your function on the file dna7.fsa.
- Make a function fastawrite(filename, headers, sequences) which takes a filename, a list of headers and a corresponding list of sequences as parameters and writes the fasta file. Add appropriate error handling to the function. You can test your function on the file dna7.fsa. If you first read the file with fastaread and then write a new file with fastawrite, then if the files are identical, you know you have done right.
- Make a function normalize that takes as argument a list of numbers. The function normalizes the numbers between 0 and 1 and returns a normalized list. Normalization in this context is a linear transformation - min-max rescaling.
- Make a new normalize function based on the above. This time the function takes three arguments; the list, a min, and a max that the values should be scaled/normalized between The default value for min and max are 0 and 1.
- Make a program that reads the ex1.dat file and counts how many positive and negative numbers there are in each column. Display the result. Now use your latest normalize function to normalize the numbers in each column between -1 and 1, and then again count the numbers of positive and negative in each column. Display. All this in one program.
- This is the first part of a larger program - you might want to also read the next 3 exercises to get the full picture. You know that column based files can use different delimiters. The typical example of this is a tab-separated file, where the tab separates the columns. Other classic delimiters can be comma, colon, semicolon or the pipe sign. Now make a function, determineDelimiter, that as argument takes a line, investigates the line and determines if the delimiter is tab, comma, colon, semicolon or pipe sign in the preferred order, and return a single char which is the delimiter. It is required that the delimiter is present at least once in the line. If no delimiter can be found, return None.
- This is the second part. Sometimes column based files uses the first line as an identifying headline where each column gets a name that describes the data in the column. See the employee-data.csv file as an example. Make a function identifyColumn, that takes three arguments, a delimiter, a headline and a column name and return which column number the column name belongs to. Return None if a column can not be identified.
- This the the third part. You learned about input from command line and options last week. Now make a function parseCommand that analyzes the command line for input. Specifically the function should parse a command line that look like this:
someprog.py [-c <positiveIntegerList>] [-n <nameList>] <filename>
The two lists are comma-separated with no spaces. It must return a (possibly empty) list of numbers and a (possibly empty) list of names and a file name. The two options are mutually exclusive, which means it is an error to use both. - This is the fourth and final part. Use your 3 functions to make a program namedcut.py, that works a bit like the Unix cut command in the sense that it selects the columns to display. The program will use as input the employee-data.csv file. If you use the -c option, it will display the columns in the list in the given order, but stripped of the headline. If you use the -n option it will display the named columns in the list in the given order - no headline strip. No matter the original delimiter the output should always be tab-separated. If a column based file is missing a headline then using the -n option is an error. Note that you would benefit from making a usage function for easy error handling. If you have trouble using the functions in your program, think about the tasks/steps that are required to successfully make the program. Also, test your program on the ex1.dat file as the program is (should be) designed to work on that too. The number of Asians in the Ethnicity column is 404 in employee-data.csv