Lists/Sequences

From 22101
Jump to navigation Jump to search
Previous: Stateful Parsing Next: More with Lists

Required course material for the lesson

Powerpoint: Lists and Sequences
Video: The basics of lists in python
Video: Working with lists, modification

Subjects covered

Lists and sequences.
Lists methods for list manipulation;

  • changing an element
  • adding/removing an element
  • adding/removing more elements
  • adding/removing elements at specific places

Exercises to be handed in

  1. Make a program that ask for words - one at a time - and saves them in a list (one word per element) until you write STOP. When STOP is entered, write the words in entered order in a file called words.txt.
  2. Searching for accession numbers. In the file ex5.acc there are 6461 unique GenBank accession numbers (taken from HU6800 DNA array chip). An inexperienced bioinformatician unfortunately fouled up the list, so many of the accession numbers appears more than once. Make a program that first reads the file into a list, then ask for an accession number and counts how many times it appears in the list and displays the result.
  3. Different search for accession numbers. This program is similar to the above. First read the file into a list, then continue to ask for accession numbers and check if they are in the list (yes/no, not a count). Keep searching for accession numbers until STOP is entered. Hint: this is 2 loops inside one another.
  4. You need to clean up the ex5.acc file. The first step is to sort the accession numbers alphabetically. You must program a sorting algorithm. There are many different algorithms for sorting, but let's pick a simple one - Bubble Sort.
    It goes like this. Read the accessions into a list like the previous exercises. Go through the list looking at pairs of accessions (at position i & i+1). If a pair is in the wrong order, you switch them. Repeat going through the list until you have gone through the entire list without switching once. Now the list is sorted and you save the list in the sorted5.acc file.
    Note: There is room for optimization in the described algorithm and it is in any case not the most efficient method. You are free to implement a different method.
  5. It is now time to find the unique accession numbers, so you only have one of each - no duplicates. Read the accessions from sorted5.acc into a list. Since the list is now sorted, the duplicates are "next" to each other, which makes them easy to find. Make a new list with the unique accessions from the old list, and save that list in the file clean5.acc. Check that you have 6461 accessions, one per line.
  6. In this exercise you have to do the same and achieve the same result in a different way as the previous exercise. Instead of making a new list with the unique accessions, just keep the old list and remove the duplicates using del or pop. If you run into trouble imagine your code executed on this list: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

Exercises for extra practice

  • Read two files with numbers into two lists. Files ex1_1.dat, ex1_2.dat & ex1_3.dat are good to use for this. Now add the numbers together, like the first number in the first file with the first number in the second file, and so forth, i.e. add the column in the file row-wise. You could also consider the numbers in a file to be a vector, and then you add two vectors together. Save the output (a list of numbers, one number per line) in a new file - you decide the name.
  • Continuing previous exercise: Corrupt your input files a bit, change some numbers to not-numbers, like words or empty lines. Make your program work on these files too, assuming anything you can not read is ignored - you can "pretend" that there is a zero on the line you can not convert to a number.
  • Continuing previous exercise: Make the program work even if the input files do not have the same length, just assume zero for the missing lines in the short file. BTW, you do not know which input is short.
  • Read a file with numbers into two lists. The first number in the file goes to the first list, the second number to the second list, the third number to the first list, and so forth - alternating numbers to alternating lists. Compute the sum of the two lists and display. Files ex1_1.dat, ex1_2.dat & ex1_3.dat are good to use for this.
  • Now compute the same sums - without the lists :-)
  • Ask for files names and read the numbers in the file into a list. Continue to ask for file names and add the numbers to the list until you enter STOP. Now compute the sum of the numbers in the list and display the result. Files ex1_1.dat, ex1_2.dat & ex1_3.dat are good to use for this.