Simple pattern matching

From 22116
Jump to navigation Jump to search
Previous: Functions Next: Set techniques

Required course material for the lesson

Powerpoint: Simple pattern matching

Subjects covered

String pattern matching, stripping, replacement, translation.
Pattern matching is about 2 things: WHAT is the pattern and WHERE is the pattern.
It is important to AVOID matching the wrong place.

Exercises to be handed in

The first 4 exercises will all have to do with the files data1-4.gb, which are various Genbank entries of genes. First you should study the files, notice the structure of the data. In all exercises you will have to parse (read and find the wanted data) the files using pattern matching. Every exercise adds to the previous ones, so the final program can do a lot. Remember. Your program should be able to handle all files, but just one at a time.

  1. Extract the accession number, the definition and the organism (and print it).
  2. Extract and print all MEDLINE article numbers which are mentioned in the entries.
  3. Extract and print the translated gene (the amino acid sequence). Look for the line starting with /translation=. Generalize; An amino acid sequence can be short, i.e. only one line in the feature table, or long, i.e. more than one line in the feature table. Use stateful parsing.
  4. Extract and print the DNA (whole base sequence in the end of the file). Use stateful parsing.
  5. Your program already extracts the DNA. Make it also extract the positions of the coding sequence. Display them. You can recognize it by the line that starts with "     CDS     join("
  6. Now that you have the DNA and the positions in the DNA for the coding sequence, extract the coding DNA directly from the DNA, such that you can display the actual gene. Verify that you have done so correctly by thinking about the properties of coding DNA sequence.
  7. Find all titles and authors using stateful parsing. Put every author/title record (i.e. all authors/title in one string) in one list. Display all neatly with title on one line, authors on the next and a separator line.
  8. Returning all the way back to lesson 6, Stateful parsing, exercise 6. Given the content of today's lecture, how would you make this exercise now?

Exercises for extra practice

You will program a reverse polish notation calculator. This is somewhat complicated, so I will split up the job in a number of smaller exercises, where the one exercise builds on the previous. Shortly, reverse polish notation is a mathematical notation in which operators follow their operands, i.e. adding two numbers is written by "3 4 +" instead of "3 + 4". We will utilize a stack, which is simply a list, where elements are added to and removed from the bottom.

  • Let's start with making the overall structure; a loop that asks for input, which is one item/command at a time. Inside the loop, there can be a number of operations depending on the input. If you enter a number, it should be simply appended to your list (the stack, which is the word I will use in the rest of the exercises here). If you enter commands STOP, EXIT or QUIT, the loop should end and the program stop. If you enter something that the program does not understand it should write "Illegal command", but otherwise do nothing.
  • Add the following commands: PEEK, which displays the last number in the stack, and SHOW which will display the entire stack.
  • Add the standard math operations, +, -, * and /: These remove 2 numbers from the stack (the last number in stack is the first operand and the second last number is the second operand), perform the operation and appends the result back to the stack. You notice operand order matters for - and /. At this point we have a simple but functional calculator.
  • Add single number operations LOG and SQRT, which replaces the last number in the stack with the log/square root of the number.
  • Solidify your program by ensuring any operations first check if the required number of operands are on the stack before trying to perform the operation. Display "Not enough operands on the stack" and otherwise do nothing if the check fails. Hint: Maybe lists of zero, one and two operands commands could be useful.
  • Add the commands: DROP which simply removes the last number on the stack, =, which removes the last number on the stack and displays it, SWITCH, which switches the last two numbers on the stack, and DUP which duplicates the last number on the stack, i.e stack consists of 1, after DUP it consists of 1 1.
  • Add the commands: ABS, NEG which replaces the last number on the stack which the absolute/negative value of the number.
  • At this point you can probably work out some mathematical operations to add that require one or 2 operands by yourself. I can think of the SIN, COS & TAN operations, the INT taking the integer part of a number, EXPonential and SQUARE. Modulo operation with %. FACtorial. You could add constants like PI and E that simply puts an appropriate value on the stack.
  • Add stack control with CLEAR to empty the stack and extend the SWITCH to switch the numbers as two given position like "SWITCH 2 4" if given positions.