I todays exercise you shall implement two algorithms for alignment of a sequence to a hidden Markov Model, The Viterbi, and Posterior decoding algorithms.
First you must access the program templates of today exercise
Download the file HMM.tar.gz file
Open the file (using tar -xzvf HMM.tar.gz) and place the created Align directory in the "Algo/code" directory.
Now the HMM directory should contain two Jupyter-Notebook files
viterbi.ipynb posterior_decoding.ipynb
Check that you have the needed data files ready
ls -ltr Algo/data/HMM
This directory should contain two files
casino.seq casino.seqlong
If not you need to need to download a data directory.
Download the file HMM_data.tar.gz file
Place the file in the "Algo/data" directory, and open the file (using tar -xzvf HMM_data.tar.gz). Now you should have the HMM directory containing the two files
Now go back to the HMM code directory.
The first program viterbi.ipynb implements the (guess what!) Viterbi decoding algorithm, and the second program posterior.ipynb implements the forward/backward posterior decoding algorithm.
Open the viterbi.ipynb program. Most the code is (as usual) pre-written. Go through the code. Make sure you understand the structure of the program. In particular make sure you understand the "transition_matrix" and "emission_probs" variables and how they are indexed.
You shall fill in the missing code (XXXX).
Next, run the viterbi program on the sequence from todays handout exercise "566611234". Can you reproduce the output?
Next, run the sequence "31245366664". Look at the output. How does the output compare to the casino.seq_vit?
Do the same for the "34512331245366664666563266". How does the output compare to the casino.seqlong_vit?
Download the code as a python program, and add a commandline parser with the option
optional arguments: -h, --help show this help message and exit -i INPUT_SEQUENCE Input file
and try the code with the two files "casino.seq", and "casino.seqlong" in the data/HMM directory.
Now open the posterior.ipynb program. Spend some time to make sure you understand the structure of the program. Fill in the missing code (XXXXXX).
Test your posterior decoding code on the two input data from before 31245366664, and 34512331245366664666563266.
How does the output compare to casino.seq_post and casino.seqlong_post?
Make sure you understand how the output from the posterior decoding for the same observation is modified depending on flanking observations.
Download the code as a python program, and add a commandline parser with the options
optional arguments: -h, --help show this help message and exit -i INPUT_SEQUENCE Input file -s STATE State
Test the program on the "casino.seq", and "casino.seqlong" files in the data/HMM directory.
Now you done. Remember to upload the two python programs via DTU-Learn.