Pairwise alignment
Description
Aligning sequences is of great importance in bioinformatics. Many discoveries are based on finding sequences that align to each other. Evolution theory and phylogeny are based on sequence alignments. This project is about implementing a well-known algorithm for aligning two sequences, i.e. finding where they match in an optimal fashion.
You must choose to implement either:
- Smith-Waterman alignment where the goal is to find the best local alignment of the two sequences given as input, i.e. the optimal alignment that covers most/best of both sequences.
- Needleman-Wunsch alignment where the goal is to find the best global alignment of the two sequences given as input, i.e. the optimal alignment that covers all of at least one sequence.
- Or both if you are cool :-)
Input and output
The input is just a fasta file with two sequences, that should be aligned.
The output should be the the best alignment with clear notation where it is in both sequence inputs.
Note: Pairwise alignment works for both DNA and protein sequences.
Examples of program execution:
align.py <fastafile> align.py fastafile.fsa
Details
Fasta file: Similar dna sequences coding for insulin.
Wikipedia: Smith-Waterman alignment.
Wikipedia: Needleman-Wunsch alignment.
Google: book on alignment.
Note: Investigate substitution matrices, https://en.wikipedia.org/wiki/BLOSUM and https://en.wikipedia.org/wiki/Point_accepted_mutation