Pairwise alignment

Description

Aligning sequences is of great importance in bioinformatics. Many discoveries are based on finding sequences that align to each other. Evolution theory and phylogeny are based on sequence alignments. This project is about implementing a well-known algorithm for aligning two sequences, i.e. finding where they match in an optimal fashion.

You must choose to implement either:

Smith-Waterman alignment where the goal is to find the best local alignment of the two sequences given as input, i.e. the optimal alignment that covers most/best of both sequences.
Needleman-Wunsch alignment where the goal is to find the best global alignment of the two sequences given as input, i.e. the optimal alignment that covers all of at least one sequence.
Or both if you are cool :-)

Input and output

The input is just a fasta file with two sequences, that should be aligned.
The output should be the the best alignment with clear notation where it is in both sequence inputs.
Note: Pairwise alignment works for both DNA and protein sequences.

Examples of program execution:

align.py <fastafile>
align.py fastafile.fsa

Details

Fasta file: Similar dna sequences coding for insulin.
Wikipedia: Smith-Waterman alignment.
Wikipedia: Needleman-Wunsch alignment.
Google: book on alignment.
Note: Investigate substitution matrices, https://en.wikipedia.org/wiki/BLOSUM and https://en.wikipedia.org/wiki/Point_accepted_mutation

Pairwise alignment

Description

Input and output

Details

Navigation menu

Search