June2018

From 22125
Jump to: navigation, search

Algorithms in Bioinformatics - #36625

Information for participants

General Schedule

Lectures will be in the morning from 9.00 - 12.00, and exercises in the afternoon from 13.00 - 17.00.

The morning sessions will consist of lectures and small practical exercises introducing the different algorithms, and the afternoon sessions will consist of programming exercises where the algorithms will be implemented.

The main programming language will be C, and all program templates provided in the course will be written in C. Prior knowlegde of C programming is NOT required. However, basic programming skills are required to follow the course. LITERATURE: The course curriculum consists of review paper and selected chapters from Immunological Bioinformatics, Lund et al., MIT Press, 2005. All course material will be made available online during the course. All course material is available here Course material.

PROGRAMS AND TOOLS

  • For doing the exercises on our server you must be able to connect to the server using Secure Shell (SSH) and tunnel X through the connection. See informations on prerequisites on: Tools for SSH and X11 and How to install MobaXterm.
  • If you have problems login to the CBS server, try using the following link Login problems.

Course Programme

Please note that the programme is updated on a regular basis - click the 'refresh' button once in a while to make sure that you have the most updated information

LITERATURE:

  • The course curriculum consists of review paper and selected chapters from Immunological Bioinformatics, Lund et al., MIT Press, 2005. All course material will be made available online during the course. All course material is available here Course material

Monday, 2. January

Introduction to course, UNIX and C-programming crash course 101

Morten Nielsen
  • BACKGROUND TEXTS
C Programming course by Tom Macke.
  • 9.00 - 9.15
Introduction to course Introduction to course. [PDF] .
  • 9.15 - 9.35
Introduction to the immune system Introduction to the immune system. [PDF] .
  • 9.35 - 9.45
Performance measures Performance measures. [PDF] .
  • 9.45 - 10.00
Coffe break
  • 10.00 - 12.00
Unix crash course A UNIX/Linux crash course Answers to part 3 (gawk)
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Doing it on your local machine C-programming crash course 101
  • First part (loops, data-structure, sub-routines, variables, input/output) C-programming crash course, part 1 C-programming crash course 101
  • Second part (Linked lists, dynamic memory allocation, pointers) C-programming crash course, part 2.
  • Answers to C-programming exercises part 1 and 2

Tuesday, 3. January

Your first C program, Weight matrix (PSSM) construction, and Psi-Blast

Morten Nielsen
  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 4.
  • 9.00 - 9.15
Questions to yesterdays lectures/exercise A few notes on sequence alignment Some notes on sequence alignment [PDF] .
  • 9.15 - 10.30
Some notes on command line parsing. [PDF]. The development of the first c-program working using command line parsing. C-programming - Part 3. Your first c program. Answers to C-programming exercise - Part 2.
  • 10.30 - 10.45
Blosum scoring matrices [PDF] .
  • 10.45 - 12.00
Weight matrix construction. [PDF]. Handout. Estimation of pseudo counts Answer
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Implementation of PSSM construction from pre-aligned sequences including pseudo count correction for low counts and sequence clustering
PSSM construction and evaluation
PSSM answers

Wednesday, 4. January

Sequence alignment and Dynamic programming

Morten Nielsen
  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 3.
A General method applicable to search for simularities in the amino acid sequences of two proteins. S. B. Needleman and C. D Wunsch J.Mol. Biol (1970), 48.
Identification of common molecular subsequences. T. F. Smith and M. S. Waterman. J. Mol. Biol. (1981), 147.
An improved algorithm for matching biological sequences. O. Gotoh. J. Mol. Biol. (1982), 162.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Altschul SF et al. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.


  • 9.00 - 9.15
Questions to yesterdays lectures/exercises
  • 9.15 - 11.00
Sequence alignment [PDF] . Handout (O3) Handout (O2) Handout answers
  • 11.00 - 12.00
Blast alignment heuristics, Psi-Blast, and sequence profiles [PDF] . Psi-Blast handout.
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Implementation of the Smith-Waterman Dynamic programming algorithm Matrix dumps from alignment programs (to be used for debugging)
Answers to sequence alignment exercise

Thursday, 5. January

Data redundancy reduction algortihms Optimizations methods Gibbs sampling

Morten Nielsen
  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 3.
Selection of representative protein data sets. U. Hobohm et al. Protein Science. 1992.
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.Bioinformatics. 2004 Jun 12;20(9):1388-97.
Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.Bioinformatics. 2012 Oct 24. [Epub ahead of print]
  • 9.00 - 9.15
Questions to yesterdays lectures
  • 9.15 - 10.00
Data redundancy reduction algorithms (Hobohm1 and Hobohm2). [ PDF].
  • 10.00 - 10.45
Optimization procedures - Gradient decent, Monte Carlo Optimization procedures [PDF] GD handout
  • 10.45 - 11.00
Break
  • 11.00 - 12.00
Gibbs sampling and Gibbs clustering Gibbs sampling. [PDF] .
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Hobohm redundancy reduction algorithms Answers to Hobohm programming exercise Implementating of a Gibbs sampling algorithm for prediction of MHC class II binding Answers

Friday, 6. January

Hidden Markov Models

Morten Nielsen
  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 3.
Notes by Anders Krog.
What is a hidden Markov model by S. R. Eddy.
  • 9.00 - 9.15
Questions to yesterdays lectures
  • 9.15 - 11.00
Hidden Markov models (with a break around 10.30) Viterbi decoding, Forward/Backward algorithm, Posterior decoding, Baum-Welsh learning HMM slides [ PDF]. Viterbi Handout Answers Forward Handout Answers
  • 11.00 - 12.00
Profile Hidden Markov Models.
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Implementation of Viterbi and posterior decoding. Hidden Markov exercises Answer to Hidden Markov exercises

Monday, 9. January

Cross validation and training of data driven prediction methods. Stabilization matrix method (SMM)

Morten Nielsen


  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 4.
Peters et al., Bioinformatics. 2003 Sep 22;19(14):1765-72. ONLY pages 1-2.
  • 9.00 - 9.15
Questions to yesterdays lectures/exercise
  • 9.15 - 9.45
Cross validation and training of data driven prediction methods Cross-validation, overfitting and method evaluation. [PDF] .
  • 9.45 - 10.15
Stabilization matrix method (SMM) background SMM background. [PDF] . SMM handout
  • 10.15 - 10.30
Break
  • 10.30 - 12.00
Implementing and evaluating SMM algorithms using cross-validation
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Continuation of exercise Answers

Tuesday, 10. January

Artificial neural networks - I. Sequence encoding and feedforward algorithm

Morten Nielsen


  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 4.
Background
Sequence encoding
Feed forward algorithm
Back-propagation and neural network training
  • 9.00 - 9.15
Questions to yesterdays lectures/exercise
  • 9.15 - 10.30
Artificial neural networks. [PDF] . Handout
  • 10.30 - 10.40
Break
  • 10.40 - 12.00
Network training - backpropagation Training of artificial neural networks.. [PDF] . Handout
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Implementation of sequence encoding and feed forward algorithm Network part I answers Implementation of Implementation of back-propagation and neural network training Network part 2 answers

Wednesday, 11. January

Project work and an introduction to the Theano artificial neural network library

Morten Nielsen


  • BACKGROUND TEXTS
Immunological Bioinformatics. MIT Press. Chapter 4.
  • 9.00 - 9.15
Questions to yesterdays lectures/exercise
  • 9.15 - 10.00
Description of potential projects and formation of groups Project suggestions, and descriptions.
  • 10.00 - 11.00
NNAlign, alignment using ANN's [PDF] Trick for ANN training [PDF.
  • 11.00 - 12.00
The Lasagne artificial neural network library (Vanessa Jurtz) [PDF].
  • 12.00 - 13.00
Lunch
  • 13.00 - 17.00
Using the Lasagne python library to construct ANN models (Vanessa Jurtz)
Exercise.
Google doc to share hyperparameters.

Thursday 12. - Wednesday 18. January.

Project work
No lectures. Project work Projects must be submitted (in PDF format) via campusnet Wednesday 18. of January 23.59 at the latest.

Friday, 20. January. 8.30-17.50

Project evaluation and Exam (in building 208 room 062, where we had the classes)
Program exam