Queueing System

From 22112
Revision as of 10:24, 17 June 2024 by WikiSysop (talk | contribs) (→‎Material for the lesson)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Previous: Computer design Next: Distributed computing

Material for the lesson

Video: Introduction to the Queueing System
Video: Submitting jobs
Video: Queue control and practical advice
Powerpoint: Queueing System
IUPAC nucleotide codes: Read before doing exercise
Video: Exercises

Exercises

The exercises must work with the human genome in the fastafile human.fsa and you can develop/test your program against the small scale humantest.fsa file. Use/copy the jobscript-template.sh as a template for your runs with the big file. You must use the Queueing System and qsub.

  1. Make (or reuse) a program that reads a fasta file and finds the complement strand for each entry, and saves the result in a new file. Keep it simple. Make sure it works. Speed is not important in this step.
  2. Speed is still not important. Add this functionality to your program: Count the bases and unknowns in the entry and add the counts to the header line, like >seq01 A:3450 T:45665 C:34576 G:142345 N:5462
  3. You need to increase the performance of your program. Experiment with various ideas of how to increase the speed. You got some ideas last lecture and you should also be using your Python knowledge. Document your experiments with a line or two as comments in your program. This exercise is likely the one that takes the most time to complete and the shortest to run.

I solved this problem in 222 seconds on Computerome, however time vary, as I had also a run using 527 seconds with the same code - IO from other users can really affect your time. I had a "slow" version run on computerome in over 1000 seconds. On my laptop with SSD I solved the problem in 100 seconds.