Queueing System

Material for the lesson

Powerpoint: Queueing System
IUPAC nucleotide codes: Read before doing exercise

Exercises

The exercises must work with the human genome in the fastafile human.fsa and you can develop/test your program against the small scale humantest.fsa file. You must use the Queueing System and sbatch.

Make (or reuse) a program that reads a fasta file and finds the complement strand for each entry, and saves the result in a new file. Keep it simple. Make sure it works. Speed is not important in this step.
Speed is still not important. Add this functionality to your program: Count the bases and unknowns in the entry and add the counts to the header line, like >seq01 A:3450 T:45665 C:34576 G:142345 N:5462
You need to increase the performance of your program. Experiment with various ideas of how to increase the speed. You got some ideas last lecture and you should also be using your Python knowledge. Document your experiments with a line or two as comments in your program. This exercise is likely the one that takes the most time to complete and the shortest to run.

I solved this problem in 222 seconds on a server, however time vary, as I had also a run using 527 seconds with the same code - IO from other users can really affect your time. I had a "slow" version run on a server in over 1000 seconds. On my laptop with SSD I solved the problem in 100 seconds.

Queueing System

Material for the lesson

Exercises

Navigation menu

Search