What affects performance: Difference between revisions

From 22112
Jump to navigation Jump to search
No edit summary
 
(One intermediate revision by the same user not shown)
Line 7: Line 7:
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=495a9551-22b7-4253-8857-af270123ef1b Improve performance]<br>
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=495a9551-22b7-4253-8857-af270123ef1b Improve performance]<br>
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=1f931856-2307-4e26-8ada-af270123c811 Random access in Python]<br>
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=1f931856-2307-4e26-8ada-af270123c811 Random access in Python]<br>
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22112/HPCLife04-Efficiency.ppt Efficiency]<br>
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22112/22112_07-Efficiency.ppt Efficiency]<br>
Resource: [[Example code - memoization]]<br>
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=4d74bc2b-93ec-4aab-aadd-af270123a092 What is expected in exercises]
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=4d74bc2b-93ec-4aab-aadd-af270123a092 What is expected in exercises]



Latest revision as of 10:41, 17 June 2024

Previous: Distributed computing Next: Algorithms

Material for the lesson

Video: File Systems
Video: Improve performance
Video: Random access in Python
Powerpoint: Efficiency
Resource: Example code - memoization
Video: What is expected in exercises

Exercises

1)
Make a program that can index a fasta file, that is find all the positions in the file where a header starts and ends where a sequence starts and ends, so 4 numbers per entry.
The result (first numbers from human.fsa shown) should be printed like :

0 71 72 253105767
253105768 253105839 253105840 499335927
499335928 499335999 499336000 700936484
700936485 700936556 700936557 894321354
894321355 894321426 894321427 1078885323

Use the *.fsa files for practice.

2)
Make a program that can be given a fasta file and the 4 numbers from above on command line, and read and print a single entry.

3)
Think about speed. Insert ways to measure the performance. Don't use real profiling from lecture 3, but time the code using the python time module.
Make the programs in 1 and 2 work faster. 1 can be tricky, since the way to read data fast does not give precise information about the file pointer.
Perhaps chunk reading is good.
Consider that some fasta files have small entries, others have large entries. What impact does that have on your method?

For the record, I could do the indexing in 2.5 seconds on computerome, and 6.5 seconds on my laptop for human.fsa. I would have expected computerome to be slower.
There can be many reasons for the speed difference, but the main one here is computerome has more memory - more file buffers. It has simply stored the file in memory, and when I repeatedly test it does not need to go to the disk except the first time.
Another test on computerome gave me 34,6 seconds for indexing when the file was not in the file buffers and 1.6 sec when the file was in the file buffers.