Denovo solution: Difference between revisions
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
Q6. 1 | Q6. 1 | ||
Q7. Repeat regions + | Q7. Repeat regions + misassemblies | ||
Q8. Contaminations + | Q8. Contaminations + misassemblies | ||
Q9. Because we use the reference genome as the truth it may be hard to distinguish what is a misassembly and what is true variation from the reference genome. | Q9. Because we use the reference genome as the truth it may be hard to distinguish what is a misassembly and what is true variation from the reference genome. | ||
Q10. This is just visual, but it seems that | Q10. This is just visual, but it seems that a lot of the reference genome is covered by our assembly, so yes. | ||
Q11. | Q11. very few and the K119.81 only maps partially. This could be a sequence in our strain, but not in the reference genome. Or a misassembly. | ||
Q12. This is a region with a lot of repeats, this is also why we | Q12. This is a region with a lot of repeats, this is also why we can't really assemble it. It is used by V. cholerae to integrate new genes into its genome. | ||
Q13. The Nanopore assembly only has 2 contigs and pacbio 1! | Q13. The Nanopore assembly only has 2 contigs and pacbio 1! | ||
<!-- Q14. The 454 assembly was best. --> | <!-- Q14. The 454 assembly was best. --> |
Revision as of 15:49, 28 November 2024
Q1. Illumina
Q1A. discarded contains reads that are too short, pair1 and pair2 files contain the read pairs were both passed trimming and singleton are reads were one of the two pairs were discarded.
Q2. Around 84
Q3. N = (M*L)/(L-K+1) = (84*99)/(99-15+1) = 97.84 Genome_size = T/N = (213721367+212523694)/97.84 = 4.35Mb
Q4. Mean = 259 ; SD = 11
Q5. It is higher N50:179846 than the best we found at k=79 N50:92020
Q6. 1
Q7. Repeat regions + misassemblies
Q8. Contaminations + misassemblies
Q9. Because we use the reference genome as the truth it may be hard to distinguish what is a misassembly and what is true variation from the reference genome.
Q10. This is just visual, but it seems that a lot of the reference genome is covered by our assembly, so yes.
Q11. very few and the K119.81 only maps partially. This could be a sequence in our strain, but not in the reference genome. Or a misassembly.
Q12. This is a region with a lot of repeats, this is also why we can't really assemble it. It is used by V. cholerae to integrate new genes into its genome.
Q13. The Nanopore assembly only has 2 contigs and pacbio 1!