Postprocess exercise answers

From 22126
Jump to navigation Jump to search

Q1 Running:

java -jar /home/ctools/picard_2.23.8/picard.jar MarkDuplicates -I /home/projects/22126_NGS/exercises/dupremoval/ERR016028_chr20_sort.bam -M  ERR016028_chr20_sort_markdup.metrics.txt -O ERR016028_chr20_sort_markdup.bam

The log should state:


Marking 9798 records as duplicates.

Please note that this is very low but that is because we have very little data so that it runs faster.


Q2

They do not have the same sequence:

ERR016028.5947720  ACATGTGGCTAATTTTTTTTACTGTTGTGGAGAAAGGAGGAGGGAGAGGGGAGTCTCATTATCTTGCCCAGGCTAG
ERR016028.18808080 ACATGTGGCTAATTTTTTTTACTGTTGTGGAGAAAGGAGGAGGGAGAGGGGAGTCNCATTATCTTGCCCAGGCTAG

notice "TCTCA" vs "TCNCA" but they both have the same starting coordinate (45996739).

Q3

ERR016028.18808080 is the read marked as duplicate. It is the read whose flag (2nd field) changed from 163 to 1187, which corresponds to a duplicate (see https://broadinstitute.github.io/picard/explain-flags.html).

Q4

The correct command is:

samtools merge 

If you choose:

samtools cat 

It will merely concatenate the files meaning that they will be file1, file2, file3.. It will not necessarily be sorted.

The full command should look something like this:

samtools merge -c --write-index HG00418_chr20_sort_markdup.bam   ERR016028_chr20_sort_markdup.bam  /home/projects/22126_NGS/exercises/dupremoval/ERR016025_chr20_sort_markdup.bam 

Q5

It is RG which stands for read group. You will see them at the end of reads:

	RG:Z:ERR016025
	RG:Z:ERR016028

If it was RG:Z:ERR016025 it was from the file that was stored, RG:Z:ERR016028 was from the file you generated.

Q6 multiplexing

Q7 demultiplexing