<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22126/index.php?action=history&amp;feed=atom&amp;title=SNP_calling_exercise_part_1_answers</id>
	<title>SNP calling exercise part 1 answers - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22126/index.php?action=history&amp;feed=atom&amp;title=SNP_calling_exercise_part_1_answers"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22126/index.php?title=SNP_calling_exercise_part_1_answers&amp;action=history"/>
	<updated>2026-05-12T04:42:37Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22126/index.php?title=SNP_calling_exercise_part_1_answers&amp;diff=150&amp;oldid=prev</id>
		<title>Gabre: Created page with &quot;&#039;&#039;&#039;Q1&#039;&#039;&#039;  First, running: &lt;pre&gt; tabix -f -p vcf NA24694.gvcf.gz &lt;/pre&gt; then &lt;pre&gt; gatk --java-options &quot;-Xmx10g&quot; HaplotypeCaller    -R /home/databases/references/human/GRCh38_full_analysis_set_plus_decoy_hla.fa    -I /home/projects/22126_NGS/exercises/snp_calling/NA24694.bam  -L chr20  -O NA24694.gvcf.gz  --dbsnp  /home/databases/databases/GRCh38/Homo_sapiens_assembly38.dbsnp138.vcf.gz  -ERC GVCF  &lt;/pre&gt;  &lt;pre&gt; gatk  GenotypeGVCFs -R /home/databases/references/human/GRCh3...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22126/index.php?title=SNP_calling_exercise_part_1_answers&amp;diff=150&amp;oldid=prev"/>
		<updated>2024-12-15T15:27:53Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;Q1&amp;#039;&amp;#039;&amp;#039;  First, running: &amp;lt;pre&amp;gt; tabix -f -p vcf NA24694.gvcf.gz &amp;lt;/pre&amp;gt; then &amp;lt;pre&amp;gt; gatk --java-options &amp;quot;-Xmx10g&amp;quot; HaplotypeCaller    -R /home/databases/references/human/GRCh38_full_analysis_set_plus_decoy_hla.fa    -I /home/projects/22126_NGS/exercises/snp_calling/NA24694.bam  -L chr20  -O NA24694.gvcf.gz  --dbsnp  /home/databases/databases/GRCh38/Homo_sapiens_assembly38.dbsnp138.vcf.gz  -ERC GVCF  &amp;lt;/pre&amp;gt;  &amp;lt;pre&amp;gt; gatk  GenotypeGVCFs -R /home/databases/references/human/GRCh3...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Q1&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
First, running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix -f -p vcf NA24694.gvcf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
then&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gatk --java-options &amp;quot;-Xmx10g&amp;quot; HaplotypeCaller    -R /home/databases/references/human/GRCh38_full_analysis_set_plus_decoy_hla.fa    -I /home/projects/22126_NGS/exercises/snp_calling/NA24694.bam  -L chr20  -O NA24694.gvcf.gz  --dbsnp  /home/databases/databases/GRCh38/Homo_sapiens_assembly38.dbsnp138.vcf.gz  -ERC GVCF &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gatk  GenotypeGVCFs -R /home/databases/references/human/GRCh38_full_analysis_set_plus_decoy_hla.fa -V NA24694.gvcf.gz  -L chr20 --dbsnp  /home/databases/databases/GRCh38/Homo_sapiens_assembly38.dbsnp138.vcf.gz -O NA24694.vcf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bcftools stats  NA24694.vcf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Should give you:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
SN	0	number of SNPs:	75684&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
so 75684 SNPs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q2&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
First run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix -p vcf NA24694.vcf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix  NA24694.vcf.gz chr20:32000000-33000000 |wc -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
So 1290 variant sites.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q3&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
There are 2 ways to do this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bcftools view -H --type=snps NA24694.vcf.gz  chr20:32000000-33000000 |wc -l &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix -h NA24694.vcf.gz chr20:32000000-33000000 |bcftools view -H --type=snps -  |wc -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Both will give you: 956.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q4&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix NA24694.vcf.gz  chr20:32011209-32011209&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
You get:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chr20	32011209	rs147652161	G	A	264.64	.	AC=1;AF=0.500;AN=2;BaseQRankSum=-3.010e-01;DB;DP=24;ExcessHet=3.0103;FS=3.949;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=11.03;ReadPosRankSum=-3.580e-01;SOR=0.552	GT:AD:DP:GQ:PL	0/1:15,9:24:99:272,0,533&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The line above says that you probably have G and A and the site is heterozygous. G is the reference and A the alternative allele. The allele depth is 15Gs, 9As, the depth is 24, the genotype quality is 99, the PHRED genotype likelihoods are homo ref, hetero, homo alt: 272,0,533&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tabix NA24694.vcf.gz  chr20:32044279-32044279&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
You get:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chr20	32044279	rs4525768	C	T	799.06	.	AC=2;AF=1.00;AN=2;DB;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.99;SOR=0.990	GT:AD:DP:GQ:PL	1/1:0,21:21:63:813,63,0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The line above says that you are probably homozygous for T. C is the reference and T the alternative allele. The allele depth is 0Cs, 21Ts, the depth is 21, the genotype quality is 63, the PHRED genotype likelihoods are homo ref, hetero, homo alt: 813,63,0. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q5&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Both are heterozygous sites however this is the better one:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chr20	32974911	rs6088051	A	G	403.64	.	AC=1;AF=0.500;AN=2;BaseQRankSum=0.693;DB;DP=22;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.78;MQRankSum=-6.860e-01;QD=19.22;ReadPosRankSum=-1.703e+00;SOR=0.871	GT:AD:DP:GQ:PL	0/1:8,13:21:99:411,0,247&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the worse one:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chr20	64291638	rs369221086	C	T	30.64	.	AC=1;AF=0.500;AN=2;BaseQRankSum=-2.530e-01;DB;DP=6;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=28.18;MQRankSum=-9.670e-01;QD=5.11;ReadPosRankSum=-8.420e-01;SOR=0.693	GT:AD:DP:GQ:PL	0/1:4,2:6:38:38,0,114&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first site has a depth of 21, 8As and 13Gs. In the second case, the genotype quality is 38 and the first was 99. The second one only has 4Cs and 2 Ts for a total depth of 6 which is not sufficient to confidently call a heterozygous site. &lt;br /&gt;
&lt;br /&gt;
  &lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q6&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Either use:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bcftools view -H --types=snps  NA24694.vcf.gz chr20:32000000-33000000 |cut -f 3 |grep -v rs |wc -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bcftools view -H --types=snps  NA24694.vcf.gz chr20:32000000-33000000 |cut -f 3 |grep &amp;quot;\.&amp;quot; |wc -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you will get 17. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Q7&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
17 is very little compared to the number of SNPs 956. However, this is very expected. Given that the individual is Han Chinese, this ethnic group is very well represented in dbSNP.&lt;/div&gt;</summary>
		<author><name>Gabre</name></author>
	</entry>
</feed>