User contributions for WikiSysop
Jump to navigation
Jump to search
19 March 2024
- 16:4816:48, 19 March 2024 diff hist 0 N File:Raw sampledepth.png No edit summary current
- 16:4716:47, 19 March 2024 diff hist 0 N File:Raw richnessVSshannonZoom.png No edit summary current
- 16:4716:47, 19 March 2024 diff hist 0 N File:Raw shannon.png No edit summary current
- 16:4616:46, 19 March 2024 diff hist 0 N File:Raw richness.png No edit summary current
- 16:4516:45, 19 March 2024 diff hist +10,393 N QuantitativeMetagenomics Created page with " <H3>Overview</H3> If you need to use metagenomics for your final project, we have a more thorough workflow that you will need to use https://teaching.healthtech.dtu.dk/22136/index.php/22136:Course_plan_autumn_2020 here. Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance! In this exercise we have done the assembly and counting across a cohort of..." current
- 16:4416:44, 19 March 2024 diff hist 0 N File:DDseq 350OTU.png No edit summary current
- 16:4416:44, 19 March 2024 diff hist 0 N File:DDseq 100OTU.png No edit summary current
- 16:4416:44, 19 March 2024 diff hist 0 N File:PrincipalComponents.png No edit summary current
- 16:4316:43, 19 March 2024 diff hist 0 N File:Rawkaijubar.png No edit summary current
- 16:4316:43, 19 March 2024 diff hist +3,019 N Kaiju solution Created page with "<b> Q1: What is nr_euk? And how do the choice of database influence the results of Kaiju?</b> nr stands for non-redundant and indicates, that each entry is only found once within the database. BLASTS' nr database contains bacteria, archea and fungi. euk indicates that microbial eukaryotes and fungi are included. The database should be intended as the target of the search, so it is of importance that it contains the organisms you are searching for. <b> Q2: Explain the..." current
- 16:4216:42, 19 March 2024 diff hist +21,585 N Kaiju exercise Created page with "<!-- =22136 Metagenomics and Microbiome Analysis Kaiju exercise= --> ==Introduction== [http://kaiju.binf.ku.dk/ Kaiju] is a protein-based sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments. <br> Kaiju translates metagenomic sequencing reads into the six possible reading frames and searches for maximum exact matches (MEMs) of amino acid sequences in a given database of annotat..." current
- 16:4116:41, 19 March 2024 diff hist +2,972 N Longread exercise answers Created page with "'''Q1''' Counting all the lines minus the header gives us: <pre> zcat BGI_hg38_chr20.vcf.gz |grep -v "^#"|wc -l </pre> 1878 variants '''Q2''' We can try the following: <pre> zcat BGI_hg38_chr20.vcf.gz |grep -v "^#" |cut -f 10 |sed "s/:.*//g"|sort | uniq -c |sort -n </pre> <OL> <LI>grep -v "^#" grep -v: Inverts the match, i.e., selects lines that do not match the given pattern. "^#": The pattern to match lines starting with a hash (#). These lines are usually he..." current
- 16:4116:41, 19 March 2024 diff hist +7,233 N Longread exercise Created page with "<H2>Overview</H2> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "longread" <LI>Navigate to the directory you just created. </OL> We will phase some variants using [https://www.biorxiv.org/content/10.1101/085050v2 WhatsHap] (no not the messaging app). First, what is phasing? Phasing means that we determine which base is on the same chromosome as another base for neighboring variants. Let's consider a small example with just two varia..." current
- 16:3916:39, 19 March 2024 diff hist 0 N File:Rnaseq fig3.png No edit summary current
- 16:3916:39, 19 March 2024 diff hist 0 N File:Rnaseq fig2.png No edit summary current
- 16:3916:39, 19 March 2024 diff hist 0 N File:Rnaseq fig1.png No edit summary current
- 16:3716:37, 19 March 2024 diff hist +18,721 N Rnaseq exercise answers Created page with " <div class="page-content has-page-title"> <div id="overview-and-background" class="section level1"> <h1>Overview and background</h1> <div id="groups" class="section level2"> <h2>Groups</h2> <p>Please get into groups of 2-3. We don’t have enough computational power for all of you working alone. Please let the instructors know if you need help finding a group.</p> </div> <div id="assignment-notes" class="section level2"> <h2>Assignment notes</h2> <p>While some question..." current
- 16:3716:37, 19 March 2024 diff hist +11,080 N Rnaseq exercise Created page with " <div class="page-content has-page-title"> <div id="overview-and-background" class="section level1"> <h1>Overview and background</h1> <div id="groups" class="section level2"> <h2>Groups</h2> <p>Please get into groups of 2-3. We don’t have enough computational power for all of you working alone. Please let the instructors know if you need help finding a group.</p> </div> <div id="assignment-notes" class="section level2"> <h2>Assignment notes</h2> <p>While some question..." current
- 16:3616:36, 19 March 2024 diff hist +2,833 N Ancient DNA exercise answers Created page with "'''Q1''' the read length is about 100bp but the actual insert size is unknown. '''Q2''' very low, less than 1% '''Q3''' About 40bp. '''Q4''' About 25%. '''Q5''' As and Gs '''Q6''' The sample indeed looks ancient. If we did not see DNA fragmentation or damage it could be indicative of present-day human contamination. '''Q7''' <pre> wc -l world.fam wc -l world.bim </pre> 297 samples and 587772 SNPs. '''Q8''' <pre> cut -f2 world.sampleInfo.txt | tail -n +2..." current
- 16:3516:35, 19 March 2024 diff hist +11,971 N Ancient DNA exercise Created page with "<H2>Overview</H2> Adapted from Martin Sikora. First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "adna" <LI>Navigate to the directory you just created. </OL> We will try to # Authenticate ancient DNA # do some basic population genetics <h2> Data authentication</h2> Authentication involves making sure that the DNA that you have extracted from my fossil and sequenced is indeed from the fossil and not some modern contaminant. A big differe..." current
- 16:3516:35, 19 March 2024 diff hist +1,480 N Denovo solution Created page with "Q1. Illumina Q1A. discarded contains reads that are too short, pair1 and pair2 files contain the read pairs were both passed trimming and singleton are reads were one of the two pairs were discarded. Q2. Around 84 Q3. N = (M*L)/(L-K+1) = (84*99)/(99-15+1) = 97.84 Genome_size = T/N = (213721367+212523694)/97.84 = 4.35Mb Q4. Mean = 259 ; SD = 11 Q5. It is lower, this means that the actual kmer peak we found (unless you found one higher than 84) is higher (this would g..." current
- 16:3416:34, 19 March 2024 diff hist +8 Denovo exercise No edit summary current
- 16:3116:31, 19 March 2024 diff hist +20,412 N Denovo exercise Created page with "<H2>Overview</H2> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "denovo" <LI>Navigate to the directory you just created. </OL> <p>In this exercise we will try to perform de novo assembly of Illumina paired-end reads. The data is from a <i>Vibrio cholerae</i> strain isolated in Nepal. You will try to: <OL> <LI>Run FastQC, adaptor and quality trimming reads (Optional - repeat of analysis you have already done in data pre-processing)..."
- 16:3116:31, 19 March 2024 diff hist +7,265 N SNP calling exercise answers Created page with "'''Q1''' First, running: <pre> tabix -f -p vcf NA24694.gvcf.gz </pre> then <pre> gatk --java-options "-Xmx10g" HaplotypeCaller -R /home/databases/references/human/GRCh38_full_analysis_set_plus_decoy_hla.fa -I /home/projects/22126_NGS/exercises/snp_calling/NA24694.bam -L chr20 -O NA24694.gvcf.gz --dbsnp /home/databases/databases/GRCh38/Homo_sapiens_assembly38.dbsnp138.vcf.gz -ERC GVCF </pre> <pre> gatk GenotypeGVCFs -R /home/databases/references/human/GRCh3..." current
- 16:3016:30, 19 March 2024 diff hist +16,677 N SNP calling exercise Created page with "<H2>Overview</H2> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "variant_call" <LI>Navigate to the directory you just created. </OL> We will: <OL> <LI>Genotype some whole-genome sequencing data. <LI>Get acquainted with VCF files <LI>Soft filtering <LI>Hard filtering <LI> Annotation of variants </OL> ---- <H2>Genotyping</H2> We will genotype a chromosome from a BAM file that has been processed using the steps we detailed before. It i..." current
- 16:2916:29, 19 March 2024 diff hist +1,834 N Postprocess exercise answers Created page with "'''Q1''' Running: <pre> java -jar /home/ctools/picard_2.23.8/picard.jar MarkDuplicates -I /home/projects/22126_NGS/exercises/dupremoval/ERR016028_chr20_sort.bam -M ERR016028_chr20_sort_markdup.metrics.txt -O ERR016028_chr20_sort_markdup.bam </pre> The log should state: <pre> Marking 9798 records as duplicates. </pre> Please note that this is very low but that is because we have very little data so that it runs faster. '''Q2''' They do not have the same sequence:..." current
- 16:2916:29, 19 March 2024 diff hist +3,530 N Postprocess exercise Created page with "<H2>Overview</H2> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "postalign" <LI>Navigate to the directory you just created. </OL> In this exercise, we will pre-process bam-files so they are ready for SNP calling. This is necessary to reduce the high number of potential false SNPs that will get called. You will try to: <OL> <LI>Mark read duplicates from the BAM-files <LI>Merge BAM files </OL> <H2>Duplicate removal</H2> <p>We are..." current
- 16:2816:28, 19 March 2024 diff hist +4,533 N Alignment exercise answers Created page with "'''Q1:''' 3 possible ways: * The file with the smaller file size contains the trimmed reads. * Peek in the file and determine which file contains reads of uneven lengths * Use fastqc, to determine which file contains overrepresented adapter sequences '''Q2:''' 4 lines if you have added the RG '''Q3:''' 166782 '''Q4:''' 0, this means that the probability of being mismapped is one. This means that this read cannot be confidently assigned to this position. '''Q5:''' The..." current
- 16:2716:27, 19 March 2024 diff hist +13,848 N Alignment exercise Created page with " <H2>Overview</H2> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "align" <LI>Navigate to the directory you just created. </OL> We will try to align different types of NGS data. # <i>Pseudomonas alcaligenes</i> single-end Illumina reads # Human single-end paired-end Illumina reads <H2><i>P. aeruginosa</i> single-end Illumina reads</H2> <H3>Alignment using bwa mem</H3> <p> We will align some of the single-end reads that we trimmed f..." current
- 16:2616:26, 19 March 2024 diff hist +5,351 N Data Preprocess exercise answers Created page with "'''Q1''' <pre> zcat /home/projects/22126_NGS/exercises/preprocess/ex1/SRR957824_1.fastq.gz|head -n 2 |tail -1 |wc -c 151 </pre> However, the answers is 150 as "wc" counts the end of line character '''Q2''' Running: <pre> fastqc -o . /home/projects/22126_NGS/exercises/preprocess/ex1/SRR957824_1.fastq.gz fastqc -o . /home/projects/22126_NGS/exercises/preprocess/ex1/SRR957868_1.fastq.gz </pre> SRR957824 is the worse run, the quality scores towards the end of the r..." current
- 16:2616:26, 19 March 2024 diff hist +12,804 N Data Preprocess exercise Created page with " <H3>Overview</H3> First: <OL> <LI>Navigate to your home directory: <LI>Create a directory called "preprocess" <LI>Navigate to the directory you just created. </OL> We will try to pre-process several types of NGS data. # <i>Escherichia coli</i> single-end Illumina reads # <i>Pseudomonas aeruginosa</i> paired-end Illumina reads <HR> <h2><i>Escherichia coli</i> single-end Illumina reads</h2> <h3>Introduction</h3> <p> An outbreak of <i>E. coli</i> has occurred. Peop..." current
- 16:2516:25, 19 March 2024 diff hist +317 N Data basics exercise answers Created page with "Answers: # S or L # X, I or J # X # Yes, quality scores picked from [<=>?@ABCDEFGHI], either very good quality (+33) or very poor (+64). # D = 68, 68-33 = 35, -> p[error] = 10^[-3.5] = 0.00031622776 = 1/3162 This really goes to show that having metadata from the sequencing run is essential for proper analysis." current
- 16:2516:25, 19 March 2024 diff hist +3,384 N Data basics exercise Created page with " <p>This is a small exercise where we will try to identify the quality encoding of some reads.</p> <HR> <H3>Read quality encoding table</H3> We have seen that the fastq format encodes quality scores which represent the probability of an error. '''Beware''' because there are many different types of encoding for quality scores. The table below summarizes it. This table is adapted from Wikipedia article on [https://en.wikipedia.org/wiki/FASTQ_format FASTQ format]: <pre>..." current
- 16:2416:24, 19 March 2024 diff hist +4,770 N Zip codes answers Created page with " Please note that in UNIX, there is more than one way to do things. '''Q1:''' <pre> zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $0}}'|wc -l </pre> <ol> <li><code>zcat</code>: Decompresses the ZIP_CODES.csv.gz file and outputs its content.</li> <li><code>awk 'BEGIN{FS=","}</code>: Sets the field separator to a comma for the CSV file.</li> <li><code>if($5=="\"NY\""){print $0}</code>: Checks if the 5th field (st..." current
- 16:2316:23, 19 March 2024 diff hist +2,402 N Zip codes Created page with " <H2>Extra fun with US zip codes</H2> <p>If you are 100% done with everything, you can have fun with the following exercise involving [https://en.wikipedia.org/wiki/ZIP_Code US zip codes]. This is mostly for people with previous Unix experience. </p> You will find the following file: <pre> /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz </pre> No need to copy it or unzip it. You can view it with '''zcat''' or '''zless'''. csv stands for comma-separated val..." current
- 16:2316:23, 19 March 2024 diff hist +1,452 N First look exercise answers Created page with " <H2> Solutions </H2> Illumina data: 1. <pre> cd </pre> 2. <pre> mkdir first_look/ </pre> 3. <pre> cp /data/shared/exercises/first_look/reads.fastq.gz . </pre> 4. <pre> zless -S reads.fastq.gz </pre> 5. <pre> zcat /data/shared/exercises/first_look/reads.fastq.gz |wc -l </pre> 1000 lines so 1000/4 250 sequences. 1. <pre> tar xvfz /data/shared/exercises/first_look/pairedReads.tar.gz </pre> 2. <pre> head ERR243038_1.fastq ERR243038_2.fastq </pre>..." current
- 16:2216:22, 19 March 2024 diff hist +9,367 N First look exercise Created page with " <H2>Overview</H2> <p>In this exercise you will try to look at empirical NGS data. Additionally, you will try to use the '''screen''' command when using the shell. </p> <OL> <LI>Use standard UNIX commands to work with NGS data <LI>Use '''screen''' in shell </OL> <HR> <H2>First look at data</H2> <OL> <LI>Navigate to your home directory: <pre> cd </pre> '''cd''' without arguments will bring you back to your home directory. In our case, your home is: <pre> /home/people..." current
- 16:2116:21, 19 March 2024 diff hist +2 Program 2024 →Course Program - January 2024 current
- 16:1816:18, 19 March 2024 diff hist +5,409 N Unix answers Created page with " 1. Use a text editor to (nedit/gedit/komodo/textwrangler) to create a file mycommands.txt where you write all commands and observations you do in the following exercises. Use copy/paste to copy the commands. Note: There are more standard text editors than nedit, etc. Examples are emacs, xemacs, vi, vim, and pico. Make sure that we can easily see which exercise you attempt to solve. 2. First list the files in the directory. <pre> ls </pre> 3. Copy ex1.acc to myfile.ac..." current
- 16:1716:17, 19 March 2024 diff hist +90 Logging on to pupil system No edit summary current
- 16:1416:14, 19 March 2024 diff hist +4,741 N Logging on to pupil system Created page with " <HR> <H2>Overview</H2> In this exercise, we will prepare our computers to log on to our servers called "pupilX", where X is 1/2/3. These are small but reliable machines. Please read the instructions carefully. Please be aware: there is '''no''' backup. All of your data will be deleted when the class concludes. <H2>Are you physically at DTU?</H2> First, make sure you are connected to the internet. If you are on campus, you do not need to do anything extra, however,..."
- 15:5915:59, 19 March 2024 diff hist +12,776 N Program 2024 Created page with " '''REMEMBER TO BRING A LAPTOP FOR EXERCISES''' Lectures will be in person in building [https://goo.gl/maps/k4wYkMjTJ2HLHuyN8 303A] in auditorium 44. Offline discussions will take place on Discord (https://discord.gg/7PKuKhKYQJ). Please register with your '''full name'''. Will use Discord for online classes and collaboration with your project partners. <!-- Lectures and exercises will take place on Discord (https://discord.gg/FBb2edFW). Please register with your ful..."
- 15:5715:57, 19 March 2024 diff hist −70 22126 - Next Generation Sequencing Analysis No edit summary
- 15:5615:56, 19 March 2024 diff hist 0 22126 - Next Generation Sequencing Analysis No edit summary
- 15:5515:55, 19 March 2024 diff hist 0 22126 - Next Generation Sequencing Analysis No edit summary
- 15:5415:54, 19 March 2024 diff hist +3,282 N 22126 - Next Generation Sequencing Analysis Created page with "<small>Introduction to Next-Generation Sequencing Analysis, 5 ECTS</small> <hr> <br> [http://kurser.dtu.dk/course/36626 DTU's Studies Handbook about #36626] [http://kurser.dtu.dk/course/36826 DTU's Studies Handbook about #36826]<be> Program 2024 Program 2023 Program 2022 Program 2021 Program 2020 Program 2019 The next course will be held in January 2024, the course runs every day for a three weeks period. The course consists of lectures, e..."
- 15:5215:52, 19 March 2024 diff hist +43 N MediaWiki:Mainpage Created page with "22126 - Next Generation Sequencing Analysis" current
- 15:5115:51, 19 March 2024 diff hist 0 N MediaWiki:Disclaimers Created blank page current
- 15:5115:51, 19 March 2024 diff hist 0 N MediaWiki:Aboutsite Created blank page current
- 15:5015:50, 19 March 2024 diff hist 0 N MediaWiki:Privacy Created blank page current