First look exercise answers: Difference between revisions
Jump to navigation
Jump to search
(Created page with " <H2> Solutions </H2> Illumina data: 1. <pre> cd </pre> 2. <pre> mkdir first_look/ </pre> 3. <pre> cp /data/shared/exercises/first_look/reads.fastq.gz . </pre> 4. <pre> zless -S reads.fastq.gz </pre> 5. <pre> zcat /data/shared/exercises/first_look/reads.fastq.gz |wc -l </pre> 1000 lines so 1000/4 250 sequences. 1. <pre> tar xvfz /data/shared/exercises/first_look/pairedReads.tar.gz </pre> 2. <pre> head ERR243038_1.fastq ERR243038_2.fastq </pre>...") |
No edit summary |
||
| Line 1: | Line 1: | ||
<H2>Solutions</H2> | |||
== First look at data == | |||
1. Navigate to home directory. | |||
<pre> | <pre> | ||
cd | cd | ||
</pre> | </pre> | ||
2. Create directory <tt>first_look</tt>. | |||
2. | |||
<pre> | <pre> | ||
mkdir first_look | mkdir first_look | ||
cd first_look | |||
</pre> | </pre> | ||
3. | 3. Copy FASTQ file. | ||
<pre> | <pre> | ||
cp / | cp /home/projects/22126_NGS/exercises/first_look/reads.fastq.gz . | ||
</pre> | </pre> | ||
4. | 4. Inspect reads. | ||
<pre> | <pre> | ||
zless -S reads.fastq.gz | |||
</pre> | </pre> | ||
5. Count number of reads (lines / 4). | |||
5. | |||
<pre> | <pre> | ||
zcat reads.fastq.gz | wc -l | |||
</pre> | </pre> | ||
1000 lines | If result = 1000 lines → | ||
1000 / 4 = <b>250 reads</b>. | |||
--- | |||
== Illumina data == | |||
1. Extract paired-end data. | |||
<pre> | <pre> | ||
tar xvfz / | tar xvfz /home/projects/22126_NGS/exercises/first_look/pairedReads.tar.gz | ||
</pre> | </pre> | ||
This creates: | |||
* ERR243038_1.fastq | |||
* ERR243038_2.fastq | |||
2. Inspect the first read header in each file. | |||
<pre> | <pre> | ||
head ERR243038_1.fastq | |||
head ERR243038_2.fastq | |||
</pre> | </pre> | ||
Extract first 10 header lines using grep: | |||
<pre> | <pre> | ||
grep | grep '^@ERR243038' ERR243038_1.fastq | head | ||
grep '^@ERR243038' ERR243038_2.fastq | head | |||
</pre> | </pre> | ||
Example output: | |||
<pre> | <pre> | ||
@ERR243038.1 HS4_09359:1:1101:1072:21612#33/1 | @ERR243038.1 HS4_09359:1:1101:1072:21612#33/1 | ||
@ERR243038.2 HS4_09359:1:1101:1076:69021#33/1 | @ERR243038.2 HS4_09359:1:1101:1076:69021#33/1 | ||
@ERR243038.3 HS4_09359:1:1101:1081:60568#33/1 | @ERR243038.3 HS4_09359:1:1101:1081:60568#33/1 | ||
... | |||
</pre> | </pre> | ||
3. Remove trailing /1 and /2 using sed. | |||
<pre> | |||
grep '^@ERR243038' ERR243038_1.fastq | sed 's:/1$::' > human_1.headers | |||
grep '^@ERR243038' ERR243038_2.fastq | sed 's:/2$::' > human_2.headers | |||
</pre> | |||
(Alternate version using generic regex: | |||
<pre> | <pre> | ||
grep | grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' > human_1.headers | ||
grep '^@ERR243038' ERR243038_2.fastq | sed 's/.$//' > human_2.headers | |||
</pre> | </pre> | ||
) | |||
4. Compare the results. | |||
View first 10 lines of each: | |||
<pre> | <pre> | ||
head human_1.headers | |||
head human_2.headers | |||
</pre> | </pre> | ||
Side-by-side: | |||
<pre> | |||
paste human_1.headers human_2.headers | head | |||
</pre> | |||
Ensure they match: | |||
<pre> | <pre> | ||
diff human_1.headers human_2.headers | |||
diff human_1.headers human_2.headers | |||
</pre> | </pre> | ||
If <code>diff</code> prints nothing, the paired-end files are in perfect sync. | |||
Latest revision as of 12:48, 20 November 2025
Solutions
First look at data
1. Navigate to home directory.
cd
2. Create directory first_look.
mkdir first_look cd first_look
3. Copy FASTQ file.
cp /home/projects/22126_NGS/exercises/first_look/reads.fastq.gz .
4. Inspect reads.
zless -S reads.fastq.gz
5. Count number of reads (lines / 4).
zcat reads.fastq.gz | wc -l
If result = 1000 lines → 1000 / 4 = 250 reads.
---
Illumina data
1. Extract paired-end data.
tar xvfz /home/projects/22126_NGS/exercises/first_look/pairedReads.tar.gz
This creates:
- ERR243038_1.fastq
- ERR243038_2.fastq
2. Inspect the first read header in each file.
head ERR243038_1.fastq head ERR243038_2.fastq
Extract first 10 header lines using grep:
grep '^@ERR243038' ERR243038_1.fastq | head grep '^@ERR243038' ERR243038_2.fastq | head
Example output:
@ERR243038.1 HS4_09359:1:1101:1072:21612#33/1 @ERR243038.2 HS4_09359:1:1101:1076:69021#33/1 @ERR243038.3 HS4_09359:1:1101:1081:60568#33/1 ...
3. Remove trailing /1 and /2 using sed.
grep '^@ERR243038' ERR243038_1.fastq | sed 's:/1$::' > human_1.headers grep '^@ERR243038' ERR243038_2.fastq | sed 's:/2$::' > human_2.headers
(Alternate version using generic regex:
grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' > human_1.headers grep '^@ERR243038' ERR243038_2.fastq | sed 's/.$//' > human_2.headers
)
4. Compare the results.
View first 10 lines of each:
head human_1.headers head human_2.headers
Side-by-side:
paste human_1.headers human_2.headers | head
Ensure they match:
diff human_1.headers human_2.headers
If diff prints nothing, the paired-end files are in perfect sync.