First look exercise answers
Solutions
First look at data
1. Navigate to home directory.
cd
2. Create directory first_look.
mkdir first_look cd first_look
3. Copy FASTQ file.
cp /home/projects/22126_NGS/exercises/first_look/reads.fastq.gz .
4. Inspect reads.
zless -S reads.fastq.gz
5. Count number of reads (lines / 4).
zcat reads.fastq.gz | wc -l
If result = 1000 lines → 1000 / 4 = 250 reads.
---
Illumina data
1. Extract paired-end data.
tar xvfz /home/projects/22126_NGS/exercises/first_look/pairedReads.tar.gz
This creates:
- ERR243038_1.fastq
- ERR243038_2.fastq
2. Inspect the first read header in each file.
head ERR243038_1.fastq head ERR243038_2.fastq
Extract first 10 header lines using grep:
grep '^@ERR243038' ERR243038_1.fastq | head grep '^@ERR243038' ERR243038_2.fastq | head
Example output:
@ERR243038.1 HS4_09359:1:1101:1072:21612#33/1 @ERR243038.2 HS4_09359:1:1101:1076:69021#33/1 @ERR243038.3 HS4_09359:1:1101:1081:60568#33/1 ...
3. Remove trailing /1 and /2 using sed.
grep '^@ERR243038' ERR243038_1.fastq | sed 's:/1$::' > human_1.headers grep '^@ERR243038' ERR243038_2.fastq | sed 's:/2$::' > human_2.headers
(Alternate version using generic regex:
grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' > human_1.headers grep '^@ERR243038' ERR243038_2.fastq | sed 's/.$//' > human_2.headers
)
4. Compare the results.
View first 10 lines of each:
head human_1.headers head human_2.headers
Side-by-side:
paste human_1.headers human_2.headers | head
Ensure they match:
diff human_1.headers human_2.headers
If diff prints nothing, the paired-end files are in perfect sync.