First look exercise answers

Solutions

1. Navigate to home directory.

cd

2. Create directory first_look.

mkdir first_look
cd first_look

3. Copy FASTQ file.

cp /home/projects/22126_NGS/exercises/first_look/reads.fastq.gz .

4. Inspect reads.

zless -S reads.fastq.gz

5. Count number of reads (lines / 4).

zcat reads.fastq.gz | wc -l

If result = 1000 lines → 1000 / 4 = 250 reads.

---

1. Extract paired-end data.

tar xvfz /home/projects/22126_NGS/exercises/first_look/pairedReads.tar.gz

This creates:

2. Inspect the first read header in each file.

head ERR243038_1.fastq
head ERR243038_2.fastq

Extract first 10 header lines using grep:

grep '^@ERR243038' ERR243038_1.fastq | head
grep '^@ERR243038' ERR243038_2.fastq | head

Example output:

@ERR243038.1 HS4_09359:1:1101:1072:21612#33/1
@ERR243038.2 HS4_09359:1:1101:1076:69021#33/1
@ERR243038.3 HS4_09359:1:1101:1081:60568#33/1
...

3. Remove trailing /1 and /2 using sed.

grep '^@ERR243038' ERR243038_1.fastq | sed 's:/1$::' > human_1.headers
grep '^@ERR243038' ERR243038_2.fastq | sed 's:/2$::' > human_2.headers

(Alternate version using generic regex:

grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' > human_1.headers
grep '^@ERR243038' ERR243038_2.fastq | sed 's/.$//' > human_2.headers

)

4. Compare the results.

View first 10 lines of each:

head human_1.headers
head human_2.headers

Side-by-side:

paste human_1.headers human_2.headers | head

Ensure they match:

diff human_1.headers human_2.headers

If diff prints nothing, the paired-end files are in perfect sync.