First look exercise

Overview

In this exercise you will explore real NGS data and learn how to use the screen command to keep long-running jobs alive after logout.

Use standard UNIX commands to work with NGS data
Use screen in the shell

First look at data

Navigate to your home directory:
```
cd
```
cd without arguments returns you to your home directory:
```
/home/people/[STUDENT ID]
```
Create a directory called first_look and cd into it.

Copy the file reads.fastq.gz from:

/home/projects/22126_NGS/exercises/first_look/

Use zless to inspect the compressed FASTQ file:
```
zless -S reads.fastq.gz
```
A FASTQ read consists of exactly four lines:
1. Header line starting with @
2. Sequence line (A/C/G/T/N)
3. “+” line (may repeat the header)
4. Quality line (ASCII PHRED scores)
The -S option disables line wrapping so the sequence appears on one line.
Count the number of reads in the file.
Each read has 4 lines. Use wc to count lines:
```
zcat reads.fastq.gz | wc -l
```
Divide by 4 to get the number of reads.

Illumina data

Copy pairedReads.tar.gz into your first_look directory from:
```
/home/projects/22126_NGS/exercises/first_look/
```
Unpack it:
```
tar xvfz pairedReads.tar.gz
```
Flags:
- x – extract
- v – verbose
- f – archive file follows
- z – gzip-compressed
If a file ends with .tar.bz2, use j instead of z.
You should now have two FASTQ files:
- ERR243038_1.fastq
- ERR243038_2.fastq
Inspect the first read in each file. The two headers should be identical except for the trailing 1 or 2 — meaning they are paired-end reads from opposite ends of the same DNA fragment.
We now check whether the two files are “in sync.” We will:
1. extract all header lines from each file
2. remove the final /1 or /2-type suffix
3. write each set of normalized headers to a new file
4. compare the two files
Extract all header lines with grep. FASTQ headers begin with:
```
@ERR243038
```
Example:
```
grep '^@ERR243038' ERR243038_1.fastq | head
```
Try this for both FASTQ files and inspect the first 10 headers.

Remove the trailing 1 or 2 using sed. Examples of sed patterns:

sed 's/PATTERN/REPLACEMENT/' file
sed 's/PATTERN//' file              # remove PATTERN
sed 's/^PATTERN//' file             # remove PATTERN at line start
sed 's/PATTERN$//' file             # remove at line end

Apply sed to strip the last character from each header line. For example:

grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' | head

Redirect the output into files:

grep '^@ERR243038' ERR243038_1.fastq | sed 's/.$//' > human_1.headers
grep '^@ERR243038' ERR243038_2.fastq | sed 's/.$//' > human_2.headers

Inspect the first 10 lines side-by-side with paste:
```
paste human_1.headers human_2.headers | head
```
Finally compare both files using diff:
```
diff human_1.headers human_2.headers
```
If diff prints nothing, the pair files are perfectly in sync.

Use `screen` in the shell

NGS jobs often run for hours. If you log out or lose network connection, all running commands normally die. The screen program creates a persistent “virtual terminal” that continues running even after logout.

Benefits:

Safe against connection drops
Allows long-running jobs to continue after logout
Can detach at work, reattach from home

Start screen:

screen

Press Enter to dismiss the welcome message.

Inside a screen session:

All commands run normally
Special commands begin with Ctrl-a

Try:

Ctrl-a ?

This opens the help screen. Press Enter to exit.

Run something simple, e.g.:

ls

Detach from the session:

Ctrl-a d

You’ll see:

[detached]

Reattach later:

screen -r

If your SSH session dies, simply reconnect and run screen -r to resume.

First look exercise

Contents

Overview

First look at data

Illumina data

Use `screen` in the shell

Navigation menu

First look exercise

Overview

First look at data

Illumina data

Use screen in the shell

Navigation menu

Search

Use `screen` in the shell