Advanced UNIX and Pipes
Advanced UNIX and Pipes
This page covers standard input/output streams, redirection, pipes, file descriptors, and examples using Python scripts. These concepts are extremely useful in NGS analysis (e.g., chaining commands together, avoiding intermediate files, streaming FASTQ/BAM data, etc.).
stdout, stdin, stderr
Every UNIX command uses three data streams:
- stdin (file descriptor 0) – input
- stdout (file descriptor 1) – normal output
- stderr (file descriptor 2) – error messages
Redirecting these streams:
- `>` – redirect stdout
- `2>` – redirect stderr
- `<` – redirect stdin
- `>>` – append
- `|` – pipe stdout → stdin
Examples:
ls > listing.txt grep HUMAN ex1.acc 2> errors.log wc -l < ex1.acc ls | wc -l
/dev/null (“black hole”)
Redirect output you want to ignore:
command > /dev/null
Pipes
Pipes connect the output of one command to the input of another.
Simple examples:
grep HUMAN ex1.acc | sort | uniq -c cut -f1 ex1.tot | sort | head
Pipes allow commands to run in parallel and avoid writing temporary files.
Using stdin/stdout in Python
Example program reading from stdin:
#!/usr/bin/python3
import sys
for line in sys.stdin:
print("Hello", line.strip())
Run with:
echo "world" | python3 hello.py
File descriptors and process substitution
Process substitution creates a temporary file-like object from a command:
diff <(sort file1) <(sort file2)
Useful for tools that expect filenames.
Example: Random name generator + greeting script
(Your cleaned-up versions go here.)
Example: Integer generator + prime checker
(Place the simplified versions here.)
Example: RSA key generator
(Optional section, for students who want the deeper CS example.)
Benchmarking
Using the `time` command:
time python3 script.py
Use this when comparing pipelines vs intermediate files.
Summary
These advanced UNIX tools allow:
- streaming large data instead of creating intermediate files
- chaining tools together efficiently
- using Python and shell commands seamlessly
- improving performance for large NGS pipelines