Advanced UNIX and Pipes
Advanced UNIX and Pipes
This page covers standard input/output streams, redirection, pipes, file descriptors, and Python examples demonstrating how UNIX handles data flow. These concepts are extremely useful in NGS analysis, where tools are often chained together and large data files should be streamed instead of written to disk.
stdout, stdin, stderr
Every UNIX program interacts with three data streams:
- stdin (0) – standard input
- stdout (1) – standard output
- stderr (2) – standard error
Basic redirection:
command > out.txt # redirect stdout to file command 2> errors.txt # redirect stderr to file command < input.txt # feed file into stdin command >> out.txt # append output
Piping connects stdout of one command to stdin of another:
command1 | command2
Examples:
ls | wc -l # count files grep HUMAN ex1.acc | sort | uniq -c cut -f5 ex1.tot | sort -nr | head
/dev/null (the “black hole”)
If you want to discard output:
command > /dev/null
stderr goes separately:
command 2> /dev/null
Simple Pipe Examples
These illustrate the concepts used constantly in real NGS workflows.
zcat reads.fastq.gz | head
grep -v "^#" variants.vcf | wc -l
samtools view file.bam | awk '{print $3}' | sort | uniq -c
Using stdin/stdout in Python
Python scripts can read from stdin and write to stdout, making them pipe-friendly.
Example 1: Minimal stdin → stdout Python script
#!/usr/bin/python3
import sys
for line in sys.stdin:
print("Hello", line.strip())
Run it with a pipe:
echo "world" | python3 hello.py
Example 2: Greeting names from stdin
Create `hello_world.py`:
#!/usr/bin/python3
import sys
def main():
for line in sys.stdin:
name = line.strip()
if name:
print(f"Hello World! {name}")
if __name__ == "__main__":
main()
Run:
echo -e "Alice\nBob" | python3 hello_world.py
Example: Random Name Generator + Hello Script
This demonstrates connecting two Python scripts using pipes rather than temporary files.
random_name_generator.py
#!/usr/bin/python3
import random
names = [
"Anders", "Niels", "Jens", "Poul", "Lars", "Morten", "Søren", "Thomas",
"Peter", "Martin", "Henrik", "Jesper", "Frederik", "Kasper", "Rasmus",
"Anne", "Maria", "Sofie", "Camilla", "Julie", "Eva", "Sara", "Ida"
]
# Print 10 random names
for name in random.sample(names, 10):
print(name)
hello_world_stdin.py
#!/usr/bin/python3
import sys
for line in sys.stdin:
name = line.strip()
print(f"Hello World! {name}")
Run both using a pipe
python3 random_name_generator.py | python3 hello_world_stdin.py
No temporary files needed.
stderr (Standard Error)
stderr is meant for status or diagnostics.
Here is a corrected example where the greeting goes to stdout and the status message goes to stderr.
#!/usr/bin/python3
import sys
def main():
for line in sys.stdin:
name = line.strip()
print(f"Hello World! {name}") # stdout
print(f"Name greeted: {name}", file=sys.stderr) # stderr
if __name__ == "__main__":
main()
Redirect stderr:
python3 greet.py 2> status.txt
Now:
- stdout → terminal
- stderr → status.txt
Process Substitution (<(...))
This provides a “fake temporary file” whose contents come from a command.
Example:
diff <(sort file1.txt) <(sort file2.txt)
No temp files created, and diff sees two “files”.
This works in bash, not all shells.
Real Example: Integer Generator + Prime Checker
random_int_generator.py
#!/usr/bin/python3
import sys, random, argparse
parser = argparse.ArgumentParser()
parser.add_argument("n", nargs="?", type=int, default=10)
parser.add_argument("--min", type=int, default=10)
parser.add_argument("--max", type=int, default=100)
args = parser.parse_args()
for _ in range(args.n):
print(random.randint(args.min, args.max))
prime_checker.py
#!/usr/bin/python3
import sys
import math
def is_prime(num):
if num <= 1:
return False
for i in range(2, int(math.sqrt(num))+1):
if num % i == 0:
return False
return True
numbers = map(int, sys.stdin.read().strip().split())
for n in numbers:
if is_prime(n):
print(n)
Run both together
python3 random_int_generator.py 20 --min 1 --max 200 | python3 prime_checker.py
This streams numbers directly to the checker.
RSA Example Using Process Substitution
Demonstrates combining two pipelines, each producing primes, into an RSA key generator.
The concept:
python3 RSAcompute.py \ <(python3 random_int_generator.py --min 1000000 --max 10000000 10000 | python3 prime_checker.py) \ <(python3 random_int_generator.py --min 1000000 --max 10000000 10000 | python3 prime_checker.py)
Each `<(...)>` block becomes a temporary file-like input.
Benchmarking with time
time python3 random_int_generator.py 5000000 | python3 prime_checker.py
Compare:
1. Using intermediate files 2. Using pipes 3. Using process substitution
Pipes are usually fastest because:
- no disk IO
- both programs run concurrently
Summary
Advanced UNIX concepts such as redirection, pipes, stderr handling, and process substitution are essential for:
- chaining tools in NGS pipelines
- avoiding large temporary files
- streaming FASTQ/BAM/VCF data efficiently
- mixing shell tools and Python scripts
- optimizing performance
For core UNIX navigation and file management, see Basic UNIX Notes.