Advanced UNIX and Pipes

From 22126
Revision as of 10:03, 20 November 2025 by Mick (talk | contribs) (Created page with "== Advanced UNIX and Pipes == This page covers standard input/output streams, redirection, pipes, file descriptors, and examples using Python scripts. These concepts are extremely useful in NGS analysis (e.g., chaining commands together, avoiding intermediate files, streaming FASTQ/BAM data, etc.). === stdout, stdin, stderr === Every UNIX command uses three data streams: * '''stdin''' (file descriptor 0) – input * '''stdout''' (file descriptor 1) – normal output...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Advanced UNIX and Pipes

This page covers standard input/output streams, redirection, pipes, file descriptors, and examples using Python scripts. These concepts are extremely useful in NGS analysis (e.g., chaining commands together, avoiding intermediate files, streaming FASTQ/BAM data, etc.).

stdout, stdin, stderr

Every UNIX command uses three data streams:

  • stdin (file descriptor 0) – input
  • stdout (file descriptor 1) – normal output
  • stderr (file descriptor 2) – error messages

Redirecting these streams:

  • `>` – redirect stdout
  • `2>` – redirect stderr
  • `<` – redirect stdin
  • `>>` – append
  • `|` – pipe stdout → stdin

Examples:

ls > listing.txt
grep HUMAN ex1.acc 2> errors.log
wc -l < ex1.acc
ls | wc -l

/dev/null (“black hole”)

Redirect output you want to ignore:

command > /dev/null

Pipes

Pipes connect the output of one command to the input of another.

Simple examples:

grep HUMAN ex1.acc | sort | uniq -c
cut -f1 ex1.tot | sort | head

Pipes allow commands to run in parallel and avoid writing temporary files.

Using stdin/stdout in Python

Example program reading from stdin:

#!/usr/bin/python3
import sys

for line in sys.stdin:
    print("Hello", line.strip())

Run with:

echo "world" | python3 hello.py

File descriptors and process substitution

Process substitution creates a temporary file-like object from a command:

diff <(sort file1) <(sort file2)

Useful for tools that expect filenames.

Example: Random name generator + greeting script

(Your cleaned-up versions go here.)

Example: Integer generator + prime checker

(Place the simplified versions here.)

Example: RSA key generator

(Optional section, for students who want the deeper CS example.)

Benchmarking

Using the `time` command:

time python3 script.py

Use this when comparing pipelines vs intermediate files.

Summary

These advanced UNIX tools allow:

  • streaming large data instead of creating intermediate files
  • chaining tools together efficiently
  • using Python and shell commands seamlessly
  • improving performance for large NGS pipelines