Zip codes answers

Please note that in UNIX, there is more than one way to do things.

Q1:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz  |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $0}}'|wc -l

zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
awk 'BEGIN{FS=","}: Sets the field separator to a comma for the CSV file.
if($5=="\"NY\""){print $0}: Checks if the 5th field (state) is "NY" and prints the entire line.
wc -l: Counts the number of lines, which corresponds to the number of ZIP codes in NY.

Q2:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |sort -nr  |head -n 3

awk and if($5=="\"NY\""){print $4}: Selects and prints the ZIP codes of NY.
sort: Sorts the ZIP codes.
uniq -c: Counts the unique occurrences of each ZIP code.
sort -nr: Sorts the counts in descending order.
head -n 3: Displays the top 3 most frequent ZIP codes.

Q3:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |awk '{if($1==1){print $0}}'  |wc -l

Processes similar to Q2 to get the count of each ZIP code in NY.
awk '{if($1==1){print $0}}': Filters out ZIP codes that appear only once.
wc -l: Counts these unique ZIP codes.

Q4:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |awk '{print $1}' |sort -n |uniq -c

Initial steps similar to Q2 to count each ZIP code in NY.
awk '{print $1}': Extracts the count number.
sort -n | uniq -c: Sorts and counts how often each frequency occurs.

Q5:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $2"\t"$4}' |sed "s/+//g" |sed "s/\"//g"   |awk '{if($1>=39.4){print $2}}' |sort |uniq |wc -l

awk '{print $2"\t"$4}': Extracts latitude and ZIP code, separated by a tab.
sed "s/+//g" and sed "s/\"//g": Removes pluses and quotes.
awk '{if($1>=39.4){print $2}}': Filters ZIP codes with latitude >= 39.4.
sort | uniq | wc -l: Counts unique ZIP codes that meet the criteria.

Q6:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz| awk 'BEGIN{FS=","}{print $4"\t"$5}'   |sort |uniq -c |sed "s/\([0-9]\) /\1\t/g" | sort  -k 3,3 -k 1,1nr -t "`/bin/echo -e '\t'`"  | sort -u -k 3,3 -t "`/bin/echo -e '\t'`"

zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
awk 'BEGIN{FS=","}: Sets the field separator to a comma. Prints the 4th (city) and 5th (state) fields separated by a tab.
sort: Sorts the output alphabetically.
uniq -c: Counts unique occurrences of each city-state pair.
sed "s/$[0-9]$ /\1\t/g": Replaces the space between the count and city-state pair with a tab for better formatting.
sort -k 3,3 -k 1,1nr -t "\`/bin/echo -e '\t'\`": Sorts the output first by state (3rd field), then by the count (1st field) in reverse numeric order, using a tab as the field separator.
sort -u -k 3,3 -t "\`/bin/echo -e '\t'\`": Uniquely sorts the list by state, keeping only the first occurrence of each state (which is the city with the highest count due to the previous sort).

Q7:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $4}' |sort |uniq  |awk '{print substr($1,0,2)}' |grep "\"M" |wc -l

zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
awk 'BEGIN{FS=","}: Sets the field separator to a comma. Prints only the 4th field (ZIP code).
sort: Sorts the ZIP codes alphabetically.
uniq: Filters out duplicate ZIP codes.
awk '{print substr($1,0,2)}': Prints the first two characters of each ZIP code.
grep "\"M": Filters for lines where the first two characters start with "M" (accounting for any quotes).
wc -l: Counts the number of lines, giving the total number of unique ZIP codes starting with "M".

Zip codes answers

Navigation menu

Search