Zip codes answers
Please note that in UNIX, there is more than one way to do things.
Q1:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $0}}'|wc -l
zcat
: Decompresses the ZIP_CODES.csv.gz file and outputs its content.awk 'BEGIN{FS=","}
: Sets the field separator to a comma for the CSV file.if($5=="\"NY\""){print $0}
: Checks if the 5th field (state) is "NY" and prints the entire line.wc -l
: Counts the number of lines, which corresponds to the number of ZIP codes in NY.
Q2:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c |sort -nr |head -n 3
awk
andif($5=="\"NY\""){print $4}
: Selects and prints the ZIP codes of NY.sort
: Sorts the ZIP codes.uniq -c
: Counts the unique occurrences of each ZIP code.sort -nr
: Sorts the counts in descending order.head -n 3
: Displays the top 3 most frequent ZIP codes.
Q3:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c |awk '{if($1==1){print $0}}' |wc -l
- Processes similar to Q2 to get the count of each ZIP code in NY.
awk '{if($1==1){print $0}}'
: Filters out ZIP codes that appear only once.wc -l
: Counts these unique ZIP codes.
Q4:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c |awk '{print $1}' |sort -n |uniq -c
- Initial steps similar to Q2 to count each ZIP code in NY.
awk '{print $1}'
: Extracts the count number.sort -n | uniq -c
: Sorts and counts how often each frequency occurs.
Q5:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $2"\t"$4}' |sed "s/+//g" |sed "s/\"//g" |awk '{if($1>=39.4){print $2}}' |sort |uniq |wc -l
awk '{print $2"\t"$4}'
: Extracts latitude and ZIP code, separated by a tab.sed "s/+//g"
andsed "s/\"//g"
: Removes pluses and quotes.awk '{if($1>=39.4){print $2}}'
: Filters ZIP codes with latitude >= 39.4.sort | uniq | wc -l
: Counts unique ZIP codes that meet the criteria.
Q6:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz| awk 'BEGIN{FS=","}{print $4"\t"$5}' |sort |uniq -c |sed "s/\([0-9]\) /\1\t/g" | sort -k 3,3 -k 1,1nr -t "`/bin/echo -e '\t'`" | sort -u -k 3,3 -t "`/bin/echo -e '\t'`"
zcat
: Decompresses theZIP_CODES.csv.gz
file and outputs its content.awk 'BEGIN{FS=","}
: Sets the field separator to a comma. Prints the 4th (city) and 5th (state) fields separated by a tab.sort
: Sorts the output alphabetically.uniq -c
: Counts unique occurrences of each city-state pair.sed "s/\([0-9]\) /\1\t/g"
: Replaces the space between the count and city-state pair with a tab for better formatting.sort -k 3,3 -k 1,1nr -t "\`/bin/echo -e '\t'\`"
: Sorts the output first by state (3rd field), then by the count (1st field) in reverse numeric order, using a tab as the field separator.sort -u -k 3,3 -t "\`/bin/echo -e '\t'\`"
: Uniquely sorts the list by state, keeping only the first occurrence of each state (which is the city with the highest count due to the previous sort).
Q7:
zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $4}' |sort |uniq |awk '{print substr($1,0,2)}' |grep "\"M" |wc -l
zcat
: Decompresses theZIP_CODES.csv.gz
file and outputs its content.awk 'BEGIN{FS=","}
: Sets the field separator to a comma. Prints only the 4th field (ZIP code).sort
: Sorts the ZIP codes alphabetically.uniq
: Filters out duplicate ZIP codes.awk '{print substr($1,0,2)}'
: Prints the first two characters of each ZIP code.grep "\"M"
: Filters for lines where the first two characters start with "M" (accounting for any quotes).wc -l
: Counts the number of lines, giving the total number of unique ZIP codes starting with "M".