Zip codes answers

From 22126
Revision as of 16:24, 19 March 2024 by WikiSysop (talk | contribs) (Created page with " Please note that in UNIX, there is more than one way to do things. '''Q1:''' <pre> zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $0}}'|wc -l </pre> <ol> <li><code>zcat</code>: Decompresses the ZIP_CODES.csv.gz file and outputs its content.</li> <li><code>awk 'BEGIN{FS=","}</code>: Sets the field separator to a comma for the CSV file.</li> <li><code>if($5=="\"NY\""){print $0}</code>: Checks if the 5th field (st...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Please note that in UNIX, there is more than one way to do things.

Q1:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz  |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $0}}'|wc -l
  1. zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
  2. awk 'BEGIN{FS=","}: Sets the field separator to a comma for the CSV file.
  3. if($5=="\"NY\""){print $0}: Checks if the 5th field (state) is "NY" and prints the entire line.
  4. wc -l: Counts the number of lines, which corresponds to the number of ZIP codes in NY.

Q2:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |sort -nr  |head -n 3
  1. awk and if($5=="\"NY\""){print $4}: Selects and prints the ZIP codes of NY.
  2. sort: Sorts the ZIP codes.
  3. uniq -c: Counts the unique occurrences of each ZIP code.
  4. sort -nr: Sorts the counts in descending order.
  5. head -n 3: Displays the top 3 most frequent ZIP codes.

Q3:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz |awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |awk '{if($1==1){print $0}}'  |wc -l
  1. Processes similar to Q2 to get the count of each ZIP code in NY.
  2. awk '{if($1==1){print $0}}': Filters out ZIP codes that appear only once.
  3. wc -l: Counts these unique ZIP codes.


Q4:


zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz|awk 'BEGIN{FS=","}{if($5=="\"NY\""){print $4}}' |sort |uniq -c  |awk '{print $1}' |sort -n |uniq -c
  1. Initial steps similar to Q2 to count each ZIP code in NY.
  2. awk '{print $1}': Extracts the count number.
  3. sort -n | uniq -c: Sorts and counts how often each frequency occurs.

Q5:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $2"\t"$4}' |sed "s/+//g" |sed "s/\"//g"   |awk '{if($1>=39.4){print $2}}' |sort |uniq |wc -l
  1. awk '{print $2"\t"$4}': Extracts latitude and ZIP code, separated by a tab.
  2. sed "s/+//g" and sed "s/\"//g": Removes pluses and quotes.
  3. awk '{if($1>=39.4){print $2}}': Filters ZIP codes with latitude >= 39.4.
  4. sort | uniq | wc -l: Counts unique ZIP codes that meet the criteria.

Q6:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz| awk 'BEGIN{FS=","}{print $4"\t"$5}'   |sort |uniq -c |sed "s/\([0-9]\) /\1\t/g" | sort  -k 3,3 -k 1,1nr -t "`/bin/echo -e '\t'`"  | sort -u -k 3,3 -t "`/bin/echo -e '\t'`"
  1. zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
  2. awk 'BEGIN{FS=","}: Sets the field separator to a comma. Prints the 4th (city) and 5th (state) fields separated by a tab.
  3. sort: Sorts the output alphabetically.
  4. uniq -c: Counts unique occurrences of each city-state pair.
  5. sed "s/\([0-9]\) /\1\t/g": Replaces the space between the count and city-state pair with a tab for better formatting.
  6. sort -k 3,3 -k 1,1nr -t "\`/bin/echo -e '\t'\`": Sorts the output first by state (3rd field), then by the count (1st field) in reverse numeric order, using a tab as the field separator.
  7. sort -u -k 3,3 -t "\`/bin/echo -e '\t'\`": Uniquely sorts the list by state, keeping only the first occurrence of each state (which is the city with the highest count due to the previous sort).

Q7:

zcat /home/projects/22126_NGS/exercises/unix/ZIP_CODES.csv.gz | awk 'BEGIN{FS=","}{print $4}' |sort |uniq  |awk '{print substr($1,0,2)}' |grep "\"M" |wc -l
  1. zcat: Decompresses the ZIP_CODES.csv.gz file and outputs its content.
  2. awk 'BEGIN{FS=","}: Sets the field separator to a comma. Prints only the 4th field (ZIP code).
  3. sort: Sorts the ZIP codes alphabetically.
  4. uniq: Filters out duplicate ZIP codes.
  5. awk '{print substr($1,0,2)}': Prints the first two characters of each ZIP code.
  6. grep "\"M": Filters for lines where the first two characters start with "M" (accounting for any quotes).
  7. wc -l: Counts the number of lines, giving the total number of unique ZIP codes starting with "M".