Example code - Regex

From 22113
Jump to navigation Jump to search

Files used in example

Database (text file) of people

Average height of certain people

In the file there are entries like

CPR: 230226-9781
First name: Anton
Last name: Gade
Height: 201
Weight: 65
Eye color: Black
Blood type: A+
Children: 081154-2786 120853-1151 050354-4664

The problem we want to solve is to find the average height of the people in the file with a user given first name. This solution will be done without using regular expressions.
Since we need data from different lines, some kind of stateful parsing has to be implemented. There are several possibilities.

  • Assume a new (person) record always start with the CPR field.
  • Assume that every record is separated by a newline.
  • Assume that the order of the fields are always the same.

I like the first assumption. Notice how the program is easy to change to look for more keywords than just name and height in a record. Notice also how we store the data immediately when the last part in the record has been found. There is no "extra" manipulation of the last record.

#!/usr/bin/env python3
import sys

# Ask for first name
name = input("What is the first name of the people we are looking for: ")

hit_count = 0
accumulated_height = 0
try:
    with open("people.db", "r") as people_file:
        keyword_hits = 0
        for line in people_file:
            try:
                keyword, value = line.split(":")
            except ValueError:
                keyword, value = '', ''
            if keyword == 'CPR':
                keyword_hits = 0
            elif keyword == "First name":
                this_name = value.strip()
                keyword_hits += 1
            elif keyword == "Height":
                this_height = int(value.strip())
                keyword_hits += 1
            if keyword_hits == 2:
                keyword_hits = 0
                if this_name == name:
                    hit_count += 1
                    accumulated_height += this_height
except IOError as err:
    print("Can not read file:" ,filename, "Reason:", str(err))
    sys.exit(1)

if hit_count == 0:
    print("There are no people in file file with the name:", name)
else:
    print("People named {} has the average height of {:.1f} cm".format(name, accumulated_height/hit_count))
    print("Number of occurrences:", hit_count)

Average weight of certain people

The problem we want to solve is to find the average weight of the people in the file with a user given last name. This solution will be using regular expressions.

import sys, re

# Ask for first name
name = input("What is the last name of the people we are looking for: ")

hit_count = 0
accumulated_weight = 0
try:
    with open("people.db", "r") as people_file:
        keyword_hits = 0
        for line in people_file:
            if re.search("^CPR:", line):
                keyword_hits = 0
            regex_obj = re.search("^Last name:\s*(.+)", line)
            if regex_obj is not None:
                this_name = regex_obj.group(1).strip()
                keyword_hits += 1
            regex_obj = re.search("^Weight:\s*(\d+)", line)
            if regex_obj is not None:
                this_weight = int(regex_obj.group(1))
                keyword_hits += 1
            if keyword_hits == 2:
                keyword_hits = 0
                if this_name == name:
                    hit_count += 1
                    accumulated_weight += this_weight
except IOError as err:
    print("Can not read file:" ,filename, "Reason:", str(err))
    sys.exit(1)

if hit_count == 0:
    print("There are no people in file file with the name:", name)
else:
    print("People named {} has the average weight of {:.1f} kg".format(name, accumulated_weight/hit_count))
    print("Number of occurrences:", hit_count)

As can be seen from the two examples you can get far with pattern matching in python without resorting to regular expressions. Regexes are great, but often you can solve your problem with more simple methods.