Example code - Regex
Files used in example
Database (text file) of people
Average height of certain people
In the file there are entries like
CPR: 230226-9781 First name: Anton Last name: Gade Height: 201 Weight: 65 Eye color: Black Blood type: A+ Children: 081154-2786 120853-1151 050354-4664
The problem we want to solve is to find the average height of the people in the file with a user given first name.
This solution will be done without using regular expressions.
Since we need data from different lines, some kind of stateful parsing has to be implemented. There are several possibilities.
- Assume a new (person) record always start with the CPR field.
- Assume that every record is separated by a newline.
- Assume that the order of the fields are always the same.
I like the first assumption. Notice how the program is easy to change to look for more keywords than just name and height in a record. Notice also how we store the data immediately when the last part in the record has been found. There is no "extra" manipulation of the last record.
#!/usr/bin/env python3 import sys # Ask for first name name = input("What is the first name of the people we are looking for: ") hit_count = 0 accumulated_height = 0 try: with open("people.db", "r") as people_file: keyword_hits = 0 for line in people_file: try: keyword, value = line.split(":") except ValueError: keyword, value = '', '' if keyword == 'CPR': keyword_hits = 0 elif keyword == "First name": this_name = value.strip() keyword_hits += 1 elif keyword == "Height": this_height = int(value.strip()) keyword_hits += 1 if keyword_hits == 2: keyword_hits = 0 if this_name == name: hit_count += 1 accumulated_height += this_height except IOError as err: print("Can not read file:" ,filename, "Reason:", str(err)) sys.exit(1) if hit_count == 0: print("There are no people in file file with the name:", name) else: print("People named {} has the average height of {:.1f} cm".format(name, accumulated_height/hit_count)) print("Number of occurrences:", hit_count)
Average weight of certain people
The problem we want to solve is to find the average weight of the people in the file with a user given last name. This solution will be using regular expressions.
import sys, re # Ask for first name name = input("What is the last name of the people we are looking for: ") hit_count = 0 accumulated_weight = 0 try: with open("people.db", "r") as people_file: keyword_hits = 0 for line in people_file: if re.search("^CPR:", line): keyword_hits = 0 regex_obj = re.search("^Last name:\s*(.+)", line) if regex_obj is not None: this_name = regex_obj.group(1).strip() keyword_hits += 1 regex_obj = re.search("^Weight:\s*(\d+)", line) if regex_obj is not None: this_weight = int(regex_obj.group(1)) keyword_hits += 1 if keyword_hits == 2: keyword_hits = 0 if this_name == name: hit_count += 1 accumulated_weight += this_weight except IOError as err: print("Can not read file:" ,filename, "Reason:", str(err)) sys.exit(1) if hit_count == 0: print("There are no people in file file with the name:", name) else: print("People named {} has the average weight of {:.1f} kg".format(name, accumulated_weight/hit_count)) print("Number of occurrences:", hit_count)
As can be seen from the two examples you can get far with pattern matching in python without resorting to regular expressions. Regexes are great, but often you can solve your problem with more simple methods.