For Day 4, the problem was about validating passport entries in a file. Possible data fields in the passport are:

  • byr (Birth Year)
  • iyr (Issue Year)
  • eyr (Expiration Year)
  • hgt (Height)
  • hcl (Hair Color)
  • ecl (Eye Color)
  • pid (Passport ID)
  • cid (Country ID)

cid was an optional field, for the entire problem, as it turns out, so I’m not sure why it was even there to begin with.

Part one involved making sure that all of the required fields were present in a passport. This part was very easy to do, since it boils down to checking for the presence of certain character sequences in a string (misspellings or other formatting errors were not part of the problem):

fields = set(["eyr", "byr", "hcl", "pid", "ecl", "hgt", "iyr"])
def has_all_fields(passport):
    if all([x in passport for x in fields]):
        return True
    return False

The second part was a little more interesting. Various rules were laid out for the values associated with the different fields, and the second layer of validation involved checking the constraints. The first task was to retrieve the values from the fields. I chose to use regular expressions, because they’re fun:

byr_rgx = re.compile("(?<=byr:)([0-9]{4})")
iyr_rgx = re.compile("(?<=iyr:)([0-9]{4})")
eyr_rgx = re.compile("(?<=eyr:)([0-9]{4})")
hgt_rgx = re.compile("(?<=hgt:)([0-9]{2,3})(cm|in)")
hcl_rgx = re.compile("(?<=hcl:#)([a-f0-9]{6})")
ecl_rgx = re.compile("(?<=ecl:)(amb|blu|brn|gry|grn|hzl|oth)")
pid_rgx = re.compile("(?<=pid:)([0-9]{9})(?![0-9])")
cid_rgx = re.compile("(?<=cid:)([0-9]+)")

For each passport, the different regular expressions were used to extract the values, and then any additional constraints were applied. For example, with the expiration year validation, the regular expression would only match on values that container exactly four digits:

eyr_rgx = re.compile("(?<=eyr:)([0-9]{4})")

Let’s break down the expression being used here. (?<=eyr:) is a look-behind assertion. For an expression preceded by a look-behind to match, the look-behind must be matched as well, but it’s not part of what “matched”, so to speak.

[0-9] is a matching group. Anything in the brackets will match, for a singe character. The 0-9 is shorthand for 0123456789, or all of the digits. To use a matching group like this for more than one character, you would add a quantifier, which is done here, with the curly braces: {x} means “must match exactly x times”. So, the expression [0-9]{4} translates to “exactly four digits”. The entire expression would be expressed in English as “exactly four digits that are preceded by eyr:”.

No match, and the passport was rejected as invalid:

eyr = re.search(eyr_rgx, passport)
if eyr is None:
    return False

Next, the value had to satisfy $$2020 \le eyr \le 2030$$:

if not 2020 <= int(eyr.group()) <= 2030:
    return False

Most of the rules worked the same way, requiring a certain number of digits and then a range. Height, however, did not work this way, units were also included:

hgt_rgx = re.compile("(?<=hgt:)([0-9]{2,3})(cm|in)")

This regular expression can be read as “match at least two and at most three digits, and either cm or in, all preceded by hgt:”. The parentheses here are called “capture groups”, and can be used to separate parts of a match from each other. Note that parentheses aren’t capture groups when they’re part of look-behind or look-ahead assertions.

To validate the height:

hgt = re.search(hgt_rgx, passport)
if hgt is None:
    return None
hgt, unit = int(hgt.group(1)), hgt.group(2)

if unit == "cm" and not (150 <= hgt <= 193):
    return False
if unit == "in" and not (59 <= hgt <= 76):
    return False

If you want to suffer have some fun with regular expressions, you can try out Regex Crossword, a website where you can do crossword puzzles that are all regular expressions!

If you work with regular expressions a lot, you should find a regex tester that you like, and use it to help you write expressions. I like Regex101, it has support for Python, which is where I’m usually using regular expressions.

One thing to keep in mind is that there’s no real specification about what a regular expression engine has to provide. All of the expressions I wrote, for example, won’t with the <regex> functionality in the C++ standard library, because C++ doesn’t support look-behind or look-ahead assertions.