Practice Regular Expressions with Python Programs

A good way to practice regular expressions, is to take some of the Python programs you used before, and add Python regular expressions to give them sophistication.

Consider the example line below from the mbox-short.txt file.

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

 

Look at, and analyze the following Python regular expression, which will extract the email address.

re.findall(‘\S+@\S+’, x)

 

Assume ‘x’ has been assigned to your example line. This will match the ‘@’ character. Then it will push to the left, and to the right, until it encounters a space (‘\S+’). The ‘\S’ includes non-whitespace characters, and the ‘+’ includes those that occur one or more times.

You can practice regular expressions to fine tune it more. The following will only extract email addresses out of lines that start with ‘From ‘:

re.findall(‘^From (‘\S+@\S+’)’, x)

 

The (‘\S+@\S+’) is the only part that is returned in a list.

What if you only want to extract the domain from the example line?

Pictured below is the Python fundamental way of coding this program.

Extract Domain

You could also code this a fundamental way using a double split pattern.

Double Split Pattern

Coding this same program with a Python regular expression would result in the following:

re.findall(‘@([^ ]*)’, x)

 

Always refer to the Python regular expression guide for help with meaning of the special characters. If you do not want a special character to function with its special meaning, then prepend it with a backslash. For example ‘\$’ would be a real dollar sign, rather than match the end of a line.

Leave a Reply

Your email address will not be published. Required fields are marked *