How to Parse Email Data with Python

This post will examine how to parse email data with Python.

You will go step by step through the program

This program will read through mbox-short.txt file, count the occurrences of each email, and put that data in a database.

In the first line of code you will import a library you need to talk to the database.

The second line of code establishes a connection to the database file emaildb.sqlite.

The third line of code creates a cursor object that allows you to make commands to the database.

email db

Next, you will call the execute method on the cursor. This program will create a new table called “Counts” every time it runs. So, first it will drop the table if it exists, and then it will create a new table.

drop table

Notice how you are using SQL commands in Python to talk to the database.

The next several lines of code are similar to what you learned in the Python dictionary post. Here, you are parsing through the file and pulling out the email address.parse data


The next line of code is different. This line uses a technique called parameter substitution.

parameter sub

In this line, you are selecting a row from the table that match the email. The question mark after email serves as a placeholder for a value that will be substituted in. You are substituting in the current value of the email variable.

Next, you will use a “try” statement. You want to use “try” because your program will blow up if no rows were found that matched the email from the previous line. If the email was found, then you advance the count of that email.

In the “except” statement, you are saving the program from blowing up from the email not being found in the table. Rather, you insert this new email into the table, and start the count at 1.

Finally, the conn.commit() statement is very important, because it writes your new changes to the database.


The next thing you will do is run select statement to list the top 10 emails in descending order.

top 10 emails

Before closing out the program, you will loop through the rows and print the columns for each row. It is a good idea to convert your column fields to str type before printing.

print entries

Make an SQL Database

In this post you will make an SQL database.

If you have not done so already, you need to install an SQL database tool. See picture below.


SQLite Browser reads and writes a special type of file that is optimized for rapid retrieval, and storage of data.

Run SQLite Browser and create a new database. Once complete, your screen should display the following:

create database

Now, go ahead and create a table. Use the following command on the right side of the picture shown below (assume case sensitivity). This specifies a new table called “Users”, with two columns, and up to 128 length variable characters. This code also forms a contract with the table. A contract with what is allowed, and not allowed. Put this command in the execute SQL window.

create table

It is up to you regarding the terms of the contract, but you need to plan it ahead of time.

Next, add data (“records”) in your table as shown below.

add records

You can even view the SQL Log to view the code every time you add a new record.

Now, you can learn for SQL commands to insert, delete, and retrieve data.


You put the SQL commands in the Execute SQL window.

At any time you can hit the “Browse Data” button to see all your records.

Following is how you delete a row.

delete row


This will delete all rows that have the matching email address.

The update statement allows you to find certain records and change their values.

sql update

Finally, you can do a retrieval with the statement shown below.

retrieving records

The “SELECT *” command translate to “Select All”.

You can also sort your records.

sorting records

Below is a summary of all the SQL commands you have learned in this post.

SQL summary

This covers the basics of SQL databases. The real power will come later when you learn to exploit connections between tables.

Database Application SQLite and Python

This post will introduce the database application SQLite in relation to Python.

For large project, there are two main roles. One of those roles in the database administrator, who often consults with the developer (the other role). These are rather specialized jobs.

two roles

The administrator talks directly to the database, while the developer talks to indirectly, by way of the application. The following picture illustrates this split between roles, in large projects.

large project structure

However, for smaller projects, one person can handle both roles.

By handling both these roles you will:

  1. Use the database application SQLite to create tables.
  2. Write Python programs to retrieve, clean, and put cleaned data in the tables.
  3. Write another program to pull the cleaned data out and output a nice file.

For now, you will focus on the first step. You will learn how to create a database model, or contract.

database model

There are several common database systems. Oracle dominated this market, mainly because they were the first to embrace the relational model concept. However, there are good alternatives.

common database systems


In the context of Python, you will use SQLite. As it turns out, SQLite is quite popular.

SQLite is Popular

SQLite is fast, and is good for smaller amounts of data. Most importantly, the database application SQLite is embedded in Python.


Using Databases with Python – A Must Know

It is a good idea to understand the importance of using databases with Python while learning Python. If you are analyzing data, pulling data from over the network, then it makes sense to store that data in a database. You can then set up a process of pulling data from the database as you need it. Overall, it can speed up your workflow.

A good database system to learn using databases with Python is DB Browser for SQLite.

using databases with python

Relational databases comprise a whole sub-field of computer science. They are relevant because you can pull out an entry from huge amounts of data in a split-second. It would take you much longer if you had to read through the data.

relational databases

You can look no further than Oracle to understand the relevance of relational databases. The majority of their revenue comes from database products.

The underlying foundation of databases is rooted in mathematics. This is present in the terminology that experts use to describe databases.

database terminology

The idea behind databases is that you model data at a connection point.

database model

However, programmers tend to think of it in terms of rows and columns.

Typically, when you make a table, the first row becomes metadata for the table. You often use the first row to title what each column is for. Therefore, you can refer to this first row as the schema for the table. It sets the rules for each column with regard to what goes there, for example, a string, an integer, etc.

In the early 1960s, the database pioneers figured out ways to quickly retrieve data from random access memory, without having to go through the data sequentially. However, databases were very complex. As a result, a new component of internet architecture evolved called the database application. This allows you, the programmer, to talk with the database, by way of the database application. At this point, an industry standard was desired for the language for the API between a database and its application. The name of the language the industry agreed on was SQL (Structured Query Language).


SQL is a great language, but it depends on the data being clean . The nice thing about Python, is it can deal really well with unstructured data. So together, Python and SQL, you have a powerful combination.


Inheritance – Extending a Class

This post will introduce the concept of inheritance.

Inheritance is a very important concept for object orientation. It is the idea that you can write a class, and borrow stuff from another class. It can also be referred to as extending a class.


Look at the code below where you have the “PartyAnimal” class, and a new class “FootballFan”. “FootballFan” extends “PartyAnimal”. Thus, “FootballFan” has the member variable “x”, “name”, and “points”. Everything from “PartyAnimal” gets pulled into the new class, plus another method is added.

football fan class

You can see that “FootballFan” is a class which extends “PartyAnimal”. It has all the capabilities of “PartyAnimal” and more.

Object Life Cycle

This post will discuss object life cycle.

Know that the word “construct” always refers to object life cycle.

object lifecycle

The primary purpose of the constructor is to set up some instance variables to have the proper initial values when the object is created.

The constructor and destructor are optional. The constructor is typically used to set up variables. The destructor is seldom used.

Consider the following class from the last post, but with two new methods.

object example

Most of the methods with double underscores are for code that gets run when certain occurrences happen in the language.

The only time the thing gets destructed is at the end of the program.

The execution of this program is quite simple.

program execution

The constructor is a special block of statements when the object is created.

Now, consider having more than one instance.

many instances

In the following code, you have two instance variables, a constructor, but no destructor.

two instances

The above program will create two independent instances.

Use Python Reserved Word Class for Making Objects

This post will demonstrate use of the Python reserved word class.

Take a look at the following class called “PartyAnimal”.

class party animal

Class is a reserved word that you will use for making python objects. Class remembers the template for an object.

The above picture demonstrates a template for making objects.

The variable “x” will be part of every instance of this class.

After the variable, you have a method with a name (“party”) and parameter (“self”). By convention, “self” is always the first parameter. More parameters can follow.

You can then call the object, put it in the variable “an”. At this point, “an” is an object. You can do things to it based on the definition of the class. Essentially “” is a short form of “”.

If you run the code, it will output the following:

code output

The variable “self” acts as alias to “an”. It is like “self.x” is saying “x within self”.

find capability

Now, every string has all of the below methods.

string methods



Python Object Oriented Programming

This post will examine Python object oriented programming.

When you think of Python object oriented programming, you should think of it as an orchestration of objects using the capabilities that each object has.

object oriented programming

A function is a bit of code, but an object is a bit of code and data.

python object

Part of the goal in object oriented programming is to take a complex problem and break it into smaller parts. Then you can hide complexity in the smaller parts, which allows you to work on other parts without having to worry about the complexity. Essentially, you want a simple interface, that hides complexity.

In the end, your program begins a network of objects that you orchestrate to get the desired output.

object orchestration