Python Scatter Plot Example Using Matplotlib

A Python scatter plot example can be used as a reference to build another plot, or to remind us about the proper syntax.

Python scatter plots example often use the Matplotlib library because it is arguably the most powerful Python library for data visualization. It is usually used in combination with the Python Numpy library.

Suppose you have two Python lists. One is a list of  home prices, and the other list represents the size of the living area. You want to use these lists to see if there is a correlation between the two. This problem calls for a simple linear regression analysis. However, a scatter plot can help infer if there is a strong or weak correlation.

The Python Scatter Plot Example

The list for home prices is:

homeprice = [208500, 181500, 223500, 140000, 250000, 143000, 307000, 200000, 129900, 118000, 129500, 345000, 144000, 279500, 157000, 132000, 149000, 90000, 159000, 139000, 325300]

The list for living area size is:

livearea = [1710, 1262, 1786, 1717, 2198, 1362, 1694, 2090, 1774, 1077, 1040, 2324, 912, 1494, 1253, 854, 1004, 1296, 1114, 1339, 2376]

The next step is to import the libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Inline comments can explain the remaining steps.

# Convert lists to numpy arrays
np_homeprice = np.array(homeprice)
np_livearea = np.array(livearea)

# Set arguments for the x and y axis
plt.scatter(np_livearea, np_homeprice)

# Label x and y axis
plt.xlabel(‘Living Area Square Footage’)
plt.ylabel(‘Sale Price of Home’)

# Give title to plot
plt.title(‘Home Sale Price vs. Size of Living Area’)

# Display the plot
plt.show()

Executing this code will result in a scatter plot.

Python Scatter Plot Example

Play with this example in the interactive Google Colab.

How to Parse Email Data with Python

This post will examine how to parse email data with Python.

You will go step by step through the program emaildb.py.

This program will read through mbox-short.txt file, count the occurrences of each email, and put that data in a database.

In the first line of code you will import a library you need to talk to the database.

The second line of code establishes a connection to the database file emaildb.sqlite.

The third line of code creates a cursor object that allows you to make commands to the database.

email db

Next, you will call the execute method on the cursor. This program will create a new table called “Counts” every time it runs. So, first it will drop the table if it exists, and then it will create a new table.

drop table

Notice how you are using SQL commands in Python to talk to the database.

The next several lines of code are similar to what you learned in the Python dictionary post. Here, you are parsing through the file and pulling out the email address.parse data

 

The next line of code is different. This line uses a technique called parameter substitution.

parameter sub

In this line, you are selecting a row from the table that match the email. The question mark after email serves as a placeholder for a value that will be substituted in. You are substituting in the current value of the email variable.

Next, you will use a “try” statement. You want to use “try” because your program will blow up if no rows were found that matched the email from the previous line. If the email was found, then you advance the count of that email.

In the “except” statement, you are saving the program from blowing up from the email not being found in the table. Rather, you insert this new email into the table, and start the count at 1.

Finally, the conn.commit() statement is very important, because it writes your new changes to the database.

commit

The next thing you will do is run select statement to list the top 10 emails in descending order.

top 10 emails

Before closing out the program, you will loop through the rows and print the columns for each row. It is a good idea to convert your column fields to str type before printing.

print entries

Database Application SQLite and Python

This post will introduce the database application SQLite in relation to Python.

For large project, there are two main roles. One of those roles in the database administrator, who often consults with the developer (the other role). These are rather specialized jobs.

two roles

The administrator talks directly to the database, while the developer talks to indirectly, by way of the application. The following picture illustrates this split between roles, in large projects.

large project structure

However, for smaller projects, one person can handle both roles.

By handling both these roles you will:

  1. Use the database application SQLite to create tables.
  2. Write Python programs to retrieve, clean, and put cleaned data in the tables.
  3. Write another program to pull the cleaned data out and output a nice file.

For now, you will focus on the first step. You will learn how to create a database model, or contract.

database model

There are several common database systems. Oracle dominated this market, mainly because they were the first to embrace the relational model concept. However, there are good alternatives.

common database systems

 

In the context of Python, you will use SQLite. As it turns out, SQLite is quite popular.

SQLite is Popular

SQLite is fast, and is good for smaller amounts of data. Most importantly, the database application SQLite is embedded in Python.

 

Using Databases with Python – A Must Know

It is a good idea to understand the importance of using databases with Python while learning Python. If you are analyzing data, pulling data from over the network, then it makes sense to store that data in a database. You can then set up a process of pulling data from the database as you need it. Overall, it can speed up your workflow.

A good database system to learn using databases with Python is DB Browser for SQLite.

using databases with python

Relational databases comprise a whole sub-field of computer science. They are relevant because you can pull out an entry from huge amounts of data in a split-second. It would take you much longer if you had to read through the data.

relational databases

You can look no further than Oracle to understand the relevance of relational databases. The majority of their revenue comes from database products.

The underlying foundation of databases is rooted in mathematics. This is present in the terminology that experts use to describe databases.

database terminology

The idea behind databases is that you model data at a connection point.

database model

However, programmers tend to think of it in terms of rows and columns.

Typically, when you make a table, the first row becomes metadata for the table. You often use the first row to title what each column is for. Therefore, you can refer to this first row as the schema for the table. It sets the rules for each column with regard to what goes there, for example, a string, an integer, etc.

In the early 1960s, the database pioneers figured out ways to quickly retrieve data from random access memory, without having to go through the data sequentially. However, databases were very complex. As a result, a new component of internet architecture evolved called the database application. This allows you, the programmer, to talk with the database, by way of the database application. At this point, an industry standard was desired for the language for the API between a database and its application. The name of the language the industry agreed on was SQL (Structured Query Language).

metadata

SQL is a great language, but it depends on the data being clean . The nice thing about Python, is it can deal really well with unstructured data. So together, Python and SQL, you have a powerful combination.

SQL

Python Object Oriented Programming

This post will examine Python object oriented programming.

When you think of Python object oriented programming, you should think of it as an orchestration of objects using the capabilities that each object has.

object oriented programming

A function is a bit of code, but an object is a bit of code and data.

python object

Part of the goal in object oriented programming is to take a complex problem and break it into smaller parts. Then you can hide complexity in the smaller parts, which allows you to work on other parts without having to worry about the complexity. Essentially, you want a simple interface, that hides complexity.

In the end, your program begins a network of objects that you orchestrate to get the desired output.

object orchestration

An Introduction on Python Objects

This post will introduce a discussion about Python objects.

As the complexity of your programs increase, it’s a good idea to gain an understanding about object-oriented programming. This post will not examine a new skill, but rather introduce terminology that will need to know.

Python Objects

As your programs get more complex, you will need more complex data structures. Consider this example pictured below, where you construct a list, and inside this list is a dictionary. The dictionary of movies comprises the list.

Construct List

Coming up with shapes of data is part of solving programming problems. You can see in the example above, how it has been decided that each dictionary, in the list of movies, will be shaped a certain way. If each dictionary has the same shape, then you can write code that takes advantage of the consistency in shape.

Shape

As you can see in the above program, you will loop through the keys that you expect to be there.

In summary, the idea is to find ways to make data structures with consistency.

How to Talk to an Application Program Interface (“API”)

This post will focus on how to talk to an application program interface.

As you talk to APIs or web services, you have to understand how they think. You will need to read the set of rules for the API. The rules will tell you how to interface with the application.

API

There are a couple of choices for web service technologies. SOAP is considered difficult to work with. It is much easier to work with REST.

SOAP

A nice API to learn is the Google Maps Geocoding. It is always a good idea to read the API documentation.

Run the program geojson.py, and enter “Ann Arbor, MI” for the user input. The program will return the following JSON object:

JSON
This JSON data results from entering the URL for “Ann Arbor, MI” in the Google Geocoding API.

The nice thing about a REST based service is you can take the URL and paste it in a browser. You derive how to put the URL together from the API documentation.

The URL will retrieve JSON that gives you lots of data about the location. You can parse it with the JSON library in Python.

The following picture shows the entire geojson.py program:

longer code

Running the program prompts the user for a location. This example showed the JSON results data for entering “Ann Arbor, MI”.

Notice the program imports the “urllib” library, which gives you power to retrieve data on the internet. The “json” library gives you power to parse data that comes back.

The “serviceurl” is the one you get from reading the API documentation, but Python is able to encode it automatically. Look at the line which calls the method “urllib.urlencode”. This line of code is what encodes the URL.

The use of a “try” and “except” checks if the data is bad. If the data is bad then the loop breaks and the user is prompted to enter in a new location.

The line “print json.dumps(js, indent=4)” will dump the JSON object into a string and print it out nicely with indentation.

The lines of code for “lat” and “lng” are a bit tricky. It parses through dictionaries inside dictionaries from the JSON object.

The data this API provides can be very valuable, so do not assume the API is always free.

 

JSON Serialization Format for Pyhon

This post will examine the JSON serialization format (“JavaScript Object Notation”).

XML is good at representing things that may have elements nested within elements, like documents.

JSON is not so great at representing documents, but it is very good at representing many other types of data.

JSON

JSON is a cleaned up version of the constant syntax of JavaScript. In Python, the constant syntax for a Python list looks like this:

my_list = [‘item1’, ‘item2’, ‘item3’]

 

JavaScript uses arrays, instead of lists, but these are just different means to the same end. Also JavaScript has objects, but Python has dictionaries. Because JSON is a cleaned up version of JavaScript, it actually looks very similar to Python. Thus, if you already know Python, it should be very natural to look at JSON.

JSON was defined by Douglas Crockford. Once he published it, people quickly started using it. JSON is now an entire industry within itself. Its pure organic growth is a testament to its usefulness.

JSON has two basic structures. They are an array and an object. It’s best advantage is that in Python you tend to make lists and dictionaries. JSON is a great way to represent those.

Look at  the picture of some JSON below. It may seem familiar to you.

JSON Code

  • The data represents an object inside the triple quote syntax (which technically makes it a string).
  • After the first curly bracket, you have key / value pair followed by a comma.
  • The first key / value pair is “name” : “Chuck”.
  • In the second key / value pair, the value is a whole other object.
  • The key is “phone”, and its value is another object with two key / value pairs.

If you look at the whole outer thing, there are three keys: “name”, “phone”, and “email”.

This is the basic information about how you structure data, but the main thing you need to think about is how to de-serialize the data.

Like many other thinks, JSON is built-in to Python. This is why you start your code with:

import json

 

The next step is to de-serialize from string to internal Python data structure.

info = json.loads(data)

 

The method “loads” is saying load from string, and data is the string that you are passing in as the parameter.

The really nice part is that “info” is returned as an actual Python dictionary. You pull information out of this dictionary the same as you would any other native Python dictionary. Thus, running this code will result in the following:

Run Code

JSON Representation of an Array

JSON Data

The array “input” starts with square brackets. This is the same as a list in Python. In this case, “input” is an array of two objects. The objects are inside curly brackets, and separated by a comma.

Examine the following declaration:

info = json.loads(input)

As you could maybe guessed, this will return a native Python list. As with any list, you can use a “for” loop to iterate through the list items.

Running this program should result in what you would expect.

Program Output

How to Parse XML with Python

This post will focus on how to parse XML with Python.

Fortunately, XML is built-in to Python. So, this makes parsing XML fairly straight forward.

Open the file xml1.py. In this program, the XML data presents itself as a string. Note that the syntax for the string are triple quotes. Single quotes are used, because double quotes are part of XML. The new lines are part of the string.

xml1.py

At the beginning of your code, you should put the following import statement to pull in the XML parsing mechanism.

import xml.etree.ElementTree as ET

Below the data string, you see a line of code as follows:

tree = ET.fromstring(data)

The method “fromstring”, in the Element Tree library, passes in the data, and makes it an object. The object is given the name tree. Now, you can look at the underlying data inside the object.

Below is a screenshot that shows the result if you run this program.

result

Next, look at the xml2.py program. This code will parse out the list of users.

Wk5e_Parse_XML_3

In this program, the input gets converted to an object called stuff. A list is then created for each user in users. Notice a path is specified to find all the users. Next, The length of the list is printed, which tells you the number of users.

After you print the number of users, you can loop through your list of users and print the data you want.

How XML Schema Validates XML

XML Schema is a way to describe what is valid or not valid XML.

XML Schema

XML Schema is used for validation between applications. For example suppose communication between an airline company and a hotel company suddenly breaks. The XML schema is used to check on which side the mistake was made.

Picture below is a sample document and schema contract. You can see the tags between the two match up. However, if the document had a different tag name, than as specified in the contract, it would not get validated.

XML Valdation
If XML tags in the document agree with the schema, then the XML will get validated.

In essence, a schema formalizes the relationship between applications. There are many types of XML schema languages, but XSD from W3C tends to be the most common.

Look at the picture below for an example of XSD constraints. Constraints serve to lock-in the contract between applications.

XSD Contstraints

You should also be familiar with the various XSD data types.

XSD Data Types

You need to understand the date/time format, so that you will know how to sort it.

Date Format
It is best practice to not change the date format.

It is best to stick with this format when working with dates and time inside a computer.