How to Talk to an Application Program Interface (“API”)

This post will focus on how to talk to an application program interface.

As you talk to APIs or web services, you have to understand how they think. You will need to read the set of rules for the API. The rules will tell you how to interface with the application.


There are a couple of choices for web service technologies. SOAP is considered difficult to work with. It is much easier to work with REST.


A nice API to learn is the Google Maps Geocoding. It is always a good idea to read the API documentation.

Run the program, and enter “Ann Arbor, MI” for the user input. The program will return the following JSON object:

This JSON data results from entering the URL for “Ann Arbor, MI” in the Google Geocoding API.

The nice thing about a REST based service is you can take the URL and paste it in a browser. You derive how to put the URL together from the API documentation.

The URL will retrieve JSON that gives you lots of data about the location. You can parse it with the JSON library in Python.

The following picture shows the entire program:

longer code

Running the program prompts the user for a location. This example showed the JSON results data for entering “Ann Arbor, MI”.

Notice the program imports the “urllib” library, which gives you power to retrieve data on the internet. The “json” library gives you power to parse data that comes back.

The “serviceurl” is the one you get from reading the API documentation, but Python is able to encode it automatically. Look at the line which calls the method “urllib.urlencode”. This line of code is what encodes the URL.

The use of a “try” and “except” checks if the data is bad. If the data is bad then the loop breaks and the user is prompted to enter in a new location.

The line “print json.dumps(js, indent=4)” will dump the JSON object into a string and print it out nicely with indentation.

The lines of code for “lat” and “lng” are a bit tricky. It parses through dictionaries inside dictionaries from the JSON object.

The data this API provides can be very valuable, so do not assume the API is always free.


JSON Serialization Format for Pyhon

This post will examine the JSON serialization format (“JavaScript Object Notation”).

XML is good at representing things that may have elements nested within elements, like documents.

JSON is not so great at representing documents, but it is very good at representing many other types of data.


JSON is a cleaned up version of the constant syntax of JavaScript. In Python, the constant syntax for a Python list looks like this:

my_list = [‘item1’, ‘item2’, ‘item3’]


JavaScript uses arrays, instead of lists, but these are just different means to the same end. Also JavaScript has objects, but Python has dictionaries. Because JSON is a cleaned up version of JavaScript, it actually looks very similar to Python. Thus, if you already know Python, it should be very natural to look at JSON.

JSON was defined by Douglas Crockford. Once he published it, people quickly started using it. JSON is now an entire industry within itself. Its pure organic growth is a testament to its usefulness.

JSON has two basic structures. They are an array and an object. It’s best advantage is that in Python you tend to make lists and dictionaries. JSON is a great way to represent those.

Look at  the picture of some JSON below. It may seem familiar to you.


  • The data represents an object inside the triple quote syntax (which technically makes it a string).
  • After the first curly bracket, you have key / value pair followed by a comma.
  • The first key / value pair is “name” : “Chuck”.
  • In the second key / value pair, the value is a whole other object.
  • The key is “phone”, and its value is another object with two key / value pairs.

If you look at the whole outer thing, there are three keys: “name”, “phone”, and “email”.

This is the basic information about how you structure data, but the main thing you need to think about is how to de-serialize the data.

Like many other thinks, JSON is built-in to Python. This is why you start your code with:

import json


The next step is to de-serialize from string to internal Python data structure.

info = json.loads(data)


The method “loads” is saying load from string, and data is the string that you are passing in as the parameter.

The really nice part is that “info” is returned as an actual Python dictionary. You pull information out of this dictionary the same as you would any other native Python dictionary. Thus, running this code will result in the following:

Run Code

JSON Representation of an Array


The array “input” starts with square brackets. This is the same as a list in Python. In this case, “input” is an array of two objects. The objects are inside curly brackets, and separated by a comma.

Examine the following declaration:

info = json.loads(input)

As you could maybe guessed, this will return a native Python list. As with any list, you can use a “for” loop to iterate through the list items.

Running this program should result in what you would expect.

Program Output

How to Parse XML with Python

This post will focus on how to parse XML with Python.

Fortunately, XML is built-in to Python. So, this makes parsing XML fairly straight forward.

Open the file In this program, the XML data presents itself as a string. Note that the syntax for the string are triple quotes. Single quotes are used, because double quotes are part of XML. The new lines are part of the string.

At the beginning of your code, you should put the following import statement to pull in the XML parsing mechanism.

import xml.etree.ElementTree as ET

Below the data string, you see a line of code as follows:

tree = ET.fromstring(data)

The method “fromstring”, in the Element Tree library, passes in the data, and makes it an object. The object is given the name tree. Now, you can look at the underlying data inside the object.

Below is a screenshot that shows the result if you run this program.


Next, look at the program. This code will parse out the list of users.


In this program, the input gets converted to an object called stuff. A list is then created for each user in users. Notice a path is specified to find all the users. Next, The length of the list is printed, which tells you the number of users.

After you print the number of users, you can loop through your list of users and print the data you want.

How XML Schema Validates XML

XML Schema is a way to describe what is valid or not valid XML.

XML Schema

XML Schema is used for validation between applications. For example suppose communication between an airline company and a hotel company suddenly breaks. The XML schema is used to check on which side the mistake was made.

Picture below is a sample document and schema contract. You can see the tags between the two match up. However, if the document had a different tag name, than as specified in the contract, it would not get validated.

XML Valdation
If XML tags in the document agree with the schema, then the XML will get validated.

In essence, a schema formalizes the relationship between applications. There are many types of XML schema languages, but XSD from W3C tends to be the most common.

Look at the picture below for an example of XSD constraints. Constraints serve to lock-in the contract between applications.

XSD Contstraints

You should also be familiar with the various XSD data types.

XSD Data Types

You need to understand the date/time format, so that you will know how to sort it.

Date Format
It is best practice to not change the date format.

It is best to stick with this format when working with dates and time inside a computer.

Use eXtensible Markup Language – XML

This post will examine when to use eXtensible Markup Language (XML).

XML stands for eXtensible Markup Language. Most programmers would probably prefer JSON, which is the other common wire formatting language, but XML does have advantages in certain circumstances.

XML is good for representing documents. For example, the new format of Microsoft Word and PowerPoint ends in “x”, which stands for XML.

XML stands for eXtensible Markup Language.

XML is a textual representation of a tree structure with nodes. There are both simple and complex elements. Complex elements have tags within tags. Look at the picture below for an example.

XML Elements
This picture represents the difference between simple and complex elements.

Further, look at another picture for an illustration of more XML basics.

XML Basics
This picture color codes the basics of XML.

Indentation is used just for readability. In other words, white space is generally discarded.

In XML, unlike HTML, you make up the tag and attribute names to be useful in what you are describing.

XML Terminology

Indentation is often used to capture the nesting of elements.

For example:

  • In the picture below, the <a> tag has two child tags <b>, and <c>.
  • These tags are one level down from the root <a> tag. 
  • You could say <a> is the parent of <b> and <c>.
  • Also, <c> is the parent of <d> and <e>.
  • Text nodes and attribute nodes are considered children of the node itself.

XML as a tree

As a Python programmer, you could write code that traverses down tags, and pulls out information.

Web Services for Data on the Web

This post will discuss common web services.

Rather than retrieve and parse HTML documents, web services are URLs designed specifically to hand you data back for your application.

Web Services

XML and JSON are the two commonly used web services to format language going back and forth across the internet.

The problem is finding a way to send data that different programming languages can agree on. A Python dictionary, for example, is internally different from a Java hashmap, even though these data structures serve the same purpose. A “wire protocol” is how you send data structures in Python, that Java can agree on.

Wire Protocol
You send data across the net using a wire protocol.

The need for this wire protocol spawned two new terms.

Serialize is the act of taking an internal data structure, and creating a wire format.

De-Serialize is the act of taking the wire format and creating an internal data structure in a different language.

The wire protocol allows us to create sets of applications that work in different languages. Below is an example of the XML wire format.

XML Wire Format
This is an example of the XML wire format.

The next picture below is an example of the JSON wire format.

JSON Wire Format
This is an example of the JSON wire format.

XML and JSON are the two most common wire formats used for applications to exchange data.

BeautifulSoup Example as a Python Scraper

This post will give a BeautifulSoup example to demonstrate its usefulness as a Python scraper.

A problem you will encounter with HTML is that while the code might be technically correct, it could be edited in a very ugly fashion.

Even if you understand HTML, it can be hard to read if the code is ugly.

For example, there could be uneven indentations, inconsistent line spacing, or a host of other bad elements. A BeautifulSoup example will show how it can easily be used as a Python HTML parser.

Use BeautifulSoup as a Python scraper for HTML pages.

After you download BeautifulSoup, place the file in the same file as your Python programs. You can download it here

The demonstrations in this post will show you how to use a BeautifulSoup example with Python 2, rather than Python 3. The concepts are very similar for both versions of Python, but installation is a bit different.

BeautifulSoup Example for Retrieving Web Pages

Thanks to BeautifulSoup, it is very easy to retrieve web pages, and print all the “href” attributes of the anchor tags. These are essentially the links that go to other web pages. The whole program to do this is shown in the picture below.

BeautifulSoup example as a Python scraper
This program takes user input of an HTML page, and prints all the anchor tags from that page.

The second line in the code pictured above is crucial because it imports all routines in the file.

The variable “html” (which could be could anything, but calling it html makes sense) is used to return a string consisting of the entire HTML page.

The variable “soup” becomes an object of parsed HTML data. You can then ask to retrieve certain things from this variable.

How to Print All Anchor Tags in an HTML Document

An anchor tag in HTML looks like <a> </a>, so by passing ‘a’ into the soup object you will get the web address of the actual page that the anchor tag links to.

This BeautifulSoup example its power as a Python scraper, using the “urllib” and “BeautifulSoup” libraries to parse HTML.


Making Sense of HTML Documents – Using Python

This post will focus on making sense of HTML documents that you retrieve from a web server – using Python.

Look at the example pictured below. It displays useful code to retrieve a web page, and print out the content.

You can see HTML tags in the document. These tags are rendered on a web page to give it structure. Learning HTML is a whole other topic.

However, what you will focus on here is parsing through the content using Python, and looking for certain elements within the content.

Retrieve HTML
The purple text is an HTML link to another web page.

In the above picture, look at the string highlighted in purple. This string represents a link to another web page.

You can create a loop that parses out these types of string, puts it in a “fhand” variable, and opens the page. This type of loop could continue until it opens and prints all the content on the internet.

Realistically, your computer would get drained of its memory long before your loop completed parsing through all the web links on the internet, but this concept outlines the beginning of a web crawler. A web crawler employs what is referred to as web scraping.

Web Scraping

The Power of Web Scraping

Web Scraping gives you great power. You are literally able to make a copy of web, or part of it, given enough memory.

Some web servers employ shields, like a captcha for example, to ward off applications like Python from scraping their site. However, Python can usually outsmart these types of shields. On the other hand, some servers do not care if you scrape their pages.

Why Scrape HTML Documents

Why Web Scrape HTML Documents

You can see that there are many reasons why you may want to scrape the web. You could write Python code that checks for new apartments on Craigslist, for example. You could write Python code to pull social data.

Web scraping provides a way to pull data when there is no application program interface.

Some websites have rules regarding web scraping. Facebook, for example, does not allow it. Facebook does not display public data. You have to be logged in to see anything. So if you did try to scrape their site, your code would have to log you in first, and then Facebook could easily know it’s you scraping.

What next? See how this BeautifulSoup example makes it easy to scrape HTML.

Use Python for Web Scraping

This post will demonstrate how you write Python for web scraping.

Learning the HTTP application is fairly complex, but it is simple to apply in Python. The picture demonstrates how to make an HTTP request in Python.

HTTP Request

The line starting with “mysock.connect” is what pushes the socket out across the internet, and connects it to an endpoint.

It is crucial there is a server there to connect to, or else your code will crash right there at the third line. A crucial difference between connecting with a socket versus reading, is you can send and retrieve data with a socket.

Because you are using HTTP protocol, and you established the socket connection, then it is your responsibility to make the first communication.

The line starting with “mysock.send” makes first communication with a GET request. Once you make the GET request, you can scrape the data you want.

The while loop will receive data at 512 characters at a time. If the data is less than 512 characters, you will still receive it, unless it is less than one character.  Running this program should return the following data:

web scraping

Make the HTTP Request Easier

You might agree that the previous example showed you that it is fairly simple to make an HTTP request with Python. Well, there is a library called “urllib” that makes it even easier.

urllib in Python

The urllib library work like an extra application layer that makes a URL seem like it is just a file.

You can see that using urllib is similar to using a handle to open and read a file.


Make Your Python Socket Talk to the Internet

This post will show you how to make your Python socket talk to an application on another web server.

Once you establish a connection with your socket, you can use Python to browse web data. The most common protocol is HTTP (HyperText Transport Protocol). HTTP is a set of rules to allow browsers to retrieve web documents from servers over the internet.

python socket
Use this code to establish a socket in Python.

Examining the URL

Look at the URL in your location field or address bar of your web browser. It can be broken down into three parts.

For example, consider the URL

  1. The first part is the “http”. This tells you what protocol is being used.
  2. The second part, “”, refers to the host you want to talk to.
  3. The last part, “page1.htm”, refers to the file you want to retrieve.

Every time you click on a link to get a new page on the internet, your browser initiates a request / response cycle to GET the new page. This, in a nutshell, is the act of surfing the web.

Web Surfing
The act of surfing…the web.