Python Tuple – Another Python Collection

There are lists, dictionaries, and tuples. These are all Python collections.

A Python tuple is like a non-changeable list. Instead of the square brackets that are used for lists, for tuples you use the regular parenthesis.

You are not able to use many of the methods on a Python tuple, because they are immutable. For example, you can not use sort, reverse, or append.

You can use the dir function to check what you can do with tuples, compared to lists.

Python tuple versus list
Compare what you can do with a Python tuple versus a list.

Why would you use tuples if they are not as capable as lists? You would use a Python tuple because it is more efficient, and they require less memory. You should use Python tuples when creating a collection that is temporary.

Two Way TuplesA nice thing about Python is you can do two things in one by placing a tuple on both the left and right side of an assignment statement.

Two Way Tuples 2Please note, the left-hand side must contain variables. Also, you can omit parenthesis on the left-hand side.

If you remember, the ‘items’ method for a Python dictionary returns a (key, value) pair. This pair is a Python tuple, so you can use a tuple as an iteration variable to loop through a dictionary.

Another nice thing about tuples is they are comparable. Comparison operators work with tuples. The first element will be compared first. If they are equal, then Python will move to the next element. It stops when it finds elements that differ.

Comparable Tuples

Compare and Sort a Python Tuple

This ability to compare Python tuples is a nice feature, because things that can be compared can also be sorted. You can use the built-in sorted function to do this, like in the following example.

#Create a dictionary. A dictionary can not be sorted.
>>>d = {‘alpha’:5, ‘charlie’:3, ‘beta’:4}
#Assign to variable x a sorted list of tuples.
>>>x = sorted(d.items())
>>>x
[(‘alpha’, 5), (‘beta’, 4), (‘charlie’, 3)]

 

Notice it only sorts the key. You can loop through this to print in sorted key order.

>>>for k, v in sorted(d.items()):
… print k, v

alpha 5
beta 4
charlie 3

 

Do you remember finding the most common word program? What if you want to find the five most common words? Rather than sort through the key, you will want to sort the value in descending order.

#Create a dictionary. A dictionary can not be sorted.
>>>d = {‘alpha’:5, ‘charlie’:3, ‘beta’:4}
#Creat a temporary list.
>>>temp_list = list()
#Loop through the dictionary, but append the value first!
>>>for k, v in d.items():
… temp_list.append( (v,k) )

>>>print temp_list
[(3, ‘charlie), (4, ‘beta’), (5, ‘alpha’)] #Reverse the sorted order of values.
>>>temp_list.sort(reverse=True)
>>>print temp_list
[(5, ‘alpha’), (4, ‘beta’), (3, ‘charlie’)]

 

The following is a program that will find the ten most common words in a text file.

10 most common words program
A Python program for finding the 10 most common words.

Once you become comfortable with this program, you can begin to understand ways to condense your code. The concept of list comprehension can make a dynamic list in one line.

# Start with the dictionary.
>>> d = {‘alpha’:5, ‘charlie’:3, ‘beta’:4}
# Use list comprehension to make a dynamic list.
>>> print sorted( [(v, k) for k, v in d.items()] )
[(3, ‘charlie), (4, ‘beta’), (5, ‘alpha’)]

 

The syntax inside the parenthesis serves as the list comprehension. It dynamically creates a list of the pair (v, k) as it itereates through the key, value pairs inside the dictionary. This syntax is rather dense, but you can use it as you become more comfortable programming in Python.

Python Dictionary – Powerful Data Collection

A collection, in Python, is like a piece of luggage that we can put things in. A variable is not a collection, because it stores only one value. Once a new value is assigned, the old value goes away. A Python dictionary, however, is considered a collection.

A Python dictionary allow us to store many things. The work like a variable that serves as an aggregate of many values.

The difference between a list and a dictionary is how the values are stored. A list is a linear collection, indexed by a value starting at zero. A Python dictionary is more like a bag of things. The things are not stored in any particular order, but each thing has its own label. We call the label a ‘key’, and the thing is its ‘value’.

A Python Dictionary are considered the most powerful data collection in Python. In other programming languages they are called different names like associative arrays, hash maps, or property bags.

You can create a Python dictionary as follows:

>>>suitcase = dict()
>>>suitcase[‘socks’] = 5
>>>suitcase[‘shirts’] = 3
>>>suitcase[‘pants’] = 2
>>>print suitcase
{‘socks’: 5, ‘shirts’: 3, ‘pants’: 2}

The socks, shirts, and pants are the ‘keys’ and the quantities are their ‘values’.

>>>suitcase[‘shirts’] = suitcase[‘shirts’] + 1
>>>print suitcase[‘shirts’] 4

That’s right! You just added to the value of shirts. However, unlike a Python list, there is no preserved order in a Python dictionary. Lists preserve order, dictionaries do not. Therefore, when you print the contents of a dictionary, do not expect it to come out in the same order you added the ‘key’: ‘value’ pairs.

You will get a traceback error if you reference a ‘key’ that is not in your dictionary. You check to see if the ‘key’ exists.

>>> print ‘underwear’ in suitcase
False

You can make an empty dictionary using curly brackets.

>>>empty_dic = {}

A common use for Python dictionaries is counting how often we see something.

counts = dict()
names = [‘bob’, ‘ted’, ‘bill’, ‘ted’, ‘bob’] for name in names:
++++if name not in counts:
++++++++counts[name] = 1
++++else:
++++++++counts[name] = counts[name] + 1
++++print counts

The above Python script should print {‘bob’: 2, ‘ted’: 2, ‘bill’: 1}

This pattern is so common that Python has a built-in method called ‘get()’ that does it for us. For example, print counts.get(name, 0) will return the name and its value, but if the name does not exist then it starts the value at zero. It’s a very valuable method.

Using this ‘get() method, the above Python script can be condensed as follows:

counts = dict()
names = [‘bob’, ‘ted’, ‘bill’, ‘ted’, ‘bob’] for name in names:
++++counts[name] = counts.get(name, 0) + 1
print counts

The following script will count the occurrence of each word in a line of text.

counts = dict()
print ‘Enter a line of text:’
line = raw_input(”)

words = line.split()
print words

for word in words:
++++counts[word] = counts.get(word, 0) + 1

print counts

Another common task is to use a definite loop on Python dictionaries.

for key in counts:
++++print key, counts[key]

The key is the actual word, and counts[key] is how many times the word was counted.

You can retrieve lists of keys and values with other built-in methods. For example, counts.keys() or counts.values(). There is counts.items(). This will return both keys and values. Each pair is referred to as a tuple. You can then loop through each key-value pair using two iteration variables.

for x, y in counts.items():
++++print x, y

Note, x is the ‘key’ and y is the ‘value’.

Now you should be able to fully understand the following script. It returns the most used word from a text file.

name = raw_input(‘Enter file:’)
handle = open(name, ‘r’)
text = handle.read()
words = text.split()
counts = dict()

for word in words:
++++counts[word] = counts.get(word, 0) + 1

bigcount = None
bigword = None
for word,count in counts.items():
++++if bigcount is None or count > bigcount:
++++++++bigword = word
++++++++bigcount = count

The Python List – Delve Into Data Science

Knowing how to manipulate a Python List is where you can really delve into data science. A Python list has square brackets. It is a collection wherein we assign multiple values to one variable. It is important to know how to find certain values within your lists.

Lists do not have to be of a single value type. However, converting a list to a numpy array will coerce the list to a single data type. A Python List should be converted to a numpy array if, for example, you want to make a scatter plot.

Lists can exist inside of a list.

You can look up values in a list, similar to how you lookup values in a string.

Remember the index operator from the lesson about Python strings?

>>> colors = [‘blue’, ‘green’, ‘red’] >>> print colors[1] green

 

Do not forget index values start at zero. That is why the above example returns ‘green’.

However, while strings are immutable, lists are mutable. This is a great feature of lists.

>>> lucky_numbers = [3, 21, 7, 68, 93] >>> lucky_numbers[4] = 36
>>> print lucky_numbers
[3, 21, 7, 68, 36]

 

See! The fourth index value of the list was changed.

You can use ‘len’ to know the length of a list.

>>> print len(lucky_numbers)
5

 

You can use a range function.

>>> print range(len(lucky_numbers))
[0, 1, 2, 3, 4]

 

Now you know that lucky_numbers has a range of five values. You might want to loop through the list, while keeping track of the range value.

for i in range(len(lucky_numbers)):
++++number = lucky_numbers[i] ++++print number

 

You can concatenate lists with the ‘+’ operator.

You can slice a Python list. Again, this is similar to strings.

>>> print lucky_numbers[0:3] [3, 21, 7]

 

There are many Python list methods that are built-in functions to do useful things to your list.

For example, you can append to a list.

>>> things = list()
>>> things.append(‘food’)
>>> things.append(5)
>>> print things
[‘food’, 5]

 

Find if something is, or is not in a list.

>>> ‘book’ in things
False
>>> 7 not in things
True

 

The ‘sort’ method will force a list to sort itself, like alphabetically for example.

There are lots great methods for lists of numbers, such as max, min, and sum.

You can loop through an input of numbers, and build a list.

A list building example.
How to loop through user input, and build a list with the input.

A very powerful method is ‘split’. This allows us to split a string into a list of words.

>>> lyric = ‘three little birds’
>>> words = lyric.split()
>>> print words
[‘three’, ‘little’, ‘birds’]

 

Split sees many spaces as just one space. So, if a line has lots of space at the end, then split will discard all that extra spice. This is very convenient.

Data could consist of no spaces, where every string is delimited by a comma, for example. You would pass the comma in as an argument to split.

>>> jibberish = ‘heh,reh,vtv’
>>> jiblist = jibberish.split(‘,’)
>>> print jiblist
[‘heh’, ‘reh’, ‘vtv’]

 

Python Open Function – Read Text Files

Text files are stored in secondary memory, so you need a way that tells Python where to go look for the text file we want it to read. This is what the built-in, Python open function is for.

For example:

handle = open(readme.txt, r)

In this example we assigned a variable name of handle that we can use to manipulate the file. The first parameter inside the parenthesis is the actual file name we want to open, and the second parameter of r indicates read mode. The other option is w for write mode, but for now we will use the read mode. The mode r is chosen by default for the Python open function if left blank.

It is important to note that the variable handle is not the file itself. Rather, it is a mechanism to use the file. To keep things simple, you want your text file to be in the same folder as your Python code file.

Before moving forward, you have to understand there is a special character to use that indicates when a line ends.

The New Line Character

>>>print ‘XY’
XY
>>>print ‘X\nY’
X
Y

The \n character tells us we start on a new line.

>>len(‘X\nY’)
3

You see that the new line character counts as only one character, even though it is technically two. Think of \n as syntax to encode a new line in a string.

You have to mentally visualize the new line character as being there at the end of lines in text files. Text editors do not show us this character when we are just looking at the text, but they are encoded there, and you have to know that.

The Efficiency of the Python Open Function

You can print every line in a file with three lines of Python code.

xfile = open(readme.txt)
for line in xfile:
++++print line

You can count the lines in a file with six lines of Python code.

xfile = open(readme.txt)
count = 0
for line in xfile:
++++count = count + 1
print ‘Line Count=’, count

Notice in these examples we used xfile instead of handle as our variable name. That is okay, it is just a variable name.

You can also read every character of a text file into one string. You would use the built-in read function.

xfile = open(readme.txt)
one_string = xfile.read()
print len(one_string)

This program will print the number of characters in your text file. In other words, it prints the length of one string.

You could print the first 20 characters.

print one_string[:20]

Select Only the Text You Want

Suppose you have a fairly large text file, and you only want to print only lines that start with the word ‘Time’.

xfile = open(readme.txt)
for line in xfile:
++++line = line.rstrip()
++++if line.startwith(‘Time’):
++++++++print line

You might be thinking, what is the purpose of line = line.rstrip()? Well, this bit of code strips the newline character from the text file. It is a built-in function that removes the whitespace so your program will not print blank lines.

You can skip a line by using the continue statement. The following Python code will skip every line that does not start with ‘I’.

xfile = open(readme.txt)
for line in xfile:
++++line = line.rstrip()
++++if not line.startwith(‘I’):
++++++++continue
++++print line

You can use ‘in’ to select lines. The following Python code will only select lines containing the ‘$’ character:

xfile = open(readme.txt)
for line in xfile:
line = line.rstrip()
++++if not ‘$’ in line:
++++++++continue
++++print line

How to Catch Bad User Input

Use this code to prompt a user for a file. This will catch a bad file if the user does not enter the file name correctly, or if the file does not exist.

xname = raw_input(‘Enter the file name: ‘)
try:
++++xfile = open(xname)
except:
++++print ‘File cannot be opened’
++++exit()

The Python String – Parse It!

A Python string is a sequence of characters. You can use single or double quotes to delimit a string.

We can look inside a Python string with the index operator. Use the square brackets for this, []. You must know that the index value is an integer and always starts at zero.

>>> creature = ‘monkey’
>>> print creature[0] m

The output is m because the index value of a string always starts at zero. In other words, m is the first index value of the string ‘monkey’.

>>> x = 4
>>> print creature[x-1] k

An expression can exist inside the index operator. Got it? Good!

Sometimes you need to know the length of a string.

>>> l = len(animal)
>>> print l
6

That’s right! The string ‘monkey’ has six characters. Note that len is a built-in function. It’s already been written for us, we just have to apply it.

You can loop through strings. The following is Python program to loop through a particular Python string.

food = ‘pizza’
for letter in food:
++++print letter

In the above program, the word letter is being used as an iteration variable.

You can write a Python program to count the occurrence of a letter in a string.

bigword = ‘supercalifragilisticexpialidocious’
count = 0
for letter in bigword:
++++if letter == ‘i’:
++++count = count + 1
++++print count

Slicing the Python String

You can slice a Python string to get a substring.

>>> bigword = ‘supercalifragilisticexpialidocious’
>>> slice = bigword[0:5] >>> print slice
super

You can see that it will slice up to, but not including the second index value. The fifth index value is the letter c, but that is not include in our slice.

If you omit a the first or second index value, it will assume the beginning or end respectively.

>>> slice = bigword[:] >>> print slice
supercalifragilisticexpialidocious

You can look for values.

>>> ‘x’ in bigword
True

Yes indeed! The letter x is in supercalifragilisticexpialidocious.

There is an extensive Python string library. These are built-in functions we can invoke on strings.

>>> greet = “HOw aRe YOu?”
>>> print greet.lower()
how are you?

You can look for a character and know its index value.

>>>idea = ‘Learn Python Programming Language’
>>>print idea.find(‘Python’)
6

Correct! Python start at the sixth index value of the string. Remember, index values start a zero!

Stripping Whitespace

You need to know how to remove whitespace at the beginning or end of strings. These are lstrip, rstrip, and strip. They remove whitespace from the left, right, and both sides respectively.

>>> color = ‘ blue’
>>> print color
++++blue
>>> print color.lstrip()
blue

See how that works? Great!

Sometimes you need to extract lines that begin with a certain string.

>>> line = ‘Email message sent at….’
>>> line.startswith(‘Email’)
True

These examples show how Python is really good at parsing data.