Python Dictionary – Powerful Data Collection

A collection, in Python, is like a piece of luggage that we can put things in. A variable is not a collection, because it stores only one value. Once a new value is assigned, the old value goes away. A Python dictionary, however, is considered a collection.

A Python dictionary allow us to store many things. The work like a variable that serves as an aggregate of many values.

The difference between a list and a dictionary is how the values are stored. A list is a linear collection, indexed by a value starting at zero. A Python dictionary is more like a bag of things. The things are not stored in any particular order, but each thing has its own label. We call the label a ‘key’, and the thing is its ‘value’.

A Python Dictionary are considered the most powerful data collection in Python. In other programming languages they are called different names like associative arrays, hash maps, or property bags.

You can create a Python dictionary as follows:

>>>suitcase = dict()
>>>suitcase[‘socks’] = 5
>>>suitcase[‘shirts’] = 3
>>>suitcase[‘pants’] = 2
>>>print suitcase
{‘socks’: 5, ‘shirts’: 3, ‘pants’: 2}

The socks, shirts, and pants are the ‘keys’ and the quantities are their ‘values’.

>>>suitcase[‘shirts’] = suitcase[‘shirts’] + 1
>>>print suitcase[‘shirts’] 4

That’s right! You just added to the value of shirts. However, unlike a Python list, there is no preserved order in a Python dictionary. Lists preserve order, dictionaries do not. Therefore, when you print the contents of a dictionary, do not expect it to come out in the same order you added the ‘key’: ‘value’ pairs.

You will get a traceback error if you reference a ‘key’ that is not in your dictionary. You check to see if the ‘key’ exists.

>>> print ‘underwear’ in suitcase
False

You can make an empty dictionary using curly brackets.

>>>empty_dic = {}

A common use for Python dictionaries is counting how often we see something.

counts = dict()
names = [‘bob’, ‘ted’, ‘bill’, ‘ted’, ‘bob’] for name in names:
++++if name not in counts:
++++++++counts[name] = 1
++++else:
++++++++counts[name] = counts[name] + 1
++++print counts

The above Python script should print {‘bob’: 2, ‘ted’: 2, ‘bill’: 1}

This pattern is so common that Python has a built-in method called ‘get()’ that does it for us. For example, print counts.get(name, 0) will return the name and its value, but if the name does not exist then it starts the value at zero. It’s a very valuable method.

Using this ‘get() method, the above Python script can be condensed as follows:

counts = dict()
names = [‘bob’, ‘ted’, ‘bill’, ‘ted’, ‘bob’] for name in names:
++++counts[name] = counts.get(name, 0) + 1
print counts

The following script will count the occurrence of each word in a line of text.

counts = dict()
print ‘Enter a line of text:’
line = raw_input(”)

words = line.split()
print words

for word in words:
++++counts[word] = counts.get(word, 0) + 1

print counts

Another common task is to use a definite loop on Python dictionaries.

for key in counts:
++++print key, counts[key]

The key is the actual word, and counts[key] is how many times the word was counted.

You can retrieve lists of keys and values with other built-in methods. For example, counts.keys() or counts.values(). There is counts.items(). This will return both keys and values. Each pair is referred to as a tuple. You can then loop through each key-value pair using two iteration variables.

for x, y in counts.items():
++++print x, y

Note, x is the ‘key’ and y is the ‘value’.

Now you should be able to fully understand the following script. It returns the most used word from a text file.

name = raw_input(‘Enter file:’)
handle = open(name, ‘r’)
text = handle.read()
words = text.split()
counts = dict()

for word in words:
++++counts[word] = counts.get(word, 0) + 1

bigcount = None
bigword = None
for word,count in counts.items():
++++if bigcount is None or count > bigcount:
++++++++bigword = word
++++++++bigcount = count

Leave a Reply

Your email address will not be published. Required fields are marked *