Lecture 18 — Dictionaries, Part 2¶
Overview¶
- More IMDB examples:
- Dictionaries of string/set pairs
- Converting dictionaries with one key to another
- Combining information from multiple dictionaries
- A different view of dictionaries: storing attribute/value pairs.
- Twitter example.
Recap of dictionaries¶
Dictionaries are lists, except, you can have anything for indices (keys), not just numbers starting with 0.
The following two are equal:
>>> listoption = ['a','b','c'] >>> dictoption = {0:'a', 1: 'b', 2:'c'}
You would access them in the same way:
>>> listoption[1] 'b' >>> dictoption[1] 'b'
You would update them in the same way:
>>> listoption[1] = 'd' >>> dictoption[1] = 'd'
Of course the point of a dictionary is that keys can be anything:
>>> d = {'Gru':3, 'Margo':4} >>> d['Gru'] 3
This dictionary has strings as keys and integer as values. The values can be anything as well:
>>> d2 = {'Gru': set( [123,456] ), 'Margo': set( [456] ) } >>> d2['Gru'] set([123, 456])
Note that since keys can be anything, to print or iterate through the values in a dictionary, you need something other than range. Well, it helps that keys() returns all the keys in a dictionary:
>>> d2.keys() ['Gru', 'Margo'] >>> for key in d2.keys(): … print key, d2[key] Gru set([123, 456]) Margo set([456])
These are really the most common operations on dictionaries: put a value, read a value, iterate through keys to get values.
There is a function to convert a dictionary to a list, by throwing away the indices. For example:
>>> d2.values() [set([123, 456]), set([456])]
Exercises¶
Given the following dictionary for hobbies for people:
hobby = {'Gru':set(['Hiking','Cooking']), 'Edith':set(['Hiking','Board Games'])}
creates a new dictionary that lists people for each hobby:
{'Hiking': set(['Vector','Edith']), 'Cooking':set(['Vector']), 'Board Games':set(['Edith'])}
Write a program that uses a dictionary that associates integers (the key) and sets strings (the values) to find the number of movies in each year of the IMDB. Start from
imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip() years_and_movies = {} for line in open(imdb_file): words = line.strip().split('|') movie_name = words[1].strip() year = int(words[2])
Write additional code that uses the
years_and_movies
dictionary to find the year that has the most movies.
Dictionary for lookup¶
- Let us now use two different dictionaries to go between pieces of
information
- Movies: key: movies, value: sets of actors in the movie
- Actors: key: actors, value: sets of movies that the actor has
- Given an actor:
- Find all the movies she has starred in
- For each movie, find the actors in such movies
- Play the degree of Kevin Bacon game! If actor is Kevin Bacon, then these are actors with Kevin Bacon degree of 1.
Attribute / Value Pairs¶
- We can use dictionaries to construct even more complicated data structures: dictionaries as values, lists of dictionaries, etc.
- Consider the problem of representing all the houses a real estate company is trying to sell.
- We could keep a list with information about each property, but a list of what?
- We will look at describing each house as a dictionary, with the keys being the “attributes”, and the values being, well, the values of the attributes.
- Examples include the listing reference number, the address, the
number of bedrooms, the price, whether or not it has a pool, the
style of the house, the age, etc.
- Some properties will not be known and therefore they will not be represented in the dictionary.
- We will work through a made-up example in class, producing a list of
dictionaries. This list will be called
houses
. - As an exercise, write code that finds all houses in our house
list that have at least 4 bedrooms (attribute is
bedrooms
, value is an integer), a pool (attribute ispool
, value a string describing if the pool is above ground or below), for a price below $300,000 (atttribute isprice
, value is an int). - Overall, this a simple Python implementation of the storage and access of information in a database.
Important Aside: Back to Copying and Aliasing¶
Exercise: what is the output of the following?
>>> d = dict() >>> d[15] = 'hi' >>> L = [] >>> L.append(d) >>> d[20] = 'bye' >>> L.append(d.copy()) >>> d[15] = 'hello' >>> del d[20] >>> L
The result may surprise you, but it reflects the difference between making an alias to an object and making a full copy of an object.
- An alias is also sometimes known as a shallow copy
- A full copy is also sometimes known as a deep copy
Assignment between lists, between sets, and between dictionaries all involve shallow copies!
Accessing APIs¶
Many APIs (Application Programming Interfaces) return values as JSON strings which are actually easily loaded into Python objects.
We will demo accessing Twitter through an API and processing the returned JSON object.
Accessing Twitter requires two modules:
oauth2
is an open-source secure authorization software used by Twitter (and many others)simplejson
is a Python tool to parse Javascript Object Notation (JSON) returned by Twitter.
Querying Twitter:
- Set up through
oauth2
based on keys and secrets previously obtained. - Search terms for the query are embedded in the URL
- Set up through
A pair is returned, containing a dictionary of information about the process of generating the query result, and a string containing the query result itself.
simplejson
is used to parse this query string into a dictionary:Two entries:
- ``search_metadata``, which has a dictionary of attributes
statuses
, which has is a list of the actual tweets
Each tweet in the list is a dictionary; one of the entries in this dictionary is the actual text.
Overall, this is a complicated hierarchy of lists and dictionaries, with each dictionary storing attribute/value pairs.
- We will diagram it in class.
Once we understand the structure, we can write code to extract the information we want.
Dictionary Practice Problems¶
Create a dictionary to store the favorite colors of the following individuals
- Thomas prefers red
- Ashok prefs green
- Sandy prefer red
- Alison prefers orange
- Fei prefers green
- Natasha prefs blue
Then add some others of your own. Now, write code to change Fei’s preference to green and to remove Sandy’s preference from the dictionary.
Using the dictionary from the first problem, write code to find which color is most commonly preferred. Use a second dictionary, one that associates strings (representing the colors) with the counts. Output the most common color. If there are ties, output all tied colors.
Complete the fast, list solution to the movie counting problem based on sorting, as outlined at the start of the lecture notes.
Use a dictionary to determine which last names are most common in the IMDB data we have provided. Count individual people not the movies they appear in. For example,
'Hanks, Tom'
counts as one instance of the name'Hanks"
despite the fact that he is in many movies. Assume that the last name ends with the first','
in the actual name. Start this problem by thinking about what the dictionary keys and values should be.Which two individuals have the most movies in common? To solve this you will need to start from the dictionary that associates each individual with the set of movies s/he is involved in. Then you will need double for loops.
Summary¶
- Dictionaries of sets.
- Dictionaries where the keys are numbers.
- A variety of examples to extract information from the IMDB data set.
- Dictionaries as database — storing attribute / value pairs.
- Accessing Twitter information