Lecture 18 — Dictionaries, Part 2 =================================== Overview -------- - More IMDB examples: - Dictionaries of string/set pairs - Converting dictionaries with one key to another - Combining information from multiple dictionaries - A different view of dictionaries: storing attribute/value pairs. - Twitter example. Recap of dictionaries --------------------- - Dictionaries are lists, except, you can have anything for indices (keys), not just numbers starting with 0. - The following two are equal:: >>> listoption = ['a','b','c'] >>> dictoption = {0:'a', 1: 'b', 2:'c'} - You would access them in the same way:: >>> listoption[1] 'b' >>> dictoption[1] 'b' - You would update them in the same way:: >>> listoption[1] = 'd' >>> dictoption[1] = 'd' - Of course the point of a dictionary is that keys can be anything: >>> d = {'Gru':3, 'Margo':4} >>> d['Gru'] 3 - This dictionary has strings as keys and integer as values. The values can be anything as well: >>> d2 = {'Gru': set( [123,456] ), 'Margo': set( [456] ) } >>> d2['Gru'] set([123, 456]) - Note that since keys can be anything, to print or iterate through the values in a dictionary, you need something other than range. Well, it helps that keys() returns all the keys in a dictionary:: >>> d2.keys() ['Gru', 'Margo'] >>> for key in d2.keys(): … print key, d2[key] Gru set([123, 456]) Margo set([456]) - These are really the most common operations on dictionaries: put a value, read a value, iterate through keys to get values. - There is a function to convert a dictionary to a list, by throwing away the indices. For example:: >>> d2.values() [set([123, 456]), set([456])] Exercises --------- #. Given the following dictionary for hobbies for people:: hobby = {'Gru':set(['Hiking','Cooking']), 'Edith':set(['Hiking','Board Games'])} creates a new dictionary that lists people for each hobby:: {'Hiking': set(['Vector','Edith']), 'Cooking':set(['Vector']), 'Board Games':set(['Edith'])} #. Write a program that uses a dictionary that associates integers (the key) and sets strings (the values) to find the number of movies in each year of the IMDB. Start from :: imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip() years_and_movies = {} for line in open(imdb_file): words = line.strip().split('|') movie_name = words[1].strip() year = int(words[2]) #. Write additional code that uses the ``years_and_movies`` dictionary to find the year that has the most movies. Dictionary for lookup --------------------- - Let us now use two different dictionaries to go between pieces of information - Movies: key: movies, value: sets of actors in the movie - Actors: key: actors, value: sets of movies that the actor has - Given an actor: - Find all the movies she has starred in - For each movie, find the actors in such movies - Play the degree of Kevin Bacon game! If actor is Kevin Bacon, then these are actors with Kevin Bacon degree of 1. Attribute / Value Pairs ----------------------- - We can use dictionaries to construct even more complicated data structures: dictionaries as values, lists of dictionaries, etc. - Consider the problem of representing all the houses a real estate company is trying to sell. - We could keep a list with information about each property, but a list of what? - We will look at describing each house as a dictionary, with the keys being the “attributes”, and the values being, well, the values of the attributes. - Examples include the listing reference number, the address, the number of bedrooms, the price, whether or not it has a pool, the style of the house, the age, etc. - Some properties will not be known and therefore they will not be represented in the dictionary. - We will work through a made-up example in class, producing a list of dictionaries. This list will be called ``houses``. - As an **exercise**, write code that finds all houses in our house list that have at least 4 bedrooms (attribute is ``bedrooms``, value is an integer), a pool (attribute is ``pool``, value a string describing if the pool is above ground or below), for a price below $300,000 (atttribute is ``price``, value is an int). - Overall, this a simple Python implementation of the storage and access of information in a *database*. Important Aside: Back to Copying and Aliasing --------------------------------------------- - **Exercise:** what is the output of the following? :: >>> d = dict() >>> d[15] = 'hi' >>> L = [] >>> L.append(d) >>> d[20] = 'bye' >>> L.append(d.copy()) >>> d[15] = 'hello' >>> del d[20] >>> L - The result may surprise you, but it reflects the difference between making an alias to an object and making a full copy of an object. - An alias is also sometimes known as a *shallow copy* - A full copy is also sometimes known as a *deep copy* - Assignment between lists, between sets, and between dictionaries all involve shallow copies! Accessing APIs ----------------- - Many APIs (Application Programming Interfaces) return values as JSON strings which are actually easily loaded into Python objects. - We will demo accessing Twitter through an API and processing the returned JSON object. - Accessing Twitter requires two modules: - ``oauth2`` is an open-source secure authorization software used by Twitter (and many others) - ``simplejson`` is a Python tool to parse Javascript Object Notation (JSON) returned by Twitter. - Querying Twitter: - Set up through ``oauth2`` based on keys and secrets previously obtained. - Search terms for the query are embedded in the URL - A pair is returned, containing a dictionary of information about the process of generating the query result, and a string containing the query result itself. - ``simplejson`` is used to *parse* this query string into a dictionary: - Two entries:: - ``search_metadata``, which has a dictionary of attributes - ``statuses``, which has is a list of the actual tweets - Each tweet in the list is a dictionary; one of the entries in this dictionary is the actual text. - Overall, this is a complicated hierarchy of lists and dictionaries, with each dictionary storing attribute/value pairs. - We will diagram it in class. - Once we understand the structure, we can write code to extract the information we want. Dictionary Practice Problems ---------------------------- #. Create a dictionary to store the favorite colors of the following individuals - Thomas prefers red - Ashok prefs green - Sandy prefer red - Alison prefers orange - Fei prefers green - Natasha prefs blue Then add some others of your own. Now, write code to change Fei’s preference to green and to remove Sandy’s preference from the dictionary. #. Using the dictionary from the first problem, write code to find which color is most commonly preferred. Use a second dictionary, one that associates strings (representing the colors) with the counts. Output the most common color. If there are ties, output all tied colors. #. Complete the fast, list solution to the movie counting problem based on sorting, as outlined at the start of the lecture notes. #. Use a dictionary to determine which last names are most common in the IMDB data we have provided. Count individual people not the movies they appear in. For example, ``'Hanks, Tom'`` counts as one instance of the name ``'Hanks"`` despite the fact that he is in many movies. Assume that the last name ends with the first ``','`` in the actual name. Start this problem by thinking about what the dictionary keys and values should be. #. Which two individuals have the most movies in common? To solve this you will need to start from the dictionary that associates each individual with the set of movies s/he is involved in. Then you will need double for loops. Summary ------- - Dictionaries of sets. - Dictionaries where the keys are numbers. - A variety of examples to extract information from the IMDB data set. - Dictionaries as database — storing attribute / value pairs. - Accessing Twitter information