Lecture 17 — Dictionaries, Part 2 ================================== Overview -------- - More IMDB examples: - Dictionaries of string/set pairs - Converting dictionaries with one key to another - Combining information from multiple dictionaries - A different view of dictionaries: storing attribute/value pairs. - Twitter example. Recap of dictionaries --------------------- - Dictionaries are lists, except, you can have anything for indices (keys), not just numbers starting with 0. - The following two are equal:: >>> listoption = ['a','b','c'] >>> dictoption = {0:'a', 1: 'b', 2:'c'} - You would access them in the same way:: >>> listoption[1] 'b' >>> dictoption[1] 'b' - You would update them in the same way:: >>> listoption[1] = 'd' >>> dictoption[1] = 'd' - Of course the point of a dictionary is that keys can be anything: >>> d = {'Gru':3, 'Margo':4} >>> d['Gru'] 3 - This dictionary has strings as keys and integer as values. The values can be anything as well: >>> d2 = {'Gru': set( [123,456] ), 'Margo': set( [456] ) } >>> d2['Gru'] set([123, 456]) - Note that since keys can be anything, to print or iterate through the values in a dictionary, you need something other than range. Well, it helps that the keys() function returns all the keys in a dictionary:: >>> d2.keys() ['Gru', 'Margo'] >>> for key in d2.keys(): … print key, d2[key] Gru set([123, 456]) Margo set([456]) - These are really the most common operations on dictionaries: put a value, read a value, iterate through keys to get values. - There is a function to convert a dictionary to a list, by throwing away the indices. For example:: >>> d2.values() [set([123, 456]), set([456])] Exercises --------- #. Given the following dictionary for hobbies for people:: hobby = {'Gru':set(['Hiking','Cooking']), 'Edith':set(['Hiking','Board Games'])} create a new dictionary that lists people for each hobby:: {'Hiking': set(['Gru','Edith']), 'Cooking':set(['Gru']), 'Board Games':set(['Edith'])} #. Write a program that uses a dictionary that associates integers (the key) and sets strings (the values) to find the number of movies in each year of the IMDB. Start from :: imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip() years_and_movies = {} for line in open(imdb_file): words = line.strip().split('|') movie_name = words[1].strip() year = int(words[2]) #. Write additional code that uses the ``years_and_movies`` dictionary to find the year that has the most movies. Dictionary for lookup --------------------- - Let us now use two different dictionaries to go between pieces of information - Movies: key: movies, value: sets of actors in the movie - Actors: key: actors, value: sets of movies that the actor has - Given an actor: - Find all the movies she has starred in - For each movie, find the actors in such movies - Play the degree of Kevin Bacon game! If actor is Kevin Bacon, then these are actors with Kevin Bacon degree of 1. Attribute / Value Pairs ----------------------- - We can use dictionaries to construct even more complicated data structures: dictionaries as values, lists of dictionaries, etc. - Consider the problem of representing all the houses a real estate company is trying to sell. - We could keep a list with information about each property, but a list of what? - We will look at describing each house as a dictionary, with the keys being the “attributes”, and the values being, well, the values of the attributes. - Examples include the listing reference number, the address, the number of bedrooms, the price, whether or not it has a pool, the style of the house, the age, etc. - Some properties will not be known and therefore they will not be represented in the dictionary. - We will work through a made-up example in class, producing a list of dictionaries. This list will be called ``houses``. - As an **exercise**, write code that finds all houses in our house list that have at least 4 bedrooms (attribute is ``bedrooms``, value is an integer), a pool (attribute is ``pool``, value a string describing if the pool is above ground or below), for a price below $300,000 (atttribute is ``price``, value is an int). - Overall, this a simple Python implementation of the storage and access of information in a *database*. Important Aside: Back to Copying and Aliasing --------------------------------------------- - **Exercise:** what is the output of the following? :: >>> d = dict() >>> d[15] = 'hi' >>> L = [] >>> L.append(d) >>> d[20] = 'bye' >>> L.append(d.copy()) >>> d[15] = 'hello' >>> del d[20] >>> L - The result may surprise you, but it reflects the difference between making an alias to an object and making a full copy of an object. - An alias is also sometimes known as a *shallow copy* - A full copy is also sometimes known as a *deep copy* - Assignment between lists, between sets, and between dictionaries all involve shallow copies! Accessing APIs ----------------- - Many APIs (Application Programming Interfaces) return values as JSON strings which are actually easily loaded into Python objects, often involving dictionaries. - Best way to understand the dictionary structure returned by an API is to seek documentation. If that fails, you can print the top level keys and values to explore. - Public APIs do not require authentication and are accessed as follows: :: import urllib import json url = "enter your public url here" f = urllib.urlopen(url) rawcontent = f.read() content = json.loads(rawcontent) - A few examples of public APIs are (both used in our image lab): #. **nominatim** that gives you a bounding box of geolocation for a given location. Let's see this for 'Troy, NY': :: url = "http://nominatim.openstreetmap.org/"\ "search?q=%s&format=json&polygon_geojson=1&addressdetails=0"\ %('Troy, NY') #. **panaromio** that returns the url for pictures of a given box of geolocations. The following is for 'Troy, NY', box obtained from the above call: :: url = "http://www.panoramio.com/map/get_panoramas.php?set=public&"\ "from=0&to=5&minx=%s&miny=%s&maxx=%s&maxy=%s&size=medium&mapfilter=true" \ %('-73.8517851','42.5684117','-73.5317851','42.8884117') - Many sources require authentication with an API key through the ``oauth2`` authentication module. But, the overall method of access remains the same after authentication. - Once we understand the structure, we can write code to extract the information we want. Dictionary Practice Problems ---------------------------- #. Create a dictionary to store the favorite colors of the following individuals - Thomas prefers red - Ashok prefs green - Sandy prefers red - Alison prefers orange - Fei prefers green - Natasha prefs blue Then add some others of your own. Now, write code to change Fei’s preference to green and to remove Sandy’s preference from the dictionary. #. Using the dictionary from the first problem, write code to find which color is most commonly preferred. Use a second dictionary, one that associates strings (representing the colors) with the counts. Output the most common color. If there are ties, output all tied colors. #. Complete the fast, list solution to the movie counting problem based on sorting, as outlined at the start of the lecture notes. #. Use a dictionary to determine which last names are most common in the IMDB data we have provided. Count individual people not the movies they appear in. For example, ``'Hanks, Tom'`` counts as one instance of the name ``'Hanks"`` despite the fact that he is in many movies. Assume that the last name ends with the first ``','`` in the actual name. Start this problem by thinking about what the dictionary keys and values should be. #. Which two individuals have the most movies in common? To solve this you will need to start from the dictionary that associates each individual with the set of movies s/he is involved in. Then you will need double for loops. Summary ------- - Dictionaries of sets. - Dictionaries where the keys are numbers. - A variety of examples to extract information from the IMDB data set. - Dictionaries as database — storing attribute / value pairs. - Accessing information from public APIs