Lecture 17 — Dictionaries, Part 2

Overview

  • More IMDB examples:
    • Dictionaries of string/set pairs
    • Converting dictionaries with one key to another
    • Combining information from multiple dictionaries
  • A different view of dictionaries: storing attribute/value pairs.
  • Twitter example.

Recap of dictionaries

  • Dictionaries are lists, except, you can have anything for indices (keys), not just numbers starting with 0.

  • The following two are equal:

    >>> listoption = ['a','b','c']
    >>> dictoption = {0:'a', 1: 'b', 2:'c'}
    
  • You would access them in the same way:

    >>> listoption[1]
    'b'
    >>> dictoption[1]
    'b'
    
  • You would update them in the same way:

    >>> listoption[1] = 'd'
    >>> dictoption[1] = 'd'
    
  • Of course the point of a dictionary is that keys can be anything:

    >>> d = {'Gru':3, 'Margo':4}
    >>> d['Gru']
    3
    
  • This dictionary has strings as keys and integer as values. The values can be anything as well:

    >>> d2 = {'Gru': set( [123,456] ), 'Margo': set( [456] ) }
    >>> d2['Gru']
    set([123, 456])
    
  • Note that since keys can be anything, to print or iterate through the values in a dictionary, you need something other than range. Well, it helps that the keys() function returns all the keys in a dictionary:

    >>> d2.keys()
    ['Gru', 'Margo']
    >>> for key in d2.keys():
    …        print key, d2[key]
    
    Gru set([123, 456])
    Margo set([456])
    
  • These are really the most common operations on dictionaries: put a value, read a value, iterate through keys to get values.

  • There is a function to convert a dictionary to a list, by throwing away the indices. For example:

    >>> d2.values()
    [set([123, 456]), set([456])]
    

Exercises

  1. Given the following dictionary for hobbies for people:

    hobby = {'Gru':set(['Hiking','Cooking']), 'Edith':set(['Hiking','Board Games'])}
    

    create a new dictionary that lists people for each hobby:

    {'Hiking': set(['Gru','Edith']), 'Cooking':set(['Gru']), 'Board Games':set(['Edith'])}
    
  2. Write a program that uses a dictionary that associates integers (the key) and sets strings (the values) to find the number of movies in each year of the IMDB. Start from

    imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
    years_and_movies = {}
    for line in open(imdb_file):
        words = line.strip().split('|')
        movie_name = words[1].strip()
        year = int(words[2])
    
  3. Write additional code that uses the years_and_movies dictionary to find the year that has the most movies.

Dictionary for lookup

  • Let us now use two different dictionaries to go between pieces of information
    • Movies: key: movies, value: sets of actors in the movie
    • Actors: key: actors, value: sets of movies that the actor has
  • Given an actor:
    • Find all the movies she has starred in
    • For each movie, find the actors in such movies
  • Play the degree of Kevin Bacon game! If actor is Kevin Bacon, then these are actors with Kevin Bacon degree of 1.

Attribute / Value Pairs

  • We can use dictionaries to construct even more complicated data structures: dictionaries as values, lists of dictionaries, etc.
  • Consider the problem of representing all the houses a real estate company is trying to sell.
  • We could keep a list with information about each property, but a list of what?
  • We will look at describing each house as a dictionary, with the keys being the “attributes”, and the values being, well, the values of the attributes.
  • Examples include the listing reference number, the address, the number of bedrooms, the price, whether or not it has a pool, the style of the house, the age, etc.
    • Some properties will not be known and therefore they will not be represented in the dictionary.
  • We will work through a made-up example in class, producing a list of dictionaries. This list will be called houses.
  • As an exercise, write code that finds all houses in our house list that have at least 4 bedrooms (attribute is bedrooms, value is an integer), a pool (attribute is pool, value a string describing if the pool is above ground or below), for a price below $300,000 (atttribute is price, value is an int).
  • Overall, this a simple Python implementation of the storage and access of information in a database.

Important Aside: Back to Copying and Aliasing

  • Exercise: what is the output of the following?

    >>> d = dict()
    >>> d[15] = 'hi'
    >>> L = []
    >>> L.append(d)
    >>> d[20] = 'bye'
    >>> L.append(d.copy())
    >>> d[15] = 'hello'
    >>> del d[20]
    >>> L
    
  • The result may surprise you, but it reflects the difference between making an alias to an object and making a full copy of an object.

    • An alias is also sometimes known as a shallow copy
    • A full copy is also sometimes known as a deep copy
  • Assignment between lists, between sets, and between dictionaries all involve shallow copies!

Accessing APIs

  • Many APIs (Application Programming Interfaces) return values as JSON strings which are actually easily loaded into Python objects, often involving dictionaries.

  • Best way to understand the dictionary structure returned by an API is to seek documentation. If that fails, you can print the top level keys and values to explore.

  • Public APIs do not require authentication and are accessed as follows:

    import urllib
    import json
    
    url = "enter your public url here"
    f = urllib.urlopen(url)
    rawcontent = f.read()
    content = json.loads(rawcontent)
    
  • A few examples of public APIs are (both used in our image lab):

    1. nominatim that gives you a bounding box of geolocation for a given location. Let’s see this for ‘Troy, NY’:

      url = "http://nominatim.openstreetmap.org/"\
            "search?q=%s&format=json&polygon_geojson=1&addressdetails=0"\
            %('Troy, NY')
      
    2. panaromio that returns the url for pictures of a given box of geolocations. The following is for ‘Troy, NY’, box obtained from the above call:

      url = "http://www.panoramio.com/map/get_panoramas.php?set=public&"\
            "from=0&to=5&minx=%s&miny=%s&maxx=%s&maxy=%s&size=medium&mapfilter=true" \
            %('-73.8517851','42.5684117','-73.5317851','42.8884117')
      
  • Many sources require authentication with an API key through the oauth2 authentication module. But, the overall method of access remains the same after authentication.

  • Once we understand the structure, we can write code to extract the information we want.

Dictionary Practice Problems

  1. Create a dictionary to store the favorite colors of the following individuals

    • Thomas prefers red
    • Ashok prefs green
    • Sandy prefers red
    • Alison prefers orange
    • Fei prefers green
    • Natasha prefs blue

    Then add some others of your own. Now, write code to change Fei’s preference to green and to remove Sandy’s preference from the dictionary.

  2. Using the dictionary from the first problem, write code to find which color is most commonly preferred. Use a second dictionary, one that associates strings (representing the colors) with the counts. Output the most common color. If there are ties, output all tied colors.

  3. Complete the fast, list solution to the movie counting problem based on sorting, as outlined at the start of the lecture notes.

  4. Use a dictionary to determine which last names are most common in the IMDB data we have provided. Count individual people not the movies they appear in. For example, 'Hanks, Tom' counts as one instance of the name 'Hanks" despite the fact that he is in many movies. Assume that the last name ends with the first ',' in the actual name. Start this problem by thinking about what the dictionary keys and values should be.

  5. Which two individuals have the most movies in common? To solve this you will need to start from the dictionary that associates each individual with the set of movies s/he is involved in. Then you will need double for loops.

Summary

  • Dictionaries of sets.
  • Dictionaries where the keys are numbers.
  • A variety of examples to extract information from the IMDB data set.
  • Dictionaries as database — storing attribute / value pairs.
  • Accessing information from public APIs