Lecture 17 — Dictionaries, Part 2

Overview

  • More IMDB examples:
    • Dictionaries of string/set pairs
    • Converting dictionaries with one key to another
    • Combining information from multiple dictionaries
  • A different view of dictionaries: storing attribute/value pairs.
  • Twitter example.

Recap of dictionaries

  • Dictionaries are lists, except, you can have anything for indices (keys), not just numbers starting with 0.

  • The following two are equal:

    >>> listoption = ['a','b','c']
    >>> dictoption = {0:'a', 1: 'b', 2:'c'}
    
  • You would access them in the same way:

    >>> listoption[1]
    'b'
    >>> dictoption[1]
    'b'
    
  • You would update them in the same way:

    >>> listoption[1] = 'd'
    >>> dictoption[1] = 'd'
    
  • Of course the point of a dictionary is that keys can be anything:

    >>> d = {'Gru':3, 'Margo':4}
    >>> d['Gru']
    3
    
  • This dictionary has strings as keys and integer as values. The values can be anything as well:

    >>> d2 = {'Gru': set( [123,456] ), 'Margo': set( [456] ) }
    >>> d2['Gru']
    set([123, 456])
    
  • Note that since keys can be anything, to print or iterate through the values in a dictionary, you need something other than range. Well, it helps that keys() returns all the keys in a dictionary:

    >>> d2.keys()
    ['Gru', 'Margo']
    >>> for key in d2.keys():
    …        print key, d2[key]
    
    Gru set([123, 456])
    Margo set([456])
    
  • These are really the most common operations on dictionaries: put a value, read a value, iterate through keys to get values.

  • There is a function to convert a dictionary to a list, by throwing away the indices. For example:

    >>> d2.values()
    [set([123, 456]), set([456])]
    

Exercises

  1. Given the following dictionary for hobbies for people:

    hobby = {'Gru':set(['Hiking','Cooking']), 'Edith':set(['Hiking','Board Games'])}
    

    creates a new dictionary that lists people for each hobby:

    {'Hiking': set(['Vector','Edith']), 'Cooking':set(['Vector']), 'Board Games':set(['Edith'])}
    
  2. Write a program that uses a dictionary that associates integers (the key) and sets strings (the values) to find the number of movies in each year of the IMDB. Start from

    imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
    years_and_movies = {}
    for line in open(imdb_file):
        words = line.strip().split('|')
        movie_name = words[1].strip()
        year = int(words[2])
    
  3. Write additional code that uses the years_and_movies dictionary to find the year that has the most movies.

Dictionary for lookup

  • Let us now use two different dictionaries to go between pieces of information
    • Movies: key: movies, value: sets of actors in the movie
    • Actors: key: actors, value: sets of movies that the actor has
  • Given an actor:
    • Find all the movies she has starred in
    • For each movie, find the actors in such movies
  • Play the degree of Kevin Bacon game! If actor is Kevin Bacon, then these are actors with Kevin Bacon degree of 1.

Attribute / Value Pairs

  • We can use dictionaries to construct even more complicated data structures: dictionaries as values, lists of dictionaries, etc.
  • Consider the problem of representing all the houses a real estate company is trying to sell.
  • We could keep a list with information about each property, but a list of what?
  • We will look at describing each house as a dictionary, with the keys being the “attributes”, and the values being, well, the values of the attributes.
  • Examples include the listing reference number, the address, the number of bedrooms, the price, whether or not it has a pool, the style of the house, the age, etc.
    • Some properties will not be known and therefore they will not be represented in the dictionary.
  • We will work through a made-up example in class, producing a list of dictionaries. This list will be called houses.
  • As an exercise, write code that finds all houses in our house list that have at least 4 bedrooms (attribute is bedrooms, value is an integer), a pool (attribute is pool, value a string describing if the pool is above ground or below), for a price below $300,000 (atttribute is price, value is an int).
  • Overall, this a simple Python implementation of the storage and access of information in a database.

Important Aside: Back to Copying and Aliasing

  • Exercise: what is the output of the following?

    >>> d = dict()
    >>> d[15] = 'hi'
    >>> L = []
    >>> L.append(d)
    >>> d[20] = 'bye'
    >>> L.append(d.copy())
    >>> d[15] = 'hello'
    >>> del d[20]
    >>> L
    
  • The result may surprise you, but it reflects the difference between making an alias to an object and making a full copy of an object.

    • An alias is also sometimes known as a shallow copy
    • A full copy is also sometimes known as a deep copy
  • Assignment between lists, between sets, and between dictionaries all involve shallow copies!

Accessing APIs

  • Many APIs (Application Programming Interfaces) return values as JSON strings which are actually easily loaded into Python objects.

  • We will demo accessing Twitter through an API and processing the returned JSON object.

  • Accessing Twitter requires two modules:

    • oauth2 is an open-source secure authorization software used by Twitter (and many others)
    • simplejson is a Python tool to parse Javascript Object Notation (JSON) returned by Twitter.
  • Querying Twitter:

    • Set up through oauth2 based on keys and secrets previously obtained.
    • Search terms for the query are embedded in the URL
  • A pair is returned, containing a dictionary of information about the process of generating the query result, and a string containing the query result itself.

  • simplejson is used to parse this query string into a dictionary:

    • Two entries:

      -  ``search_metadata``, which has a dictionary of attributes
      • statuses, which has is a list of the actual tweets
    • Each tweet in the list is a dictionary; one of the entries in this dictionary is the actual text.

  • Overall, this is a complicated hierarchy of lists and dictionaries, with each dictionary storing attribute/value pairs.

    • We will diagram it in class.
  • Once we understand the structure, we can write code to extract the information we want.

Dictionary Practice Problems

  1. Create a dictionary to store the favorite colors of the following individuals

    • Thomas prefers red
    • Ashok prefs green
    • Sandy prefer red
    • Alison prefers orange
    • Fei prefers green
    • Natasha prefs blue

    Then add some others of your own. Now, write code to change Fei’s preference to green and to remove Sandy’s preference from the dictionary.

  2. Using the dictionary from the first problem, write code to find which color is most commonly preferred. Use a second dictionary, one that associates strings (representing the colors) with the counts. Output the most common color. If there are ties, output all tied colors.

  3. Complete the fast, list solution to the movie counting problem based on sorting, as outlined at the start of the lecture notes.

  4. Use a dictionary to determine which last names are most common in the IMDB data we have provided. Count individual people not the movies they appear in. For example, 'Hanks, Tom' counts as one instance of the name 'Hanks" despite the fact that he is in many movies. Assume that the last name ends with the first ',' in the actual name. Start this problem by thinking about what the dictionary keys and values should be.

  5. Which two individuals have the most movies in common? To solve this you will need to start from the dictionary that associates each individual with the set of movies s/he is involved in. Then you will need double for loops.

Summary

  • Dictionaries of sets.
  • Dictionaries where the keys are numbers.
  • A variety of examples to extract information from the IMDB data set.
  • Dictionaries as database — storing attribute / value pairs.
  • Accessing Twitter information