recitation 11: association lists & hash tables

----------------------------------------------------------
association list: (alist)  List of 2 element lists (key value)

(define (find-assoc key alist)
   (cond
    ((null? alist) #f)
    ((equal? key (caar alist)) (cadar alist))
    (else (find-assoc key (cdr alist)))))

(define (add-assoc key val alist)
  (cons (list key val) alist))

(define *phone_book*
  (list (list 'alice 6175551212)
	(list 'bob   6175555237)
	(list 'carol 6175556824)))

* write an expression to get bob's number

  (find-assoc 'bob *phone_book*) ==> 6175555237

* what happens if you look up dan's number?

  (find-assoc 'dan *phone_book*) ==> #f

* write code to add erin's number

  (define *phone_book* (add-assoc 'erin 6175554321 *phone_book*))

* if there are 50,000 phone numbers in *phone-book*, how
  long does it take to look up a person's number, on average?

  O(n)

* write code to add bob's new number

  (define *phone_book* (add-assoc 'bob 6175559999 *phone_book*))

* what happens to his old number?

  it's still there!

* write code to return the list of all the numbers ever stored 
  for a particular person, in chronological order

  (map (lambda (x) (cadr x)) (filter (lambda (x) (eq? (car x) 'bob)) *phone_book*)) 
    ==>  (6175559999 6175555237)

NOTE: redefining at the top level is 'sort-of' mutation, but we'll
learn REAL mutation on Tuesday & Wednesday.  

----------------------------------------------------------
vectors: (a.k.a. arrays, "efficient lists")

vectors are fixed length, retrieval in O(1) (it's O(n) for lists)

  (make-vector k)             k is length of vector, initialize entries to #f
  (vector-ref vector k)       returns the kth element of vector
  (vector-set! vector k obj)  mutates the kth element of the vector to point to obj

Convention: procedures that end in ! (pronounced "bang!") are mutators

(define *students* (make-vector 5))
*students*  ==>  #(#f #f #f #f #f)

(vector-set! *students* 0 'bob)
(vector-set! *students* 3 'alice)

*students*  ==>  #(bob #f #f alice #f)

(vector-ref *students* 3)  ==>  alice

* can you re-implement (ignoring efficiency) make-vector & vector-ref using cons? 

  yes, just make a list will all null entries and then use cdr to find
  the slot you want to change and set-car! (which we'll see again next
  week) to change the value.

----------------------------------------------------------
table data type implementations from lecture
fill in the order of growth in time for each operation below:

                    version 1, implemented            version 2, implemented 
                    with association lists         with vectors & hash functions
(make-empty-table)           O(1)                              O(n)
(put! key value)             O(1)                              O(1)
(get key)                    O(n)                              O(1)

NOTE: other than the efficiency of operations, all the functions have
the same type and we can use a table without knowing the implementation

----------------------------------------------------------
What's a hash table?  
  * table implementation with constant time access
  * implemented with a vector at the top level
  * a key is mapped to an slot in the vector by a hash function

What's a hash function?
  * simple function of one argument (the key) which returns an index 
    (a bucket or slot in the vector).  Ideally will "uniformly" distribute
    the keys throughout the range of legal index values (0 -> k-1).

What's a collision?  How do we deal with collisions?
  * when the hash function maps multiple keys to the same index
  * we resolve this by storing an association list at each slot in the vector

----------------------------------------------------------
hash table application: caller id

Given *phone-book* (an association list with 50,000 name/number pairings) we want 
to lookup the name matching a particular phone number.  Name lookup should be O(1) 
time expected, and the caller id system should use O(n) memory (n = 50,000).

(define *phone-book*
  (list (list 'alice 6175551212)
	(list 'bob   6175555237)
	(list 'carol 6175556824)))

(define *caller-id-length* 50000)
(define *caller-id-hash-table* (make-caller-id-hash-table *phone-book*))
(lookup-name 6175555237 *caller-id-hash-table*)   ==>   bob

* What's a good hash function for this application?
  What's a bad hash function for this application?
  Write (phone-number-hash-function number)

bad: always return 0, or return the area code, etc.
  these hash functions will result in lots of collisions

good: the last few digits (the "low order bits")
  we assume for phone numbers that these digits are essentially random
  and will result in very few collisions.
  
(define (phone-number-hash-function number) 
  (remainder number *caller-id-length*))

* Write (make-caller-id-hash-table phone-book)

(define (make-caller-id-hash-table phone-book)
  (make-caller-id-hash-table-helper phone-book (make-vector *caller-id-length*)))
  
(define (make-caller-id-hash-table-helper phone-book hash-table)
  (if (null? phone-book)
      hash-table
      (let* ((name (caar phone-book))
	     (number (cadar phone-book))
	     (index (phone-number-hash-function number)))
	(vector-set! hash-table index (add-assoc number name (vector-ref hash-table index)))
	(make-caller-id-hash-table-helper (cdr phone-book) hash-table))))

* Write (lookup-name number hash_table)

(define (lookup-name number caller-id)
  (find-assoc number (vector-ref caller-id (phone-number-hash-function number))))