CSCI 4150: Introduction to Artificial Intelligence, Fall 2005

Assignment 7 information

Errata and clarifications

Files for this assignment

Additionas support code documentation

There are a few procedures in the support code that somehow didn't make it into the assignment handout.

Examples

Here are two transcripts of Scheme sessions that I ran to illustrate how you can run your code to learn transition probabilities, utilities, and rewards.

Webtester

Writeup details

For problem 5, you will learn a blackjack player two different ways and compare the results. This web page is a little long, but it's really that it has detailed instructions on how to learn a player and then asking your to describe the details of what you did and analyze the results.

A few general things:

Here are the parts for problem 5:

  1. (written) Describe how your calc-initial-state and calc-new-state procedures turn the game state into a reinforcement learning state. I am interested, not in the details of how you do the calculation, but in how the reinforcement learning states you use correspond to game states. Make sure you say how many states you used, and which are terminal states.

    Give a brief explanation why you transformed the game state to a reinforcement learning state this way.

  2. (electronic) In the first approach, you will first learn the world model, then learn the utilities, and finally evaluate how good your blackjack player is with those utilities. Here are the three steps you should follow.

    Please note that I want you to report the amount won/lost and the the total amount wagered for each of these three steps in your writeup, so make sure you record this information!

    1. Learn the model of the world (i.e., transition probabilities and average rewards) but do not learn utilities in this step. You can do this by using the non-learning-procedure (from a7example.scm) as the learning procedure. Note that there is more than one way you can do this part.

      I suggest you save the tables to a file after this step:

        (save-tables "a7p5b-model.scm")
      
      so that you can try different things in part B-2 without doing this step again.

    2. Now, learn the utilities for the nonterminal states using this model of the world. Before learning, turn off the model updates with:
            (define enable-table-updates #f)
      
      This will keep the transition probabilities and average rewards from changing while you are learning the utilities.

      Learn utilities by playing backjack with the following player:

        (define (td-player)
          (list "TD-player" 
      	 (create-exploring-rl-strategy R+ Ne)
      	 (create-td-learning alpha-fn)))
      
      You will need to decide upon values for R+ and Ne and what your alpha-fn function should be. You will also have to figure out when to stop learning.

      Save the tables after this step and upload this file to the webtester.

        (save-tables "a7p5b-utilities.scm")
      

    3. Evaluate the performance of your player by running 10,000 hands using a player:
        (define (utility-player)
          (list "Your name here" 
      	  basic-rl-strategy
      	  non-learning-procedure))
        
      Make sure that you have disabled the table updates (as in the previous step)

  3. (electronic) In the second approach, you will learn the world model and the utilities simultaneously and then evaluate the performance of a blackjack player using your learned utilities.

    Please note that I want you to report the amount won/lost and the the total amount wagered for each of these two steps in your writeup, so make sure you record this information!

    Here are the steps you should follow:

    1. After (re)initializing the tables (so they are initially empty), learn the model and utilities together by playing blackjack using the td-player show above in step B-2. You do not have to use the same R+, Ne, and alpha-fn that you used in step B-2.

      Make sure you have reenabled table updates if you disabled them.

      Save the tables after this step and upload this file to the webtester.

        (save-tables "a7p5c-utilities.scm")
      

    2. Evaluate the performance of this player in the same manner you did for part B-3. (Also, make sure you disable the table updates while evaluating.)

  4. (written) For this part, you will describe some of the details of what you did in parts B and C and do a little analysis on your results. Here are the things you should cover: