(define print-debug-level 3)
;Value: print-debug-level

(run-learner threefour 1)
Epoch 1: 

0->south
0->north
4->west
4->west
0->south
0->south
0->east
4->south
0->east
1->south
1->east
2->west
1->south
1->north
1->west
0->east
0->east
1->west
0->south
0->north
4->east
4->east
7->east
8->south
8->south
8->east
9->south
5->south
2->west
1->north
1->west
0->west
0->north
4->east
4->north
4->west
4->east
4->north
4->east
4->south
0->east
1->east
2->north
5->north
9->east
Reached terminal state 10 in 45 steps; reward is 1.

state       north      south       east       west
    0:       -.03      -.032      -.039       -.02
    1:      -.033      -.026      -.026       -.03
    2:       -.02         0.         0.      -.033
    3:         0.         0.         0.         0.
    4:      -.033      -.033      -.039      -.034
    5:       -.02       -.02         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:         0.         0.       -.02         0.
    8:         0.      -.026       -.02         0.
    9:         0.       -.02        .48         0.
   10:         1.         1.         1.         1.

state       north      south       east       west
    0:       -.03      -.032      -.039       -.02
    1:      -.033      -.026      -.026       -.03
    2:       -.02         0.         0.      -.033
    3:         0.         0.         0.         0.
    4:      -.033      -.033      -.039      -.034
    5:       -.02       -.02         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:         0.         0.       -.02         0.
    8:         0.      -.026       -.02         0.
    9:         0.       -.02        .48         0.
   10:         1.         1.         1.         1.

Policy:

State  #: action   (Qmax)
-----------------------
State  0:   west   (-.02)
State  1:  south   (-.026)
State  2:  south   (0.)
State  3:  north   (0.)
State  4:  north   (-.033)
State  5:   east   (0.)
State  6:  north   (-1.)
State  7:  north   (0.)
State  8:  north   (0.)
State  9:   east   (.48)
State 10:  north   (1.)
;No value

(n 'print)
state       north      south       east       west
    0:         3.         4.         5.         1.
    1:         2.         2.         2.         3.
    2:         1.         0.         0.         2.
    3:         0.         0.         0.         0.
    4:         2.         2.         5.         3.
    5:         1.         1.         0.         0.
    6:         0.         0.         0.         0.
    7:         0.         0.         1.         0.
    8:         0.         2.         1.         0.
    9:         0.         1.         1.         0.
   10:         0.         0.         0.         0.
;No value


(define print-debug-level 2)
;Value: print-debug-level

(run-learner threefour 3)
Epoch 1: Reached terminal state 6 in 40 steps; reward is -1.

state       north      south       east       west
    0:       -.02      -.033      -.033      -.043
    1:       -.03      -.032       -.02      -.026
    2:       -.02         0.         0.         0.
    3:         0.         0.         0.         0.
    4:       -.02      -.034       -.04       -.03
    5:       -.52         0.         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:         0.      -.026         0.         0.
    8:         0.         0.         0.         0.
    9:         0.         0.         0.         0.
   10:         1.         1.         1.         1.
Epoch 2: Reached terminal state 10 in 26 steps; reward is 1.

state       north      south       east       west
    0:      -.043      -.033      -.033      -.043
    1:       -.03      -.032       -.02      -.026
    2:       -.02         0.         0.         0.
    3:         0.         0.         0.         0.
    4:      -.042      -.048      -.051      -.043
    5:       -.52         0.         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:         0.      -.038      -.026       -.02
    8:       -.02      -.033      -.026       -.02
    9:       -.02       .306         0.       -.02
   10:         1.         1.         1.         1.
Epoch 3: Reached terminal state 10 in 11 steps; reward is 1.

state       north      south       east       west
    0:      -.055      -.033      -.033      -.043
    1:       -.03      -.032       -.02      -.026
    2:       -.02         0.         0.         0.
    3:         0.         0.         0.         0.
    4:      -.042      -.048      -.051      -.043
    5:       -.52         0.         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:       -.03      -.043      -.026      -.026
    8:       -.02       -.04       .046      -.035
    9:       -.02       .306        .48       -.02
   10:         1.         1.         1.         1.

state       north      south       east       west
    0:      -.055      -.033      -.033      -.043
    1:       -.03      -.032       -.02      -.026
    2:       -.02         0.         0.         0.
    3:         0.         0.         0.         0.
    4:      -.042      -.048      -.051      -.043
    5:       -.52         0.         0.         0.
    6:        -1.        -1.        -1.        -1.
    7:       -.03      -.043      -.026      -.026
    8:       -.02       -.04       .046      -.035
    9:       -.02       .306        .48       -.02
   10:         1.         1.         1.         1.

Policy:

State  #: action   (Qmax)
-----------------------
State  0:  south   (-.033)
State  1:   east   (-.02)
State  2:  south   (0.)
State  3:  north   (0.)
State  4:  north   (-.042)
State  5:  south   (0.)
State  6:  north   (-1.)
State  7:   east   (-.026)
State  8:   east   (.046)
State  9:   east   (.48)
State 10:  north   (1.)
;No value

(define print-debug-level 1)
;Value: print-debug-level

(run-learner threefour 10)
Epoch 1: Reached terminal state 6 in 7 steps; reward is -1.
Epoch 2: Reached terminal state 10 in 114 steps; reward is 1.
Epoch 3: Reached terminal state 6 in 12 steps; reward is -1.
Epoch 4: Reached terminal state 6 in 14 steps; reward is -1.
Epoch 5: Reached terminal state 6 in 14 steps; reward is -1.
Epoch 6: Reached terminal state 10 in 10 steps; reward is 1.
Epoch 7: Reached terminal state 6 in 12 steps; reward is -1.
Epoch 8: Reached terminal state 10 in 9 steps; reward is 1.
Epoch 9: Reached terminal state 6 in 6 steps; reward is -1.
Epoch 10: Reached terminal state 6 in 10 steps; reward is -1.

state       north      south       east       west
    0:      -.066      -.066      -.065      -.064
    1:      -.052      -.052      -.053      -.052
    2:      -.038      -.041      -.038      -.053
    3:       -.78         0.         0.       -.02
    4:      -.053       -.06       -.06      -.057
    5:      -.203      -.045       -.78      -.032
    6:        -1.        -1.        -1.        -1.
    7:      -.048      -.048       .033      -.048
    8:      -.049      -.036       .287      -.047
    9:       .119      -.013       .568       .079
   10:         1.         1.         1.         1.

Policy:

State  #: action   (Qmax)
-----------------------
State  0:   west   (-.064)
State  1:  north   (-.052)
State  2:   east   (-.038)
State  3:  south   (0.)
State  4:  north   (-.053)
State  5:   west   (-.032)
State  6:  north   (-1.)
State  7:   east   (.033)
State  8:   east   (.287)
State  9:   east   (.568)
State 10:  north   (1.)
;No value

(define print-debug-level 0)
;Value: print-debug-level

(run-epochs 990)
Epoch 25: 
Epoch 50: 
Epoch 75: 
Epoch 100: 
Epoch 125: 
Epoch 150: 
Epoch 175: 
Epoch 200: 
Epoch 225: 
Epoch 250: 
Epoch 275: 
Epoch 300: 
Epoch 325: 
Epoch 350: 
Epoch 375: 
Epoch 400: 
Epoch 425: 
Epoch 450: 
Epoch 475: 
Epoch 500: 
Epoch 525: 
Epoch 550: 
Epoch 575: 
Epoch 600: 
Epoch 625: 
Epoch 650: 
Epoch 675: 
Epoch 700: 
Epoch 725: 
Epoch 750: 
Epoch 775: 
Epoch 800: 
Epoch 825: 
Epoch 850: 
Epoch 875: 
Epoch 900: 
Epoch 925: 
Epoch 950: 
Epoch 975: 

state       north      south       east       west
    0:       .257      -.072      -.074      -.074
    1:       -.07       -.07       -.07       .108
    2:       .004      -.051      -.049      -.067
    3:        -.7      -.148      -.157      -.382
    4:       .454       -.06       -.06      -.057
    5:      -.064      -.222      -.866       .095
    6:        -1.        -1.        -1.        -1.
    7:      -.048      -.048       .614      -.048
    8:      -.049      -.036       .757      -.047
    9:       .255      -.013        .84       .128
   10:         1.         1.         1.         1.

Policy:

State  #: action   (Qmax)
-----------------------
State  0:  north   (.257)
State  1:   west   (.108)
State  2:  north   (.004)
State  3:  south   (-.148)
State  4:  north   (.454)
State  5:   west   (.095)
State  6:  north   (-1.)
State  7:   east   (.614)
State  8:   east   (.757)
State  9:   east   (.84)
State 10:  north   (1.)
;No value