(define print-debug-level 3) ;Value: print-debug-level (run-learner threefour 1) Epoch 1: 0->south 0->north 4->west 4->west 0->south 0->south 0->east 4->south 0->east 1->south 1->east 2->west 1->south 1->north 1->west 0->east 0->east 1->west 0->south 0->north 4->east 4->east 7->east 8->south 8->south 8->east 9->south 5->south 2->west 1->north 1->west 0->west 0->north 4->east 4->north 4->west 4->east 4->north 4->east 4->south 0->east 1->east 2->north 5->north 9->east Reached terminal state 10 in 45 steps; reward is 1. state north south east west 0: -.03 -.032 -.039 -.02 1: -.033 -.026 -.026 -.03 2: -.02 0. 0. -.033 3: 0. 0. 0. 0. 4: -.033 -.033 -.039 -.034 5: -.02 -.02 0. 0. 6: -1. -1. -1. -1. 7: 0. 0. -.02 0. 8: 0. -.026 -.02 0. 9: 0. -.02 .48 0. 10: 1. 1. 1. 1. state north south east west 0: -.03 -.032 -.039 -.02 1: -.033 -.026 -.026 -.03 2: -.02 0. 0. -.033 3: 0. 0. 0. 0. 4: -.033 -.033 -.039 -.034 5: -.02 -.02 0. 0. 6: -1. -1. -1. -1. 7: 0. 0. -.02 0. 8: 0. -.026 -.02 0. 9: 0. -.02 .48 0. 10: 1. 1. 1. 1. Policy: State #: action (Qmax) ----------------------- State 0: west (-.02) State 1: south (-.026) State 2: south (0.) State 3: north (0.) State 4: north (-.033) State 5: east (0.) State 6: north (-1.) State 7: north (0.) State 8: north (0.) State 9: east (.48) State 10: north (1.) ;No value (n 'print) state north south east west 0: 3. 4. 5. 1. 1: 2. 2. 2. 3. 2: 1. 0. 0. 2. 3: 0. 0. 0. 0. 4: 2. 2. 5. 3. 5: 1. 1. 0. 0. 6: 0. 0. 0. 0. 7: 0. 0. 1. 0. 8: 0. 2. 1. 0. 9: 0. 1. 1. 0. 10: 0. 0. 0. 0. ;No value (define print-debug-level 2) ;Value: print-debug-level (run-learner threefour 3) Epoch 1: Reached terminal state 6 in 40 steps; reward is -1. state north south east west 0: -.02 -.033 -.033 -.043 1: -.03 -.032 -.02 -.026 2: -.02 0. 0. 0. 3: 0. 0. 0. 0. 4: -.02 -.034 -.04 -.03 5: -.52 0. 0. 0. 6: -1. -1. -1. -1. 7: 0. -.026 0. 0. 8: 0. 0. 0. 0. 9: 0. 0. 0. 0. 10: 1. 1. 1. 1. Epoch 2: Reached terminal state 10 in 26 steps; reward is 1. state north south east west 0: -.043 -.033 -.033 -.043 1: -.03 -.032 -.02 -.026 2: -.02 0. 0. 0. 3: 0. 0. 0. 0. 4: -.042 -.048 -.051 -.043 5: -.52 0. 0. 0. 6: -1. -1. -1. -1. 7: 0. -.038 -.026 -.02 8: -.02 -.033 -.026 -.02 9: -.02 .306 0. -.02 10: 1. 1. 1. 1. Epoch 3: Reached terminal state 10 in 11 steps; reward is 1. state north south east west 0: -.055 -.033 -.033 -.043 1: -.03 -.032 -.02 -.026 2: -.02 0. 0. 0. 3: 0. 0. 0. 0. 4: -.042 -.048 -.051 -.043 5: -.52 0. 0. 0. 6: -1. -1. -1. -1. 7: -.03 -.043 -.026 -.026 8: -.02 -.04 .046 -.035 9: -.02 .306 .48 -.02 10: 1. 1. 1. 1. state north south east west 0: -.055 -.033 -.033 -.043 1: -.03 -.032 -.02 -.026 2: -.02 0. 0. 0. 3: 0. 0. 0. 0. 4: -.042 -.048 -.051 -.043 5: -.52 0. 0. 0. 6: -1. -1. -1. -1. 7: -.03 -.043 -.026 -.026 8: -.02 -.04 .046 -.035 9: -.02 .306 .48 -.02 10: 1. 1. 1. 1. Policy: State #: action (Qmax) ----------------------- State 0: south (-.033) State 1: east (-.02) State 2: south (0.) State 3: north (0.) State 4: north (-.042) State 5: south (0.) State 6: north (-1.) State 7: east (-.026) State 8: east (.046) State 9: east (.48) State 10: north (1.) ;No value (define print-debug-level 1) ;Value: print-debug-level (run-learner threefour 10) Epoch 1: Reached terminal state 6 in 7 steps; reward is -1. Epoch 2: Reached terminal state 10 in 114 steps; reward is 1. Epoch 3: Reached terminal state 6 in 12 steps; reward is -1. Epoch 4: Reached terminal state 6 in 14 steps; reward is -1. Epoch 5: Reached terminal state 6 in 14 steps; reward is -1. Epoch 6: Reached terminal state 10 in 10 steps; reward is 1. Epoch 7: Reached terminal state 6 in 12 steps; reward is -1. Epoch 8: Reached terminal state 10 in 9 steps; reward is 1. Epoch 9: Reached terminal state 6 in 6 steps; reward is -1. Epoch 10: Reached terminal state 6 in 10 steps; reward is -1. state north south east west 0: -.066 -.066 -.065 -.064 1: -.052 -.052 -.053 -.052 2: -.038 -.041 -.038 -.053 3: -.78 0. 0. -.02 4: -.053 -.06 -.06 -.057 5: -.203 -.045 -.78 -.032 6: -1. -1. -1. -1. 7: -.048 -.048 .033 -.048 8: -.049 -.036 .287 -.047 9: .119 -.013 .568 .079 10: 1. 1. 1. 1. Policy: State #: action (Qmax) ----------------------- State 0: west (-.064) State 1: north (-.052) State 2: east (-.038) State 3: south (0.) State 4: north (-.053) State 5: west (-.032) State 6: north (-1.) State 7: east (.033) State 8: east (.287) State 9: east (.568) State 10: north (1.) ;No value (define print-debug-level 0) ;Value: print-debug-level (run-epochs 990) Epoch 25: Epoch 50: Epoch 75: Epoch 100: Epoch 125: Epoch 150: Epoch 175: Epoch 200: Epoch 225: Epoch 250: Epoch 275: Epoch 300: Epoch 325: Epoch 350: Epoch 375: Epoch 400: Epoch 425: Epoch 450: Epoch 475: Epoch 500: Epoch 525: Epoch 550: Epoch 575: Epoch 600: Epoch 625: Epoch 650: Epoch 675: Epoch 700: Epoch 725: Epoch 750: Epoch 775: Epoch 800: Epoch 825: Epoch 850: Epoch 875: Epoch 900: Epoch 925: Epoch 950: Epoch 975: state north south east west 0: .257 -.072 -.074 -.074 1: -.07 -.07 -.07 .108 2: .004 -.051 -.049 -.067 3: -.7 -.148 -.157 -.382 4: .454 -.06 -.06 -.057 5: -.064 -.222 -.866 .095 6: -1. -1. -1. -1. 7: -.048 -.048 .614 -.048 8: -.049 -.036 .757 -.047 9: .255 -.013 .84 .128 10: 1. 1. 1. 1. Policy: State #: action (Qmax) ----------------------- State 0: north (.257) State 1: west (.108) State 2: north (.004) State 3: south (-.148) State 4: north (.454) State 5: west (.095) State 6: north (-1.) State 7: east (.614) State 8: east (.757) State 9: east (.84) State 10: north (1.) ;No value