```recitation 24: the halting problem & garbage collection

------------------------------------------------------------------------
* is everything computable?  what's the halting problem?  (from sicp ex 4.15)

* can you define a procedure (halts? <program> <input>) that returns
#t if (<program> <input>) halts (returns a value) and #f if it results
in an error or an infinite loop?

(define (run-forever) (run-forever))
(define (halt-dummy x) 'done)
(define (run-forever-dummy x) (run-forever))

(run-forever)                ==> <infinite loop>
(halts? halt-dummy 1)        ==> #t
(halts? run-forever-dummy 1) ==> #f

ok, once halts? has been defined, let's do:

(define (try p)
(if (halts? p p)
(run-forever)
'halted))

* what happens when we evaluate (try halt-dummy)?
==>  infinite loop

* what happens when we evaluate (try run-forever-dummy)?
==>  halted

* what happens when we evaluate (try try)?
Well, we need to guess whether (try try) halts or not.
case 1: (halts try try) ==>  #t  (it halts)
so (try try) runs forever, a contradiction

case 2: (halts try try) ==>  #f  (it doesn't halt)
so (try try) halts, also a contradition

thus the procedure halts? must not exist, and cannot be written.

------------------------------------------------------------------------
* Not everything sitting in memory is useful.  Garbage is anything
that cannot have any influence on the future computation.  Consider
the following four procedures.  When applied, will Scheme (with
garbage collection) run out of memory or go into an infinite loop?

(define (foo)
(foo))        ==>  infinite loop

(define (foo)
(let ((x (cons 1 2)))
(foo)))     ==>  infinite loop

(define (foo z)
(let ((x (cons 1 2)))
(foo x)))   ==>  infinite loop

(define (foo z)
(let ((x (cons 1 z)))
(foo x)))   ==>  error!  out of memory

------------------------------------------------------------------------
* Memory consists of many cells, each cell stores a number and a tag.
The tag is N for number, P for pointer or E for NULL.  Using this
implementation, (cons A B) does this:
1. Look for the next FREE location
2. Store A at the-cars[i]
3. Store B at the-cdrs[i]
4. Return Pi, a pointer to the new cons cell.

* Draw the box-and-pointer diagram starting from P5:

0   1   2   3   4   5   6   7   8   9
the-cars  N3  N4  P0  N3  N5  P2  N2  N3  P1  N4
the-cdrs  E0  E0  P4  P5  P0  P6  P5  P3  P3  N5

so cells 1, 3, 7, 8, & 9 are garbage!

------------------------------------------------------------------------
* Garbage Collection Technique 1:  Reference Counting

1. Attach a counter to each pair in memory.
2. When a new pointer is connected to that pair, increment the counter.
3. When a pointer is removed, decrement the counter.
4. Any cell with 0 counter is garbage.

* Consider the following example, and draw the box-and-pointer
diagrams below, keeping a "reference counter" with each cell.

(define a (list 1 2 3))
(set-cdr! a 4)
(set-cdr! a a)
(set! a 1)

+ reference counting is fast and incremental
- reference counting can't handle cyclical data structures!
? reference counting requires ~50% extra memory
(1 integer per cons cell)

------------------------------------------------------------------------
* Garbage Collection Technique 2:  Stop and Copy

1. Split memory in half (working memory and copy memory).
2. When out of working memory, stop computation and begin garbage collection.
1. Place scan and free pointers at the start of the copy memory.
2. Copy the Root to copy memory, incrementing free.  For each cell
copied, put a "Broken Heart" token in the car and the forwarding
3. Starting at the scan pointer, check the car and the cdr. If
either is a pointer, look at the location in working
memory. If it's already been copied (i.e. it has a "Broken
Heart"), update the reference. Otherwise, copy the location
and put the "Broken Heart" in the old location.
4. Repeat until scan = free.
5. Swap the roles of the working and copy memory.

* let's perform stop-and-copy on the following with Root = P5

working memory
0   1   2   3   4   5   6   7   8   9
the-cars  N3  N4  P0  N3  N5  P2  N2  N3  P1  N4
the-cdrs  E0  E0  P4  P5  P0  P6  P5  P3  P3  N5

copy memory
10  11  12  13  14  15  16  17  18  19
the-cars
the-cdrs

working memory
0   1   2   3   4   5   6   7   8   9
the-cars  bh  N4  bh  N3  bh  bh  bh  N3  P1  N4
the-cdrs  P13 E0  P11 P5  P14 P10 P12 P3  P3  N5

copy memory
10  11  12  13  14  15  16  17  18  19
the-cars  P11 P13 N2  N3  N5
the-cdrs  P12 P14 P10 E0  P11

- stop-and-copy requires a long pause in program execution
+ stop-and-copy can handle cyclical data structures!
- stop-and-copy requires ~100% extra memory
(can only use half your memory)
+ stop-and-copy runs fast if most of your memory is garbage
(it only touches the cells reachable from the root)
+ data is clustered together and memory is "de-fragmented"

------------------------------------------------------------------------
* Garbage Collection Technique 3:  Mark-Sweep

1. Add a mark bit to each location in memory.
2. Keep a free pointer to the head of the free list.
3. When memory runs out, stop computation, clear the mark bits and
begin garbage collection.
4. Mark
1. Start at the Root and follow the accessible structure
(keeping a stack of where you still need to go).
2. Mark every cell you visit.
3. Stop when you see a marked cell, so you don't go into a cycle.
5. Sweep
1. Start at the end of memory, and build a new free list.
2. If a cell is unmarked, then it's garbage, so hook it into
the free list.

* let's perform Mark-Sweep on the following with Root =  P5

0   1   2   3   4   5   6   7   8   9  10
the-cars  N3  N4  P0  N3  N5  P2  N2  N3  P1  N4  P5
the-cdrs  E0  E0  P4  P5  P0  P6  P5  P3  P3  N5  N7
marks

0   1   2   3   4   5   6   7   8   9  10
the-cars  N3  E0  P0  E0  N5  P2  N2  E0  E0  E0  E0
the-cdrs  E0  P3  P4  P7  P0  P6  P5  P8  P9  P10 E0
marks      1   0   1   0   1   1   1   0   0   0   0

free list ==> P1 (which points to P3, P7, P8, P9 & P10)

- mark-sweep requires a long pause in program execution
+ mark-sweep can handle cyclical data structures!
+ mark-sweep requires ~1-2% extra memory
(just one bit per cons cell)
- mark-sweep runs the same speed regardless of how much of
memory is garbage (it must touch all cells in the mark-phase,
and must link together all garbage cells into a free list.

------------------------------------------------------------------------
```