; This is the data from the class survey ; ; There were 14 questions, some multiple choice and some numerical ; answer. Please note that this is just a "data set" and is not in ; the proper format for "training data". This is discussed below. ; ; Below are the questions and possible answers from the survey. (I ; may not have the questions exactly the same here.) I did some ; editing of the data set as described in my comments below. ; ; color = Which (of the following) is your favorite color? ; Answers: red, blue, green, yellow, plaid ; ; Almost everyone chose one of the first three, so yellow and plaid ; got turned into missing attributes (i.e., ?) ; ; OS = Which operating system do you use? ; Answers: Linux, Windows, MacOS, Other ; ; IceCream = Which of the following is your favorite ice cream flavor? ; Answers: Vanilla, Chocolate, Strawberry ; ; Vote = Did you vote in the last student election? ; Answers: Yes, No ; ; Coffee = Do you drink coffee? ; Answers: Yes, No ; ; Browser = Which Web browser do you use? ; Answers: Mozilla, IE, Safari, Other ; ; Living = Where you you live? ; Answers: Dorm, Apartment, Fraternity, Other ; ; Milage = How many miles do you have on your car? ; Numeric answer (integers, range 0 to 223000) ; ; Presumably the people who answered 0 don't have a car. One ; person answered 1; I changed this to a 0. ; ; Pizza = How many slices of pizza did you eat last week? ; Numeric answer (integers, range 0 to 16) ; ; Sleep = What is the average number of hours of sleep you get on ; Sun-Thurs nights? ; Numeric answer (reals, range 4.0 to 10.0) ; ; This should have asked "per night", though most people ; (presumably) gave that number. I change a 40 to an 8. ; ; Bed = What time did you go to bed last night? (HH:MM) ; Numeric answer (reals, range 0.0 to 5.75) ; ; I converted the answer (in a time format) into the number of ; hours (a real number) past midnight. No one reported going to ; bed before midnight. ; ; CDs = How many audio CDs do you have, not including CD-Rs? ; Numeric answer (integers, range 0 to 1000) ; ; MP3s = How many mp3s do you have? ; Numeric answer (integers, range 0 to 10000) ; ; DiskSpace = How much free space (in megabytes) is on your hard drive? ; Numeric answer (integers, range 264 to 200000) ; ; I made a number of changes to these answers. Several people gave ; an answer in bytes or kilobytes. Or at least I don't think any ; students have a 1 petabyte drive on their computer. Any answer ; that was above 200,000 I took and divided by 1000 until it was ; below that amount. Conversely, some people seem to have very ; little space on their hard drives; probably they answered in ; gigabytes. I took answers of 50 and under and multiplied them by ; 1000. I also rounded answers to the nearest integer. ; ; ; See below for code and information on turning this into a training data set. ; (define class-survey-datanames '(color OS IceCream Vote Coffee Browser Living Milage Pizza Sleep Bed CDs MP3s DiskSpace)) (define class-survey-data '((Red Linux Chocolate No No Mozilla Dorm 0 2 6.0 2.50 30 875 746) (Red Windows Chocolate No No Mozilla Apartment 0 10 6.0 2.50 3 2056 61440) (Blue Windows Vanilla No No IE Fraternity 102345 0 7.0 3.50 12 2500 4300) (Blue MacOS Chocolate No No Safari Apartment 91000 5 5.0 2.00 90 1250 24860) (Green Linux Chocolate Yes No Mozilla Apartment 88000 4 7.0 2.50 20 2737 42670) (Green Windows Vanilla No Yes Mozilla Dorm 120000 16 9.0 0.58 75 2981 4980) (Blue Windows Chocolate No No Mozilla Dorm 160000 2 8.0 0.75 160 38 17400) (Blue Windows Chocolate Yes No IE Apartment 35000 10 7.0 0.00 20 500 20275) (Blue MacOS Vanilla Yes No Safari Apartment 5600 6 8.5 3.50 85 3300 20000) (Blue Windows Vanilla No Yes IE Dorm 0 3 5.0 3.50 0 1000 20000) (Blue Windows Chocolate No Yes IE Other 600 3 7.0 2.00 5 1000 30000) (Green Windows Strawberry No No Mozilla Dorm 89885 0 8.0 3.50 25 225 13210) (Blue Windows Strawberry No Yes Other Apartment 89630 4 6.0 2.00 7 0 10957) (Red Windows Vanilla No Yes IE Apartment 47000 4 4.0 5.00 50 7000 110000) (Green Windows Vanilla Yes No Mozilla Dorm 41000 12 6.0 4.00 30 2725 50000) (Red Linux Vanilla Yes No Mozilla Dorm 0 4 6.0 4.00 30 10000 8100) (Red Windows Chocolate Yes No IE Apartment 54453 3 8.0 0.00 2 1357 36100) (Blue Windows Strawberry No No IE Apartment 68000 2 10.0 0.00 50 679 22600) (Red Windows Vanilla No Yes Mozilla Apartment 0 0 7.0 1.00 1 1000 5816) (Blue Windows Vanilla No No IE Apartment 150000 0 5.0 5.00 200 6000 200000) (Red Windows Vanilla No Yes IE Apartment 147000 4 7.0 2.50 ? ? 34000) (Blue Windows Chocolate No Yes Other Apartment 1000 6 6.0 5.00 200 4000 300) (Blue Windows Chocolate No No IE Dorm 0 4 5.0 3.50 0 278 49357) (Red Windows Chocolate No Yes Mozilla Dorm 33000 6 6.0 4.00 6 1000 2500) (Blue Linux Vanilla No No Mozilla Other 189000 0 4.0 3.00 1000 12 1100) (Blue Linux Chocolate No Yes Other Apartment 69000 10 5.0 5.75 100 900 20000) (Blue Windows Vanilla No No IE Dorm 0 2 8.0 3.00 0 80 22100) (Green MacOS Vanilla Yes No Mozilla Apartment 135347 4 5.0 3.00 130 7500 62000) (? MacOS Vanilla No No Other Dorm 0 5 6.0 2.00 14 449 53750) (Green Windows Chocolate No Yes Mozilla Apartment 113000 15 7.0 0.50 200 1200 76000) (Blue Windows Chocolate Yes No IE Dorm 0 5 6.0 5.00 20 2000 1400) (Red Windows Chocolate Yes Yes IE Fraternity 0 4 7.0 4.17 80 3300 37400) (Green Windows Strawberry No No IE Fraternity 50000 5 6.0 3.50 0 1700 6440) (? Windows Chocolate Yes Yes Mozilla Apartment 75000 4 6.0 2.00 120 1328 5090) (Red Linux Chocolate Yes No Mozilla Dorm 89400 4 6.0 3.00 20 3300 36000) (Blue Windows Vanilla Yes Yes Mozilla Fraternity 34500 0 10.0 3.00 10 4396 264) (Red Windows Chocolate Yes No Mozilla Fraternity 10870 5 5.0 4.42 300 1000 60000) (Green Windows Vanilla Yes No IE Apartment 105000 3 8.0 1.50 100 4052 100000) (Red Windows Chocolate No Yes Other Apartment 80000 5 7.0 1.00 40 7500 100000) (Blue Windows Vanilla No No IE Dorm 90000 5 6.0 0.00 1 0 6963) (Green Windows Chocolate No No IE Apartment 0 0 6.0 1.00 20 2500 10200) (Blue Linux Vanilla No Yes Other Other 223000 4 7.0 0.50 200 3200 37975))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; Here are a few procedures to turn the above data set into a training ; data set. ; ; basically, you have to select one of the attributes as the goal predicate ; ; For example, if you want to create a data set where "color" is the ; goal predicate, you could do the following: ; ; (define class-color-data ; (pick-goal-predicate class-survey-data class-survey-datanames 'color)) ; (define class-color-names (delete 'color class-survey-datanames)) ; (define (delete-nth lst n) (append (list-head lst n) (list-tail lst (+ n 1)))) (define (pick-goal-predicate data names attribute) (let* ((pos (- (length names) (length (member attribute names)))) (training-data (map (lambda (d) (list (list-ref d pos) (delete-nth d pos))) data))) ; remove any elements with ? as a goal predicate value ; ; btw, "delete-matching-items" is a build-in MIT Scheme procedure ; (not part of standard Scheme, however) (delete-matching-items training-data (lambda (td) (equal? (first td) '?))))) ; ; The last 7 attributes have (continuous) numeric values. If you want ; to try this data set before you have the missing-learn-dtree or the ; discretization stuff written, you can just eliminate those ; attributes and examples with missing values like this: ; ; (define class-survey-ddata ; (map (lambda (d) (list-head d 7)) class-survey-data)) ; (define class-survey-ddatanames ; (list-head class-survey-datanames 7)) ;