; This is the data from the class survey. ; ; It also includes the data from last year's class, which took the ; same survey. Note that the first attribute in the dataset is the ; year the survey was taken. ; ; There were 14 questions, some multiple choice and some numerical ; answer. Please note that this is just a "data set" and is not in ; the proper format for "training data". This is discussed below. ; ; Below are the questions and possible answers from the survey. (I ; may not have the questions exactly the same here.) I did some ; editing of the data set as described in my comments below. ; ; color = Which (of the following) is your favorite color? ; Answers: red, blue, green, yellow, plaid ; ; Almost everyone chose one of the first three, so yellow and plaid ; got turned into missing attributes (i.e., ?) ; ; OS = Which operating system do you use? ; Answers: Linux, Windows, MacOS, Other ; ; IceCream = Which of the following is your favorite ice cream flavor? ; Answers: Vanilla, Chocolate, Strawberry ; ; Vote = Did you vote in the last student election? ; Answers: Yes, No ; ; Coffee = Do you drink coffee? ; Answers: Yes, No ; ; Browser = Which Web browser do you use? ; Answers: Mozilla, IE, Safari, Other ; ; Living = Where you you live? ; Answers: Dorm, Apartment, Fraternity, Other ; ; Milage = How many miles do you have on your car? ; Numeric answer (integers, range 0 to 223000) ; ; Presumably the people who answered 0 don't have a car. One ; person answered 1; I changed this to a 0. ; ; Pizza = How many slices of pizza did you eat last week? ; Numeric answer (integers, range 0 to 16) ; ; Sleep = What is the average number of hours of sleep you get on ; Sun-Thurs nights? ; Numeric answer (reals, range 4.0 to 10.0) ; ; This should have asked "per night", though most people ; (presumably) gave that number. I change a 40 to an 8. ; ; Bed = What time did you go to bed last night? (HH:MM) ; Numeric answer (reals, range 0.0 to 5.75) ; ; I converted the answer (in a time format) into the number of ; hours (a real number) past midnight. No one reported going to ; bed before midnight. ; ; CDs = How many audio CDs do you have, not including CD-Rs? ; Numeric answer (integers, range 0 to 1000) ; ; MP3s = How many mp3s do you have? ; Numeric answer (integers, range 0 to 10000) ; ; DiskSpace = How much free space (in megabytes) is on your hard drive? ; Numeric answer (integers, range 264 to 200000) ; ; I made a number of changes to these answers. Several people gave ; an answer in bytes or kilobytes. Or at least I don't think any ; students have a 1 petabyte drive on their computer. Any answer ; that was above 200,000 I took and divided by 1000 until it was ; below that amount. Conversely, some people seem to have very ; little space on their hard drives; probably they answered in ; gigabytes. I took answers of 50 and under and multiplied them by ; 1000. I also rounded answers to the nearest integer. ; ; ; See below for code and information on turning this into a training data set. ; (define class-survey-datanames '(year color OS IceCream Vote Coffee Browser Living Milage Pizza Sleep Bed CDs MP3s DiskSpace)) (define class-survey-data '((2004 Red Linux Chocolate No No Mozilla Dorm 0 2 6.0 2.50 30 875 746) (2004 Red Windows Chocolate No No Mozilla Apartment 0 10 6.0 2.50 3 2056 61440) (2004 Blue Windows Vanilla No No IE Fraternity 102345 0 7.0 3.50 12 2500 4300) (2004 Blue MacOS Chocolate No No Safari Apartment 91000 5 5.0 2.00 90 1250 24860) (2004 Green Linux Chocolate Yes No Mozilla Apartment 88000 4 7.0 2.50 20 2737 42670) (2004 Green Windows Vanilla No Yes Mozilla Dorm 120000 16 9.0 0.58 75 2981 4980) (2004 Blue Windows Chocolate No No Mozilla Dorm 160000 2 8.0 0.75 160 38 17400) (2004 Blue Windows Chocolate Yes No IE Apartment 35000 10 7.0 0.00 20 500 20275) (2004 Blue MacOS Vanilla Yes No Safari Apartment 5600 6 8.5 3.50 85 3300 20000) (2004 Blue Windows Vanilla No Yes IE Dorm 0 3 5.0 3.50 0 1000 20000) (2004 Blue Windows Chocolate No Yes IE Other 600 3 7.0 2.00 5 1000 30000) (2004 Green Windows Strawberry No No Mozilla Dorm 89885 0 8.0 3.50 25 225 13210) (2004 Blue Windows Strawberry No Yes Other Apartment 89630 4 6.0 2.00 7 0 10957) (2004 Red Windows Vanilla No Yes IE Apartment 47000 4 4.0 5.00 50 7000 110000) (2004 Green Windows Vanilla Yes No Mozilla Dorm 41000 12 6.0 4.00 30 2725 50000) (2004 Red Linux Vanilla Yes No Mozilla Dorm 0 4 6.0 4.00 30 10000 8100) (2004 Red Windows Chocolate Yes No IE Apartment 54453 3 8.0 0.00 2 1357 36100) (2004 Blue Windows Strawberry No No IE Apartment 68000 2 10.0 0.00 50 679 22600) (2004 Red Windows Vanilla No Yes Mozilla Apartment 0 0 7.0 1.00 1 1000 5816) (2004 Blue Windows Vanilla No No IE Apartment 150000 0 5.0 5.00 200 6000 200000) (2004 Red Windows Vanilla No Yes IE Apartment 147000 4 7.0 2.50 ? ? 34000) (2004 Blue Windows Chocolate No Yes Other Apartment 1000 6 6.0 5.00 200 4000 300) (2004 Blue Windows Chocolate No No IE Dorm 0 4 5.0 3.50 0 278 49357) (2004 Red Windows Chocolate No Yes Mozilla Dorm 33000 6 6.0 4.00 6 1000 2500) (2004 Blue Linux Vanilla No No Mozilla Other 189000 0 4.0 3.00 1000 12 1100) (2004 Blue Linux Chocolate No Yes Other Apartment 69000 10 5.0 5.75 100 900 20000) (2004 Blue Windows Vanilla No No IE Dorm 0 2 8.0 3.00 0 80 22100) (2004 Green MacOS Vanilla Yes No Mozilla Apartment 135347 4 5.0 3.00 130 7500 62000) (2004 ? MacOS Vanilla No No Other Dorm 0 5 6.0 2.00 14 449 53750) (2004 Green Windows Chocolate No Yes Mozilla Apartment 113000 15 7.0 0.50 200 1200 76000) (2004 Blue Windows Chocolate Yes No IE Dorm 0 5 6.0 5.00 20 2000 1400) (2004 Red Windows Chocolate Yes Yes IE Fraternity 0 4 7.0 4.17 80 3300 37400) (2004 Green Windows Strawberry No No IE Fraternity 50000 5 6.0 3.50 0 1700 6440) (2004 ? Windows Chocolate Yes Yes Mozilla Apartment 75000 4 6.0 2.00 120 1328 5090) (2004 Red Linux Chocolate Yes No Mozilla Dorm 89400 4 6.0 3.00 20 3300 36000) (2004 Blue Windows Vanilla Yes Yes Mozilla Fraternity 34500 0 10.0 3.00 10 4396 264) (2004 Red Windows Chocolate Yes No Mozilla Fraternity 10870 5 5.0 4.42 300 1000 60000) (2004 Green Windows Vanilla Yes No IE Apartment 105000 3 8.0 1.50 100 4052 100000) (2004 Red Windows Chocolate No Yes Other Apartment 80000 5 7.0 1.00 40 7500 100000) (2004 Blue Windows Vanilla No No IE Dorm 90000 5 6.0 0.00 1 0 6963) (2004 Green Windows Chocolate No No IE Apartment 0 0 6.0 1.00 20 2500 10200) (2004 Blue Linux Vanilla No Yes Other Other 223000 4 7.0 0.50 200 3200 37975) (2005 Red Linux Vanilla No No Mozilla Dorm 89000 5 6.5 2.5 35 4000 70000) (2005 Green Windows ? No No Mozilla Dorm 0 0 7 2.0 30 700 33900) (2005 ? Linux Strawberry Yes Yes Mozilla Dorm 0 8 7 2.5 82 9000 1700) (2005 ? Windows Strawberry No No Mozilla Apartment 0 2 4 4.0 7 1738 167) (2005 Blue Windows Vanilla No Yes Mozilla Dorm 90000 2 6.5 2.0 25 1800 25000) (2005 Blue Windows Chocolate No Yes IE Apartment 107000 2 6.3 -1.0 30 15000 30000) (2005 Blue Windows Chocolate No Yes IE Apartment 205000 8 8 2.5 35 14000 5000) (2005 Blue Windows Chocolate No No Mozilla Dorm 50 4 7 2.5 10 100 11366) (2005 Blue Linux Vanilla No Yes Mozilla Apartment 65000 4 7 4.0 30 4000 266) (2005 Blue Windows Vanilla No Yes Mozilla Apartment ? 11 6.5 5.3 3 12483 21811) (2005 Blue Windows Vanilla No Yes Mozilla Dorm 123000 4 7.5 0.5 26 3297 96200) (2005 Blue Windows Chocolate Yes No Mozilla Apartment 172000 2 8 3.0 100 14000 41964) (2005 Green Linux Chocolate Yes No Other Dorm 0 1 5.5 3.0 31 500 40000) (2005 Red Windows Vanilla Yes No Mozilla Apartment 193000 5 7 4.0 20 100 31085) (2005 Red Windows Vanilla Yes Yes IE Dorm 0 0 4 2.5 25 3000 2000) (2005 Green Windows Vanilla Yes Yes Mozilla Apartment 0 5 7 -0.5 25 10000 50000) (2005 Blue Linux Vanilla No No Mozilla Apartment 0 0 8 2.0 18 ? 13600) (2005 Blue Windows Chocolate Yes Yes Mozilla Dorm 0 2 7 0.0 15 900 3000) (2005 Blue Windows Chocolate No Yes Mozilla Dorm 26000 4 6 -6.5 350 100 512) (2005 Red MacOS Vanilla No No Other Apartment 97000 2 8 2.5 10 230 236300) (2005 Blue Linux Vanilla Yes No Mozilla Dorm 55000 5 8 2.0 50 11846 282624) (2005 Green Windows Chocolate No No Mozilla Fraternity 100000 10 8 4.5 10 10000 20000) (2005 Blue Windows Vanilla Yes Yes Mozilla Fraternity 98000 2 6 3.5 112 2093 120000) (2005 Blue Windows Vanilla Yes No IE Dorm 115000 2 8 -1.0 22 1650 6730) (2005 Green Windows Vanilla No No Mozilla Apartment 49000 1 6 1.5 40 10500 10342) (2005 Blue Windows Chocolate Yes Yes Mozilla Dorm 90000 0 6 4.25 20 1302 16400) (2005 Blue Windows Strawberry No Yes Mozilla Other 80000 10 6 4.0 30 200 3113) (2005 Green Windows Strawberry No No Other Dorm 0 5 4.4 5.5 5 ? 833) (2005 Blue Windows Vanilla No No IE Apartment 160000 0 6 3.0 50 1615 2273) (2005 Blue MacOS Vanilla No No Safari Apartment 30000 4 10 5.0 0 3500 8360) (2005 Red Linux Chocolate No Yes Mozilla Apartment 170000 3 7 4.0 40 2000 2400) (2005 Blue Windows Vanilla No No Other Dorm 0 10 8 5.0 4 800 400) (2005 Red Windows Chocolate No No Mozilla Dorm 0 1 6 4.0 1 230 6800))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; Here are a few procedures to turn the above data set into a training ; data set. ; ; basically, you have to select one of the attributes as the goal predicate ; ; For example, if you want to create a data set where "color" is the ; goal predicate, you could do the following: ; ; (define class-color-data ; (pick-goal-predicate class-survey-data class-survey-datanames 'color)) ; (define class-color-names (delete 'color class-survey-datanames)) ; (define (delete-nth lst n) (append (list-head lst n) (list-tail lst (+ n 1)))) (define (pick-goal-predicate data names attribute) (let* ((pos (- (length names) (length (member attribute names)))) (training-data (map (lambda (d) (list (list-ref d pos) (delete-nth d pos))) data))) ; remove any elements with ? as a goal predicate value ; ; btw, "delete-matching-items" is a build-in MIT Scheme procedure ; (not part of standard Scheme, however) (delete-matching-items training-data (lambda (td) (equal? (first td) '?))))) ; ; The last 7 attributes have (continuous) numeric values. If you want ; to try this data set before you have the missing-learn-dtree or the ; discretization stuff written, you can just eliminate those ; attributes and examples with missing values like this: ; ; (define class-survey-ddata ; (map (lambda (d) (list-head d 7)) class-survey-data)) ; (define class-survey-ddatanames ; (list-head class-survey-datanames 7)) ;