Main

Contingency Table Analysis

Download the datafile adult.data.txt. The description of the data, and its attributes, is available at UCI Machine Learning Repository: Adult Dataset. This is a selection of the Census data from 1994, and it has 48842 instances over 14 categorial, real and integer attributes.

Compute the contingency matrix for variables education and race, and compute the \(\chi^2\) statistic using your own function, i.e., write a function that takes as input two categorical column-vectors, and returns the \(\chi^2\) value and its p-value. At the 99% confidence level, are education and race dependent?