Lesson 11.3  Contingency Tables and ChiSquare Test of Independence
Unit Summary 

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, chapter 10.6.
How to Summarize Two Categorical Variables
So far in our course, we only discuss measurements taken in one variable for each sampling unit, which is call univariate. In this lesson, we are going to talk about measurements taken in two variables for each sampling unit. This is referred to as bivariate data. We will talk about bivariate categorical data in this lesson and bivariate quantitative data in Lesson 12.
We will first start with univariate count (categorical) data.
Univariate Count Data: Only one categorical variable and the data is presented by a tally.
There are three daily shifts in a production plant for tires. The tires will be classified as being produced in shift 1, shift 2, or shift 3.
Shift 1 Shift 2 Shift 3 Count 231 153 116
Bivariate Count Data: Measurements are taken on two categorical variables and the data can be summarized by a twoway table, also called a r x c contingency table, where r = number of rows, c = number of columns. One common question of interest is "Are the two variables independent?"
Contingency Tables: Chisquare Test of Independence
How to test the independence of two categorical variables? It will be by Chisquare test of independence.
Null Hypothesis: The two categorical variables are independent.
Alternative Hypothesis: The two categorical variables are dependent.
Test statistic:
where O represents the observed frequency. E is the expected frequency under the null hypothesis and computed by
Compare the value of the test statistic to the critical value of ^{2} with degree of freedom = (r  1) (c  1), reject the null hypothesis if X^{2} > ^{2.}
A random sample of 500 persons is questioned regarding their political affiliation and opinion on a tax reform bill. Test if the political affiliation and their opinion on a tax reform bill are dependent at 5% level of significance. The observed contingency table is given below:
favor  indifferent  opposed  total  
democrat  138 ( )*  83 ( )  64 ( )  285 
republican  64 ( )  67 ( )  84 ( )  215 
total  202  150  148  500 
* We usually put expected counts inside of these parentheses. They can be easily obtained from Minitab output.
Minitab command:
A. To produce a contingency table (cross tabulation) and perform chisquare analysis for two categorical variables:
Stat > Tables > Cross Tabulation
Select the variables you want to summarize by doubleclicking on them. If one variable can be designated as explanatory and the other response, it is customary to define the rows using the explanatory variable and the columns using the response variable.
If you want to have the result of chisquare analysis, then check the Chisquare analysis box.
B. To perform a chisquare test of independence from summarized data:
Stat > Tables > Chisquare Test
Chisquare Test
Expected counts are printed below observed counts
favor indiffer opposed Total 1 138 83 64 285 115.14 85.50 84.36 2 64 67 84 215 86.86 64.50 63.64 Total 202 150 148 500ChiSq = 4.539 + 0.073 + 4.914 + 6.016 + 0.097 + 6.514 = 22.152
DF = 2, PValue = 0.000
Condition for Using Chisquare Test
Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chisquare test if more than 20% of the cells have expected frequencies below five, especially if the pvalue is small and these cells give a large contribution to the total chisquare value.
The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of workmanship among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are: shift and condition of the tire produced. The data can be summarized by the accompanying twoway table. Do these data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?
Perfect Satisfactory Defective Total Shift 1 106 ( ) 124 ( ) 1 ( ) 231 Shift 2 67 ( ) 85 ( ) 1 ( ) 153 Shift 3 37 ( ) 72 ( ) 3 ( ) 112 Total 210 281 5 496 Minitab output:
Chisquare Test
Expected counts are printed below observed counts
C1 C2 C3 Total 1 106 124 1 231 97.80 130.87 2.33 2 67 85 1 153 64.78 86.68 1.54 3 37 72 3 112 47.42 63.45 1.13 Total 210 281 5 496ChiSq = 0.687 + 0.076 + 2.289 + 0.361 + 0.033 + 1.152 + 0.758 + 0.191 + 3.100 = 8.647
DF = 4, PValue = 0.071Note: 3 cells with expected counts less than 5.0.
In the above example, we don't have a significant result at 5% significance level. Even if we did have a significant result, we still cannot trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0.