Lesson 11.3 - Contingency Tables and Chi-Square Test of Independence

Unit Summary

  • How to Summarize Two Categorical Variables
  • Contingency Tables: Chi-square Test of Independence
  • Condition for Using the Chi-square Test

 

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, chapter 10.6.

 

How to Summarize Two Categorical Variables

So far in our course, we only discuss measurements taken in one variable for each sampling unit, which is call univariate. In this lesson, we are going to talk about measurements taken in two variables for each sampling unit. This is referred to as bivariate data. We will talk about bivariate categorical data in this lesson and bivariate quantitative data in Lesson 12.

We will first start with univariate count (categorical) data.

Univariate Count Data: Only one categorical variable and the data is presented by a tally.

There are three daily shifts in a production plant for tires. The tires will be classified as being produced in shift 1, shift 2, or shift 3.

  Shift 1 Shift 2 Shift 3
Count 231 153 116

Bivariate Count Data: Measurements are taken on two categorical variables and the data can be summarized by a two-way table, also called a r x c contingency table, where r = number of rows, c = number of columns. One common question of interest is "Are the two variables independent?"

Contingency Tables: Chi-square Test of Independence

How to test the independence of two categorical variables? It will be by Chi-square test of independence.

Null Hypothesis: The two categorical variables are independent.

Alternative Hypothesis: The two categorical variables are dependent.

Test statistic:

where O represents the observed frequency. E is the expected frequency under the null hypothesis and computed by

Compare the value of the test statistic to the critical value of 2 with degree of freedom = (r - 1) (c - 1), reject the null hypothesis if X2 > 2.

A random sample of 500 persons is questioned regarding their political affiliation and opinion on a tax reform bill. Test if the political affiliation and their opinion on a tax reform bill are dependent at 5% level of significance. The observed contingency table is given below:

  favor indifferent opposed total
democrat 138 ( )* 83 ( ) 64 ( ) 285
republican 64 ( ) 67 ( ) 84 ( ) 215
total 202 150 148 500

* We usually put expected counts inside of these parentheses. They can be easily obtained from Minitab output.

Minitab command:

A. To produce a contingency table (cross tabulation) and perform chi-square analysis for two categorical variables:

Stat > Tables > Cross Tabulation

Select the variables you want to summarize by double-clicking on them. If one variable can be designated as explanatory and the other response, it is customary to define the rows using the explanatory variable and the columns using the response variable.

If you want to have the result of chi-square analysis, then check the Chi-square analysis box.

B. To perform a chi-square test of independence from summarized data:

Stat > Tables > Chi-square Test

Chi-square Test
Expected counts are printed below observed counts

favor
indiffer
opposed
Total
1
138
83
64
285
115.14
85.50
84.36
2
64
67
84
215
86.86
64.50
63.64
Total
202
150
148
500

Chi-Sq = 4.539 + 0.073 + 4.914 + 6.016 + 0.097 + 6.514 = 22.152
DF = 2, P-Value = 0.000

Condition for Using Chi-square Test

Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chi-square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total chi-square value.

The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of workmanship among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are: shift and condition of the tire produced. The data can be summarized by the accompanying two-way table. Do these data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?

  Perfect Satisfactory Defective Total
Shift 1 106 ( ) 124 ( ) 1 ( ) 231
Shift 2 67 ( ) 85 ( ) 1 ( ) 153
Shift 3 37 ( ) 72 ( ) 3 ( ) 112
Total 210 281 5 496

Minitab output:

Chi-square Test
Expected counts are printed below observed counts

C1
C2
C3
Total
1
106
124
1
231
97.80
130.87
2.33
2
67
85
1
153
64.78
86.68
1.54
3
37
72
3
112
47.42
63.45
1.13
Total
210
281
5
496
         

Chi-Sq = 0.687 + 0.076 + 2.289 + 0.361 + 0.033 + 1.152 + 0.758 + 0.191 + 3.100 = 8.647
DF = 4, P-Value = 0.071

Note: 3 cells with expected counts less than 5.0.

In the above example, we don't have a significant result at 5% significance level. Even if we did have a significant result, we still cannot trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0.

Click on "Next" to continue this lesson.

© 2007 The Pennsylvania State University. All rights reserved.