# Relationships Between Two Variables

### Comparing Two Categorical Variables

Understand that categorical variables either exist naturally (e.g. a person’s race, political party affiliation, or class standing), while others are created by grouping a quantitative variable (e.g. taking height and creating groups Short, Medium, and Tall). We analyze categorical data by recording counts or percents of cases occurring in each category. Although you can compare several categorical variables we are only going to consider the relationship between two such variables.

#### Example

The Class Survey data set, (CLASS_SURVEY.MTW or CLASS_SURVEY.XLS), consists of student responses to survey given last semester in a Stat200 course. We can construct a two-way table showing the relationship between Smoke Cigarettes (row variable) and Gender (column variable) using either Minitab or SPSS.

To create a two-way table in Minitab:

1. Open the Class Survey data set.
2. From the menu bar select Stat > Tables > Cross Tabulation and Chi-Square
3. In the text box For Rows enter the variable Smoke Cigarettes and in the text box For Columns enter the variable Gender
4. Under Display be sure the box is checked for Counts (should be already checked as this is the default display in Minitab).
5. Click OK

To create a two-way table in SPSS:

1. Import the data set
2. From the menu bar select Analyze > Descriptive Statistics > Crosstabs
3. Click on variable Smoke Cigarettes and enter this in the Rows box.
4. Click on variable Gender and enter this in the Columns box.
5. Click OK

This should result in the following two-way table:

The marginal distribution along the bottom (the bottom row All) gives the distribution by gender only (disregarding Smoke Cigarettes). The marginal distribution on the right (the values under the column All) is for Smoke Cigarettes only (disregarding Gender). Since there were more females (127) than males (99) who participated in the survey, we should report the percentages instead of counts in order to compare cigarette smoking behavior of females and males. This tells the conditional distribution of smoke cigarettes given gender, suggesting we are considering gender as an explanatory variable (i.e. a variable that we use to explain what is happening with another variable). These conditional percentages are calculated by taking the number of observations for each level smoke cigarettes (No, Yes) within each level of gender (Female, Male). For example, the conditional percentage of No given Female is found by 120/127 = 94.5%.

We can calculate these marginal probabilities using either Minitab or SPSS:

To calculate these marginal probabilities using Minitab:

1. Opening the Class Survey data set.
2. From the menu bar select Stat > Tables > Cross Tabulation and Chi-Square
3. In the text box For Rows enter the variable Smoke Cigarettes and in the text box For Columns enter the variable Gender
4. Under Display be sure the box is checked for Counts and also check the box for Column Percents.
5. Click OK

To create a two-way table in SPSS:

1. Import the data set
2. From the menu bar select Analyze > Descriptive Statistics > Crosstabs
3. Click on variable Smoke Cigarettes and enter this in the Rows box.
4. Click on variable Gender and enter this in the Columns box.
5. Click the tab labeled Cells and select column under Percentages.
6. Click Continue
7. Click OK

This should result in the following two-way table with column percents:

Although you do not need the counts, having those visible aids in the understanding of how the conditional probabilities of smoking behavior within gender are calculated. We can see from this display that the 94.49% conditional probability of No Smoking given the Gender is Female is found by the number of No and Female (count of 120) divided by then number of Females (count of 127). The data under Cell Contents tells you what is being displayed in each cell: the top value is Count and the bottom value is Percent of Column. Alternatively, we could compute the conditional probabilities of Gender given Smoking by calculating the Row Percents; i.e. take for example 120 divided by 209 to get 57.42%. This would be interpreted then as for those who say they do not smoke 57.42% are Females – meaning that for those who do not smoke 42.58% are Male (found by 100% – 57.42%).