### Constructing confidence intervals to estimate a population proportion

NOTE: the following interval calculations for the proportion confidence interval is dependent on the following assumptions being satisfied: np ≥ 10 and n(1-p) ≥ 10. If p is unknown then use the sample proportion.

The goal is to estimate p = proportion with a particular trait or opinion in a population.

- Sample statistic = (read "p-hat") = proportion of observed sample with the trait or opinion we’re studying.
- Standard error of , where
*n*= sample size. - Multiplier comes from this table

Confidence Level |
Multiplier |

.90 (90%) | 1.645 or 1.65 |

.95 (95%) | 1.96, usually rounded to 2 |

.98 (98%) | 2.33 |

.99 (99%) | 2.58 |

The value of the **multiplier increases** as the **confidence level increases**. This leads to **wider** intervals for **higher confidence** levels. We are **more confident** of catching the **population** value when we use a **wider** interval.

**Example**

In the year 2001 Youth Risk Behavior survey done by the U.S. Centers for Disease Control, 747 out of *n* = 1168 female 12*th* graders said the always use a seatbelt when driving.

**Goal**: Estimate proportion always using seatbelt when driving in the population of all U.S. 12*th* grade female drivers. **Check assumption**: (1168)*(0.64) = 747 and (1168)*(0.36) = 421 both of which are at least 10.

Sample statistic is = = 747 / 1168 = .64

Standard error =

A 95% confidence interval estimate is .64 ± 2 (.014), which is .612 to .668

With 95% confidence, we estimate that between .612 (61.2%) and .668 (66.8%) of all 12*th* grade female drivers always wear their seatbelt when driving.

Example Continued: For the seatbelt wearing example, a 99% confidence interval for the population proportion is

.64 ± 2.58 (.014), which is .64 ± .036, or .604 to .676.

With 99% confidence, we estimate that between .604 (60.4%) and .676 (67.6%) of all 12*th* grade female drivers always wear their seatbelt when driving.

Notice that the 99% confidence interval is slightly wider than the 95% confidence interval. IN the same situation, the greater the confidence level, the wider the interval.

Notice also, that the only the value of the multiplier differed in the calculations of the 95% and 98% intervals.

#### Using Confidence Intervals to Compare Groups

A somewhat informal method for comparing two or more populations is to compare confidence intervals for the value of a parameter. If the confidence intervals do not overlap, it is reasonable to conclude that the parameter value differs for the two populations.

**Example**

In the Youth Risk Behavior survey, 677 out of *n* = 1356 12*th* grade males said they always wear a seatbelt. To begin, we’ll calculate a 95% confidence interval estimate of the population proportion.** Check assumption:** (1356)*(0.499) = 677 and (1356)*(0.501) = 679 both of which are at least 10.

Sample statistic is = 677 / 1356 = .499

Standard error =

A 95% confidence interval estimate, calculated as Sample statistic ± multiplier × Standard Error is

. 499 ± 2 (.0137), or .472 to .526.

With 95% confidence, we estimate that between .472 (47.2%) and .526 (52.6%) of all 12*th* male drivers always wear their seatbelt when driving.

*Comparison and Conclusion*: For females, the 95% confidence interval estimate of the percent always wearing a seatbelt was found to be 61.2% to 66.8%, an obviously different interval than for males. It’s reasonable to conclude that 12*th* grade males and females differ with regard to frequency of wearing a seatbelt when driving.

#### Using Confidence Intervals to "test" how parameter value compares to a specified value

Values in a confidence interval are "acceptable" possibilities for the true population value. Values not in the confidence interval are not acceptable (reasonable) possibilities for the population value.

**Example**

The 95% confidence interval estimate of percent of 12*th* grade females who always wear a seatbelt is 61.2% to 66.8%. Any percent in this interval is an acceptable guess at the population value.

This has the consequence that it’s safe to say that a majority (more than 50%) of this population always wears their seatbelt (because all values 50% and below can be rejected as possibilities.)

If somebody claimed that 75% of all 12*th* grade females always used a seatbelt, we should reject that assertion. The value 75% is not within our confidence interval.

#### Finding sample size for estimating a population proportion

When one begins a study to estimate a population parameter they typically have an idea as how confident they want to be in their results and within what degree of accuracy. This means they get started with a set level of confidence and margin of error. We can use these pieces to determine a minimum sample size needed to produce these results by using algebra to solve for n in our margin of error:

where M is the margin of error.

**Conservative estimate:** If we have no preconceived idea of the sample proportion (e.g. previous presidential attitude surveys) then a conservative (i.e. guaranteeing the largest sample size calculation) is to use 0.5 for the sample proportion. For example, if we wanted to calculate a 95% confidence interval with a margin of error equal to 0.04, then a conservative sample size estimate would be:

And since this is the *minimum* sample size and we cannot get 0.25 of a subject, we **round up**. This results in a sample size of 601.

**Estimate when proportion value is hypothesized:** If we have an idea of a proportion value, then we simply plug that value into the equation. Note that using 0.5 will always produce the largest sample size and this is why it is called a conservative estimate.