Constructing confidence intervals to estimate a population proportion
NOTE: the following interval calculations for the proportion confidence interval is dependent on the following assumptions being satisfied: np ≥ 10 and n(1-p) ≥ 10. If p is unknown then use the sample proportion.
The goal is to estimate p = proportion with a particular trait or opinion in a population.
- Sample statistic = (read "p-hat") = proportion of observed sample with the trait or opinion we’re studying.
- Standard error of , where n = sample size.
- Multiplier comes from this table
|.90 (90%)||1.645 or 1.65|
|.95 (95%)||1.96, usually rounded to 2|
The value of the multiplier increases as the confidence level increases. This leads to wider intervals for higher confidence levels. We are more confident of catching the population value when we use a wider interval.
In the year 2001 Youth Risk Behavior survey done by the U.S. Centers for Disease Control, 747 out of n = 1168 female 12th graders said the always use a seatbelt when driving.
Goal: Estimate proportion always using seatbelt when driving in the population of all U.S. 12th grade female drivers. Check assumption: (1168)*(0.64) = 747 and (1168)*(0.36) = 421 both of which are at least 10.
Sample statistic is = = 747 / 1168 = .64
Standard error =
A 95% confidence interval estimate is .64 ± 2 (.014), which is .612 to .668
With 95% confidence, we estimate that between .612 (61.2%) and .668 (66.8%) of all 12th grade female drivers always wear their seatbelt when driving.
Example Continued: For the seatbelt wearing example, a 99% confidence interval for the population proportion is
.64 ± 2.58 (.014), which is .64 ± .036, or .604 to .676.
With 99% confidence, we estimate that between .604 (60.4%) and .676 (67.6%) of all 12th grade female drivers always wear their seatbelt when driving.
Notice that the 99% confidence interval is slightly wider than the 95% confidence interval. IN the same situation, the greater the confidence level, the wider the interval.
Notice also, that the only the value of the multiplier differed in the calculations of the 95% and 98% intervals.
Using Confidence Intervals to Compare Groups
A somewhat informal method for comparing two or more populations is to compare confidence intervals for the value of a parameter. If the confidence intervals do not overlap, it is reasonable to conclude that the parameter value differs for the two populations.
In the Youth Risk Behavior survey, 677 out of n = 1356 12th grade males said they always wear a seatbelt. To begin, we’ll calculate a 95% confidence interval estimate of the population proportion. Check assumption: (1356)*(0.499) = 677 and (1356)*(0.501) = 679 both of which are at least 10.
Sample statistic is = 677 / 1356 = .499
Standard error =
A 95% confidence interval estimate, calculated as Sample statistic ± multiplier × Standard Error is
. 499 ± 2 (.0137), or .472 to .526.
With 95% confidence, we estimate that between .472 (47.2%) and .526 (52.6%) of all 12th male drivers always wear their seatbelt when driving.
Comparison and Conclusion: For females, the 95% confidence interval estimate of the percent always wearing a seatbelt was found to be 61.2% to 66.8%, an obviously different interval than for males. It’s reasonable to conclude that 12th grade males and females differ with regard to frequency of wearing a seatbelt when driving.
Using Confidence Intervals to "test" how parameter value compares to a specified value
Values in a confidence interval are "acceptable" possibilities for the true population value. Values not in the confidence interval are not acceptable (reasonable) possibilities for the population value.
The 95% confidence interval estimate of percent of 12th grade females who always wear a seatbelt is 61.2% to 66.8%. Any percent in this interval is an acceptable guess at the population value.
This has the consequence that it’s safe to say that a majority (more than 50%) of this population always wears their seatbelt (because all values 50% and below can be rejected as possibilities.)
If somebody claimed that 75% of all 12th grade females always used a seatbelt, we should reject that assertion. The value 75% is not within our confidence interval.
Finding sample size for estimating a population proportion
When one begins a study to estimate a population parameter they typically have an idea as how confident they want to be in their results and within what degree of accuracy. This means they get started with a set level of confidence and margin of error. We can use these pieces to determine a minimum sample size needed to produce these results by using algebra to solve for n in our margin of error:
where M is the margin of error.
Conservative estimate: If we have no preconceived idea of the sample proportion (e.g. previous presidential attitude surveys) then a conservative (i.e. guaranteeing the largest sample size calculation) is to use 0.5 for the sample proportion. For example, if we wanted to calculate a 95% confidence interval with a margin of error equal to 0.04, then a conservative sample size estimate would be:
And since this is the minimum sample size and we cannot get 0.25 of a subject, we round up. This results in a sample size of 601.
Estimate when proportion value is hypothesized: If we have an idea of a proportion value, then we simply plug that value into the equation. Note that using 0.5 will always produce the largest sample size and this is why it is called a conservative estimate.