Learning Objectives For This Lesson
Upon completion of this lesson, you should be able to:
 Perform tests of hypotheses of one proportion and one mean
 Properly identify if the situation involves a proportion or mean
 Understand the errors present in hypothesis testing
 Realize the limits associated with significance tests
 Understand the basic concept regarding power in tests of significance
Hypothesis Testing
Previously we used confidence intervals to estimate some unknown population parameter. For example, we constructed 1proportion confidence intervals to estimate the true population proportion – this population proportion being the parameter of interest. We even went as far as comparing two intervals to see if they overlapped – if so we concluded that there was no difference between the population proportions for the two groups – or if the interval contained a specific parameter value.
Statistical Significance
A sample result is called statistically significant when the pvalue for a test statistic is less than level of significance, which for this class we will keep at 0.05. In other words, the result is statistically significant when we reject a null hypothesis.
Five Steps in a Hypothesis Test (Note: some texts will label these steps differently, but the premise is the same)
 Check any necessary assumptions and write null and alternative hypotheses.
 Calculate an appropriate test statistic.
 Determine a pvalue associated with the test statistic.
 Decide between the null and alternative hypotheses.
 State a "real world" conclusion.
Now let’s try to tie together the concepts we discussed regarding Sampling and Probability to delve further into statistical inference with the use of hypothesis tests.
Two designs for producing data are sampling and experimentation, both of which should employ randomization. We have learned that randomization is advantageous because it controls bias. Now we will see another advantage: because chance governs our selection, we may make use of the laws of probability – the scientific study of random behavior – to draw conclusions about the entire population from which the units (e.g. students, machined parts, U.S. adults) originated. Again, this process is called statistical inference.
Previously we had defined population and sample and what we use to describe their values, but we will revisit these:
Parameter: a number that describes the population. It is fixed but rarely do we know its value. (e.g. the true proportion of PSU undergraduates that would date someone of a different race.)
Statistic: a number that describes the sample. This value is known but can vary from sample to sample, for instance from the Class Survey data we may get one proportion of those who said they would date someone of a different race, but if I gave that survey to another sample of PSU undergraduate students do you really believe that the proportion from that sample would be identical to ours?
EXAMPLES
1. A survey is carried out at a university to estimate the mean GPA of undergraduates living off campus current term. Population: all undergraduates at the university who live off campus; sample: those undergraduates surveyed; parameter: mean GPA of all undergraduates at that university living off campus; statistic: mean GPA of sampled undergraduates.
2. A balanced coin is flipped 100 times and percentage of heads is 47%. Population: all coin flips; sample: the 100 coin flips; parameter: 50%  percentage of all coin flips that would result in heads if the coin is balanced; statistic: 47%.
Hypothesis Testing for a Proportion
Ultimately we will measure statistics (e.g. sample proportions and sample means) and use them to draw conclusions about unknown parameters (e.g. population proportion and population mean). This process, using statistics to make judgments or decisions regarding population parameters is called statistical inference.
Example 2 above produced a sample proportion of 47% heads and is written:
[read phat] = 47/100 = 0.47
Phat is called the sample proportion and remember it is a statistic (soon we will look at sample means, .) But how can phat be an accurate measure of p, the population parameter, when another sample of 100 coin flips could produce 53 heads? And for that matter we only did 100 coin flips out of an uncountable possible total!
The fact that these samples will vary in repeated random sampling taken at the same time is referred to as sampling variability. The reason sampling variability is acceptable is that if we took many samples of 100 coin flips an calculated the proportion of heads in each sample then constructed a histogram or boxplot of the sample proportions, the resulting shape would look normal (i.e. bellshaped) with a mean of 50%.
[The reason we selected a simple coin flip as an example is that the concepts just discussed can be difficult to grasp, especially since earlier we mentioned that rarely is the population parameter value known. But most people accept that a coin will produce an equal number of heads as tails when flipped many times.]
A statistical hypothesis test is a procedure for deciding between two possible statements about a population. The phrase significance test means the same thing as the phrase "hypothesis test."
The two competing statements about a population are called the null hypothesis and the alternative hypothesis.
 A typical null hypothesis is a statement that two variables are not related. Other examples are statements that there is no difference between two groups (or treatments) or that there is no difference from an existing standard value.
 An alternative hypothesis is a statement that there is a relationship between two variables or there is a difference between two groups or there is a difference from a previous or existing standard.
NOTATION: The notation H_{o} represents a null hypothesis and H_{a} represents an alternative hypothesis and p_{o} is read as pnot or pzero and represents the null hypothesized value. Shortly, we will substitute μ_{o} for when discussing a test of means.
H_{o}: p = p_{o}
H_{a}: p ≠ p_{o} or H_{a}: p > p_{o} or H_{a}: p < p_{o} [Remember, only select one H_{a}]
The first H_{a} is called a twosided test since "not equal" implies that the true value could be either greater than or less than the test value, p_{o}. The other two H_{a} are referred to as onesided tests since they are restricting the conclusion to a specific side of p_{o}.
Example 3 – This is a test of a proportion:
A Tufts University study finds that 40% of 12th grade females feel they are overweight. Is this percent lower for college age females? Let p = proportion of college age females who feel they are overweight. Competing hypothesis are:
H_{o}: p = .40 (or greater) That is, no difference from Tufts study finding.
H_{a}: p < .40 (proportion feeling they are overweight is less for college age females.
Example 4 – This is a test of a mean:
Is there a difference between the mean amount that men and women study per week? Competing hypotheses are:
Null hypothesis: There is no difference between mean weekly hours of study for men and women, writing in statistical language as μ_{1} = μ_{2}
Alternative hypothesis: There is a difference between mean weekly hours of study for men and women, writing in statistical language as μ_{1}≠ μ_{2}This notation is used since the study would consider two independent samples: one from Women and another from Men.
Test Statistic and pvalue
 A test statistic is a summary of a sample that is in some way sensitive to differences between the null and alternative hypothesis.
 A pvalue is the probability that the test statistic would "lean" as much (or more) toward the alternative hypothesis as it does if the real truth is the null hypothesis. That is, the pvalue is the probability that the sample statistic would occur under the presumption that the null hypothesis is true.
A small pvalue favors the alternative hypothesis. A small pvalue means the observed data would not be very likely to occur if we believe the null hypothesis is true. So we believe in our data and disbelieve the null hypothesis. An easy (hopefully!) way to grasp this is to consider the situation where a professor states that you are just a 70% student. You doubt this statement and want to show that you are better that a 70% student. If you took a random sample of 10 of your previous exams and calculated the mean percentage of these 10 tests, which mean would be less likely to occur if in fact you were a 70% student (the null hypothesis): a sample mean of 72% or one of 90%? Obviously the 90% would be less likely and therefore would have a small probability (i.e. pvalue).
Using the pvalue to Decide between the Hypotheses
 The significance level of a test is the border used for deciding between the null and alternative hypotheses.
 Decision Rule: We decide in favor of the alternative hypothesis when a pvalue is less than or equal to the significance level. The most commonly used significance level is 0.05.
In general, the smaller the pvalue the stronger the evidence is in favor of the alternative hypothesis.
EXAMPLE 3 CONTINUED:
In a recent elementary statistics survey, the sample proportion (of women) saying they felt overweight was 37 /129 = .287. Note that this leans toward the alternative hypothesis that the "true" proportion is less than .40. [Recall that the Tufts University study finds that 40% of 12th grade females feel they are overweight. Is this percent lower for college age females?]
Step 1: Let p = proportion of college age females who feel they are overweight.
H_{o}: p = .40 (or greater) That is, no difference from Tufts study finding.
H_{a}: p < .40 (proportion feeling they are overweight is less for college age females.
Step 2:
If np_{o} ≥ 10 and n(1 – p_{o}) ≥ 10 then we can use the following Ztest statistic: Since both (129)*(0.4) and (129)*(0.6) > 10 [or consider that the number of successes and failures, 37 and 92 respectively, are at least 10] we calculate the test statistic by:
Note: In computing the Ztest statistic for a proportion we use the hypothesized value p_{o} here not the sample proportion phat in calculating the standard error! We do this because we "believe" the null hypothesis to be true until evidence says otherwise.
Step 3: The pvalue can be found from Standard Normal Table
Calculating pvalue:
The method for finding the pvalue is based on the alternative hypothesis:
2P(Z ≥  z  ) for H_{a} : p ≠ p_{o}
P(Z ≥ z ) for H_{a }: p > p_{o}
P(Z ≤ z) for H_{a} : p < p_{o}
In our example we are using H_{a} : p < .40 so our pvalue will be found from P(Z ≤ z) = P(Z ≤ 2.62) and from Standard Normal Table this is equal to 0.0044.
Step 4: We compare the pvalue to alpha, which we will let alpha be 0.05. Since 0.0044 is less than 0.05 we will reject the null hypothesis and decide in favor of the alternative, H_{a}.
Step 5: We’d conclude that the percentage of college age females who felt they were overweight is less than 40%. [Note: we are assuming that our sample, since not random, is representative of all college age females.]
The pvalue= .004 indicates that we should decide in favor of the alternative hypothesis. Thus we decide that less than 40% of college women think they are overweight.
The "Zvalue" (2.62) is the test statistic. It is a standardized score for the difference between the sample p and the null hypothesis value p = .40. The pvalue is the probability that the zscore would lean toward the alternative hypothesis as much as it does if the true population really was p = .40.
Using Software to Perform a One Proportion Test Analysis Using Raw Data
To perform a one proportion test analysis in Minitab using raw data:
 Open Minitab data set Class_Survey.MTW
 Go to Stat > Basic Stat > 1 proportion
 Click the radio button for Samples in Columns (this is the default)
 Click the text box under this title (cursor should be in this box)
 Select from the variables list the variable Gender (be sure the variable Gender appears in the text box)
 Check the box for Perform Hypothesis Test and enter 0.5 (note that for Minitab versions earlier than 15 this test is found under the Options)
 Click Options and select the correct Alternative (e.g. not equal to)
 Check the box for Use Test and Interval Based on Normal Distribution (remember to verify this use by checking that the number of successes and failures are at least ten)
 Click OK twice
To perform a one proportion test analysis in SPSS using raw data:
 Import data Class_Survey.XLS into SPSS
 Since the variable Gender has text responses (i.e. Male, Female) we need to recode this variable into a numeric. We will use 1 to represent Male and 0 for Female.
 Go to Transform > Recode Into Different Variables
 Enter Gender into the Output Variable Window
 In the text box under Output Variable labeled Name: enter Male
 Click Change
 Click the button Old and New Values
 Under Old Value click Value and type in Male
 Under New Value enter in the Value text box the value 1
 Under Old → New click Add
 Repeat steps 8 through 11 typing in Female and 0
 Click Continue
 Click OK (you should now have a new column of ones and zeroes titled Male)
 Go to Analyze > Nonparametric Tests > Binomial
 Enter the variable Male into the text box for Test Variable List
 The Test Proportion value is defaulted at 0.5; if this is not correct then change
 Click OK
This should result in the following output:
Using Software to Perform a Summarized One Proportion Test Analysis
To perform a summarized one proportion test analysis in Minitab:
 Open Minitab without data
 Go to Stat > Basic Stat > 1 proportion
 Click the radio button for Summarized Data
 Enter 37 for Number of Events and 129 for Number of Trials
 Check the box for Perform Hypothesis Test and enter 0.4 (note that for Minitab versions earlier than 15 this test is found under the Options)
 Click Options and select the correct Alternative (e.g. less than)
 Check the box for Use Test and Interval Based on Normal Distribution (remember to verify this use by checking that the number of successes and failures are at least ten)
 Click OK twice
This should result in the following output:
To perform a summarized one proportion test analysis in SPSS:
 Open SPSS without data
 Enter in the first empty cell the number of successes, 37
 Enter in the cell below that one the number of failures, 92
 Click Data > Weight Cases
 Click the radio button Weight Cases By and enter in the text box the variable of interest from the variable list (should only be one variable VAR00001 if you started with an empty data set) (see image spss_02)
 Click OK
 Go to Analyze > Nonparametric Tests > Binomial
 Enter the variable of interest into the Test Variable List (see image spss_03)
 Change the test proportion value to 0.4
 Click OK
 NOTE: SPSS does not provide a method based on the normal approximation (even though the notation in the output references based on Z approximation). SPSS uses exact methods based on binomial distribution. However, the hypotheses setup, decision rules and conclusion use the same approach as that for when using normal approximation techniques, i.e. z method.
This should result in the following output:
The pvalue= .004 indicates that we should decide in favor of the alternative hypothesis. Thus we decide that less than 40% of college women think they are overweight.
The "Zvalue" (2.62) is the test statistic. It is a standardized score for the difference between the sample p and the null hypothesis value p = .40. The pvalue is the probability that the zscore would lean toward the alternative hypothesis as much as it does if the true population really was p = .40.
Hypothesis Testing for a Mean
Quantitative Response Variables and Means
We usually summarize a quantitative variable by examining the mean value. We summarize categorical variables by considering the proportion (or percent) in each category. Thus we use the methods described in this handout when the response variable is quantitative. Again, examples of quantitative variables are height, weight, blood pressure, pulse rate, and so on.
Null and Alternative Hypotheses for a Mean
 For one population mean, a typical null hypothesis is H_{0} : population mean μ = a specified value. We'll actual give a number where it says "a specified value," and for paired data the null hypothesis would be H_{0} : u_{d} = a specified value. Typically when considering differences this specified value is zero
 The alternative hypothesis might either be onesided ( a specific direction of inequality is given) or twosided ( a not equal statement).
Test Statistics
The test statistic for examining hypotheses about one population mean:
where the observed sample mean, μ_{0} = value specified in null hypothesis, s = standard deviation of the sample measurements and n = the number of differences.
Notice that the top part of the statistic is the difference between the sample mean and the null hypothesis. The bottom part of the calculation is the standard error of the mean.
It is a convention that a test using a tstatistic is called a ttest. That is, hypothesis tests using the above would be referred to as "1sample t test".
Finding the pvalue
Recall that a pvalue is the probability that the test statistic would "lean" as much (or more) toward the alternative hypothesis as it does if the real truth is the null hypothesis.
When testing hypotheses about a mean or mean difference, a tdistribution is used to find the pvalue. This is a close cousin to the normal curve. TDistributions are indexed by a quantity called degrees of freedom, calculated as df = n – 1 for the situation involving a test of one mean or test of mean difference.
The pvalues for the tdistribution are found in your text or a copy can be found at the following link: TTable. To interpret the table, use the column under DF to find the correct degree of freedom. Use the top row under Absolute Value of tStatistic to locate your calculated tvalue. Most likely you will not find an exact match for your tvalue so locate the range for your tvalue. This means that your tvalue will be either less than 1.28; between two tstatistics in the table; or greater than 3.00. Once you located the range, then find the corresponding pvalue(s) associated with your range of tstatistics. This would be your pvalue used to compare to alpha of 0.5.
NOTE: the tstatistics increase from left to right, but the pvalues decrease! So if your range for the tstatistic is greater than 3.00 your pvalue would be less than the corresponding pvalue listed in the table.
Examples of reading TTable [recall degrees of freedom for 1sample t are equal to n − 1, or one less than the sample size] and is read as pvalue = P(T > t). NOTE: If this formula appears familiar it should as this closely resembles that for finding probability values using Standard Normal Table with zvalues.
 If you had sample of size 15 resulting in DF = 14 and tvalue = 1.20 your tvalue range would be less than 1.28 producing a pvalue of p > 0.111. That is, the probability that P(T < 1.28) is greater than 0.111.
 If you had sample of size 15 resulting in DF = 14 and tvalue = 1.95 your tvalue range would be from 1.80 to 2.00 producing a pvalue of 0.033 < p < 0.047. That is, the probability that P(1.80 < T < 2.00) is between 0.0333 and 0.047.
 If you had sample of size 15 resulting in DF = 14 and tvalue =3.20 your tvalue range would be greater than 3.00 producing a pvalue of p < 0.005. That is, the probability that P(T > 3.00) is less than 0.005.
NOTE: The increments for the degrees of freedom in TTable are not always 1. This column increases by 1 up to DF = 30, then the increments change. If your DF is not found in the table just go to the nearest DF. Also, note that the last row, "Infinite", displays the same pvalues as those found in Standard Normal Table. This is because as n increases the tdistribution maps the standard normal distribution.
Using Software to Perform a One Mean Test Analysis Using Raw Data
Example:
Students measure their pulse rates. Is the mean pulse rate for college age women equal to 72 (a longheld standard for average pulse rate)?
Null hypothesis: μ = 72
Alternative hypothesis: μ ≠72
Pulse rates for n = 35 women are available.
To perform a one mean hypothesis test in Minitab using raw data:
 Open data set
 Go to Stat > Basic Statistics > 1Sample t
 Click inside the text area for Sample From Columns Select GPA and move GPA into the text box for Sample From Columns
 Click the check box for Perform Hypothesis Test and enter the hypothesized value into the text box for Hypothesized Mean (e.g. 3.0)
 Click Options. Here you can select correct alternative hypothesis (default is not equal to  keep that for now)
 Click OK
 Click OK
This should result in the following output:
To perform a one mean hypothesis test in SPSS:
 Import data into SPSS
 Go to Analyze > Compare Means > OneSample T Test
 Select GPA and move GPA into the text box for Test Variable(s)
 Enter in the text box for Test Value the hypothesized value being tested (e.g. 3.0)
 Click OK
Special Note: SPSS performs all tests as twosided. If interested in a 1sided alternative (e.g. "greater than") we would have to divide the pvalue in half.
This should result in the following output:
Using Software to Perform a Summarized One Mean Test Analysis
To perform a one mean hypothesis test in Minitab using summarized data:
 Open Minitab
 Go to Stat > Basic Statistics > 1Sample t
 Click the radio button for Summarized Data
 Enter the appropriate values (e.g. Sample Size: 35, Mean: 76.8, Standard Deviation: 11.62)
 Click the check box for Perform Hypothesis Test and enter the hypothesized value into the text box for Hypothesized Mean (e.g. 72)
 Click Options. Here you can select correct alternative hypothesis (default is not equal to  keep that for now)
 Click OK
 Click OK
This should result in the following output (image_003 included in conversion folder):
SPSS cannot perform a hypothesis test for a mean using summarized data.
INTERPRETATION:
The pvalue is p = 0.019. This is below the .05 standard, so the result is statistically significant. This means we decide in favor of the alternative hypothesis. We're deciding that the population mean is not 72.
The test statistic is
Because this is a twosided alternative hypothesis, the pvalue is the combined area to the right of 2.47 and the left of −2.47 in a tdistribution with 35 – 1 = 34 degrees of freedom.
Example 2:
In the same "survey" there were n = 57 men. Is the mean pulse rate for college age men equal to 72?
Null hypothesis: μ = 72
Alternative hypothesis: μ ≠72
RESULTS:
INTERPRETATION:
The pvalue is p = 0.236. This is not below the .05 standard, so we do not reject the null hypothesis. Thus it is possible that the true value of the population mean is 72. The 95% confidence interval suggests the mean could be anywhere between 67.78 and 73.06.
The test statistic is
The pvalue is the combined probability that a tvalue would be less than (to the left of ) −1.20 and greater than (to the right of +1.20).
Errors, Practicality and Power in Hypothesis Testing
Errors in Decision Making – Type I and Type II
How do we determine whether to reject the null hypothesis? It depends on the level of significance α, which is the probability of the Type I error.
What is Type I error and what is Type II error?
When doing hypothesis testing, two types of mistakes may be committed and we call them Type I error and Type II error.
Decision 
Reality 

H_{0} is true 
H_{0} is false 

Reject H_{0} and conclude H_{a}  Type I error 
Correct 
Do not reject H_{0}  Correct 
Type II error 
If we reject H_{0} when H_{0} is true, we commit a Type I error. The probability of type I error is denoted by alpha, α (as we already know this is commonly 0.05)
If we accept H_{0} when H_{0} is false, one commits a type II error. The probability of Type II error is denoted by Beta, β:
Our convention is to set up the hypotheses so that type I error is the more serious error.
Example 1: Mr. Orangejuice goes to trial where Mr. Orangejuice is being tried for the murder of his exwife.
We can put it in a hypothesis testing framework. The hypotheses being tested are:
 Mr. Orangejuice is guilty
 Mr. Orangejuice is not guilty
Set up the null and alternative hypotheses where rejecting the null hypothesis when the null hypothesis is true results in the worst scenario:
H_{0} : Not Guilty
H_{a} : Guilty
Here we put Mr. Orangejuice is not guilty in H_{0} since we consider false rejection of H_{0} a more serious error than failing to reject H_{0}. That is, finding an innocent person guilty is worse than finding a guilty man innocent.
Type I error is committed if we reject H_{0} when it is true. In other words, when Mr. Orangejuice is not guilty but found guilty.
α = probability( Type I error)
Type II error is committed if we accept H_{0} when it is false. In other words, when Mr. Orangejuice is guilty but found not guilty.
β = probability( Type II error)
Relation between α, β
Note that the smaller we specify the significance level, α, the larger will be the probability, β of accepting a false null hypothesis.
Cautions About Significance Tests
 If a test fails to reject H_{o}, it does not necessarily mean that H_{o} is true – it just means we do not have compelling evidence to refute it. This is especially true for small sample sizes n. To grasp this, if you are familiar with the judicial system you will recall that when a judge/jury renders a decision the decision is "Not Guilty". They do not say "Innocent". This is because you are not necessarily innocent, just that you haven’t been proven guilty by the evidence, (i.e. statistics) presented!
 Our methods depend on a normal approximation. If the underlying distribution is not normal (e.g. heavily skewed, several outliers) and our sample size is not large enough to offset these problems (think of the Central Limit Theorem from Chapter 9) then our conclusions may be inaccurate.
Power of a Test
When the data indicate that one cannot reject the null hypothesis, does it mean that one can accept the null hypothesis? For example, when the pvalue computed from the data is 0.12, one fails to reject the null hypothesis at = 0.05. Can we say that the data support the null hypothesis?
Answer: When you perform hypothesis testing, you only set the size of Type I error and guard against it. Thus, we can only present the strength of evidence against the null hypothesis. One can sidestep the concern about Type II error if the conclusion never mentions that the null hypothesis is accepted. When the null hypothesis cannot be rejected, there are two possible cases: 1) one can accept the null hypothesis, 2) the sample size is not large enough to either accept or reject the null hypothesis. To make the distinction, one has to check . If at a likely value of the parameter is small, then one accepts the null hypothesis. If the is large, then one cannot accept the null hypothesis.
The relationship between and :
If the sample size is fixed, then decreasing will increase . If one wants both to decrease, then one has to increase the sample size.
Power = the probability of correctly rejecting a false null hypothesis = 1  .