Unit Summary 

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, chapter 6.2.
Sampling Distribution of the Differences Between the Two Sample Means for Independent Samples
The point estimate for  is:  .
In order to find a confidence interval for  and perform hypothesis testing, we need to find the sampling distribution of  .
We can show that when the sample sizes are large or the samples from each population are normal and the samples are taken independently, then  is normal with mean  and standard deviation .
In most cases, _{1} and _{2} are unknown and they have to be estimated. It seems natural to estimate _{1} by s_{1} and _{2} by s_{2}. However, when the sample sizes are small, the estimates may not be that accurate and one may get a better estimate for the common standard deviation by pooling the data from both populations if the standard deviations for the two populations are not that different.
In view of this, there are two options for estimating the variances for the 2sample ttest with independent samples:
 2sample ttest using pooled variances
 2sample ttest using separate variances
When to use which? When we are reasonably sure that the two populations have nearly equal variances, then we use the pooled variances test. Otherwise, we use the separate variances test.
2Sample tProcedures: Pooled Variances Versus NonPooled Variances
2Sample (Independent Samples) tProcedure Using Pooled Variances to Do Inferences for TwoPopulation Means (Standard Deviations are Assumed Equal)
When we have good reason to believe that the standard deviation for population 1 (also called sample) is about the same as that of population 2 (also called sample), we can estimate the common standard deviation by pooling information from samples from population 1 and population 2.
Let n_{1} be the sample size from population 1, s_{1} be the sample standard deviation of population 1.
Let n_{2} be the sample size from population 2, s_{2} be the sample standard deviation of population 2.
Then the common standard deviation can be estimated by the pooled standard deviation:
The test statistic is:
with degrees of freedom equal to df = n_{1} + n_{2}  2.
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the following table.
New machine Old machine 42.1 41.3 42.4 43.2 41.8 42.7 43.8 42.5 43.1 44.0 41.0 41.8 42.8 42.3 42.7 43.6 43.3 43.5 41.7 44.1 = 42.14, s_{1} = 0.683 = 43.23, s_{2} = 0.750Do the data provide sufficient evidence to conclude that, on the average, the new machine packs faster? Perform the required hypothesis test at the 5% level of significance.
It is given that:
= 42.14, s_{1} = 0.683
= 43.23, s_{2} = 0.750Assumption 1: Are these independent samples? Yes, since the samples from the two machines are not related.
Assumption 2: Are these large samples or a normal population? We have n_{1} < 30, n_{2} < 30. We do not have large enough samples and thus we need to check the normality assumption from both populations.
From the normality plots, we conclude that both populations may come from normal distributions.
Assumption 3: Do the populations have equal variance? Yes, since s_{1} and s_{2} are not that different. We can thus proceed with the pooled ttest. (They are not that different as s_{1}/s_{2}=0.683/0.750=0.91 is quite close to 1. We will discuss this in more details and quantify what is "close" in Lesson 11.)
Let denote the mean for the new machine and denote the mean for the old machine.
Step 1. H_{o}:  = 0, H_{a}:  < 0
Step 2. Significance level: = 0.05.
Step 3. Compute the tstatistic:
Step 4. Critical value:
Lefttailed test
Critical value =  =  t_{0.05}
Degrees of freedom = 10 + 10  2 = 18
 t_{0.05} = 1.734
Rejection region t < 1.734Step 5. Check to see if the value of the test statistic falls in the rejection region and decide whether to reject H_{o}.
t* = 3.40 < 1.734
Reject H_{o} at = 0.05Step 6. State the conclusion in words.
At 5% level of significance, the data provide sufficient evidence that the new machine packs faster than the old machine on average.
When one wants to estimate the difference between two population means from independent samples, then one will use a tinterval. If the sample variances are not very different, one can use the pooled 2sample tinterval.
Step 1. Find with df = n_{1} + n_{2}  2.
Step 2. The endpoints of the (1  ) 100% confidence interval for  is:
the degrees of freedom of t is n_{1} + n_{2}  2.
Continuing from the previous example, give a 99% confidence interval for the difference between the mean time it takes the new machine to pack ten cartons and the mean time it takes the present machine to pack ten cartons.
Step 1. = 0.01, = t_{0.005} = 2.878, where the degrees of freedom is 18.
Step 2.
The 99% confidence interval is (2.01, 0.17).
Interpret the above result:
We are 99% confident that  is between 2.01 and 0.17.
Using Minitab to perform a pooled tprocedure:
1. Stat > Basic Statistics > 2sample t. The following window will then be displayed.Note: When entering values into the Samples in different columns input boxes, Minitab always subtracts the Second value (column entered second) from the First value (column entered first).
2. Select the Assume equal variances checkbox.
The Minitab output for the packing time example is as follows:
Two sample T for new machine vs present machine
N Mean StDev SE Meannew mach 10 42.140 0.683 0.22present 10 43.230 0.750 0.2499% CI for mu new mach  mu present: (2.01, 0.17)
TTest mu new mach = mu present (Vs <): T = 3.40
P = 0.0016 DF = 18
Both use Pooled StDev = 0.717
What to do if some of the assumptions are not satisfied:
A. What should we do if the assumption of independent samples is violated?
If the samples are not independent but paired,we can use the paired ttest.
B. What should we do if the sample sizes are not large and the populations are not normal?
We can use a nonparametric method to compare two samples such as the MannWhitney procedure.
C. What should we do if the assumption of equal variances is violated?
We can use the separate variances 2sample ttest.
2Sample (Independent Samples) tProcedure Using Separate Variances to Do Inferences for TwoPopulation Means (Very Different Standard Deviations for the Two Samples)
Note: The formulas are provided in the following for your reference only. We can perform the separate variances test using Minitab.
with
(round down to nearest integer)where
Using Minitab to perform a separate variance 2sample tprocedure:
Stat > Basic Statistics > 2sample t
For some examples, one can use both the pooled tprocedure and the separate variances (nonpooled) tprocedure with the results close to each other. However, when the sample standard deviations are very different from each other and the sample sizes are different, the separate variances 2sample tprocedure is more reliable.
Independent random samples of 17 sophomores and 13 juniors attending a large university yield the following data on grade point averages:
Sophomores 
Juniors 

3.04 
2.92 
2.86 
2.56 
3.47 
2.65 
1.71 
3.60 
3.49 
2.77 
3.26 
3.00 
3.30 
2.28 
3.11 
2.70 
3.20 
3.39 
2.88 
2.82 
2.13 
3.00 
3.19 
2.58 
2.11 
3.03 
3.27 
2.98 

2.60 
3.13 
At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPAs of sophomores and juniors at the university differ?
Check assumption 1: Are these independent samples?
Yes, the students selected from the sophomores are not related to the students selected from juniors.
Check assumption 2: Is this a normal population or large samples?
Since we don't have large samples from both populations, we need to check the normal probability plots of the two samples:
Now, we need to determine whether to use the pooled ttest or the nonpooled (separate variances) ttest.
We use the following Minitab commands:
Stat > Basic Statistics > Display Descriptive Statistics
To find the summary statistics for the two samples:
Descriptive Statistics
Variable  N 
Mean 
Median 
TrMean 
StDev 
sophomor  17 
2.840 
2.920 
2.865 
0.520 
juniors  13 
2.9808 
3.0000 
2.9745 
0.3093 
Variable  Minimum 
Maximum 
Q1 
Q3 
sophomor  1.710 
3.600 
2.440 
3.200 
juniors  2.5600 
3.4700 
2.6750 
3.2300 
Note: The standard deviations are 0.520 and 0.3093 respectively; both the sample sizes and the standard deviations are quite different from each other.
We, therefore, decide to use a nonpooled ttest.
Step 1. Set up the hypotheses:
H_{o}:  = 0
H_{a}:  0
Step 2. Write down the significance level.
= 0.05
Step 3. Perform the 2sample ttest in Minitab with the appropriate alternative hypothesis.
Note: The default for the 2sample ttest in Minitab is the nonpooled one:
Two sample T for sophomores vs juniors
N Mean StDev SE Mean sophomor 17 2.840 0.520 0.13 juniors 13 2.981 0.309 0.086 95% CI for mu sophomor  mu juniors: ( 0.45, 0.173)
TTest mu sophomor = mu juniors (Vs no =): T = 0.92
P = 0.36 DF = 26
Step 4. Find the pvalue from the output.
pvalue = 0.36
Step 5. Draw the conclusion using the pvalue.
Since the pvalue is larger than = 0.05, we cannot reject the null hypothesis.
Step 6. State the conclusion in words.
At 5% level of significance, the data does not provide sufficient evidence that the mean GPAs of sophomores and juniors at the university are different.
Continuing with the previous example, give a 95% confidence interval for the difference between the mean GPA of Sophomores and the mean GPA of Juniors.
Using Minitab:
95% CI for mu sophomor  mu juniors is:
( 0.45, 0.173)
Interpreting the above result:
We are 95% confident that the difference between the mean GPA of sophomores and juniors is between 0.45 and 0.173.
Remember: When entering values into the Samples in different columns input boxes, Minitab always subtracts the Second value (column entered second) from the First value (column entered first).