TwoSample Hotelling's TSquare
This lesson is concerned with the two sample Hotelling's T2 test. This test is the multivariate analog of the two sample ttest in univariate statistics. These two sample tests, in both cases are used to compare two populations. Two populations may correspond to two different treatments within an experiment.
For example, in a completely randomized design with two treatments, are randomly assigned to the experimental units. Here, we would like to distinguish between the two treatments. Another situation occurs where the observations are taken from two distinct populations of a sample units. But in either case, there is no pairing of the observations as was in the case where the paired Hotelling's T2 was applied.
Example: Swiss Bank Notes
This is an example where we are sampling from two distinct population occurs with the Swiss Bank Notes.
Here, we have observations that are taken from two populations of 1,000 franc Swiss bank notes.
1. The first population is the population of Genuine Bank Notes
2. The second population is the population of Counterfeit Bank Notes
While the above is an example of a more recent version issue of the note, it depicts the different measurement locations taken in this study. For both population of bank notes the following measurements were taken:
 Length of the note
 Width of the LeftHand side of the note
 Width of the RightHand side of the note
 Width of the Bottom Margin
 Width of the Top Margin
 Diagonal Length of Printed Area
Objective: To determine if counterfeit notes can be distinguished from the genuine Swiss bank notes.
This is essential for the police if they wish to be able to use these kinds of measurements to determine if a bank notes are genuine or not and help solve counterfeiting crimes.
Before considering the multivariate case, we will first consider the univariate case.
Univariate Case
Suppose we have data from a single variable from two populations:
The data will be denoted in Population 1 as: X_{11},X_{12},...,X_{1n1}
The data will be denoted in Population 2 as: X_{21},X_{22},...,X_{2n2}
For both populations the first subscript will denote which population the note is from. The second subscript will denote which observation we are looking at from each population.
Here we will make the standard assumptions:
 The data from population i is sampled from a population with mean μ_{i}. This assumption simply means that there are no subpopulations to note.
 Homoskedasticity: The data from both populations have common variance σ^{2}
 Independence: The subjects from both populations are independently sampled.
 Normality: The data from both populations are normally distributed.
Here we are going to consider testing, , that both populations have have equal population means, against the alternative hypothesis that the means are not equal.
We shall define the sample means for each population using the following expression:
We will let s2i denote the sample variance for the ith population, again calculating this using the usual formula below:
Assuming homoskedasticity, both of these sample variances, s21 and s22, are both estimates of the common variance σ2. A better estimate can be obtained, however, by pooling these two different estimates. This calculating will be done by calculating the pool variance given by the expression below:
Here each sample variance is given a weight equal to the sample size less 1 so that the pool variance is simply equal to n1  1 times the variance of the first population plus n2  1 times the variance of the second population, divided by the total sample size minus 2.
Our test statistic is the students' tstatistic which is calculated by dividing the difference in the sample means by the standard error of that difference. Here the standard error of that difference is given by pooled variance times the sum of the inverses of the sample sizes as shown below:
under the null hypothesis, H_{o }of the equality of the population means, this test statistic will be tdistributed with n1 + n2  2 degrees of freedom.
We will reject H_{o} at level α if the absolute value of this test statistic exceeds the critical value from the ttable with n1 + n2  2 degrees of freedom evaluated at α over 2..
All of this should be familiar to you from your introductory statistics course.
Next, let's consider the multivariate case...
Multivariate Case
In this case we are replacing the random variables Xij, for the jth sample for the ith population, with random vectors Xij, for the jth sample for the ith population. These vectors contain the observations from the p variable.
In our notation, we will have our two populations:
The data will be denoted in Population 1 as: X_{11},X_{12},...,X_{1n1}
The data will be denoted in Population 2 as: X_{21},X_{22},...,X_{2n2}
Here the vector Xij represents all of the data for all of the variables for sample unit j, for population i.
This vector contains elements Xijk where k runs from 1 to p, for p different variables that we are observing. So, Xijk is the observation for variable k of subject j from population i.
The assumptions here will be analogous to the assumptions in the univariate setting.
Assumptions:
 The data from population i is sampled from a population with mean vector μ_{i}. Again, corresponding to the assumption that there are no subpopulations.
 Instead of assuming homoskedsticity, we now see that the data from both populations have common variancecovariance matrix Σ, instead of simply saying that they have a common variance.
 Independence. The subjects from both populations are independently sampled.
 Normality. Both populations are normally distributed.
Consider testing the null hypothesis that the two populations have identical population mean vectors. This is represented below as well as the general alternative that the mean vectors are not equal.
So here what we are testing is:
Or, in other words...
For the null hypothesis, that is fine only if the population means are identical for all of the variables.
The alternative is that at least one pair of these means is different. This is expressed below:
This could be different for only one or it could be all of them.
Here we will carry out the test. We will define the sample mean vectors, calculated the same way as before, using data only from the ith population.
Similarly, using data only from the ith population, we will define the sample variancecovariance matrices:
Under out assumption of homogeneous variancecovariance matrices, both S1 and S2 are estimators for the common variancecovariance matrix Sigma. A better estimate can be obtained by pooling the two estimates using the expression below:
Again, the sample variancecovariance matrix is weighted by the sample sizes plus 1.
Twosample Hotelling's TSquare
Now we are ready to define the Twosample Hotelling's TSquare test statistic. As in the expression below you will note that this is all computation of differences in the sample mean vectors. It also involves a calculation of the pooled variancecovariance matrix multiplied by the sum of the inverses of the sample size. The resulting matrix is then inverted.
For large samples, this test statistic will be approximately chisquare distributed with p degrees of freedom. However, as before this approximation does not take into account the variation due to estimating the variancecovariance matrix. So, as before, we will look at transforming this Hotelling's T2 statistic into an Fstatistic using the following expression. Note that this is a function of the sample sizes of the two populations and the number of variables measured, p.
Under the null hypothesis, H_{o} : μ_{1} = μ_{2} this Fstatistic will be Fdistributed with p and n1 + n2  p degrees of freedom. We would reject H_{o} at level α if it exceeds the critical value from the Ftable evaluated at α.
Example: Swiss Bank Notes
The two sample Hotelling's T2 test can be carried out using the Swiss Bank Notes data using the SAS program swiss10.sas.
SAS Program Discussion  swiss10.sas

The output is given in swiss10.lst.
SAS Output Discussion  swiss10.lst The top of the page you see that N1 is equal to 100 indicating that we have 100 bank notes in the first sample. In this case 100 real or genuine notes. This is then followed by the sample mean vector for this population of notes. The sample mean vectors is copied into the table below:
Below this appears elements of the sample variancecovariance matrix for this first sample of notes, that is for the real or genuine notes. Those numbers are also copied into matrix that appears below: Sample variancecovariance matrix for genuine notes: The next item listed on the output we have the information for the second sample. First appears the sample mean vector for the second population of back notes. These results are copied into the table above. The sample variancecovariance for the second sample of notes, the countefeit note, is given under S2 in the output and also copied below: Sample variancecovariance matrix for the counterfeit notes: This is followed by the pooled variancecovariance matrix for the two sample of notes. For example, the pooled variance for the length is about 0.137. The pooled variance for the lefthand width is about 0.099. The pooled covariance between length and lefthand width is 0.045. You should be able to see where all of these numbers appear in the output. These results are also copied from the output and placed below: The twosample Hotelling's T2 statistic is 2412.45. The Fvalue is about 391.92 at 6 and 193 degrees of freedom. The 6 is equal to the number of variables being measured and the 193 comes from the fact that we have 100 sample for a total of 200. We subtract the number of variables and get 194, and then subtract 1 more to get 193. In this case the pvalue is close to 0, here we will write this as 0.0001. 
In this case we can reject the null hypothesis that the mean vector for the counterfeit notes equals the mean vector for the genuine notes giving the evidence as usual: (T^{2} = 2412. 45; F = 391. 92; d. f. = 6, 193; p < 0.0001)
Conclusion:
Our conclusion here is that the counterfeit notes can be distinguished from the genuine notes.
After concluding that the counterfeit notes can be distinguished from the genuine notes the next step in our analysis is to determine which variables they are different???
All we can conclude at this point is that two types of notes differ on at least one variable. It could be just one variable, or a combination of more than one variable. Or, potentially, all of the variables.
To assess which variable these notes differ on we will consider the (1  α) × 100% Confidence Ellipse for the difference in the population mean vectors for the two population of bank notes, μ_{1}  μ_{2. This ellipse is given by the expression below}:
On the left hand of the expression we have a function of the differences between the sample mean vectors for the two populations and the pooled variancecovariance matrix Sp, as well as the two sample sizes n1 and n2. On the right hand side of the equation we have a function of the number of variables, p, the sample sizes n1 and n2, and the critical value from the Ftable with p and n1 + n2  p  1 degrees of freedom, evaluated at α.
To understand the geometry of this ellipse, let λ1 through λp, below:
denote the eigenvalues of the pooled variancecovariance matrix Sp, and let
e_{1}, e_{2},..., e_{p}
denote the corresponding eigenvectors. Then the k^{th} axis of this p dimensional ellipse points in the direction specified by the kth eigenvector, e_{k.}
And, it has a halflength given by the expression below:
Note, again, that this is a function of the number of variables, p, the sample sizes n1 and n2, and the the critical value from the Ftable.
The (1  α) × 100% confidence ellipse yields simultaneous (1  α) × 100% confidence intervals for all linear combinations of the form given in the expression below:
So, these are all linear combinations of the differences in the sample means between the two populations variable where we are taking linear combinations to cross variables. This is expressed in three terms in long hand from the far left hand side of this expression above, simplified using a summation sign in the middle term, and finally in vector notation in the final term to the right.
These confidence intervals are given by the expression below:
This involves the same linear combinations of the differences in the sample mean vectors between the two populations, plus or minus the first radical term which contains the function of the sample sizes and the number variables times the critical value from the Ftable. The second radical term is the standard error of this particular linear combination.
Here, the terms denote the pooled covariances between variables k and l.
Interpretation: We with the single sample Hotelling's T2 test the interpretation of these confidence intervals are the same. Here, we are (1  α) × 100% confident that all of these intervals cover their respective linear combinations of the differences between the means of the two populations. In particular, we are also (1  α) × 100% confident that all of the intervals of the individual variables also cover their respective differences between the population means. For the individual variables we are looking at, say, the kth individual variable we are looking at the difference between the sample means for that variable, k, plus or minus the same radical term that we had in the expression previously, times the standard error of that difference between the sample means for the kth variable which involves the inverses of the sample sizes and the pooled variance for variable k.
So, here, is the pooled variance for variable k. These intervals are called simultaneous confidence intervals.
Let's work through an example of their calculation using the Swiss Bank Notes data.
Example: Swiss Bank Notes
An example of the calculation of simultaneous confidence intervals using the Swiss Bank Notes data is given in the expression below:
Here we note that the sample sizes are both equal to 100, n = n_{1} = n_{1} = 100, so there is going to be some sort of location??? of our formula as shown by the first term in the calculation above.
Let's look at our results of the differences between the genuine notes, the lengths of the genuine notes minus the lengths of the counterfeit notes.
The sample mean for the length of the genuine notes was 214.969. The sample mean for the length of the countefeit notes was 214.823. We add and subtract, the radical term. Here p is equal to 6 and n is equal to 100 for each set of bank notes. The critical value from the Ftable, with in this case, 6 and 193 degrees of freedom was 2.14. The standard error of the difference in these sample means is given by the second radical where we have 2 times the pooled variance for the length, which was 0.137, looking at the variance covariance matrix, and n again is 100.
Carrying out the math we end up with an interval that runs from 0.044 to 0.336 as shown above.
The SAS program swiss11.sas can be used to compute the simultaneous confidence intervals for the 6 variables.
SAS Program Discussion  swiss11.sas

The results are listed in swiss11.lst.
SAS Output Discussion  swiss11.lst The results are given in columns for losim and upsim. Those entries are copied into the table below:
You need to be careful where they appear in the table in the output. Note that the variables are now sorted in alphabetic order. For example, length would be the fourth line of the output data. The interval for length can then be seen to be 0.044 to 0.336 as was obtained from the hand calculations previously. In any case you should be able to find the numbers for the lower and upper bound of the simultaneous confidence intervals from the SAS output and see where they appear in the table above. When interpreting these intervals we need to see which intervals include 0, which ones fall entirely below 0, and which ones fall entirely above 0. The first thing that we notice is that interval for length includes 0. This suggests that we can not distinguish between the lengths of the counterfeit and genuine bank notes. The intervals for both width measurements fall below 0. 
Since these intervals are being calculated by taking the genuine notes minus the counterfeit notes this would suggest that the counterfeit notes are larger on these variables and we can conclude that the left and right margins of the counterfeit notes are wider than the genuine notes.
Similarly we can conclude that the top and bottom margins of the counterfeit are also too large. Note, however, that the interval for the diagonal measurements fall entirely above 0. This suggests that the diagonal measurements of the counterfeit notes are smaller than that of the genuine notes.
Conclusions:
 Counterfeit notes are too wide on both the left and right margins.
 The top and bottom margins of the counterfeit notes are too large.
 The diagonal measurement of the counterfeit notes is smaller than that of the genuine notes.
 Cannot distinguish between the lengths of the counterfeit and genuine bank notes.
Profile Plots
Simultaneous confidence intervals may be plotted using swiss12.sas.
SAS Program Discussion  swiss12.sas

The results are shown in the plot of simultaneous confidence intervals and from this plot it is easy to see how the two sets of bank notes differ. It is immediately obvious that the intervals for length includes 0, and that the intervals for bottom, left, right and top are all below 0, while the interval for diagonal measurements falls above 0.
Since we are taking the genuine minus the counterfeit notes, this would suggest that both the left and right margins of the counterfeit notes are larger than those of the genuine notes. The bottom and top margins are also larger for the counterfeit notes than they are for the genuine notes. Finally, the diagonal measurements are smaller for the counterfeit notes than for the genuine notes.
As in the onesample case, the simlutaneous confindence intervals should be computed only when we are centrally interested in linear combinations of the variables. If the only thing that is of interest, however, is the confidence intervals for the individual variables with no linear combinations, then a better approach is to calculate the Bonferroni corrected confidence intervals which are given in the expression below:
Bonferroni Corrected (1  α) x 100% Confidence Intervals
These, again, involve the difference in the population means for each of the variables, plus or minus the critical value from the ttable times the standard error of the difference in these sample means. The ttable is evaluated from a tdistribution with n1+n22 degrees of freedom, evaluated at α divided by 2p where p is the number of variables to be analyzed. The radical term gives the standard error of the difference in the sample mean and involves the pooled variance for the kth variable times the sum of the inverses of the sample sizes.
For length of the bank notes that expression can be simplified to the expression that follows since the sample sizes are identical. The average length of the genunine notes was 214.959 from which we subtract the average length from the counterfeit notes, 214.823. As for the ttable, we will be looking it up for 2 times the sample size minus 2 or 198 degrees of freedom at the critical value for 0.05 divided by 2 times the number of variables, 6, or 0.05/12. The critical value turns out to be about 2.665. The standard error is obtained by taking 2 times the pooled variance, 0.137 divided by 100. Carrying out the math we wnd up with an interval that runs from 0.006 to 0.286 as shown below:
These calculations can also be obtained from the SAS program swiss11.sas.
SAS Program Discussion  swiss11.sas

The output as given in swiss11.lst, in the columns for lobon and upbon.
SAS Output Discussion  swiss11.lst Again, make sure you note that the variables are given in alphabetical order rather than in the original order of the data. In any case, you should be able to see where the numbers appearing in the SAS output appear in the table below: In summary, we have:
The intervals are interpreted in a way similar as before. Here we can see that: 
Conclusions:
 Length: Genuine notes are longer than counterfeit notes.
 Leftwidth and Rightwidth: Counterfeit notes are too wide on both the left and right margins.
 Top and Bottom margins: Counterfeit notes are too large.
 Diagonal measurement: The counterfeit notes is smaller than that of the genuine notes.
Profile Plots
Profile plots provide another useful graphical summary of the data. These are only meaningful if all variables have the same units of measurement. They are not meaningful if the the variables units measurement. For example, some variables are measured in grams while other variables are measured in centimeters. In this case profile plots should not be constructed.
 In the traditional profile plot, the samples means for each group are plotted against the variables.
Plot 1 shows the profile plot for the swiss bank notes data. In this plot we can see that we have the variables listed on the xaxis and the means for each of the variables is given on the yaxis. The variable means for the fake notes are given by the circle, while the variable means for the real notes are given by the squares. These data points are then connected by straight line segments.
This plot was obtained by the swiss13a.sas.
SAS Program Discussion  swiss13a.sas

Returning to our plot, we note that the two population of the bank notes are plotted right on top of one another, so this plot is not particularly useful in this particular example. This is not very informative for the Swiss bank notes, since the variation among the variable means far exceeds the variation between the group means within a given variable,
 A better plot is obtained by subtracting off the government specifications before carrying out the analyses.
This plot can be obtained by the swiss13b.sas.
SAS Program Discussion  swiss13b.sas

The results can be found in Plot 2. From this plot we can see that the bottom and top margins of the counterfeit notes are larger than the corresponding mean for the genuine notes. Likewise, the left and right margins are also wider for the counterfeit notes than the genuine notes. However, the diagonal and length measurement for the counterfeit notes appear to be smaller than the genuine notes. Please note, however, this plot does not show which results are significant. Significance would be obtained from the previous simultaneous or Bonferroni confidence intervals.
One of the things that we look for in these plots is to see if the line segments joining the dots are parrallel to one another. In this case, they do not appear to be even close to being parrellel for any pair of variables.
Profile Analysis
Profile Analysis is used to test the null hypothesis that these line segments are indeed parrallel. If the variables have the same units of measurement, it may be appropriate to test this hypothesis that the line segments in the profile plot are parallel to one another. They might be expected to be parrellel in the case where all of the measurements for the counterfeit notes were consistently some constant larger than the measurements for the genuine notes.
To test this null hypothesis we use the following procedure:
Step 1: For each group, we create a new random vector Yij corresponding to the jth observation from population i. The elements in this vector are the differences between the values of the successive variables as shown below:
Step 2: Apply the twosample Hotelling's Tsquare to the data Yij to test the null hypothesis that the means of the Yij's for population 1 and the same as the means of the Yij's for population 2. In shorthand this reads as follows:
Testing for Parrallel Profiles
To test for parrallel profiles may be carried out using the SAS program swiss14.sas.
SAS Program Discussion  swiss14.sas

The results, (swiss14.lst), yield the Hotelling T2 of 2356.38 with a corresponding Fvalue of 461.76. Since there are 5 differences we will have 5 for the numerator degrees of freedom and the denominator degrees of freedom is equal to the total number of observations of 200, (100 of each type), minus 5  1, or 194. The pvalue is very close to 0 indicating that we can reject the null hypothesis.
Conclusion: We reject the null hypothesis of parallel profiles between genuine and counterfeit notes (T^{2} = 2356. 38; F = 461. 76; d. f = 5, 194; p < 0. 0001).
Model Assumptions and Diagnostics Assumptions:
In carrying out any statistical analysis it is always important to consider the assumptions under which that analysis was carried out. Also, to assess where those assumptions may be satisfied for this data.
Let's recall the four assumptions underlying the Hotelling's T2 test.
 The data from population i is sampled from a population with mean vector μ_{i}.
 The data from both populations have common variancecovariance matrix Σ
 Independence. The subjects from both populations are independently sampled. (Note that this does not mean that the variables are independent of one another.)
 Normality. Both populations are multivariate normally distributed.
The following will consider each of these assumptions for diagnosing their validity.
Assumption 1: The data from population i is sampled from a population mean vector μ_{i}.
 This assumption essentially means that there are no subpopulations with different population mean vectors.
 In our current example, this might be violated if the counterfeit notes were produced by more than one counterfeiter.
 Generally, if you have randomized experiments, this assumption is not of any concern. However, in the current application we would have to ask the police investigators whether more than one counterfeiter might be present.
Assumption 2: For now we will skip Assumption 2 and return to it at a later time.
Assumption 3: Independence
 Says the subjects for each population were independently sampled.
 This assumption may be violated for three different reasons:
 Clustered data: I can't think of any reason why we would encounter clustered data in the bank notes except the possibility that the bank notes might be produced in batches, and that the notes sampled were in a batch and are correlated with one another.
 Timeseries data: If the notes are produced in some order over time, that there might possibly some temporal correlation between notes produced over time. The notes produced at times close to one another may be more similar. This could result in temporal correlation violating the assumptions of the analysis.
 Spatial data: If the data are collected over space, in this case we may encounter some spatial correlation.
 Note: the results of Hotelling's Tsquare are not generally robust to violations of independence. What I mean by this is that the results of Hotelling's T2 will tend to be sensitive to violations of this assumption. What happens with dependence is that the results of for some observations are going to be predictable from the results of other observations, usually adjacent observations. This predictability results in some redundancy in the data, reducing the effective sample size of the study. This redundancy, in a sense, means that we may not have as much data as we think we have. Results are going to be that we will tend to reject the null hypothesis more often than we should if this assumption is violated.
Assumption 4: Multivariate Normality
To assess this assumption we can produce employ the following diagnostic procedures:
 Produce histograms for each variable. What we should look for is if the variables show a symmetric distribution.
 Produce scatter plots for each pair of variables. Under multivariate normality, we should see an elliptical cloud of points.
 Produce a threedimensional rotating scatter plot. Again, we should see an elliptical cloud of points.
Note that the Central Limit Theorem implies that the sample mean vectors are going to be approximately multivariate normally distributed regardless of the distribution of the original variables.
So, in general Hotelling's Tsquare is not going to be sensitive to violations of this assumption.
Now let us return to assumption 2.
Assumption 2. The data from both populations have common variancecovariance matrix Σ.
This assumption may be assessed using Barlett's Test.
Bartlett's Test
Suppose that the data from population i have variancecovariance matrix Σi; for population i = 1, 2. What we wish to do is to test the null hypothesis that Σ1 is equal to Σ2 against the general alternative that they are not equal as shown below:
Here, the alternative is that the variancecovariance matrices differ in at least one of their elements.
The test statistic for Bartlett's Test is given by L prime as shown below:
This involves a finite population correction factor c, which is given below. Inside the brackets above, we see it involves also the determinants of the sample variancecovariance matrices for the individual populations as well as the pooled sample variance for the variance matrix. So, what we have in the brackets is the total number of observations minus 2 times the log of the determinant of the pooled variancecovariance matrix, minus n11 times the log of the determinant of the sample variancecovariance matrix for the first population, minus n21 times the log of the determinant of the sample variancecovariance matrix for the second population. (Note that is this formula, the logs are all the natural logs.)
The finite population correction factor, c, is given below:
It is a function of the number of variables p, and the sample sizes n1 and n2.
Under the null hypothesis, H_{o} : Σ_{1} = Sigma;_{2} , the Bartlett's test statistic is going to be approximately chisquare distributed with p(p+1) divided by 2 degrees of freedom.
We will reject H_{o} at level α if the test statistic exceeds the critical value from the chisquare table evaluated at level α.
Bartlett's Test may be carried out using the SAS program swiss15.sas.
SAS Program Discussion  swiss15.sas

The output for swiss15.lst on the first page just gives information.
SAS Output Discussion  swiss15.lst On the top we can see that we have 200 observations on 6 variables and we have two classes for populations. DF total or total degrees of freedom is the total number of observations minus 1, 199. The DF within classes is basically the total sample size minus 2, in this case 198. The class level information is not particualrly useful at this time, but it does tell us that we have 100 observations on each type of note. Within Covariance Matrix observations gives us basically the size of the two variancecovariance matrices, which in this case are 6 by 6 matrices corresponding to the 6 variables in our analyses. It also gives the natural log of the determinant of the variancecovariance matrices. For the fake notes the natural log of the determinant of the covariance matrix is 10.79, for the real notes the natural log of the determinant of the covariance matrix is 11.2, and for the pooled the natural log of the determinant of the covariance matrix is 10.3. Under the null hypothesis that the variancecovariance matrices for the two populations natural logs of the determinants, and the variancecovariance matrixes should be approximately the same for the fake and the real notes. The results of Bartlett's Test are on bottom of page two of the output. Here we get a test statistic of 121.90 with 21 degrees of freedom, the 21 coming from the 6 variables. The pvalue for the test is less than 0.0001 indicating that we reject the null hypothesis. 
The conclusion here is that the two populations of bank notes have different variancecovariance matrices in at least one of their elements. This is backed up by the evidence given by the test statistic (L' = 121. 899; d. f. = 21; p < 0. 0001). Therefore, the assumption of homogeneous variancecovariance matrices is violated.
Notes:
 One should be aware, even though Hotelling's T2 test is robust to violations of assumptions of multivariate normality, the results of Bartlett's test are not robust to the violations of this assumption. The Bartlett's Test should not be used if there is any indication that the data are not multivariate normally distributed.
 In general, the twosample Hotelling's Tsquare test is sensitive to violations of the assumption of homogeneity of variancecovariance matrices, this is especially the case when the sample sizes are unequal, i.e., n_{1} ≠ n_{2}. If the sample sizes are equal then there doesn't tend to be all that much sensitivity and the ordinary twosample Hotelling's Tsquare test can be used as usual.
Testing for Equality of Mean Vectors when Σ_{1} ≠ Sigma;_{2}
The following considers test for equality of the population mean vectors under violations of the assumption homogenity of variancecovariance matrices.
Here we will consider the modified Hotelling's T2 test statistic given in the expression below:
Again, this is a function of the differences between the sample means for the two populations. But instead of being a function of the pooled variancecovariance matrix we can see that the modified test statistic is written as a function of the sample variancecovariance matrix, S1, for the first population and the sample variancecovariance matrix, S2, for the second population. It is also a function of the sample sizes n1 and n2.
For large samples, that is if both samples are large, T^{2} is approximately chisquare distributed with p d.f. We will reject H_{o} : μ_{1} = μ_{2} at level α if T^{2} exceeds the critical value from the chisquare table with p d.f. evaluated at level α.
For small samples, we can calculate an F transformation as before using the formula below.
This formula is a function of sample sizes n1 and n2, and the number of variables p. Under the null hypothesis this will be Fdistributed with p and approximately ν degrees of freedom, where 1 divided by ν is given by the formula below:
This is involves summing over the two samples of bank notes, a function of the number of observations of each sample, the difference in the sample mean vectors, the sample variancecovariance matrix for each of the individual samples, as well as a new matrix ST which is given by the expression below:
We will reject H_{o} : μ_{1} = μ_{2} at level α if the Fvalue exceeds the critical value from the Ftable with p and ν degrees of freedom evaluated at level α.
A reference for this particular test is given in: Seber, G.A.F. 1984. Multivariate Observations. Wiley, New York.
This modified Hotelling's T2 test can be carried out on the Swiss Bank Notes data using the SAS program swiss16.sas.
SAS Program Discussion  swiss16.sas

The output is given in the file (swiss16.lst).
SAS Output Discussion  swiss16.lst As before, we are given the sample sizes for each population, the sample mean vector for each population, followed by the sample variancecovariance matrix for each population. In the large sample approximation we find that T2 is 2412.45 with 6 degrees of freedom, since we have 6 variables and a pvalue that is close to 0. Note that this value for the Hotelling's T2 is identical to the value that we obtained for our unmodified test. This will always be the case if the sample sizes are equal to one another. 
 Since n_{1} = n_{2}, the modified values for T^{2} and F are identical to the original unmodified values obtained under the assumption of homogeneous variancecovariance matrices.
 Using the largesample approximation, our conclusions are the same as before. We find that mean dimensions of the counterfeit notes do not match the mean dimensions of the genuine Swiss bank notes. (T^{2} = 2412. 45; d. f. = 6; p < 0. 0001).
 Under the smallsample approximation, we also find that mean dimensions of the counterfeit notes do not match the mean dimensions of the genuine Swiss bank notes. (F = 391. 92; d. f. = 6, 193; p < 0. 0001).
Simultaneous (1  α) x 100% Confidence Intervals
As before, the next step is to determine how these notes differ. This may be carried out using the simultaneous (1  α) × 100% confidence intervals.
For Large Samples: simultaneous (1  α) × 100% confidence intervals may be calculated using the expression below:
This involves the differences in the sample means for the kth variable, plus or minus the square root of the critical value from the chisquare table times the sum of the sample variances divided by their respective sample sizes.
For Small Samples: it is better use the expression below:
Basically the chisquare value and the square root is replaced by the critical value from the Ftable, times a function of the number variables p, and the sample sizes n1 and n2.
Example: Swiss Bank Notes
An example of the large approximation for length is given by the hand calculation in the expression below:
Here the sample mean for the length for the genuine notes was 214.969. We will subtract the sample mean for the length of the counterfeit notes of 214.823. The critical value for a chisquare distribution with 6 degrees of freedom evaluated at 0.05 is 12.59. The sample variance for the first population of genuine note is 0.15024 which we will divide by a sample size of 100. The sample variance for the second population of counterfeit note is 0.15024 which will also divide by its sample size of 100. This yields the confidence interval that runs from 0.04 through 0.332.
The results of these calculations for each of the variables are summarized in the table below. Basically, they give us results that are comparable to the results we obtained earlier under the assumption of homgenity for variancecovariance matrices.
Variable

95% Confidence Interval

Length 
0.040, 0.332

Left Width 
0.515, 0.199

Right Width 
0.638, 0.308

Bottom Margin 
2.687, 1.763

Top Margin 
1.287, 0.643

Diagonal 
1.813, 2.321
