Factor Analysis

Estimation of Factor Scores

These factor scores are similar to the principal components in the previous lesson. One of the things that we did in principal components is to plot the principal components against each other in scatter plots. A similar thing can be done here with the factor scores. We also might take the factor scores and use them as explanatory variables in future analyses. Or, perhaps use them as the dependent variable in future analyses.

The methods for estimating these scores depend a lot on the method that was used to carry out the principal components analysis. What we are after here is trying to find out the vectors of common factors f . The notion remains that there are m unobserved factors that underlay our model. What we would like to be able to do is estimate those factors. We don't see them but we want to estimate them and if we have a good model they can be estimated.

Therefore, given the factor model:

we may wish to estimate the vectors of factor scores

for each observation.

Methods

There are a number of different methods that can be used for estimating factor scores from the data. These include:

Ordinary Least Squares

By default, this is the method that SAS uses if you use the principal component method of analysis. Unfortunately, SAS is a little bit vague about what it is doing here. Usually SAS will give you plenty of detail about how results are derived, but on this one it seems to be very vague.

Basically, we have our model and we look at the difference between the jth variable on the ith subject and its value under the factor model. The L's are factor loadings and the f are our unobserved common factors. The following is performed done subject by subject.

So here, we wish to find the vector of common factors for subject i, or , by minimizing the sum of the squared residuals:

This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:

In practice, we substitute in our estimated factor loadings into this expression as well as the sample mean for the data:

Using the principal component method with the unrotated factor loadings, this yields:

e1 through em are our first m eigenvectors.

Weighted Least Squares (Bartlett)

This alternative is similar to the Ordinary Least Squares method. The only real difference is that we are going to divide by the specific variances when we are taking the squared residual as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances. Variables that have low specific variances are those variables for which the factor model fits the data best. We posit that those variables that have low specific variances give us more information regarding the true values for the specific factors.

Therefore, for the factor model:

we want to find that minimizes

The solution is can be given by this expression where Ψ is the diagonal matrix whose diagonal elements are equal to the specific variances:

and can be estimated by substituting in the following:

Regression Method

This method is used when you are calculating maximum likelihood estimates of factor loadings. What it involves is looking at a vector that includes the observed data, supplemented by the vector of factor loadings for the ith subject.

Joint distribution of the data Yi and the factor fi is

Using this we can calculate the conditional expectation of the common factor score fi given the data Yi as expressed here:

This suggests the estimator by substituting in the estimates for the L and Ψ:

There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.

© 2004 The Pennsylvania State University. All rights reserved.