From this table we can see that most items have some correlation with each other ranging from \(r=-0.382\) for Items 3 and 7 to \(r=.514\) for Items 6 and 7. Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer (latent) variables. These interrelationships can be broken up into multiple components
Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean. Factor analysis assumes that variance can be partitioned into two types of variance, common and unique
The figure below shows how these concepts are related:
As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:
Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving simple structure  in order to improve interpretability.
There are two approaches to factor extraction which stems from different approaches to variance partitioning: a) principal components analysis and b) common factor analysis.
Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance (see figure below). Additionally, if the total variance is 1, then the common variance is equal to the communality.
The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items. Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later.
First go to Analyze – Dimension Reduction – Factor. Move all the observed variables over the Variables: box to be analyze.
Under Extraction – Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8. We also bumped up the Maximum Iterations of Convergence to 100.
The equivalent SPSS syntax is shown below:
Before we get into the SPSS output, let’s understand a few things about eigenvalues and eigenvectors.
Eigenvalues represent the total amount of variance that can be explained by a given principal component. Â They can be positive or negative in theory, but in practice they explain variance which is always positive.
Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.
Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings  which can be interpreted as the correlation of each item with the principal component. For this particular PCA of the SAQ-8, the  eigenvector associated with Item 1 on the first component is \(0.377\), and the eigenvalue of Item 1 is \(3.057\). We can calculate the first component as
$$(0.377)\sqrt{3.057}= 0.659.$$
In this case, we can say that the correlation of the first item with the first component is \(0.659\). Let’s now move on to the component matrix.
The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. For example, Item 1 is correlated \(0.659\) with the first component, \(0.136\) with the second component and \(-0.398\) with the third, and so on.
The square of each loading represents the proportion of variance (think of it as an \(R^2\) statistic) explained by a particular component. For Item 1, \((0.659)^2=0.434\) or \(43.4\%\) of its variance is explained by the first component. Subsequently, \((0.136)^2 = 0.018\) or \(1.8\%\) of the variance in Item 1 is explained by the second component. The total variance explained by both components is thus \(43.4\%+1.8\%=45.2\%\). If you keep going on adding the squared loadings cumulatively down the components, you find that it sums to 1 or 100%. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.
Component Matrix | ||||||||
Item | Component | |||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
1 | 0.659 | 0.136 | -0.398 | 0.160 | -0.064 | 0.568 | -0.177 | 0.068 |
2 | -0.300 | 0.866 | -0.025 | 0.092 | -0.290 | -0.170 | -0.193 | -0.001 |
3 | -0.653 | 0.409 | 0.081 | 0.064 | 0.410 | 0.254 | 0.378 | 0.142 |
4 | 0.720 | 0.119 | -0.192 | 0.064 | -0.288 | -0.089 | 0.563 | -0.137 |
5 | 0.650 | 0.096 | -0.215 | 0.460 | 0.443 | -0.326 | -0.092 | -0.010 |
6 | 0.572 | 0.185 | 0.675 | 0.031 | 0.107 | 0.176 | -0.058 | -0.369 |
7 | 0.718 | 0.044 | 0.453 | -0.006 | -0.090 | -0.051 | 0.025 | 0.516 |
8 | 0.568 | 0.267 | -0.221 | -0.694 | 0.258 | -0.084 | -0.043 | -0.012 |
Extraction Method: Principal Component Analysis. | ||||||||
a. 8 components extracted. |
Summing the squared component loadings across the components (columns) gives you the communality estimates for each item, and summing each squared loading down the items (rows) gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:
$$(0.659)^2 + Â (-.300)^2 – (-0.653)^2 + (0.720)^2 + (0.650)^2 + (0.572)^2 + (0.718)^2 + (0.568)^2 = 3.057$$
You will get eight eigenvalues for eight components, which leads us to the next table.
Total Variance Explained in the 8-component PCA
Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component. For example, Component 1 is \(3.057\), or \((3.057/8)\% = 38.21\%\) of the total variance. Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column.
Total Variance Explained | ||||||
Component | Initial Eigenvalues | Extraction Sums of Squared Loadings | ||||
Total | % of Variance | Cumulative % | Total | % of Variance | Cumulative % | |
1 | 3.057 | 38.206 | 38.206 | 3.057 | 38.206 | 38.206 |
2 | 1.067 | 13.336 | 51.543 | 1.067 | 13.336 | 51.543 |
3 | 0.958 | 11.980 | 63.523 | 0.958 | 11.980 | 63.523 |
4 | 0.736 | 9.205 | 72.728 | 0.736 | 9.205 | 72.728 |
5 | 0.622 | 7.770 | 80.498 | 0.622 | 7.770 | 80.498 |
6 | 0.571 | 7.135 | 87.632 | 0.571 | 7.135 | 87.632 |
7 | 0.543 | 6.788 | 94.420 | 0.543 | 6.788 | 94.420 |
8 | 0.446 | 5.580 | 100.000 | 0.446 | 5.580 | 100.000 |
Extraction Method: Principal Component Analysis. |
Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items. One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue (total variance explained) by the component number. Recall that we checked the Scree Plot option under Extraction – Display, so the scree plot should be produced automatically.
The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? If you look at Component 2, you will see an “elbow” joint. This is the marking point where it’s perhaps not too beneficial to continue further component extraction. There are some conflicting definitions of the interpretation of the scree plot but some say to take the number of components to the left of the the “elbow”. Following this criteria we would pick only one component. A more subjective interpretation of the scree plots suggests that any number of components between 1 and 4 would be plausible and further corroborative evidence would be helpful.
Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%. Picking the number of components is a bit of an art and requires input from the whole research team. Let’s suppose we talked to the principal investigator and she believes that the two component solution makes sense for the study, so we will proceed with the analysis.
Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2.
We will focus the differences in the output between the eight and two-component solution. Under Total Variance Explained, we see that the Initial Eigenvalues no longer equals the Extraction Sums of Squared Loadings. The main difference is that there are only two rows of eigenvalues, and the cumulative percent variance goes up to \(51.54\%\).
Total Variance Explained | ||||||
Component | Initial Eigenvalues | Extraction Sums of Squared Loadings | ||||
Total | % of Variance | Cumulative % | Total | % of Variance | Cumulative % | |
1 | 3.057 | 38.206 | 38.206 | 3.057 | 38.206 | 38.206 |
2 | 1.067 | 13.336 | 51.543 | 1.067 | 13.336 | 51.543 |
3 | 0.958 | 11.980 | 63.523 | |||
4 | 0.736 | 9.205 | 72.728 | |||
5 | 0.622 | 7.770 | 80.498 | |||
6 | 0.571 | 7.135 | 87.632 | |||
7 | 0.543 | 6.788 | 94.420 | |||
8 | 0.446 | 5.580 | 100.000 | |||
Extraction Method: Principal Component Analysis. |
Similarly, you will see that the Component Matrix has the same loadings as the eight-component solution but instead of eight columns it’s now two columns.
Component Matrix | ||
Item | Component | |
1 | 2 | |
1 | 0.659 | 0.136 |
2 | -0.300 | 0.866 |
3 | -0.653 | 0.409 |
4 | 0.720 | 0.119 |
5 | 0.650 | 0.096 |
6 | 0.572 | 0.185 |
7 | 0.718 | 0.044 |
8 | 0.568 | 0.267 |
Extraction Method: Principal Component Analysis. | ||
a. 2 components extracted. |
Again, we interpret Item 1 as having a correlation of 0.659 with Component 1. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest.
True or False
1.T, 2.F (sum of squared loadings), 3. T
The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities.
Communalities | ||
Initial | Extraction | |
1 | 1.000 | 0.453 |
2 | 1.000 | 0.840 |
3 | 1.000 | 0.594 |
4 | 1.000 | 0.532 |
5 | 1.000 | 0.431 |
6 | 1.000 | 0.361 |
7 | 1.000 | 0.517 |
8 | 1.000 | 0.394 |
Extraction Method: Principal Component Analysis. |
Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality (since this is the total variance across all 8 components), and then proceeds with the analysis until a final communality extracted. Notice that the Extraction column is smaller Initial column because we only extracted two components. As an exercise, let’s manually calculate the first communality from the Component Matrix. The first ordered pair is \((0.659,0.136)\) which represents the correlation of the first item with Component 1 and Component 2. Recall that squaring the loadings and summing down the components (columns) gives us the communality:
$$h^2_1 = (0.659)^2 + (0.136)^2 = 0.453$$
Going back to the Communalities table, if you sum down all 8 items (rows) of the Extraction column, you get \(4.123\). If you go back to the Total Variance Explained table and summed the first two eigenvalues you also get \(3.057+1.067=4.124\). Is that surprising? Basically it’s saying that the summing the communalities across all items is the same as summing the eigenvalues across all components.
1. In a PCA, when would the communality for the Initial column be equal to the Extraction column?
Answer : When you run an 8-component PCA.
1. F, the eigenvalue is the total communality across all items for a single component, 2. T, 3. T, 4. F (you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal).
The partitioning of variance differentiates a principal components analysis from what we call common factor analysis. Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate. In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.
To run a factor analysis, use the same steps as running a PCA (Analyze – Dimension Reduction – Factor) except under Method choose Principal axis factoring. Note that we continue to set Maximum Iterations for Convergence at 100 and we will see why later.
Pasting the syntax into the SPSS Syntax Editor we get:
Note the main difference is under /EXTRACTION we list PAF for Principal Axis Factoring instead of PC for Principal Components. We will get three tables of output, Communalities, Total Variance Explained and Factor Matrix. Let’s go over each of these and compare them to the PCA output.
Communalities | ||
Item | Initial | Extraction |
1 | 0.293 | 0.437 |
2 | 0.106 | 0.052 |
3 | 0.298 | 0.319 |
4 | 0.344 | 0.460 |
5 | 0.263 | 0.344 |
6 | 0.277 | 0.309 |
7 | 0.393 | 0.851 |
8 | 0.192 | 0.236 |
Extraction Method: Principal Axis Factoring. |
The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. What principal axis factoring does is instead of guessing 1 as the initial communality, it chooses the squared multiple correlation coefficient \(R^2\). To see this in action for Item 1Â run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze – Regression – Linear and enter q01 under Dependent and q02 to q08 under Independent(s).
Pasting the syntax into the Syntax Editor gives us:
The output we obtain from this analysis is
Model Summary | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | .541 | 0.293 | 0.291 | 0.697 |
Note that 0.293 (highlighted in red) matches the initial communality estimate for Item 1. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA, factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.00. This represents the total common variance shared among all items for a two factor solution.
The next table we will look at is Total Variance Explained. Comparing this to the table from the PCA we notice that the Initial Eigenvalues are exactly the same and includes 8 rows for each “factor”. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains 31.38% of the variance whereas Factor 2 explains 6.24% of the variance. Just as in PCA the more factors you extract, the less variance explained by each successive factor.
Total Variance Explained | ||||||
Factor | Initial Eigenvalues | Extraction Sums of Squared Loadings | ||||
Total | % of Variance | Cumulative % | Total | % of Variance | Cumulative % | |
1 | 3.057 | 38.206 | 38.206 | 2.511 | 31.382 | 31.382 |
2 | 1.067 | 13.336 | 51.543 | 0.499 | 6.238 | 37.621 |
3 | 0.958 | 11.980 | 63.523 | |||
4 | 0.736 | 9.205 | 72.728 | |||
5 | 0.622 | 7.770 | 80.498 | |||
6 | 0.571 | 7.135 | 87.632 | |||
7 | 0.543 | 6.788 | 94.420 | |||
8 | 0.446 | 5.580 | 100.000 | |||
Extraction Method: Principal Axis Factoring. |
A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria (Analyze – Dimension Reduction – Factor – Extraction), it bases it off the Initial and not the Extraction solution. This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.067. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.
Answers: 1. When there is no unique variance (PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice), 2. F, it uses the initial PCA solution and the eigenvalues assume no unique variance.
Factor Matrix | ||
Item | Factor | |
1 | 2 | |
1 | 0.588 | -0.303 |
2 | -0.227 | 0.020 |
3 | -0.557 | 0.094 |
4 | 0.652 | -0.189 |
5 | 0.560 | -0.174 |
6 | 0.498 | 0.247 |
7 | 0.771 | 0.506 |
8 | 0.470 | -0.124 |
Extraction Method: Principal Axis Factoring. | ||
a. 2 factors extracted. 79 iterations required. |
First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. This is why in practice it’s always good to increase the maximum number of iterations. Now let’s get into the table itself. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor. Just as in PCA, squaring each loading and summing down the items (rows) gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. Let’s calculate this for Factor 1:
$$(0.588)^2 + Â (-0.227)^2 + (-0.557)^2 + (0.652)^2 + (0.560)^2 + (0.498)^2 + (0.771)^2 + (0.470)^2 = 2.51$$
This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors (columns) for each item. For example, for Item 1:
$$(0.588)^2 + Â (-0.303)^2 = 0.437$$
Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.
To see the relationships among the three tables let’s first start from the Factor Matrix (or Component Matrix in PCA). We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item. Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table. Going back to the Factor Matrix, if you square the loadings and sum down the items you get Sums of Squared Loadings (in PAF) or eigenvalues (in PCA) for each factor. These now become elements of the Total Variance Explained table. Summing down the rows (i.e., summing down the factors) under the Extraction column we get \(2.511 + 0.499 = 3.01\) or the total (common) variance explained. In words, this is the total (common) variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total (common) variance explained, in this case
$$ (0.437)^2 + (0.052)^2 + (0.319)^2 + (0.460)^2 + (0.344)^2 + (0.309)^2 + (0.851)^2 + (0.236)^2 = 3.01$$
which is the same result we obtained from the Total Variance Explained table. Here is a table that that may help clarify what we’ve talked about:
In summary:
True or False (the following assumes a two-factor Principal Axis Factor solution with 8 items)
Answers: 1. T, 2. F, the sum of the squared elements across both factors, 3. T, 4. T, 5. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA.
Since this is a non-technical introduction to factor analysis, we won’t go into detail about the differences between Principal Axis Factoring (PAF) and Maximum Likelihood (ML). The main concept to know is that ML also assumes a common factor analysis using the \(R^2\) to obtain initial estimates of the communalities, but uses a different iterative process to obtain the extraction solution. To run a factor analysis using maximum likelihood estimation under Analyze – Dimension Reduction – Factor – Extraction – Method choose Maximum Likelihood.
Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables (although Initial columns will overlap). The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit. Non-significant values suggest a good fitting model. Here the p -value is less than 0.05 so we reject the two-factor model.
Goodness-of-fit Test | ||
Chi-Square | df | Sig. |
198.617 | 13 | 0.000 |
In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors. The table shows the number of factors extracted (or attempted to extract) as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed. Additionally, NS means no solution and N/A means not applicable. In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative (which cannot happen). For the eight factor solution, it is not even applicable in SPSS because it will spew out a warning that “You cannot request as many factors as variables with any extraction method except PC. The number of factors will be reduced by one.” This means that if you try to extract an eight factor solution for the SAQ-8, it will default back to the 7 factor solution. Now that we understand the table, let’s see if we can find the threshold at which the absolute fit indicates a good fitting model. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose 4-5 factors. We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. Note that there is no “right” answer in picking the best factor model, only what makes sense for your theory. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.
Number of Factors | Chi-square | Df | -value | Iterations needed |
1 | 553.08 | 20 | <0.05 | 4 |
2 | 198.62 | 13 | < 0.05 | 39 |
3 | 13.81 | 7 | 0.055 | 57 |
4 | 1.386 | 2 | 0.5 | 168 |
5 | NS | -2 | NS | NS |
6 | NS | -5 | NS | NS |
7 | NS | -7 | NS | NS |
8 | N/A | N/A | N/A | N/A |
Answers: 1. T, 2. F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, you can extract as many components as items in PCA, but SPSS will only extract up to the total number of items minus 1, 5. F, greater than 0.05, 6. T, we are taking away degrees of freedom but extracting more factors.
As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance (i.e., no unique variance). For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components. However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.
The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:
Answers: 1. T, 2. F, the total variance for each item, 3. T, 4. F, communality is unique to each item (shared across components or factors), 5. T, 6. T.
After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.
The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure.Â
Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases. Suppose you wanted to know how well a set of items load on each factor; simple structure helps us to achieve this.
The definition of simple structure is that in a factor loading matrix:
For every pair of factors (columns),
The following table is an example of simple structure with three factors:
Item | Factor 1 | Factor 2 | Factor 3 |
1 | 0.8 | 0 | 0 |
2 | 0.8 | 0 | 0 |
3 | 0.8 | 0 | 0 |
4 | 0 | 0.8 | 0 |
5 | 0 | 0.8 | 0 |
6 | 0 | 0.8 | 0 |
7 | 0 | 0 | 0.8 |
8 | 0 | 0 | 0.8 |
Let’s go down the checklist to criteria to see why it satisfies simple structure:
An easier criteria from Pedhazur and Schemlkin (1991) states that
For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test.
Item | Factor 1 | Factor 2 | Factor 3 |
1 | 0.8 | 0 | 0.8 |
2 | 0.8 | 0 | 0.8 |
3 | 0.8 | 0 | 0 |
4 | 0.8 | 0 | 0 |
5 | 0 | 0.8 | 0.8 |
6 | 0 | 0.8 | 0.8 |
7 | 0 | 0.8 | 0.8 |
8 | 0 | 0.8 | 0 |
Solution: Using the conventional test, although Criteria 1 and 2 are satisfied (each row has at least one zero, each column has at least three zeroes), Criteria 3 fails because for Factors 2 and 3, only 3/8 rows have 0 on one factor and non-zero on the other. Additionally, for Factors 2 and 3, only Items 5 through 7 have non-zero loadings or 3/8 rows have non-zero coefficients (fails Criteria 4 and 5 simultaneously). Using the Pedhazur method, Items 1, 2, 5, 6, and 7 have high loadings on two factors (fails first criteria) and Factor 3 has high loadings on a majority or 5/8 items (fails second criteria).
We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability. Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.
The steps to running a two-factor Principal Axis Factoring is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Varimax. Make sure under Display to check Rotated Solution and Loading plot(s), and under Maximum Iterations for Convergence enter 100.
Pasting the syntax into the SPSS editor you obtain:
Let’s first talk about what tables are the same or different from running a PAF with no rotation. First, we know that the unrotated factor matrix (Factor Matrix table) should be the same. Additionally, since the common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution (Rotated Factor Matrix) as well as the transformation used to obtain the rotation (Factor Transformation Matrix). Finally, although the total variance explained by all factors stays the same, the total variance explained by each factor will be different.
Rotated Factor Matrix | ||
Factor | ||
1 | 2 | |
1 | 0.646 | 0.139 |
2 | -0.188 | -0.129 |
3 | -0.490 | -0.281 |
4 | 0.624 | 0.268 |
5 | 0.544 | 0.221 |
6 | 0.229 | 0.507 |
7 | 0.275 | 0.881 |
8 | 0.442 | 0.202 |
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. | ||
a. Rotation converged in 3 iterations. |
The Rotated Factor Matrix table tells us what the factor loadings look like after rotation (in this case Varimax). Kaiser normalization is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation. The only drawback is if the communality is low for a particular item, Kaiser normalization will weight these items equally with items with high communality. As such, Kaiser normalization is preferred when communalities are high across all items. You can turn off Kaiser normalization by specifying
Here is what the Varimax rotated loadings look like without Kaiser normalization. Compared to the rotated factor matrix with Kaiser normalization the patterns look similar if you flip Factors 1 and 2; this may be an artifact of the rescaling. Another possible reasoning for the stark differences may be due to the low communalities for Item 2Â (0.052) and Item 8 (0.236). Kaiser normalization weights these items equally with the other high communality items.
Rotated Factor Matrix | ||
Factor | ||
1 | 2 | |
1 | 0.207 | 0.628 |
2 | -0.148 | -0.173 |
3 | -0.331 | -0.458 |
4 | 0.332 | 0.592 |
5 | 0.277 | 0.517 |
6 | 0.528 | 0.174 |
7 | 0.905 | 0.180 |
8 | 0.248 | 0.418 |
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax without Kaiser Normalization. | ||
a. Rotation converged in 3 iterations. |
In the table above, the absolute loadings that are higher than 0.4 are highlighted in blue for Factor 1 and in red for Factor 2. We can see that Items 6 and 7 load highly onto Factor 1 and Items 1, 3, 4, 5, and 8 load highly onto Factor 2. Item 2 does not seem to load highly on any factor. Looking more closely at Item 6 “My friends are better at statistics than me” and Item 7 “Computers are useful only for playing games”, we don’t see a clear construct that defines the two. Item 2, “I don’t understand statistics” may be too general an item and isn’t captured by SPSS Anxiety. It’s debatable at this point whether to retain a two-factor or one-factor solution, at the very minimum we should see if Item 2 is a candidate for deletion.
The Factor Transformation Matrix tells us how the Factor Matrix was rotated. In SPSS, you will see a matrix with two rows and two columns because we have two factors.
Factor Transformation Matrix | ||
Factor | 1 | 2 |
1 | 0.773 | 0.635 |
2 | -0.635 | 0.773 |
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. |
How do we interpret this matrix? Well, we can see it as the way to move from the Factor Matrix to the Rotated Factor Matrix. From the Factor Matrix we know that the loading of Item 1 on Factor 1 is \(0.588\) and the loading of Item 1 on Factor 2 is \(-0.303\), which gives us the pair \((0.588,-0.303)\); but in the Rotated Factor Matrix the new pair is \((0.646,0.139)\). How do we obtain this new transformed pair of values? We can do what’s called matrix multiplication. The steps are essentially to start with one column of the Factor Transformation matrix, view it as another ordered pair and multiply matching ordered pairs. To get the first element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) in the first column of the Factor Transformation Matrix.
$$(0.588)(0.773)+(-0.303)(-0.635)=0.455+0.192=0.647.$$
To get the second element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) from the second column of the Factor Transformation Matrix:
$$(0.588)(0.635)+(-0.303)(0.773)=0.373-0.234=0.139.$$
Voila! We have obtained the new transformed pair with some rounding error. The figure below summarizes the steps we used to perform the transformation
The Factor Transformation Matrix can also tell us angle of rotation if we take the inverse cosine of the diagonal element. In this case, the angle of rotation is \(cos^{-1}(0.773) =39.4 ^{\circ}\). In the factor loading plot, you can see what that angle of rotation looks like, starting from \(0^{\circ}\) rotating up in a counterclockwise direction by \(39.4^{\circ}\). Notice here that the newly rotated x and y-axis are still at \(90^{\circ}\) angles from one another, hence the name orthogonal (a non-orthogonal or oblique rotation means that the new axis is no longer \(90^{\circ}\) apart. The points do not move in relation to the axis but rotate with it.
The Total Variance Explained table contains the same columns as the PAF solution with no rotation, but adds another set of columns called “Rotation Sums of Squared Loadings”. This makes sense because if our rotated Factor Matrix is different, the square of the loadings should be different, and hence the Sum of Squared loadings will be different for each factor. However, if you sum the Sums of Squared Loadings across all factors for the Rotation solution,
$$ 1.701 + 1.309 = 3.01$$
and for the unrotated solution,
$$ 2.511 + 0.499 = 3.01,$$
you will see that the two sums are the same. This is because rotation does not change the total common variance. Looking at the Rotation Sums of Squared Loadings for Factor 1, it still has the largest total variance, but now that shared variance is split more evenly.
Total Variance Explained | |||
Factor | Rotation Sums of Squared Loadings | ||
Total | % of Variance | Cumulative % | |
1 | 1.701 | 21.258 | 21.258 |
2 | 1.309 | 16.363 | 37.621 |
Extraction Method: Principal Axis Factoring. |
Varimax rotation is the most popular but one among other orthogonal rotations. The benefit of Varimax rotation is that it maximizes the variances of the loadings within the factors while maximizing differences between high and low loadings on a particular factor. Higher loadings are made higher while lower loadings are made lower. This makes Varimax rotation good for achieving simple structure but not as good for detecting an overall factor because it splits up variance of major factors among lesser ones. Quartimax may be a better choice for detecting an overall factor. It maximizes the squared loadings so that each item loads most strongly onto a single factor.
Here is the output of the Total Variance Explained table juxtaposed side-by-side for Varimax versus Quartimax rotation.
Total Variance Explained | ||
Factor | Quartimax | Varimax |
Total | Total | |
1 | 2.381 | 1.701 |
2 | 0.629 | 1.309 |
Extraction Method: Principal Axis Factoring. |
You will see that whereas Varimax distributes the variances evenly across both factors, Quartimax tries to consolidate more variance into the first factor.
Equamax is a hybrid of Varimax and Quartimax, but because of this may behave erratically and according to Pett et al. (2003), is not generally recommended.
In oblique rotation, the factors are no longer orthogonal to each other (x and y axes are not \(90^{\circ}\) angles to each other). Like orthogonal rotation, the goal is rotation of the reference axes about the origin to achieve a simpler and more meaningful factor solution compared to the unrotated solution. In oblique rotation, you will see three unique tables in the SPSS output:
Suppose the Principal Investigator hypothesizes that the two factors are correlated, and wishes to test this assumption. Let’s proceed with one of the most common types of oblique rotations in SPSS, Direct Oblimin.
The steps to running a Direct Oblimin is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Direct Oblimin. The other parameter we have to put in is delta , which defaults to zero. Technically, when delta = 0, this is known as Direct Quartimin. Larger positive values for delta increases the correlation among factors. However, in general you don’t want the correlations to be too high or else there is no reason to split your factors up. In fact, SPSS caps the delta value at 0.8 (the cap for negative values is -9999). Negative delta factors may lead to orthogonal factor solutions. For the purposes of this analysis, we will leave our delta = 0 and do a Direct Quartimin analysis.
All the questions below pertain to Direct Oblimin in SPSS.
Answers: 1. T, 2. F, larger delta values, 3. F, delta leads to higher factor correlations, in general you don’t want factors to be too highly correlated
The factor pattern matrix represent partial standardized regression coefficients of each item with a particular factor. For example, \(0.740\) is the effect of Factor 1 on Item 1 controlling for Factor 2 and \(-0.137\) is the effect of Factor 2 on Item 1 controlling for Factor 1. Just as in orthogonal rotation, the square of the loadings represent the contribution of the factor to the variance of the item, but excluding the overlap between correlated factors. Factor 1 uniquely contributes \((0.740)^2=0.405=40.5\%\) of the variance in Item 1 (controlling for Factor 2 ), and Factor 2 uniquely contributes \((-0.137)^2=0.019=1.9%\) of the variance in Item 1 (controlling for Factor 1).
Pattern Matrix | ||
Factor | ||
1 | 2 | |
1 | 0.740 | -0.137 |
2 | -0.180 | -0.067 |
3 | -0.490 | -0.108 |
4 | 0.660 | 0.029 |
5 | 0.580 | 0.011 |
6 | 0.077 | 0.504 |
7 | -0.017 | 0.933 |
8 | 0.462 | 0.036 |
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. | ||
a. Rotation converged in 5 iterations. |
The factor structure matrix represent the simple zero-order correlations of the items with each factor (it’s as if you ran a simple regression of a single factor on the outcome). For example, \(0.653\) is the simple correlation of Factor 1 on Item 1 and \(0.333\) is the simple correlation of Factor 2 on Item 1. The more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. From this we can see that Items 1, 3, 4, 5, and 8 load highly onto Factor 1 and Items 6, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load well on either factor.
Additionally, we can look at the variance explained by each factor not controlling for the other factors. For example, Factor 1 contributes \((0.653)^2=0.426=42.6\%\) of the variance in Item 1, and Factor 2 contributes \((0.333)^2=0.11=11.0%\) of the variance in Item 1. Notice that the contribution in variance of Factor 2 is higher \(11\%\) vs. \(1.9\%\) because in the Pattern Matrix we controlled for the effect of Factor 1, whereas in the Structure Matrix we did not.
Structure Matrix | ||
Factor | ||
1 | 2 | |
1 | 0.653 | 0.333 |
2 | -0.222 | -0.181 |
3 | -0.559 | -0.420 |
4 | 0.678 | 0.449 |
5 | 0.587 | 0.380 |
6 | 0.398 | 0.553 |
7 | 0.577 | 0.923 |
8 | 0.485 | 0.330 |
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. |
Recall that the more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. In our case, Factor 1 and Factor 2 are pretty highly correlated, which is why there is such a big difference between the factor pattern and factor structure matrices.
Factor Correlation Matrix | ||
Factor | 1 | 2 |
1 | 1.000 | 0.636 |
2 | 0.636 | 1.000 |
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. |
The difference between an orthogonal versus oblique rotation is that the factors in an oblique rotation are correlated. This means not only must we account for the angle of axis rotation \(\theta\), we have to account for the angle of correlation \(\phi\). The angle of axis rotation is defined as the angle between the rotated and unrotated axes (blue and black axes). From the Factor Correlation Matrix, we know that the correlation is \(0.636\), so the angle of correlation is \(cos^{-1}(0.636) = 50.5^{\circ}\), which is the angle between the two rotated axes (blue x and blue y-axis). The sum of rotations \(\theta\) and \(\phi\) is the total angle rotation. We are not given the angle of axis rotation, so we only know that the total angle rotation is \(\theta + \phi = \theta + 50.5^{\circ}\).
The structure matrix is in fact a derivative of the pattern matrix. If you multiply the pattern matrix by the factor correlation matrix, you will get back the factor structure matrix. Let’s take the example of the ordered pair \((0.740,-0.137)\) from the Pattern Matrix, which represents the partial correlation of Item 1 with Factors 1 and 2 respectively. Performing matrix multiplication for the first column of the Factor Correlation Matrix we get
$$ (0.740)(1) + (-0.137)(0.636) = 0.740 – 0.087 =0.652.$$
Similarly, we multiple the ordered factor pair with the second column of the Factor Correlation Matrix to get:
$$ (0.740)(0.636) + (-0.137)(1) = 0.471 -0.137 =0.333 $$
Looking at the first row of the Structure Matrix we get \((0.653,0.333)\) which matches our calculation! This neat fact can be depicted with the following figure:
As a quick aside, suppose that the factors are orthogonal, which means that the factor correlations are 1′ s on the diagonal and zeros on the off-diagonal, a quick calculation with the ordered pair \((0.740,-0.137)\)
$$ (0.740)(1) + (-0.137)(0) = 0.740$$
and similarly,
$$ (0.740)(0) + (-0.137)(1) = -0.137$$
and you get back the same ordered pair. This is called multiplying by the identity matrix (think of it as multiplying \(2*1 = 2\)).
Answers: 1. Decrease the delta values so that the correlation between factors approaches zero. 2. T, the correlations will become more orthogonal and hence the pattern and structure matrix will be closer.
The column Extraction Sums of Squared Loadings is the same as the unrotated solution, but we have an additional column known as Rotation Sums of Squared Loadings. SPSS says itself that “when factors are correlated, sums of squared loadings cannot be added to obtain total variance”. You will note that compared to the Extraction Sums of Squared Loadings, the Rotation Sums of Squared Loadings is only slightly lower for Factor 1 but much higher for Factor 2. This is because unlike orthogonal rotation, this is no longer the unique contribution of Factor 1 and Factor 2. How do we obtain the Rotation Sums of Squared Loadings? SPSS squares the Structure Matrix and sums down the items.
Total Variance Explained | ||||
Factor | Extraction Sums of Squared Loadings | Rotation Sums of Squared Loadings | ||
Total | % of Variance | Cumulative % | Total | |
1 | 2.511 | 31.382 | 31.382 | 2.318 |
2 | 0.499 | 6.238 | 37.621 | 1.931 |
Extraction Method: Principal Axis Factoring. | ||||
a. When factors are correlated, sums of squared loadings cannot be added to obtain a total variance. |
As a demonstration, let’s obtain the loadings from the Structure Matrix for Factor 1
$$ (0.653)^2 + (-0.222)^2 + (-0.559)^2 + (0.678)^2 + (0.587)^2 + (0.398)^2 + (0.577)^2 + (0.485)^2 = 2.318.$$
Note that \(2.318\) matches the Rotation Sums of Squared Loadings for the first factor. This means that the Rotation Sums of Squared Loadings represent the non- unique contribution of each factor to total common variance, and summing these squared loadings for all factors can lead to estimates that are greater than total variance.
Finally, let’s conclude by interpreting the factors loadings more carefully. Let’s compare the Pattern Matrix and Structure Matrix tables side-by-side. First we highlight absolute loadings that are higher than 0.4 in blue for Factor 1 and in red for Factor 2. We see that the absolute loadings in the Pattern Matrix are in general higher in Factor 1 compared to the Structure Matrix and lower for Factor 2. This makes sense because the Pattern Matrix partials out the effect of the other factor. Looking at the Pattern Matrix, Items 1, 3, 4, 5, and 8 load highly on Factor 1, and Items 6 and 7 load highly on Factor 2. Looking at the Structure Matrix, Items 1, 3, 4, 5, 7 and 8 are highly loaded onto Factor 1 and Items 3, 4, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load on any factor. The results of the two matrices are somewhat inconsistent but can be explained by the fact that in the Structure Matrix Items 3, 4 and 7 seem to load onto both factors evenly but not in the Pattern Matrix. For this particular analysis, it seems to make more sense to interpret the Pattern Matrix because it’s clear that Factor 1 contributes uniquely to most items in the SAQ-8 and Factor 2 contributes common variance only to two items (Items 6 and 7). There is an argument here that perhaps Item 2 can be eliminated from our survey and to consolidate the factors into one SPSS Anxiety factor. We talk to the Principal Investigator and we think it’s feasible to accept SPSS Anxiety as the single factor explaining the common variance in all the items, but we choose to remove Item 2, so that the SAQ-8 is now the SAQ-7.
Pattern Matrix | Structure Matrix | |||
Factor | Factor | |||
1 | 2 | 1 | 2 | |
1 | 0.740 | -0.137 | 0.653 | 0.333 |
2 | -0.180 | -0.067 | -0.222 | -0.181 |
3 | -0.490 | -0.108 | -0.559 | -0.420 |
4 | 0.660 | 0.029 | 0.678 | 0.449 |
5 | 0.580 | 0.011 | 0.587 | 0.380 |
6 | 0.077 | 0.504 | 0.398 | 0.553 |
7 | -0.017 | 0.933 | 0.577 | 0.923 |
8 | 0.462 | 0.036 | 0.485 | 0.330 |
Answers: 1. T, 2. F, represent the non -unique contribution (which means the total sum of squares can be greater than the total communality), 3. F, the Structure Matrix is obtained by multiplying the Pattern Matrix with the Factor Correlation Matrix, 4. T, it’s like multiplying a number by 1, you get the same number back, 5. F, this is true only for orthogonal rotations, the SPSS Communalities table in rotated factor solutions is based off of the unrotated solution, not the rotated solution.
As a special note, did we really achieve simple structure? Although rotation helps us achieve simple structure, if the interrelationships do not hold itself up to simple structure, we can only modify our model. In this case we chose to remove Item 2 from our model.
Promax rotation begins with Varimax (orthgonal) rotation, and uses Kappa to raise the power of the loadings. Promax really reduces the small loadings. Promax also runs faster than Varimax, and in our example Promax took 3 iterations while Direct Quartimin (Direct Oblimin with Delta =0) took 5 iterations.
Answers: 1. T.
Suppose the Principal Investigator is happy with the final factor analysis which was the two-factor Direct Quartimin solution. She has a hypothesis that SPSS Anxiety and Attribution Bias predict student scores on an introductory statistics course, so would like to use the factor scores as a predictor in this new regression analysis. Since a factor is by nature unobserved, we need to first predict or generate plausible factor scores. In SPSS, there are three methods to factor score generation, Regression, Bartlett, and Anderson-Rubin.
In order to generate factor scores, run the same factor analysis model but click on Factor Scores (Analyze – Dimension Reduction – Factor – Factor Scores). Then check Save as variables, pick the Method and optionally check Display factor score coefficient matrix.
The code pasted in the SPSS Syntax Editor looksl like this:
Here we picked the Regression approach after fitting our two-factor Direct Quartimin solution. After generating the factor scores, SPSS will add two extra variables to the end of your variable list, which you can view via Data View. The figure below shows what this looks like for the first 5 participants, which SPSS calls FAC1_1 and FAC2_1 for the first and second factors. These are now ready to be entered in another analysis as predictors.
For those who want to understand how the scores are generated, we can refer to the Factor Score Coefficient Matrix. These are essentially the regression weights that SPSS uses to generate the scores. We know that the ordered pair of scores for the first participant is \(-0.880, -0.113\). We also know that the 8 scores for the first participant are \(2, 1, 4, 2, 2, 2, 3, 1\). However, what SPSS uses is actually the standardized scores, which can be easily obtained in SPSS by using Analyze – Descriptive Statistics – Descriptives – Save standardized values as variables. The standardized scores obtained are:Â Â \(-0.452, -0.733, 1.32, -0.829, -0.749, -0.2025, 0.069, -1.42\). Using the Factor Score Coefficient matrix, we multiply the participant scores by the coefficient matrix for each column. For the first factor:
$$ \begin{eqnarray} &(0.284) (-0.452) + (-0.048)-0.733) + (-0.171)(1.32) + (0.274)(-0.829) \\ &+ (0.197)(-0.749) +(0.048)(-0.2025) + (0.174) (0.069) + (0.133)(-1.42) \\ &= -0.880, \end{eqnarray} $$
which matches FAC1_1Â for the first participant. You can continue this same procedure for the second factor to obtain FAC2_1.
Factor Score Coefficient Matrix | ||
Item | Factor | |
1 | 2 | |
1 | 0.284 | 0.005 |
2 | -0.048 | -0.019 |
3 | -0.171 | -0.045 |
4 | 0.274 | 0.045 |
5 | 0.197 | 0.036 |
6 | 0.048 | 0.095 |
7 | 0.174 | 0.814 |
8 | 0.133 | 0.028 |
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression. |
The second table is the Factor Score Covariance Matrix,
Factor Score Covariance Matrix | ||
Factor | 1 | 2 |
1 | 1.897 | 1.895 |
2 | 1.895 | 1.990 |
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression. |
This table can be interpreted as the covariance matrix of the factor scores, however it would only be equal to the raw covariance if the factors are orthogonal. For example, if we obtained the raw covariance matrix of the factor scores we would get
Correlations | |||
FAC1_1 | FAC1_2 | ||
FAC1_1 | Covariance | 0.777 | 0.604 |
FAC1_2 | Covariance | 0.604 | 0.870 |
You will notice that these values are much lower. Let’s compare the same two tables but for Varimax rotation:
Factor Score Covariance Matrix | ||
Factor | 1 | 2 |
1 | 0.670 | 0.131 |
2 | 0.131 | 0.805 |
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. Factor Scores Method: Regression. |
If you compare these elements to the Covariance table below, you will notice they are the same.
Correlations | |||
FAC1_1 | FAC1_2 | ||
FAC1_1 | Covariance | 0.670 | 0.131 |
FAC1_2 | Covariance | 0.131 | 0.805 |
Note with the Bartlett and Anderson-Rubin methods you will not obtain the Factor Score Covariance matrix.
Among the three methods, each has its pluses and minuses. The regression method maximizes the correlation (and hence validity) between the factor scores and the underlying factor but the scores can be somewhat biased. This means even if you have an orthogonal solution, you can still have correlated factor scores. For Bartlett’s method, the factor scores highly correlate with its own factor and not with others, and they are an unbiased estimate of the true factor score. Unbiased scores means that with repeated sampling of the factor scores, the average of the scores is equal to the average of the true factor score. The Anderson-Rubin method perfectly scales the factor scores so that the factor scores are uncorrelated with other factors and uncorrelated with other factor scores . Since Anderson-Rubin scores impose a correlation of zero between factor scores, it is not the best option to choose for oblique rotations. Additionally, Anderson-Rubin scores are biased.
In summary, if you do an orthogonal rotation, you can pick any of the the three methods. For orthogonal rotations, use Bartlett if you want unbiased scores, use the regression method if you want to maximize validity and use Anderson-Rubin if you want the factor scores themselves to be uncorrelated with other factor scores. If you do oblique rotations, it’s preferable to stick with the Regression method. Do not use Anderson-Rubin for oblique rotations.
Answers: 1. T, 2. T, 3. T
Your Name (required)
Your Email (must be a valid email for us to receive the report!)
Comment/Error Report (required)
How to cite this page
Breadcrumbs Section. Click here to navigate to respective pages.
Factor analysis
DOI link for Factor analysis
Click here to navigate to parent product.
This introductory chapter discusses the purposes of factor analysis and dimension reduction, limitations of factor analysis, and common research questions associated with factor analysis. After the introductory overview of factor analysis, brief explanations are given for ten specific common questions related to factor methodology:
May I use factor analysis on sub-interval data?
How many dimensions are there in my data?
What are the best measures for my construct and how should I weigh them?
How do people in my sample cluster?
How do I use factor analysis in R to compare groups?
How do I know if my factors are really subfactors of a more comprehensive construct?
How may I use factor analysis to predict a dependent variable?
Can factor analysis help me understand the effect of outliers on my results?
How may I represent my factors spatially?
How can factor analysis be used to tell if I have common method bias?
Connect with us
Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited
Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) âfactors.â The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.
Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.
The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the âfactors.â In principal components, we create new variables that are linear combinations of the observed variables. In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.
Collect all of the variables X 's into a vector \(\mathbf{X}\) for each individual subject. Let \(\mathbf{X_i}\) denote observable trait i. These are the data from each subject and are collected into a vector of traits.
\(\textbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right) = \text{vector of traits}\)
This is a random vector, with a population mean. Assume that vector of traits \(\mathbf{X}\) is sampled from a population with population mean vector:
\(\boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right) = \text{population mean vector}\)
Here, \(\mathrm { E } \left( X _ { i } \right) = \mu _ { i }\) denotes the population mean of variable i .
Consider m unobservable common factors \(f _ { 1 } , f _ { 2 } , \dots , f _ { m }\). The \(i^{th}\) common factor is \(f _ { i } \). Generally, m is going to be substantially less than p .
The common factors are also collected into a vector,
\(\mathbf{f} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right) = \text{vector of common factors}\)
Our factor model can be thought of as a series of multiple regressions, predicting each of the observable variables \(X_{i}\) from the values of the unobservable common factors \(f_{i}\) :
\begin{align} X_1 & = \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & = \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ & \vdots \\ X_p & = \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{align}
Here, the variable means \(\mu_{1}\) through \(\mu_{p}\) can be regarded as the intercept terms for the multiple regression models.
The regression coefficients \(l_{ij}\) (the partial slopes) for all of these multiple regressions are called factor loadings. Here, \(l_{ij}\) = loading of the \(i^{th}\) variable on the \(j^{th}\) factor. These are collected into a matrix as shown here:
\(\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) = \text{matrix of factor loadings}\)
And finally, the errors \(\varepsilon _{i}\) are called the specific factors. Here, \(\varepsilon _{i}\) = specific factor for variable i . The specific factors are also collected into a vector:
\(\boldsymbol{\epsilon} = \left(\begin{array}{c}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_p\end{array}\right) = \text{vector of specific factors}\)
In summary, the basic model is like a regression model. Each of our response variables X is predicted as a linear function of the unobserved common factors \(f_{1}\), \(f_{2}\) through \(f_{m}\). Thus, our explanatory variables are \(f_{1}\) , \(f_{2}\) through \(f_{m}\). We have m unobserved factors that control the variation in our data.
We will generally reduce this into matrix notation as shown in this form here:
\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+\boldsymbol{\epsilon}\)
The specific factors or random errors all have mean zero: \(E(\epsilon_i) = 0\); i = 1, 2, ... , p
The common factors, the f 's, also have mean zero: \(E(f_i) = 0\); i = 1, 2, ... , m
A consequence of these assumptions is that the mean response of the i th trait is \(\mu_i\). That is,
\(E(X_i) = \mu_i\)
The common factors have variance one: \(\text{var}(f_i) = 1\); i = 1, 2, ... , m
The common factors are uncorrelated with one another: \(\text{cov}(f_i, f_j) = 0\) for i â j
The specific factors are uncorrelated with one another: \(\text{cov}(\epsilon_i, \epsilon_j) = 0\) for i â j
The specific factors are uncorrelated with the common factors: \(\text{cov}(\epsilon_i, f_j) = 0\); i = 1, 2, ... , p; j = 1, 2, ... , m
These assumptions are necessary to estimate the parameters uniquely. An infinite number of equally well-fitting models with different parameter values may be obtained unless these assumptions are made.
Under this model the variance for the i th observed variable is equal to the sum of the squared loadings for that variable and the specific variance:
The variance of trait i is: \(\sigma^2_i = \text{var}(X_i) = \sum_{j=1}^{m}l^2_{ij}+\psi_i\)
This derivation is based on the previous assumptions. \(\sum_{j=1}^{m}l^2_{ij}\) is called the Communality for variable i. Later on, we will see how this is a measure of how well the model performs for that particular variable. The larger the commonality, the better the model performance for the i th variable.
The covariance between pairs of traits i and j is: \(\sigma_{ij}= \text{cov}(X_i, X_j) = \sum_{k=1}^{m}l_{ik}l_{jk}\)
The covariance between trait i and factor j is: \(\text{cov}(X_i, f_j) = l_{ij}\)
In matrix notation, our model for the variance-covariance matrix is expressed as shown below:
\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)
This is the matrix of factor loadings times its transpose, plus a diagonal matrix containing the specific variances.
Here \(\boldsymbol{\Psi}\) equals:
\(\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\)
A parsimonious (simplified) model for the variance-covariance matrix is obtained and used for estimation.
The variance-covariance matrix is going to have p ( p +1)/2 unique elements of \(\Sigma\) approximated by:
This means that there are mp plus p parameters in the variance-covariance matrix. Ideally, mp + p is substantially smaller than p ( p +1)/2. However, if mp is too small, the mp + p parameters may not be adequate to describe \(\Sigma\). There may always be the case that this is not the right model and you cannot reduce the data to a linear combination of factors.
\(\mathbf{T'T = TT' = I} \)
We can write our factor model in matrix notation:
\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{LTT'f}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{L^*f^*}+\boldsymbol{\epsilon}\)
Note that This does not change the calculation because the identity matrix times any matrix is the original matrix. This results in an alternative factor model, where the relationship between the new factor loadings and the original factor loadings is:
\(\mathbf{L^*} = \textbf{LT}\)
and the relationship between the new common factors and the original common factors is:
\(\mathbf{f^*} = \textbf{T'f}\)
This gives a model that fits equally well. Moreover, because there is an infinite number of orthogonal matrices, then there is an infinite number of alternative models. This model, as it turns out, satisfies all of the assumptions discussed earlier.
\(E(\mathbf{f^*}) = E(\textbf{T'f}) = \textbf{T'}E(\textbf{f}) = \mathbf{T'0} =\mathbf{0}\),
\(\text{var}(\mathbf{f^*}) = \text{var}(\mathbf{T'f}) = \mathbf{T'}\text{var}(\mathbf{f})\mathbf{T} = \mathbf{T'IT} = \mathbf{T'T} = \mathbf{I}\)
\(\text{cov}(\mathbf{f^*, \boldsymbol{\epsilon}}) = \text{cov}(\mathbf{T'f, \boldsymbol{\epsilon}}) = \mathbf{T'}\text{cov}(\mathbf{f, \boldsymbol{\epsilon}}) = \mathbf{T'0} = \mathbf{0}\)
So f* satisfies all of the assumptions, and hence f* is an equally valid collection of common factors. There is a certain apparent ambiguity to these models. This ambiguity is later used to justify a factor rotation to obtain a more parsimonious description of the data.
We consider two different methods to estimate the parameters of a factor model:
A third method, the principal factor method, is also available but not considered in this class.
Let \(X_i\) be a vector of observations for the \(i^{th}\) subject:
\(\mathbf{X_i} = \left(\begin{array}{c}X_{i1}\\ X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\)
\(\mathbf{S}\) denotes our sample variance-covariance matrix and is expressed as:
\(\textbf{S} = \dfrac{1}{n-1}\sum\limits_{i=1}^{n}\mathbf{(X_i - \bar{x})(X_i - \bar{x})'}\)
We have p eigenvalues for this variance-covariance matrix as well as corresponding eigenvectors for this matrix.
Eigenvalues of \(\mathbf{S}\):
\(\hat{\lambda}_1, \hat{\lambda}_2, \dots, \hat{\lambda}_p\)
Eigenvectors of \(\mathbf{S}\):
\(\hat{\mathbf{e}}_1, \hat{\mathbf{e}}_2, \dots, \hat{\mathbf{e}}_p\)
Recall that the variance-covariance matrix can be re-expressed in the following form as a function of the eigenvalues and the eigenvectors:
\(\Sigma = \sum_{i=1}^{p}\lambda_i \mathbf{e_{ie'_i}} \cong \sum_{i=1}^{m}\lambda_i \mathbf{e_{ie'_i}} = \left(\begin{array}{cccc}\sqrt{\lambda_1}\mathbf{e_1} & \sqrt{\lambda_2}\mathbf{e_2} & \dots & \sqrt{\lambda_m}\mathbf{e_m}\end{array}\right) \left(\begin{array}{c}\sqrt{\lambda_1}\mathbf{e'_1}\\ \sqrt{\lambda_2}\mathbf{e'_2}\\ \vdots\\ \sqrt{\lambda_m}\mathbf{e'_m}\end{array}\right) = \mathbf{LL'}\)
The idea behind the principal component method is to approximate this expression. Instead of summing from 1 to p , we now sum from 1 to m , ignoring the last p - m terms in the sum, and obtain the third expression. We can rewrite this as shown in the fourth expression, which is used to define the matrix of factor loadings \(\mathbf{L}\), yielding the final expression in matrix notation.
This yields the following estimator for the factor loadings:
\(\hat{l}_{ij} = \hat{e}_{ji}\sqrt{\hat{\lambda}_j}\)
This forms the matrix \(\mathbf{L}\) of factor loading in the factor analysis. This is followed by the transpose of \(\mathbf{L}\). To estimate the specific variances, recall that our factor model for the variance-covariance matrix is
\(\boldsymbol{\Sigma} = \mathbf{LL'} + \boldsymbol{\Psi}\)
in matrix notation. \(\Psi\) is now going to be equal to the variance-covariance matrix minus \(\mathbf{LL'}\).
\( \boldsymbol{\Psi} = \boldsymbol{\Sigma} - \mathbf{LL'}\)
This in turn suggests that the specific variances, the diagonal elements of \(\Psi\), are estimated with this expression:
\(\hat{\Psi}_i = s^2_i - \sum\limits_{j=1}^{m}\lambda_j \hat{e}^2_{ji}\)
We take the sample variance for the i th variable and subtract the sum of the squared factor loadings (i.e., the commonality).
Example 12-1: places rated.
Let's revisit the Places Rated Example from Lesson 11 . Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:
Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.
Our objective here is to describe the relationships among the variables.
Before carrying out a factor analysis we need to determine m . How many common factors should be included in the model? This requires a determination of how many parameters will be involved.
For p = 9, the variance-covariance matrix \(\Sigma\) contains
\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)
unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to
\(p(m+1) = 9(m+1)\)
Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.
It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.
Component | Eigenvalue | Proportion | Cumulative |
---|---|---|---|
1 | 3.2978 | 0.3664 | 0.3664 |
2 | 1.2136 | 0.1348 | 0.5013 |
3 | 1.1055 | 0.1228 | 0.6241 |
4 | 0.9073 | 0.1008 | 0.7249 |
5 | 0.8606 | 0.0956 | 0.8205 |
6 | 0.5622 | 0.0625 | 0.8830 |
7 | 0.4838 | 0.0538 | 0.9368 |
8 | 0.3181 | 0.0353 | 0.9721 |
9 | 0.2511 | 0.0279 | 1.0000 |
We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.
Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.
The factor analysis is carried out using the program as shown below:
Download the SAS Program here: places2.sas
Note : In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
To perform factor analysis and obtain the communalities:.
Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression
\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)
These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m =3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.
Factor | |||
---|---|---|---|
Variable | 1 | 2 | 3 |
Climate | 0.286 | 0.076 | |
Housing | 0.153 | 0.084 | |
Health | -0.410 | -0.020 | |
Crime | 0.135 | ||
Transportation | -0.156 | -0.148 | |
Education | -0.253 | ||
Arts | -0.115 | 0.011 | |
Recreation | 0.322 | 0.044 | |
Economics | 0.298 |
Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criteria, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:
Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.
Similarly, Factor 2 is correlated most strongly with Crime, Education, and Economics. You can say that the second factor is primarily a measure of these variables.
Likewise, Factor 3 is correlated most strongly with Climate and Economics. You can say that the third factor is primarily a measure of these variables.
The interpretation above is very similar to that obtained in the standardized principal component analysis.
Example 12-1: continued....
The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:
\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)
To understand the computation of communulaties, recall the table of factor loadings:
Factor | |||
---|---|---|---|
Variable(HEADING) | 1 | 2 | 3 |
Climate | |||
Housing | 0.698 | 0.153 | 0.084 |
Health | 0.744 | -0.410 | -0.020 |
Crime | 0.471 | 0.522 | 0.135 |
Transportation | 0.681 | -0.156 | -0.148 |
Education | 0.498 | -0.498 | -0.253 |
Arts | 0.861 | -0.115 | 0.011 |
Recreation | 0.642 | 0.322 | 0.044 |
Economics | 0.298 | 0.595 | -0.533 |
Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:
\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)
The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:
Final Communality Estimates: Total = | ||||||||
---|---|---|---|---|---|---|---|---|
Climate | housing | health | crime | trans | educate | arts | recreate | econ |
0.79500707 | 0.51783185 | 0.72230182 | 0.51244913 | 0.50977159 | 0.56073895 | 0.75382091 | 0.51725940 | 0.72770402 |
5.616885, (located just above the individual communalities), is the "Total Communality".
To perform factor analysis and obtain the communities:.
In summary, the communalities are placed into a table:
Variable | Communality |
---|---|
Climate | 0.795 |
Housing | 0.518 |
Health | 0.722 |
Crime | 0.512 |
Transportation | 0.510 |
Education | 0.561 |
Arts | 0.754 |
Recreation | 0.517 |
Economics | 0.728 |
You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variations in climate, the arts, economics, and health.
One assessment of how well this model performs can be obtained from the communalities. We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best and is not bad for other variables such as Economics, Health, and the Arts. However, for other variables such as Crime, Recreation, Transportation, and Housing the model does not do a good job, explaining only about half of the variation.
The sum of all communality values is the total communality value:
\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)
Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is
\(\dfrac{5.617}{9} = 0.624\)
This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.
Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:
\(\hat{\Psi}_i = 1-\hat{h}^2_i\)
Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:
\(\hat{\Psi}_1 = 1-0.795 = 0.205\)
The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:
Climate | Housing | Health | crime | Trans | Educate | Arts | Recreate | Econ | |
---|---|---|---|---|---|---|---|---|---|
Climate | 0.20499 | -0.00924 | -0.01476 | -0.06027 | -0.03720 | 0.18537 | -0.07518 | -0.12475 | 0.21735 |
Housing | -0.00924 | 0.48217 | -0.02317 | -0.28063 | -0.12119 | -0.04803 | -0.07518 | -0.04032 | 0.04249 |
Health | -0.01476 | -0.02317 | 0.27770 | 0.05007 | -0.15480 | -0.11537 | -0.00929 | -0.09108 | 0.06527 |
Crime | -0.06027 | -0.28063 | 0.05007 | 0.48755 | 0.05497 | 0.11562 | 0.00009 | -0.18377 | -0.10288 |
Trans | -0.03720 | -0.12119 | -0.15480 | 0.05497 | 0.49023 | -0.14318 | -0.05439 | 0.01041 | -0.12641 |
Educate | 0.18537 | -0.04803 | -0.11537 | 0.11562 | -0.14318 | 0.43926 | -0.13515 | -0.05531 | 0.14197 |
Arts | -0.07518 | -0.07552 | -0.00929 | 0.00009 | -0.05439 | -0.13515 | 0.24618 | -0.01926 | -0.04687 |
Recreate | -0.12475 | -0.04032 | -0.09108 | -0.18377 | 0.01041 | -0.05531 | -0.01926 | 0.48274 | -0.18326 |
Econ | 0.21735 | 0.04249 | 0.06527 | -0.10288 | -0.12641 | 0.14197 | -0.04687 | -0.18326 | 0.27230 |
For example, the specific variance for housing is 0.482.
This model provides an approximation to the correlation matrix. We can assess the model's appropriateness with the residuals obtained from the following calculation:
\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)
This is basically the difference between R and LL' , or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217. These values give an indication of how well the factor model fits the data.
One disadvantage of the principal component method is that it does not provide a test for lack of fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this. Such a test is available for the maximum likelihood method.
Unlike the competing methods, the estimated factor loadings under the principal component method do not change as the number of factors is increased. This is not true of the remaining methods (e.g., maximum likelihood). However, the communalities and the specific variances will depend on the number of factors in the model. In general, as you increase the number of factors, the communalities increase toward one and the specific variances will decrease toward zero.
The diagonal elements of the variance-covariance matrix \(\mathbf{S}\) (or \(\mathbf{R}\)) are equal to the diagonal elements of the model:
\(\mathbf{\hat{L}\hat{L}' + \mathbf{\hat{\Psi}}}\)
The off-diagonal elements are not exactly reproduced. This is in part due to variability in the data - just random chance. Therefore, we want to select the number of factors to make the off-diagonal elements of the residual matrix small:
\(\mathbf{S - (\hat{L}\hat{L}' + \hat{\Psi})}\)
Here, we have a trade-off between two conflicting desires. For a parsimonious model, we wish to select the number of factors m to be as small as possible, but for such a model, the residuals could be large. Conversely, by selecting m to be large, we may reduce the sizes of the residuals but at the cost of producing a more complex and less interpretable model (there are more factors to interpret).
Another result to note is that the sum of the squared elements of the residual matrix is equal to the sum of the squared values of the eigenvalues left out of the matrix:
\(\sum\limits_{j=m+1}^{p}\hat{\lambda}^2_j\)
Below are three common techniques used to determine the number of factors to extract:
Maximum Likelihood Estimation requires that the data are sampled from a multivariate normal distribution. This is a drawback of this method. Data is often collected on a Likert scale, especially in the social sciences. Because a Likert scale is discrete and bounded, these data cannot be normally distributed.
Using the Maximum Likelihood Estimation Method, we must assume that the data are independently sampled from a multivariate normal distribution with mean vector \(\mu\) and variance-covariance matrix of the form:
\(\boldsymbol{\Sigma} = \mathbf{LL' +\boldsymbol{\Psi}}\)
where \(\mathbf{L}\) is the matrix of factor loadings and \(\Psi\) is the diagonal matrix of specific variances.
We define additional notation: As usual, the data vectors for n subjects are represented as shown:
\(\mathbf{X_1},\mathbf{X_2}, \dots, \mathbf{X_n}\)
Maximum likelihood estimation involves estimating the mean, the matrix of factor loadings, and the specific variance.
The maximum likelihood estimator for the mean vector \(\mu\), the factor loadings \(\mathbf{L}\), and the specific variances \(\Psi\) are obtained by finding \(\hat{\mathbf{\mu}}\), \(\hat{\mathbf{L}}\), and \(\hat{\mathbf{\Psi}}\) that maximize the log-likelihood given by the following expression:
\(l(\mathbf{\mu, L, \Psi}) = - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL' + \Psi}|} - \dfrac{1}{2}\sum_{i=1}^{n}\mathbf{(X_i-\mu)'(LL'+\Psi)^{-1}(X_i-\mu)}\)
The log of the joint probability distribution of the data is maximized. We want to find the values of the parameters, (\(\mu\), \(\mathbf{L}\), and \(\Psi\)), that are most compatible with what we see in the data. As was noted earlier the solutions for these factor models are not unique. Equivalent models can be obtained by rotation. If \(\mathbf{L'\Psi^{-1}L}\) is a diagonal matrix, then we may obtain a unique solution.
Computationally this process is complex. In general, there is no closed-form solution to this maximization problem so iterative methods are applied. Implementation of iterative methods can run into problems as we will see later.
Example 12-2: places rated.
This method of factor analysis is being carried out using the program shown below:
Download the SAS Program here: places3.sas
Here we have specified the Maximum Likelihood Method by setting method=ml. Again, we need to specify the number of factors.
You will notice that this program produces errors and does not complete the factor analysis. We will start out without the Heywood or priors options discussed below to see the error that occurs and how to remedy it.
For m = 3 factors, maximum likelihood estimation fails to converge. An examination of the records of each iteration reveals that the commonality of the first variable (climate) exceeds one during the first iteration. Because the communality must lie between 0 and 1, this is the cause for failure.
SAS provides a number of different fixes for this kind of error. Most fixes adjust the initial guess, or starting value, for the commonalities.
This option is added within the proc factor line of code (proc factor method=ml nfactors=3 priors=smc;). If we begin with better-starting values, then we might have better luck at convergence. Unfortunately, in trying each of these options, (including running the random option multiple times), we find that these options are ineffective for our Places Rated Data. The second option needs to be considered.
We start with the same values for the commonalities and then at each iteration, we obtain new values for the commonalities. The criterion is a value that we are trying to minimize in order to obtain our estimates. We can see that the convergence criterion decreases with each iteration of the algorithm.
Iteration | Criterion | Ridge | Change | Communalities | |||
---|---|---|---|---|---|---|---|
1 | 0.3291161 | 0.0000 | 0.2734 | 0.47254 | 0.40913 | 0.73500 | 0.22107 |
0.38516 | 0.26178 | 0.75125 | 0.46384 | ||||
0.15271 | |||||||
2 | 0.2946707 | 0.0000 | 0.5275 | 1.00000 | 0.37872 | 0.75101 | 0.20469 |
0.36111 | 0.26155 | 0.75298 | 0.48979 | ||||
0.11995 | |||||||
3 | 0.2877116 | 0.0000 | 0.0577 | 1.00000 | 0.41243 | 0.80868 | 0.22168 |
0.38551 | 0.26263 | 0.74546 | 0.53277 | ||||
0.11601 | |||||||
4 | 0.2876330 | 0.0000 | 0.0055 | 1.00000 | 0.41336 | 0.81414 | 0.21647 |
0.38365 | 0.26471 | 0.74493 | 0.53724 | ||||
0.11496 | |||||||
5 | 0.2876314 | 0.0000 | 0.0007 | 1.00000 | 0.41392 | 0.81466 | 0.21595 |
0.38346 | 0.26475 | 0.74458 | 0.53794 | ||||
0.11442 |
You can see in the second iteration that rather than report a commonality greater than one, SAS replaces it with the value one and then proceeds as usual through the iterations.
After five iterations the algorithm converges, as indicated by the statement on the second page of the output. The algorithm converged to a setting where the commonality for Climate is equal to one.
Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:
\(\mathbf{\Sigma = LL' + \Psi}\)
\(\mathbf{L}\) is the matrix of factor loadings, and the diagonal elements of \(Κ\) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:
\(X^2 = \left(n-1-\frac{2p+4m-5}{6}\right)\log \frac{|\mathbf{\hat{L}\hat{L}'}+\mathbf{\hat{\Psi}}|}{|\hat{\mathbf{\Sigma}}|}\)
The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant in the statistic is called the Bartlett correction. The log is the natural log. In the numerator, we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:
\(\hat{\boldsymbol{\Sigma}} = \frac{n-1}{n}\mathbf{S}\)
and \(\mathbf{S}\) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for \(X_{2}\). However, if the model does not fit well, then the determinants will be different and \(X_{2}\) will be large.
Under the null hypothesis that the factor model adequately describes the relationships among the variables,
\(\mathbf{X}^2 \sim \chi^2_{\frac{(p-m)^2-p-m}{2}} \)
Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if \(X_{2}\) exceeds the critical value from the chi-square table.
Looking just past the iteration results, we have....
Test | DF | Chi-Square | Pr > ChiSq |
---|---|---|---|
\(H_{o}\colon\) No common factors | 36 | 839.4268 | < 0.0001 |
\(H_{A}\colon\) At least one common factor | |||
\(H_{o}\colon\) 3 Factors are sufficient | 12 | 92.6652 | < 0.0001 |
\(H_{A}\colon\) More Factors are needed |
For our Places Rated dataset, we find a significant lack of fit. \(X _ { 2 } = 92.67 ; d . f = 12 ; p < 0.0001\). We conclude that the relationships among the variables are not adequately described by the factor model. This suggests that we do not have the correct model.
The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy
\(p(m+1) \le \frac{p(p+1)}{2}\)
In the present example, this means that m †4.
Let's return to the SAS program and change the "nfactors" value from 3 to 4:
Test | DF | Chi-Square | Pr > ChiSq |
---|---|---|---|
\(H_{o}\colon\) No common factors | 36 | 839.4268 | < 0.0001 |
\(H_{A}\colon\) At least one common factor | |||
\(H_{o}\colon\) 4 Factors are sufficient | 6 | 41.6867 | < 0.0001 |
\(H_{A}\colon\) More Factors are needed |
We find that the factor model with m = 4 does not fit the data adequately either, \(X _ { 2 } = 41.69 ; d . f . = 6 ; p < 0.0001\). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. The next step could be to drop variables from the data set to obtain a better-fitting model.
From our experience with the Places Rated data, it does not look like the factor model works well. There is no guarantee that any model will fit the data well.
The first motivation of factor analysis was to try to discern some underlying factors describing the data. The Maximum Likelihood Method failed to find such a model to describe the Places Rated data. The second motivation is still valid, which is to try to obtain a better interpretation of the data. In order to do this, let's take a look at the factor loadings obtained before from the principal component method.
Factor | |||
---|---|---|---|
Variable | 1 | 2 | 3 |
Climate | 0.286 | 0.076 | |
Housing | 0.153 | 0.084 | |
Health | -0.410 | -0.020 | |
Crime | 0.135 | ||
Transportation | -0.156 | -0.148 | |
Education | -0.253 | ||
Arts | -0.115 | 0.011 | |
Recreation | 0.322 | 0.044 | |
Economics | 0.298 |
The problem with this analysis is that some of the variables are highlighted in more than one column. For instance, Education appears significant to Factor 1 AND Factor 2. The same is true for Economics in both Factors 2 AND 3. This does not provide a very clean, simple interpretation of the data. Ideally, each variable would appear as a significant contributor in one column.
In fact, the above table may indicate contradictory results. Looking at some of the observations, it is conceivable that we will find an observation that takes a high value on both Factors 1 and 2. If this occurs, a high value for Factor 1 suggests that the community has quality education, whereas a high value for Factor 2 suggests the opposite, that the community has poor education.
Factor rotation is motivated by the fact that factor models are not unique. Recall that the factor model for the data vector, \(\mathbf{X = \boldsymbol{\mu} + LF + \boldsymbol{\epsilon}}\), is a function of the mean \(\boldsymbol{\mu}\), plus a matrix of factor loadings times a vector of common factors, plus a vector of specific factors.
Moreover, we should note that this is equivalent to a rotated factor model, \(\mathbf{X = \boldsymbol{\mu} + L^*F^* + \boldsymbol{\epsilon}}\), where we have set \(\mathbf{L^* = LT}\) and \(\mathbf{f^* = T'f}\) for some orthogonal matrix \(\mathbf{T}\) where \(\mathbf{T'T = TT' = I}\). Note that there are an infinite number of possible orthogonal matrices, each corresponding to a particular factor rotation.
We plan to find an appropriate rotation, defined through an orthogonal matrix \(\mathbf{T}\) , that yields the most easily interpretable factors.
To understand this, consider a scatter plot of factor loadings. The orthogonal matrix \(\mathbf{T}\) rotates the axes of this plot. We wish to find a rotation such that each of the p variables has a high loading on only one factor.
We will return to the program below to obtain a plot. In looking at the program, there are a number of options (marked in blue under proc factor) that we did not yet explain.
Download the SAS program here: places2.sas
One of the options above is labeled 'preplot'. We will use this to plot the values for factor 1 against factor 2.
In the output these values are plotted, the loadings for factor 1 on the y-axis, and the loadings for factor 2 on the x-axis.
Similarly, the second variable, labeled with the letter B, has a factor 1 loading of about 0.7 and a factor 2 loading of about 0.15. Each letter on the plot corresponds to a single variable. SAS provides plots of the other combinations of factors, factor 1 against factor 3 as well as factor 2 against factor 3.
Three factors appear in this model so we might consider a three-dimensional plot of all three factors together.
To perform factor analysis with scree and loading plots:.
The selection of the orthogonal matrixes \(\mathbf{T}\) corresponds to our rotation of these axes. Think about rotating the axis of the center. Each rotation will correspond to an orthogonal matrix \(\mathbf{T}\). We want to rotate the axes to obtain a cleaner interpretation of the data. We would really like to define new coordinate systems so that when we rotate everything, the points fall close to the vertices (endpoints) of the new axes.
If we were only looking at two factors, then we would like to find each of the plotted points at the four tips (corresponding to all four directions) of the rotated axes. This is what rotation is about, taking the factor pattern plot and rotating the axes in such a way that the points fall close to the axes.
This is the sample variances of the standardized loadings for each factor summed over the m factors.
Returning to the options of the factoring procedure (marked in blue):
"rotate," asks for factor rotation and we specified the Varimax rotation of our factor loadings.
"plot," asks for the same kind of plot that we just looked at for the rotated factors. The result of our rotation is a new factor pattern given below (page 11 of SAS output):
Here is a copy of page 10 from the SAS output:
At the top of page 10 of the output, above, we have our orthogonal matrix T .
To perform factor analysis with varimax rotation:.
The values of the rotated factor loadings are:
Factor | |||
---|---|---|---|
Variable | 1 | 2 | 3 |
Climate | 0.021 | 0.239 | |
Housing | 0.438 | 0.166 | |
Health | 0.127 | 0.137 | |
Crime | 0.031 | 0.139 | |
Transportation | 0.289 | -0.028 | |
Education | -0.094 | -0.117 | |
Arts | 0.432 | 0.150 | |
Recreation | 0.301 | 0.099 | |
Economics | -0.022 | -0.551 |
Let us now interpret the data based on the rotation. We highlighted the values that are large in magnitude and make the following interpretation.
This is just the pattern that exists in the data and no causal inferences should be made from this interpretation. It does not tell us why this pattern exists. It could very well be that there are other essential factors that are not seen at work here.
Let us look at the amount of variation explained by our factors under the rotated model and compare it to the original model. Consider the variance explained by each factor under the original analysis and the rotated factors:
Analysis | ||
---|---|---|
Factor | Original | Rotated |
1 | 3.2978 | 2.4798 |
2 | 1.2136 | 1.9835 |
3 | 1.1055 | 1.1536 |
Total | 5.6169 | 5.6169 |
The total amount of variation explained by the 3 factors remains the same. Rotations, among a fixed number of factors, do not change how much of the variation is explained by the model. The fit is equally good regardless of what rotation is used.
However, notice what happened to the first factor. We see a fairly large decrease in the amount of variation explained by the first factor. We obtained a cleaner interpretation of the data but it costs us something somewhere. The cost is that the variation explained by the first factor is distributed among the latter two factors, in this case mostly to the second factor.
The total amount of variation explained by the rotated factor model is the same, but the contributions are not the same from the individual factors. We gain a cleaner interpretation, but the first factor does not explain as much of the variation. However, this would not be considered a particularly large cost if we are still interested in these three factors.
Rotation cleans up the interpretation. Ideally, we should find that the numbers in each column are either far away from zero or close to zero. Numbers close to +1 or -1 or 0 in each column give the ideal or cleanest interpretation. If a rotation can achieve this goal, then that is wonderful. However, observed data are seldom this cooperative!
Nevertheless, recall that the objective is data interpretation. The success of the analysis can be judged by how well it helps you to make sense of your data If the result gives you some insight as to the pattern of variability in the data, even without being perfect, then the analysis was successful.
Factor scores are similar to the principal components in the previous lesson. Just as we plotted principal components against each other, a similar scatter plot of factor scores is also helpful. We also might use factor scores as explanatory variables in future analyses. It may even be of interest to use the factor score as the dependent variable in a future analysis.
The methods for estimating factor scores depend on the method used to carry out the principal components analysis. The vectors of common factors f are of interest. There are m unobserved factors in our model and we would like to estimate those factors. Therefore, given the factor model:
\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}; i = 1,2,\dots, n,\)
we may wish to estimate the vectors of factor scores
\(\mathbf{f_1, f_2, \dots, f_n}\)
for each observation.
There are a number of different methods for estimating factor scores from the data. These include:
By default, this is the method that SAS uses if you use the principal component method. The difference between the \(j^{th}\) variable on the \(i^{th}\) subject and its value under the factor model is computed. The \(\mathbf{L}\) 's are factor loadings and the f 's are the unobserved common factors. The vector of common factors for subject i , or \( \hat{\mathbf{f}}_i \), is found by minimizing the sum of the squared residuals:
\[\sum_{j=1}^{p}\epsilon^2_{ij} = \sum_{j=1}^{p}(y_{ij}-\mu_j-l_{j1}f_1 - l_{j2}f_2 - \dots - l_{jm}f_m)^2 = (\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})'(\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})\]
This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:
\(\mathbf{\hat{f}_i = (L'L)^{-1}L'(Y_i-\boldsymbol{\mu})}\)
In practice, we substitute our estimated factor loadings into this expression as well as the sample mean for the data:
\(\mathbf{\hat{f}_i = \left(\hat{L}'\hat{L}\right)^{-1}\hat{L}'(Y_i-\bar{y})}\)
Using the principal component method with the unrotated factor loadings, this yields:
\[\mathbf{\hat{f}_i} = \left(\begin{array}{c} \frac{1}{\sqrt{\hat{\lambda}_1}}\mathbf{\hat{e}'_1(Y_i-\bar{y})}\\ \frac{1}{\sqrt{\hat{\lambda}_2}}\mathbf{\hat{e}'_2(Y_i-\bar{y})}\\ \vdots \\ \frac{1}{\sqrt{\hat{\lambda}_m}}\mathbf{\hat{e}'_m(Y_i-\bar{y})}\end{array}\right)\]
\(e_i\) through \(e_m\) are our first m eigenvectors.
The difference between WLS and OLS is that the squared residuals are divided by the specific variances as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances. The factor model fits the data best for variables with low specific variances. The variables with low specific variances should give us more information regarding the true values for the specific factors.
Therefore, for the factor model:
\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}\)
we want to find \(\boldsymbol{f_i}\) that minimizes
\( \sum\limits_{j=1}^{p}\frac{\epsilon^2_{ij}}{\Psi_j} = \sum\limits_{j=1}^{p}\frac{(y_{ij}-\mu_j - l_{j1}f_1 - l_{j2}f_2 -\dots - l_{jm}f_m)^2}{\Psi} = \mathbf{(Y_i-\boldsymbol{\mu}-Lf_i)'\Psi^{-1}(Y_i-\boldsymbol{\mu}-Lf_i)}\)
The solution is given by this expression where \(\mathbf{\Psi}\) is the diagonal matrix whose diagonal elements are equal to the specific variances:
\(\mathbf{\hat{f}_i = (L'\Psi^{-1}L)^{-1}L'\Psi^{-1}(Y_i-\boldsymbol{\mu})}\)
and can be estimated by substituting the following:
\(\mathbf{\hat{f}_i = (\hat{L}'\hat{\Psi}^{-1}\hat{L})^{-1}\hat{L}'\hat{\Psi}^{-1}(Y_i-\bar{y})}\)
This method is used for maximum likelihood estimates of factor loadings. A vector of the observed data, supplemented by the vector of factor loadings for the i th subject, is considered.
The joint distribution of the data \(\boldsymbol{Y}_i\) and the factor \(\boldsymbol{f}_i\) is
\(\left(\begin{array}{c}\mathbf{Y_i} \\ \mathbf{f_i}\end{array}\right) \sim N \left[\left(\begin{array}{c}\mathbf{\boldsymbol{\mu}} \\ 0 \end{array}\right), \left(\begin{array}{cc}\mathbf{LL'+\Psi} & \mathbf{L} \\ \mathbf{L'} & \mathbf{I}\end{array}\right)\right]\)
Using this we can calculate the conditional expectation of the common factor score \(\boldsymbol{f}_i\) given the data \(\boldsymbol{Y}_i\) as expressed here:
\(E(\mathbf{f_i|Y_i}) = \mathbf{L'(LL'+\Psi)^{-1}(Y_i-\boldsymbol{\mu})}\)
This suggests the following estimator by substituting in the estimates for L and \(\mathbf{\Psi}\):
\(\mathbf{\hat{f}_i = \hat{L}'\left(\hat{L}\hat{L}'+\hat{\Psi}\right)^{-1}(Y_i-\bar{y})}\)
There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.
\(\mathbf{\tilde{f}_i = \hat{L}'S^{-1}(Y_i-\bar{y})}\)
In this lesson we learned about:
Skynesher / Getty Images
Types of factor analysis, advantages and disadvantages of factor analysis, how is factor analysis used in psychology.
Like many methods encountered by those studying psychology , factor analysis has a long history.
The primary goal of factor analysis is to distill a large data set into a working set of connections or factors.
It was originally discussed by British psychologist Charles Spearman in the early 20th century and has gone on to be used in not only psychology but in other fields that often rely on statistical analyses,
But what is it, what are some real-world examples, and what are the different types? In this article, we'll answer all of those questions.
The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD , who works at the University of California-Irvine, uses factor analysis in her work on attachment.
She is doing research that looks into how people perceive relationships and how they connect to one another. She gives the example of providing a hypothetical questionnaire with 100 items on it and using factor analysis to drill deeper into the data. "So, rather than looking at each individual item on its own I'd rather say, 'Is there is there any way in which these items kind of cluster together or go together so that I can... create units of analysis that are bigger than the individual items."
Factor analysis is looking to identify patterns where it is assumed that there are already connections between areas of the data.
One common example of a factor analysis is when you are taking something not easily quantifiable, like socio-economic status , and using it to group together highly correlated variables like income level and types of jobs.
Factor analysis isn't just used in psychology but also deployed in fields like sociology, business, and technology sector fields like machine learning.
There are two types of factor analysis that are most commonly referred to: exploratory factor analysis and confirmatory factor analysis.
Here are the two types of factor analysis:
In an exploratory analysis, you are being a little bit more open-minded as a researcher because you are using this type of analysis to provide some clarity in your data set that you haven't yet found. It's an approach that Borelli uses in her own research.
On the other hand, if you're using a confirmatory factor analysis you are using the assumptions or theoretical findings you have already identified to drive your statistical model.
Unlike in an exploratory factor analysis, where the relationships between factors and variables are more open, a confirmatory factor analysis requires you to select which variables you are testing for. In Borelli's words:
"When you do a confirmatory factor analysis, you kind of tell your analytic program what you think the data should look like, in terms of, 'I think it should have these two factors and this is the way I think it should look.'"
Let's take a look at the advantages and disadvantages of factor analysis.
A main advantage of a factor analysis is that it allows researchers to reduce a number of variables by combining them into a single factor.
When answering your research questions, it's a lot easier to be working with three variables than thirty, for example.
Disadvantages include that the factor analysis relies on the quality of the data, and also may allow for different interpretations of the data. For example, during one study, Borelli found that after deploying a factor analysis, she was still left with results that didn't connect well with what had been found in hundreds of other studies .
Due to the nature of the sample being new and being more culturally diverse than others being explored, she used an exploratory factor analysis that left her with more questions than answers.
The goal of factor analysis in psychology is often to make connections that allow researchers to develop models with common factors in ways that might be hard or impossible to observe otherwise.
So, for example, intelligence is a difficult concept to directly observe. However, it can be inferred from factors that we can directly measure on specific tests.
Factor analysis has often been used in the field of psychology to help us better understand the structure of personality.
This is due to the multitude of factors researchers have to consider when it comes to understanding the concept of personality. This area of personality research is certainly not new, with easily findable research dating as far back as 1942 recognizing its power in personality research.
Britannica. Charles E. Spearman .
United State Environmental Protection Agency. Exploratory Data Analysis .
Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data . Psychol Methods . 2004;9(4):466-491. doi:10.1037/1082-989X.9.4.466
Wolfle D. Factor analysis in the study of personality . The Journal of Abnormal and Social Psychology. 1942;37(3):393â397.
By John Loeppky John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.
Introduction to confirmatory factor analysis.
Confirmatory Factor Analysis (CFA) is a sophisticated statistical technique used to verify the factor structure of a set of observed variables. It allows researchers to test the hypothesis that a relationship between observed variables and their underlying latent constructs exists. CFA is distinct from Exploratory Factor Analysis (EFA), where the structure of the data is not predefined and is instead determined through the analysis.
The primary goal of CFA is to confirm whether the data fits a hypothesized measurement model based on theory or prior research. This involves several critical steps:
1. Defining Constructs : The process begins by clearly defining the theoretical constructs. This stage often involves a pretest to evaluate the construct’s items and ensure they are well-defined and represent the concept accurately.
2. Developing the Measurement Model : In CFA, it is essential to establish the concept of unidimensionality, where each factor or construct is represented by multiple observed variables that are presumed to measure only that specific construct. Typically, a good practice involves having at least three items per construct.
3. Specifying the Model : Researchers must specify the number of factors and the pattern of loadings (which variables load on which factors). This specification is based on theoretical expectations or results from previous studies.
4. Assessing Model Fit : The validity of the measurement model is assessed by comparing the theoretical model with the actual data. This includes examining factor loadings (with a standard threshold of 0.7 or higher for adequate loadings), and fit indices such as Chi-square, Root Mean Square Error of Approximation (RMSEA), Goodness of Fit Index (GFI), and Comparative Fit Index (CFI).
Can the proposed five factors in a 20-question instrument be identified and validated through the specific items designed to measure them?
Do four specific survey questions reliably measure a single underlying factor?
Multivariate Normality : The data should follow a multivariate normal distribution.
Sample Size : Adequate sample size is crucial, generally n > 200, to ensure reliable results.
Model Specification : The model should be correctly specified a priori based on theoretical or empirical justification.
Random Sampling : Data must be collected from a random sample to generalize findings.
Schedule a time to speak with an expert using the calendar below.
User Friendly Software
Transform raw data to written, interpreted, APA formatted CFA results in seconds.
CFA is an essential tool in the toolkit of researchers aiming to validate the structure of their measurement instruments. It provides a rigorous method to ensure that the data aligns with expected theoretical constructs, enhancing the reliability and validity of subsequent analyses based on these measurements.
Confirmatory factor analysis (CFA) and statistical software:
Usually, statistical software like Intellectus Statistics , AMOS , LISREL, and SAS are used for confirmatory factor analysis. In AMOS, visual paths are manually drawn on the graphic window and analysis is performed. In LISREL, confirmatory factor analysis can be performed graphically as well as from the menu. In SAS, confirmatory factor analysis can be performed by using the programming languages.
Related Pages:
To Reference This Page:
Statistics Solutions. (2013). Confirmatory Factor Analysis . Retrieved from https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/confirmatory-factor-analysis/
Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:
Data Analysis Plan
Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )
Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar , or email [email protected]
Surveys can be a rich source of information, including not only factual questions, but asking about attitudes, behaviours, and activities. The results from a survey analysis can also provide more than just percentages, averages and crosstabulations.
Factor analysis is a statistical technique that combines questions that are related (correlated) into a smaller number of factors, to create more robust measures.
By combining questions or variables and using the resulting measures rather than analysing and reporting the questions individually, factor analysis is useful as a dimensionality-reduction technique (other dimension-reduction techniques include Principal Component Analysis [PCA], for example). And being based on correlations it can help to avoid some of the problems of collinearity that can arise in analyses. Furthermore, factors can often provide more meaningful results, by capturing overall, intrinsic characteristics and qualities, rather than individual, separate questions.
It is worth noting though that factor analysis can be used with many types of data, not just with survey responses. It can be used to analyse, for example, items bought in shops or supermarkets, time spent in different office areas (solo pods, meeting rooms, conference spaces, etc.), patient reported outcomes (PROs) (e.g., of pain or depression), and so on. Hence, factor analysis can not only help you to understand your students’, your customers’ or your employees’ attitudes and opinions, it can be used to help uncover their preferences and behaviours via transaction or office utilisation data, for example.
In this blog we show factor analysis in action.
A factor, sometimes called a latent trait or construct, is an intrinsic characteristic or quality. Factors are multi-faceted and difficult to measure directly. Examples being qualities like empathy, IQ, self-confidence, or ethos.
The theory of factor analysis is that these deeper level factors or latent traits underpin your actions and attitudes and also influence your responses to questions about these topics.
To illustrate factor analysis, we use some data from the Organisation for Economic Co-operation and Developmentâs (OECD) Programme for International Student Assessment (PISA) as an example. The PISA study runs every 3 years, across many countries, and assesses the numeracy, literacy and science knowledge and skills of 15-year-old students. The OECD make PISA data available for secondary analyses.
PISA also includes a teacher questionnaire. The 2018 questionnaire asked teachers to answer the following set of questions:
Our example is based on the responses of teachers in the UK.
The first step in factor analysis is to calculate the correlations between each of the questions. As the responses are on a Likert scale (from ‘ strongly disagree ‘ to ‘ strongly agree ‘) and are ordered categorical ( ordinal ) data rather than on a continuous scale, we calculate polychoric correlations , which are appropriate for these sorts of data (unlike, for example, Pearson product-moment correlations).
In the next step, the appropriate number of factors to be extracted is determined (ensuring, for example, that a sufficient proportion of the variation in the data is explained), and factor solutions are calculated based on the correlations (e.g., perhaps using the fa() function in the psych package in the statistical software R ). Factor analysis groups together questions that are highly correlated to derive a smaller set of factors that retain a high proportion of the information in the original questions. We wonât go into the detail of how this is done here (as itâs not the focus of this post), but in our example we find that 2 factors are potentially a good solution.
A key aim of factor analysis is to obtain factors that are interpretable. To interpret the factors, we look at the “factor loadings” from the factor analysis output. Each factor has a set of factor loadings corresponding to the input questions. These are the correlations between each input question and the factor, the underlying latent construct.
We identify the questions that are strongly correlated with each of the underlying factors. Strong correlations are indicated by values close to +1 or to -1 (positively and negatively correlated, respectively), weaker correlations are values closer to zero.
In the example below, we see that the 1 st , 2 nd , 4 th and 6 th statements are strongly correlated with factor 1, and the 3 rd , 5 th and 7 th statements are strongly correlated with factor 2. (These are indicated by the shaded cells.) Though, we note that statements 8, 9 and 10 also correlate with the factors, but to a lesser extent.
Factor 1 â satisfaction with teaching.
The strong correlations between the statements with shaded cells and factor 1 indicate that teachers who agreed that “ The advantages of being a teacher clearly outweigh the disadvantages ” also tended to agree with the statement “ If I could decide again, I would still choose to work as a teacher “. In addition, teachers who agreed with these first 2 statements also tended to disagree (indicated by the negative correlation) with the statements “ I regret that I decided to become a teacher ” and “ I wonder whether it would have been better to choose another profession .”
The converse is also true; teachers who disagreed with the first two statements tended to agree to the latter two shaded statements.
Collectively these four statements provide a measure of teachers’ satisfaction with being a teacher: their satisfaction with their profession.
In factor 2, the strong correlations between the 3 rd , 5 th and 7 th statements (with shaded cells) and the factor indicate that teachers who agreed that “ I enjoy working at this school ” also tended to agree with the statement “ I would recommend my school as a good place to work “, and additionally tended to disagree (indicated by the negative correlation) with the statement “ I would like to change to another school if that were possible “. Again, the converse is also true; teachers who agreed with the first statement tended to disagree with the latter two statements.
Collectively these three statements provide a measure of teachers’ satisfaction with their particular school.
Ideally the correlations between statements and factors should show associations between each statement and only one of the factors (or neither of the factors). In the factor loading matrix above, the final statement (“ All in All, I Am Satisfied with My Job “) is positively correlated with both Factor 1: teachers’ satisfaction with their profession, and Factor 2: teachers’ satisfaction with their school, though the correlations are weaker than for the shaded statements (and are described as moderate rather than strong). It is quite sensible in terms of interpretation that teachers’ overall job satisfaction is (positively) related to both their satisfaction with the profession and with their school. However, since the correlations associated with this statement are not strong, the factors may be improved by excluding this statement, and the other statements with small correlations, from the analysis, as they may be adding more noise than information.
Having identified and interpreted the factors, we can use the data and the factor solution to calculate factor scores: in this case use the teachers’ responses to calculate the ‘satisfaction with the profession’ and ‘satisfaction with their school’ measures for each teacher. (A weighted combination of the factor loadings multiplied by the corresponding question responses gives the factor score, measuring the relative magnitude of each factor (i.e., trait), for each teacher.)
Teachers’ scores, whether they are high or low, or nearer the average, will reflect (because they are calculated from) their levels of agreement and disagreement and the strength of their opinions. And so, we have taken responses to (in this case) 10 categorical variables and created two scale measures (continuous variables), which provide more robust measures of teachers’ satisfaction than the survey questions individually.
While factor analysis is a technique in its own right, it is not usually the analysis outcome itself. The derived factors can be really useful when used in subsequent analyses. They can be used to compare or describe different groups of teachers, for example to answer hypotheses such as are older teachers more satisfied with their profession than younger teachers? They can be used in statistical models, for example to explore whether and how students’ outcomes vary according to their teachers’ levels of satisfaction or what are the drivers of teachers’ satisfaction? They can be used with cluster analysis to identify groups of teachers according to their characteristics. Â Similarly, in another context, with a customer or brand survey, for example, we could investigate whether customer satisfaction might be associated with a particular customer demographic, or cluster customers into different groups based on their attitudes, opinions, preferences, and shopping behaviours, to better understand your customer base and brand positioning, and target products and/or advertisements accordingly.
Select Statistical Services Ltd
Oxygen House, Grenadier Road, Exeter Business Park,
Exeter EX1 3LH
t: 01392 440426
COMMENTS
Use factor analysis to identify a smaller number of latent factors that cause a larger number of observable variables to covary.
Abstract. Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements.
Factor analysis isn't a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research, as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.
Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.
Factor analysis is a statistical technique that aids in reducing a complex dataset into simpler, more manageable components, unveiling patterns and relationships within the data. By identifying underlying structures, it provides valuable insights into the variables and how they relate to one another. This blog explores the definition, types, examples, and sample questions of factor analysis to ...
Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...
Factor Analysis Output V - Rotated Component Matrix Our rotated component matrix (below) answers our second research question: " which variables measure which factors?
What do we need factor analysis for? What are the modeling assumptions? How to specify, fit, and interpret factor models? What is the difference between exploratory and confirmatory factor analysis? What is and how to assess model identifiability? What is
Abstract Factor analysis is a multivariate statistical approach commonly used in psychology, education, and more recently in the health-related professions. This paper will attempt to provide novice researchers with a simplified approach to undertaking exploratory factor analysis (EFA). As the paramedic body of knowledge continues to grow, indeed into scale and instrument psychometrics, it is ...
Most research fields consider this a strong association for a factor analysis. Two other variables, education and occupation, are also associated with Factor 1.
Research questions Factor Analysis A possible research question might be: Can different personality traits such as outgoing, curious, sociable, or helpful be grouped into personality types such as conscientious, extraverted, or agreeable?
This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).
Learn about factor analysis, a statistical method for reducing variables and extracting common variance for further analysis.
This introductory chapter discusses the purposes of factor analysis and dimension reduction, limitations of factor analysis, and common research questions associated with factor analysis.
Factor analysis is commonly used in psychometrics, personality psychology, biology, marketing, product management, operations research, finance, and machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables.
Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon.
The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD, who works at the University of California-Irvine, uses factor analysis in her work on attachment. She is doing research that looks into how people perceive relationships and how they connect to one another.
Factor analysis is a sophisticated statistical method used to group and reduce a large set of variables into fewer, more manageable factors or dimensions. This technique is crucial for uncovering latent variables or constructs that are not directly observable but are inferred from the relationships among observed variables. The primary aim of factor analysis is to simplify complex data sets ...
Factor Analysis - Science method Explore the latest questions and answers in Factor Analysis, and find Factor Analysis experts.
The factor analysis video series is available for FREE as an iTune book for download on the iPad. The ISBN is 978-1-62847-041-3. The title is "Factor Analysis". Waller and Lumadue are the authors ...
Discover how confirmatory factor analysis can identify and validate factors and measure reliability in survey questions.
The results from a survey analysis can also provide more than just percentages, averages and crosstabulations. Factor analysis is a statistical technique that combines questions that are related (correlated) into a smaller number of factors, to create more robust measures. By combining questions or variables and using the resulting measures ...