• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Factor Analysis Guide with an Example

By Jim Frost 19 Comments

What is Factor Analysis?

Factor analysis uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables.

Analysts often refer to the observed variables as indicators because they literally indicate information about the factor. Factor analysis treats these indicators as linear combinations of the factors in the analysis plus an error. The procedure assesses how much of the variance each factor explains within the indicators. The idea is that the latent factors create commonalities in some of the observed variables.

For example, socioeconomic status (SES) is a factor you can’t measure directly. However, you can assess occupation, income, and education levels. These variables all relate to socioeconomic status. People with a particular socioeconomic status tend to have similar values for the observable variables. If the factor (SES) has a strong relationship with these indicators, then it accounts for a large portion of the variance in the indicators.

The illustration below illustrates how the four hidden factors in blue drive the measurable values in the yellow indicator tags.

Factor analysis illustration.

Researchers frequently use factor analysis in psychology, sociology, marketing, and machine learning.

Let’s dig deeper into the goals of factor analysis, critical methodology choices, and an example. This guide provides practical advice for performing factor analysis.

Analysis Goals

Factor analysis simplifies a complex dataset by taking a larger number of observed variables and reducing them to a smaller set of unobserved factors. Anytime you simplify something, you’re trading off exactness with ease of understanding. Ideally, you obtain a result where the simplification helps you better understand the underlying reality of the subject area. However, this process involves several methodological and interpretative judgment calls. Indeed, while the analysis identifies factors, it’s up to the researchers to name them! Consequently, analysts debate factor analysis results more often than other statistical analyses.

While all factor analysis aims to find latent factors, researchers use it for two primary goals. They either want to explore and discover the structure within a dataset or confirm the validity of existing hypotheses and measurement instruments.

Exploratory Factor Analysis (EFA)

Researchers use exploratory factor analysis (EFA) when they do not already have a good understanding of the factors present in a dataset. In this scenario, they use factor analysis to find the factors within a dataset containing many variables. Use this approach before forming hypotheses about the patterns in your dataset. In exploratory factor analysis, researchers are likely to use statistical output and graphs to help determine the number of factors to extract.

Exploratory factor analysis is most effective when multiple variables are related to each factor. During EFA, the researchers must decide how to conduct the analysis (e.g., number of factors, extraction method, and rotation) because there are no hypotheses or assessment instruments to guide them. Use the methodology that makes sense for your research.

For example, researchers can use EFA to create a scale, a set of questions measuring one factor. Exploratory factor analysis can find the survey items that load on certain constructs.

Confirmatory Factor Analysis (CFA)

Confirmatory factor analysis (CFA) is a more rigid process than EFA. Using this method, the researchers seek to confirm existing hypotheses developed by themselves or others. This process aims to confirm previous ideas, research, and measurement and assessment instruments. Consequently, the nature of what they want to verify will impose constraints on the analysis.

Before the factor analysis, the researchers must state their methodology including extraction method, number of factors, and type of rotation. They base these decisions on the nature of what they’re confirming. Afterwards, the researchers will determine whether the model’s goodness-of-fit and pattern of factor loadings match those predicted by the theory or assessment instruments.

In this vein, confirmatory factor analysis can help assess construct validity. The underlying constructs are the latent factors, while the items in the assessment instrument are the indicators. Similarly, it can also evaluate the validity of measurement systems. Does the tool measure the construct it claims to measure?

For example, researchers might want to confirm factors underlying the items in a personality inventory. Matching the inventory and its theories will impose methodological choices on the researchers, such as the number of factors.

We’ll get to an example factor analysis in short order, but first, let’s cover some key concepts and methodology choices you’ll need to know for the example.

Learn more about Validity and Construct Validity .

In this context, factors are broader concepts or constructs that researchers can’t measure directly. These deeper factors drive other observable variables. Consequently, researchers infer the properties of unobserved factors by measuring variables that correlate with the factor. In this manner, factor analysis lets researchers identify factors they can’t evaluate directly.

Psychologists frequently use factor analysis because many of their factors are inherently unobservable because they exist inside the human brain.

For example, depression is a condition inside the mind that researchers can’t directly observe. However, they can ask questions and make observations about different behaviors and attitudes. Depression is an invisible driver that affects many outcomes we can measure. Consequently, people with depression will tend to have more similar responses to those outcomes than those who are not depressed.

For similar reasons, factor analysis in psychology often identifies and evaluates other mental characteristics, such as intelligence, perseverance, and self-esteem. The researchers can see how a set of measurements load on these factors and others.

Method of Factor Extraction

The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA).

You should use either ML or PAF most of the time.

Use ML when your data follow a normal distribution. In addition to extracting factor loadings, it also can perform hypothesis tests, construct confidence intervals, and calculate goodness-of-fit statistics .

Use PAF when your data violates multivariate normality. PAF doesn’t assume that your data follow any distribution, so you could use it when they are normally distributed. However, this method can’t provide all the statistical measures as ML.

PCA is the default method for factor analysis in some statistical software packages, but it isn’t a factor extraction method. It is a data reduction technique to find components. There are technical differences, but in a nutshell, factor analysis aims to reveal latent factors while PCA is only for data reduction. While calculating the components, PCA doesn’t assess the underlying commonalities that unobserved factors cause.

PCA gained popularity because it was a faster algorithm during a time of slower, more expensive computers. If you’re using PCA for factor analysis, do some research to be sure it’s the correct method for your study. Learn more about PCA in, Principal Component Analysis Guide and Example .

There are other methods of factor extraction, but the factor analysis literature has not strongly shown that any of them are better than maximum likelihood or principal axis factoring.

Number of Factors to Extract

You need to specify the number of factors to extract from your data except when using principal component components. The method for determining that number depends on whether you’re performing exploratory or confirmatory factor analysis.

Exploratory Factor Analysis

In EFA, researchers must specify the number of factors to retain. The maximum number of factors you can extract equals the number of variables in your dataset. However, you typically want to reduce the number of factors as much as possible while maximizing the total amount of variance the factors explain.

That’s the notion of a parsimonious model in statistics. When adding factors, there are diminishing returns. At some point, you’ll find that an additional factor doesn’t substantially increase the explained variance. That’s when adding factors needlessly complicates the model. Go with the simplest model that explains most of the variance.

Fortunately, a simple statistical tool known as a scree plot helps you manage this tradeoff.

Use your statistical software to produce a scree plot. Then look for the bend in the data where the curve flattens. The number of points before the bend is often the correct number of factors to extract.

The scree plot below relates to the factor analysis example later in this post. The graph displays the Eigenvalues by the number of factors. Eigenvalues relate to the amount of explained variance.

Scree plot that helps us decide the number of factors to extract.

The scree plot shows the bend in the curve occurring at factor 6. Consequently, we need to extract five factors. Those five explain most of the variance. Additional factors do not explain much more.

Some analysts and software use Eigenvalues > 1 to retain a factor. However, simulation studies have found that this tends to extract too many factors and that the scree plot method is better. (Costello & Osborne, 2005).

Of course, as you explore your data and evaluate the results, you can use theory and subject-area knowledge to adjust the number of factors. The factors and their interpretations must fit the context of your study.

Confirmatory Factor Analysis

In CFA, researchers specify the number of factors to retain using existing theory or measurement instruments before performing the analysis. For example, if a measurement instrument purports to assess three constructs, then the factor analysis should extract three factors and see if the results match theory.

Factor Loadings

In factor analysis, the loadings describe the relationships between the factors and the observed variables. By evaluating the factor loadings, you can understand the strength of the relationship between each variable and the factor. Additionally, you can identify the observed variables corresponding to a specific factor.

Interpret loadings like correlation coefficients . Values range from -1 to +1. The sign indicates the direction of the relations (positive or negative), while the absolute value indicates the strength. Stronger relationships have factor loadings closer to -1 and +1. Weaker relationships are close to zero.

Stronger relationships in the factor analysis context indicate that the factors explain much of the variance in the observed variables.

Related post : Correlation Coefficients

Factor Rotations

In factor analysis, the initial set of loadings is only one of an infinite number of possible solutions that describe the data equally. Unfortunately, the initial answer is frequently difficult to interpret because each factor can contain middling loadings for many indicators. That makes it hard to label them. You want to say that particular variables correlate strongly with a factor while most others do not correlate at all. A sharp contrast between high and low loadings makes that easier.

Rotating the factors addresses this problem by maximizing and minimizing the entire set of factor loadings. The goal is to produce a limited number of high loadings and many low loadings for each factor.

This combination lets you identify the relatively few indicators that strongly correlate with a factor and the larger number of variables that do not correlate with it. You can more easily determine what relates to a factor and what does not. This condition is what statisticians mean by simplifying factor analysis results and making them easier to interpret.

Graphical illustration

Let me show you how factor rotations work graphically using scatterplots .

Factor analysis starts by calculating the pattern of factor loadings. However, it picks an arbitrary set of axes by which to report them. Rotating the axes while leaving the data points unaltered keeps the original model and data pattern in place while producing more interpretable results.

To make this graphable in two dimensions, we’ll use two factors represented by the X and Y axes. On the scatterplot below, the six data points represent the observed variables, and the X and Y coordinates indicate their loadings for the two factors. Ideally, the dots fall right on an axis because that shows a high loading for that factor and a zero loading for the other.

Scatterplot of the initial factor loadings.

For the initial factor analysis solution on the scatterplot, the points contain a mixture of both X and Y coordinates and aren’t close to a factor’s axis. That makes the results difficult to interpret because the variables have middling loads on all the factors. Visually, they’re not clumped near axes, making it difficult to assign the variables to one.

Rotating the axes around the scatterplot increases or decreases the X and Y values while retaining the original pattern of data points. At the blue rotation on the graph below, you maximize one factor loading while minimizing the other for all data points. The result is that the loads are high on one indicator but low on the other.

Scatterplot of rotated loadings in a factor analysis.

On the graph, all data points cluster close to one of the two factors on the blue rotated axes, making it easy to associate the observed variables with one factor.

Types of Rotations

Throughout these rotations, you work with the same data points and factor analysis model. The model fits the data for the rotated loadings equally as well as the initial loadings, but they’re easier to interpret. You’re using a different coordinate system to gain a different perspective of the same pattern of points.

There are two fundamental types of rotation in factor analysis, oblique and orthogonal.

Oblique rotations allow correlation amongst the factors, while orthogonal rotations assume they are entirely uncorrelated.

Graphically, orthogonal rotations enforce a 90° separation between axes, as shown in the example above, where the rotated axes form right angles.

Oblique rotations are not required to have axes forming right angles, as shown below for a different dataset.

Oblique rotation for a factor analysis.

Notice how the freedom for each axis to take any orientation allows them to fit the data more closely than when enforcing the 90° constraint. Consequently, oblique rotations can produce simpler structures than orthogonal rotations in some cases. However, these results can contain correlated factors.

Promax Varimax
Oblimin Equimax
Direct Quartimin Quartimax

In practice, oblique rotations produce similar results as orthogonal rotations when the factors are uncorrelated in the real world. However, if you impose an orthogonal rotation on genuinely correlated factors, it can adversely affect the results. Despite the benefits of oblique rotations, analysts tend to use orthogonal rotations more frequently, which might be a mistake in some cases.

When choosing a rotation method in factor analysis, be sure it matches your underlying assumptions and subject-area knowledge about whether the factors are correlated.

Factor Analysis Example

Imagine that we are human resources researchers who want to understand the underlying factors for job candidates. We measured 12 variables and perform factor analysis to identify the latent factors. Download the CSV dataset: FactorAnalysis

The first step is to determine the number of factors to extract. Earlier in this post, I displayed the scree plot, which indicated we should extract five factors. If necessary, we can perform the analysis with a different number of factors later.

For the factor analysis, we’ll assume normality and use Maximum Likelihood to extract the factors. I’d prefer to use an oblique rotation, but my software only has orthogonal rotations. So, we’ll use Varimax. Let’s perform the analysis!

Interpreting the Results

Statistical output for the factor analysis example.

In the bottom right of the output, we see that the five factors account for 81.8% of the variance. The %Var row along the bottom shows how much of the variance each explains. The five factors are roughly equal, explaining between 13.5% to 19% of the variance. Learn about Variance .

The Communality column displays the proportion of the variance the five factors explain for each variable. Values closer to 1 are better. The five factors explain the most variance for Resume (0.989) and the least for Appearance (0.643).

In the factor analysis output, the circled loadings show which variables have high loadings for each factor. As shown in the table below, we can assign labels encompassing the properties of the highly loading variables for each factor.

1 Relevant Background Academic record, Potential, Experience
2 Personal Characteristics Confidence, Likeability, Appearance
3 General Work Skills Organization, Communication
4 Writing Skills Letter, Resume
5 Overall Fit Company Fit, Job Fit

In summary, these five factors explain a large proportion of the variance, and we can devise reasonable labels for each. These five latent factors drive the values of the 12 variables we measured.

Hervé Abdi (2003), “Factor Rotations in Factor Analyses,” In: Lewis-Beck M., Bryman, A., Futing T. (Eds.) (2003). Encyclopedia of Social Sciences Research Methods . Thousand Oaks (CA): Sage.

Brown, Michael W., (2001) “ An Overview of Analytic Rotation in Exploratory Factor Analysis ,” Multivariate Behavioral Research , 36 (1), 111-150.

Costello, Anna B. and Osborne, Jason (2005) “ Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis ,” Practical Assessment, Research, and Evaluation : Vol. 10 , Article 7.

Share this:

factor analysis research question

Reader Interactions

' src=

May 26, 2024 at 8:51 am

Good day Jim, I am running in troubles in terms of the item analysis on the 5 point Likert scale that I am trying to create. The thing is, is that my CFI is around 0.9 and TLI is around 0.8 which is good but my RMSEA and SRMR has a awful result as the RMSEA is around 0.1 and SRMR is 0.2. And it is a roadblock for me, I want to ask on how I can improve my RMSEA and SRMR? so that it would reach the cut off.

I hope that his message would reach you and thank you for taking the time and reading and responding to my troubled question.

' src=

May 15, 2024 at 11:27 am

Good day, Sir Jim. I am currently trying to create a 5-Likert scale that tries to measure National Identity Conformity in three ways: (1) Origin – (e.g., Americans are born in/from America), (2) Culture (e.g., Americans are patriotic) and (3) Belief (e.g., Americans embrace being Americans).

In the process of establishing the scale’s validity, I was told to use Exploratory Factor Analysis, and I would like to ask what methods of extraction and rotation can be best used to ensure that the inter-item validity of my scale is good. I would also like to understand how I can avoid crossloading or limit crossloading factors.

' src=

May 15, 2024 at 3:13 pm

I discuss those issues in this post. I’d recommend PAF as the method of extraction because your data being Likert scale won’t be normally distribution. Read the Method of Factor Extraction section for more information.

As for cross-loading, the method of rotation can help with that. The choice depends largely on subject-area knowledge and what works best for your data, so I can’t provide a suggested method. Read the Factor Rotations section for more information about that. For instance, if you get cross-loadings with orthogonal rotations, using an oblique rotation might help.

If factor rotation doesn’t sufficiently reduce cross-loading, you might need to rework your questions so they’re more distinct, remove problematic items, or increase your sample size (can provide more stable factor solutions and clearer patterns of loadings). In this scenario where changing rotations doesn’t help, you’ll need to determine whether the underlying issue is with your questions or having to small of a sample size.

I hope that helps!

' src=

March 6, 2024 at 10:20 pm

What does negative loadings mean? How to proceed further with these loadings?

March 6, 2024 at 10:44 pm

Loadings are like correlation coefficients and range from -1 to +1. More extreme positive and negative values indicate stronger relationships. Negative loadings indicate a negative relationship between the latent factors and observed variables. Highly negative values are as good as highly positive values. I discuss this in detail in the the Factor Loadings section of this post.

' src=

March 6, 2024 at 10:10 am

Good day Jim,

The methodology seems loaded with opportunities for errors. So often we are being asked to translate a nebulous English word into some sort of mathematical descriptor. As an example, in the section labelled ‘Interpreting the Results’, what are we to make of the words ‘likeability’ or ‘self-confidence’ ? How can we possibly evaluate those things…and to three significant decimal places ?

You Jim, understand and use statistical methods correctly. Yet, too often people who apply statistics fail to examine the language of their initial questions and end up doing poor analysis. Worse, many don’t understand the software they use.

On a more cheery note, keep up the great work. The world needs a thousand more of you.

March 6, 2024 at 5:08 pm

Thanks for the thoughtful comment. I agree with your concerns.

Ideally, all of those attributes are measured using validated measurement scales. The field of psychology is pretty good about that for terms that seem kind of squishy. For instance, they usually have thorough validation processes for personality traits, etc. However, your point is well taken, you need to be able to trust your data.

All statistical analyses depend on thorough subject-area knowledge, and that’s very true for factor analysis. You must have a solid theoretical understanding of these latent factors from extensive research before considering FA. Then FA can see if there’s evidence that they actually exist. But, I do agree with you that between the rotations and having to derive names to associate with the loadings, it can be a fairly subjective process.

Thanks so much for your kind words! I appreciate them because I do strive for accuracy.

' src=

March 2, 2024 at 8:44 pm

sir, i want to know that after successfully identifying my 3 factors with above give method now i want to regress on the data how to get single value for each factor rather than these number of values

' src=

February 28, 2024 at 7:48 am

Hello, Thanks for your effort on this post, it really helped me a lot. I want your recommendation for my case if you don’t mind.

I’m working on my research and I’ve 5 independent variables and 1 dependent variable, I want to use a factor analysis method in order to know which variable contributes the most in the dependent variable.

Also, what kind of data checks and preparations shall I make before starting the analysis.

Thanks in advance for your consideration.

February 28, 2024 at 1:46 pm

Based on the information you provided, I don’t believe factor analysis is the correct analysis for you.

Factor analysis is primarily used for understanding the structure of a set of variables and for reducing data dimensions by identifying underlying latent factors. It’s particularly useful when you have a large number of observed variables and believe that they are influenced by a smaller number of unobserved factors.

Instead, it sounds like you have the IVs and DV and want to understand the relationships between them. For that, I recommend multiple regression. Learn more in my post about When to Use Regression . After you settle on a model, there are several ways to Identify the Most Important Variables in the Model .

In terms of checking assumptions, familiarize yourself with the Ordinary Least Squares Regression Assumptions . Least squares regression is the most common and is a good place to start.

Best of luck with your analysis!

' src=

December 1, 2023 at 1:01 pm

What would be the eign value in efa

' src=

November 1, 2023 at 4:42 am

Hi Jim, this is an excellent yet succinct article on the topic. A very basic question, though: the dataset contains ordinal data. Is this ok? I’m a student in a Multivariate Statistics course, and as far as I’m aware, both PCA and common factor analysis dictate metric data. Or is it assumed that since the ordinal data has been coded into a range of 0-10, then the data is considered numeric and can be applied with PCA or CFA?

Sorry for the dumb question, and thank you.

November 1, 2023 at 8:00 pm

That’s a great question.

For the example in this post, we’re dealing with data on a 10 point scale where the differences between all points are equal. Consequently, we can treat discrete data as continuous data.

Now, to your question about ordinal data. You can use ordinal data with factor analysis however you might need to use specific methods.

For ordinal data, it’s often recommended to use polychoric correlations instead of Pearson correlations. Polychoric correlations estimate the correlation between two latent continuous variables that underlie the observed ordinal variables. This provides a more accurate correlation matrix for factor analysis of ordinal data.

I’ve also heard about categorical PCA and nonlinear Factor Analysis that use a monotonical transformation of ordinal data.

I hope that helps clarify it for you!

' src=

September 2, 2023 at 4:14 pm

Once identifying how much each variability the factors contribute, what steps could we take from here to make predictions about variables ?

September 2, 2023 at 6:53 pm

Hi Brittany,

Thanks for the great question! And thanks for you kind words in your other comment! 🙂

What you can do is calculate all the factor scores for each observation. Some software will do this for you as an option. Or, you can input values into the regression equations for the factor scores that are included in the output.

Then use these scores as the independent variables in regression analysis. From there, you can use the regression model to make predictions .

Ideally, you’d evaluate the regression model before making predictions and use cross validation to be sure that the model works for observations outside the dataset you used to fit the model.

September 2, 2023 at 4:13 pm

Wow! This was really helpful and structured very well for interpretation. Thank you!

' src=

October 6, 2022 at 10:55 am

I can imagine that Prof will have further explanations on this down the line at some point in future. I’m waiting… Thanks Prof Jim for your usual intuitive manner of explaining concepts. Funsho

' src=

September 26, 2022 at 8:08 am

Thanks for a very comprehensive guide. I learnt a lot. In PCA, we usually extract the components and use it for predictive modeling. Is this the case with Factor Analysis as well? Can we use factors as predictors?

September 26, 2022 at 8:27 pm

I have not used factors as predictors, but I think it would be possible. However, PCA’s goal is to maximize data reduction. This process is particularly valuable when you have many variables, low sample size and/or collinearity between the predictors. Factor Analysis also reduces the data but that’s not its primary goal. Consequently, my sense is that PCA is better for that predictive modeling while Factor Analysis is better for when you’re trying to understand the underlying factors (which you aren’t with PCA). But, again, I haven’t tried using factors in that way nor I have compared the results to PCA. So, take that with a grain of salt!

Comments and Questions Cancel reply

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/AmĂ©rica Latina
  • PortuguĂȘs Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Factor Analysis

Try Qualtrics for free

Factor analysis and how it simplifies research findings.

17 min read There are many forms of data analysis used to report on and study survey data. Factor analysis is best when used to simplify complex data sets with many variables.

What is factor analysis?

Factor analysis is the practice of condensing many variables into just a few, so that your research data is easier to work with.

For example, a retail business trying to understand customer buying behaviours might consider variables such as ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’. Factor analysis can help condense these variables into a single factor, such as ‘customer purchase satisfaction’.

customer purchase satisfaction tree

The theory is that there are deeper factors driving the underlying concepts in your data, and that you can uncover and work with them instead of dealing with the lower-level variables that cascade from them. Know that these deeper concepts aren’t necessarily immediately obvious – they might represent traits or tendencies that are hard to measure, such as extraversion or IQ.

Factor analysis is also sometimes called “dimension reduction”: you can reduce the “dimensions” of your data into one or more “super-variables,” also known as unobserved variables or latent variables. This process involves creating a factor model and often yields a factor matrix that organizes the relationship between observed variables and the factors they’re associated with.

As with any kind of process that simplifies complexity, there is a trade-off between the accuracy of the data and how easy it is to work with. With factor analysis, the best solution is the one that yields a simplification that represents the true nature of your data, with minimum loss of precision. This often means finding a balance between achieving the variance explained by the model and using fewer factors to keep the model simple.

Factor analysis isn’t a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research , as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

What is a factor?

In the context of factor analysis, a factor is a hidden or underlying variable that we infer from a set of directly measurable variables.

Take ‘customer purchase satisfaction’ as an example again. This isn’t a variable you can directly ask a customer to rate, but it can be determined from the responses to correlated questions like ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’.

While not directly observable, factors are essential for providing a clearer, more streamlined understanding of data. They enable us to capture the essence of our data’s complexity, making it simpler and more manageable to work with, and without losing lots of information.

Free eBook: 2024 global market research trends report

Key concepts in factor analysis

These concepts are the foundational pillars that guide the application and interpretation of factor analysis.

Central to factor analysis, variance measures how much numerical values differ from the average. In factor analysis, you’re essentially trying to understand how underlying factors influence this variance among your variables. Some factors will explain more variance than others, meaning they more accurately represent the variables they consist of.

The eigenvalue expresses the amount of variance a factor explains. If a factor solution (unobserved or latent variables) has an eigenvalue of 1 or above, it indicates that a factor explains more variance than a single observed variable, which can be useful in reducing the number of variables in your analysis. Factors with eigenvalues less than 1 account for less variability than a single variable and are generally not included in the analysis.

Factor score

A factor score is a numeric representation that tells us how strongly each variable from the original data is related to a specific factor. Also called the component score, it can help determine which variables are most influenced by each factor and are most important for each underlying concept.

Factor loading

Factor loading is the correlation coefficient for the variable and factor. Like the factor score, factor loadings give an indication of how much of the variance in an observed variable can be explained by the factor. High factor loadings (close to 1 or -1) mean the factor strongly influences the variable.

When to use factor analysis

Factor analysis is a powerful tool when you want to simplify complex data, find hidden patterns, and set the stage for deeper, more focused analysis.

It’s typically used when you’re dealing with a large number of interconnected variables, and you want to understand the underlying structure or patterns within this data. It’s particularly useful when you suspect that these observed variables could be influenced by some hidden factors.

For example, consider a business that has collected extensive customer feedback through surveys. The survey covers a wide range of questions about product quality, pricing, customer service and more. This huge volume of data can be overwhelming, and this is where factor analysis comes in. It can help condense these numerous variables into a few meaningful factors, such as ‘product satisfaction’, ‘customer service experience’ and ‘value for money’.

Factor analysis doesn’t operate in isolation – it’s often used as a stepping stone for further analysis. For example, once you’ve identified key factors through factor analysis, you might then proceed to a cluster analysis – a method that groups your customers based on their responses to these factors. The result is a clearer understanding of different customer segments, which can then guide targeted marketing and product development strategies.

By combining factor analysis with other methodologies, you can not only make sense of your data but also gain valuable insights to drive your business decisions.

Factor analysis assumptions

Factor analysis relies on several assumptions for accurate results. Violating these assumptions may lead to factors that are hard to interpret or misleading.

Linear relationships between variables

This ensures that changes in the values of your variables are consistent.

Sufficient variables for each factor

Because if only a few variables represent a factor, it might not be identified accurately.

Adequate sample size

The larger the ratio of cases (respondents, for instance) to variables, the more reliable the analysis.

No perfect multicollinearity and singularity

No variable is a perfect linear combination of other variables, and no variable is a duplicate of another.

Relevance of the variables

There should be some correlation between variables to make a factor analysis feasible.

assumptions for factor analysis

Types of factor analysis

There are two main factor analysis methods: exploratory and confirmatory. Here’s how they are used to add value to your research process.

Confirmatory factor analysis

In this type of analysis, the researcher starts out with a hypothesis about their data that they are looking to prove or disprove. Factor analysis will confirm – or not – where the latent variables are and how much variance they account for.

Principal component analysis (PCA) is a popular form of confirmatory factor analysis. Using this method, the researcher will run the analysis to obtain multiple possible solutions that split their data among a number of factors. Items that load onto a single particular factor are more strongly related to one another and can be grouped together by the researcher using their conceptual knowledge or pre-existing research.

Using PCA will generate a range of solutions with different numbers of factors, from simplified 1-factor solutions to higher levels of complexity. However, the fewer number of factors employed, the less variance will be accounted for in the solution.

Exploratory factor analysis

As the name suggests, exploratory factor analysis is undertaken without a hypothesis in mind. It’s an investigatory process that helps researchers understand whether associations exist between the initial variables, and if so, where they lie and how they are grouped.

How to perform factor analysis: A step-by-step guide

Performing a factor analysis involves a series of steps, often facilitated by statistical software packages like SPSS, Stata and the R programming language . Here’s a simplified overview of the process.

how to perform factor analysis

Prepare your data

Start with a dataset where each row represents a case (for example, a survey respondent), and each column is a variable you’re interested in. Ensure your data meets the assumptions necessary for factor analysis.

Create an initial hypothesis

If you have a theory about the underlying factors and their relationships with your variables, make a note of this. This hypothesis can guide your analysis, but keep in mind that the beauty of factor analysis is its ability to uncover unexpected relationships.

Choose the type of factor analysis

The most common type is exploratory factor analysis, which is used when you’re not sure what to expect. If you have a specific hypothesis about the factors, you might use confirmatory factor analysis.

Form your correlation matrix

After you’ve chosen the type of factor analysis, you’ll need to create the correlation matrix of your variables. This matrix, which shows the correlation coefficients between each pair of variables, forms the basis for the extraction of factors. This is a key step in building your factor analysis model.

Decide on the extraction method

Principal component analysis is the most commonly used extraction method. If you believe your factors are correlated, you might opt for principal axis factoring, a type of factor analysis that identifies factors based on shared variance.

Determine the number of factors

Various criteria can be used here, such as Kaiser’s criterion (eigenvalues greater than 1), the scree plot method or parallel analysis. The choice depends on your data and your goals.

Interpret and validate your results

Each factor will be associated with a set of your original variables, so label each factor based on how you interpret these associations. These labels should represent the underlying concept that ties the associated variables together.

Validation can be done through a variety of methods, like splitting your data in half and checking if both halves produce the same factors.

How factor analysis can help you

As well as giving you fewer variables to navigate, factor analysis can help you understand grouping and clustering in your input variables, since they’ll be grouped according to the latent variables.

Say you ask several questions all designed to explore different, but closely related, aspects of customer satisfaction:

  • How satisfied are you with our product?
  • Would you recommend our product to a friend or family member?
  • How likely are you to purchase our product in the future?

But you only want one variable to represent a customer satisfaction score. One option would be to average the three question responses. Another option would be to create a factor dependent variable. This can be done by running a principal component analysis (PCA) and keeping the first principal component (also known as a factor). The advantage of a PCA over an average is that it automatically weights each of the variables in the calculation.

Say you have a list of questions and you don’t know exactly which responses will move together and which will move differently; for example, purchase barriers of potential customers. The following are possible barriers to purchase:

  • Price is prohibitive
  • Overall implementation costs
  • We can’t reach a consensus in our organization
  • Product is not consistent with our business strategy
  • I need to develop an ROI, but cannot or have not
  • We are locked into a contract with another product
  • The product benefits don’t outweigh the cost
  • We have no reason to switch
  • Our IT department cannot support your product
  • We do not have sufficient technical resources
  • Your product does not have a feature we require
  • Other (please specify)

Factor analysis can uncover the trends of how these questions will move together. The following are loadings for 3 factors for each of the variables.

factor analysis data

Notice how each of the principal components have high weights for a subset of the variables. Weight is used interchangeably with loading, and high weight indicates the variables that are most influential for each principal component. +0.30 is generally considered to be a heavy weight.

The first component displays heavy weights for variables related to cost, the second weights variables related to IT, and the third weights variables related to organizational factors. We can give our new super variables clever names.

factor analysis data 2

If we were to cluster the customers based on these three components, we can see some trends. Customers tend to be high in cost barriers or organizational barriers, but not both.

The red dots represent respondents who indicated they had higher organizational barriers; the green dots represent respondents who indicated they had higher cost barriers

factor analysis graph

Considerations when using factor analysis

Factor analysis is a tool, and like any tool its effectiveness depends on how you use it. When employing factor analysis, it’s essential to keep a few key considerations in mind.

Oversimplification

While factor analysis is great for simplifying complex data sets, there’s a risk of oversimplification when grouping variables into factors. To avoid this you should ensure the reduced factors still accurately represent the complexities of your variables.

Subjectivity

Interpreting the factors can sometimes be subjective, and requires a good understanding of the variables and the context. Be mindful that multiple analysts may come up with different names for the same factor.

Supplementary techniques

Factor analysis is often just the first step. Consider how it fits into your broader research strategy and which other techniques you’ll use alongside it.

Examples of factor analysis studies

Factor analysis, including PCA, is often used in tandem with segmentation studies. It might be an intermediary step to reduce variables before using KMeans to make the segments.

Factor analysis provides simplicity after reducing variables. For long studies with large blocks of Matrix Likert scale questions, the number of variables can become unwieldy. Simplifying the data using factor analysis helps analysts focus and clarify the results, while also reducing the number of dimensions they’re clustering on.

Sample questions for factor analysis

Choosing exactly which questions to perform factor analysis on is both an art and a science. Choosing which variables to reduce takes some experimentation, patience and creativity. Factor analysis works well on Likert scale questions and Sum to 100 question types.

Factor analysis works well on matrix blocks of the following question genres:

Psychographics (Agree/Disagree):

  • I value family
  • I believe brand represents value

Behavioral (Agree/Disagree):

  • I purchase the cheapest option
  • I am a bargain shopper

Attitudinal (Agree/Disagree):

  • The economy is not improving
  • I am pleased with the product

Activity-Based (Agree/Disagree):

  • I love sports
  • I sometimes shop online during work hours

Behavioral and psychographic questions are especially suited for factor analysis.

Sample output reports

Factor analysis simply produces weights (called loadings) for each respondent. These loadings can be used like other responses in the survey.

Cost Barrier IT Barrier Org Barrier
R_3NWlKlhmlRM0Lgb 0.7 1.3 -0.9
R_Wp7FZE1ziZ9czSN 0.2 -0.4 -0.3
R_SJlfo8Lpb6XTHGh -0.1 0.1 0.4
R_1Kegjs7Q3AL49wO -0.1 -0.3 -0.2
R_1IY1urS9bmfIpbW 1.6 0.3 -0.3

Related resources

Analysis & Reporting

Margin of error 11 min read

Data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, request demo.

Ready to learn more about Qualtrics?

  • Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = ÎŁ[(xi – x̄)(yi – Èł)] / [n-1] σx σy

where: xi, yi are the data points, x̄, Èł are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Discourse Analysis

Discourse Analysis – Methods, Types and Examples

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Inferential Statistics

Inferential Statistics – Types, Methods and...

Histogram

Histogram – Types, Examples and Making Guide

Critical Analysis

Critical Analysis – Types, Examples and Writing...

Data Analysis

Data Analysis – Process, Methods and Types

factor analysis research question

  • Voxco Online
  • Voxco Panel Management
  • Voxco Panel Portal
  • Voxco Audience
  • Voxco Mobile Offline
  • Voxco Dialer Cloud
  • Voxco Dialer On-premise
  • Voxco TCPA Connect
  • Voxco Analytics
  • Voxco Text & Sentiment Analysis

factor analysis research question

  • 40+ question types
  • Drag-and-drop interface
  • Skip logic and branching
  • Multi-lingual survey
  • Text piping
  • Question library
  • CSS customization
  • White-label surveys
  • Customizable ‘Thank You’ page
  • Customizable survey theme
  • Reminder send-outs
  • Survey rewards
  • Social media
  • Website surveys
  • Correlation analysis
  • Cross-tabulation analysis
  • Trend analysis
  • Real-time dashboard
  • Customizable report
  • Email address validation
  • Recaptcha validation
  • SSL security

Take a peek at our powerful survey features to design surveys that scale discoveries.

Download feature sheet.

  • Hospitality
  • Academic Research
  • Customer Experience
  • Employee Experience
  • Product Experience
  • Market Research
  • Social Research
  • Data Analysis

Explore Voxco 

Need to map Voxco’s features & offerings? We can help!

Watch a Demo 

Download Brochures 

Get a Quote

  • NPS Calculator
  • CES Calculator
  • A/B Testing Calculator
  • Margin of Error Calculator
  • Sample Size Calculator
  • CX Strategy & Management Hub
  • Market Research Hub
  • Patient Experience Hub
  • Employee Experience Hub
  • NPS Knowledge Hub
  • Market Research Guide
  • Customer Experience Guide
  • Survey Research Guides
  • Survey Template Library
  • Webinars and Events
  • Feature Sheets
  • Try a sample survey
  • Professional Services

factor analysis research question

Get exclusive insights into research trends and best practices from top experts! Access Voxco’s ‘State of Research Report 2024 edition’ .

We’ve been avid users of the Voxco platform now for over 20 years. It gives us the flexibility to routinely enhance our survey toolkit and provides our clients with a more robust dataset and story to tell their clients.

VP Innovation & Strategic Partnerships, The Logit Group

  • Client Stories
  • Voxco Reviews
  • Why Voxco Research?
  • Careers at Voxco
  • Vulnerabilities and Ethical Hacking

Explore Regional Offices

  • Survey Software The world’s leading omnichannel survey software
  • Online Survey Tools Create sophisticated surveys with ease.
  • Mobile Offline Conduct efficient field surveys.
  • Text Analysis
  • Close The Loop
  • Automated Translations
  • NPS Dashboard
  • CATI Manage high volume phone surveys efficiently
  • Cloud/On-premise Dialer TCPA compliant Cloud on-premise dialer
  • IVR Survey Software Boost productivity with automated call workflows.
  • Analytics Analyze survey data with visual dashboards
  • Panel Manager Nurture a loyal community of respondents.
  • Survey Portal Best-in-class user friendly survey portal.
  • Voxco Audience Conduct targeted sample research in hours.
  • Predictive Analytics
  • Customer 360
  • Customer Loyalty
  • Fraud & Risk Management
  • AI/ML Enablement Services
  • Credit Underwriting

factor analysis research question

Find the best survey software for you! (Along with a checklist to compare platforms)

Get Buyer’s Guide

  • 100+ question types
  • SMS surveys
  • Financial Services
  • Banking & Financial Services
  • Retail Solution
  • Risk Management
  • Customer Lifecycle Solutions
  • Net Promoter Score
  • Customer Behaviour Analytics
  • Customer Segmentation
  • Data Unification

Explore Voxco 

Watch a Demo 

Download Brochures 

  • CX Strategy & Management Hub
  • The Voxco Guide to Customer Experience
  • Professional services
  • Blogs & White papers
  • Case Studies

Find the best customer experience platform

Uncover customer pain points, analyze feedback and run successful CX programs with the best CX platform for your team.

Get the Guide Now

factor analysis research question

VP Innovation & Strategic Partnerships, The Logit Group

  • Why Voxco Intelligence?
  • Our clients
  • Client stories
  • Featuresheets

Factor analysis: Definition, sample questions Stratified Sampling

Unveiling Patterns with Factor Analysis: Definition and Sample Questions

SHARE THE ARTICLE ON

Factor analysis is a statistical technique that aids in reducing a complex dataset into simpler, more manageable components, unveiling patterns and relationships within the data. By identifying underlying structures, it provides valuable insights into the variables and how they relate to one another. This blog explores the definition, types, examples, and sample questions of factor analysis to understand its applications and benefits fully.

See what question types are possible with a sample survey!

What is factor analysis.

Factor analysis , also known as dimension reduction, is a statistical method used to reduce a large volume of data to a smaller, more manageable dataset. It works by identifying patterns and common characteristics in the data, ultimately reducing the dimensions or variables to make it easier to understand.

Types of factor analysis:

There are two types of factor analysis 

1.  Explanatory Factor Analysis

Explanatory factor analysis is used when there is no prior understanding of the data’s structure. It aims to explore the relationships and patterns within the dataset, providing insights into how variables group together.

 2. Confirmatory Factor Analysis

Confirmatory factor analysis is employed when researchers have a clear hypothesis or theory about the data’s structure. It verifies whether the proposed structure aligns with the actual data, validating preconceived notions.

Factor analysis helps in condensing variables and uncovering clusters of responses by measuring shared variances across the variables. It creates a new scale by eliminating unique variables that do not share variances with other variables. This technique is often used by survey researchers to simplify question responses into shorter ones.

Validity in factor analysis

works by measuring both unique and shared variances across the variables. But to be focused on one, we only consider the variables that share their variances with the other variables and not the unique variables. This is then used to identify various patterns in the variables and group them accordingly. 

Factor analysis basically creates a new scale by eliminating the unique variables or the ones that do not share their variances with some other variables. Along with its many applications, factor analysis is commonly used by survey researchers where they want to know whether they can simplify their question responses into shorter ones. 

Turn survey data into insights

How does factor analysis work.

Factor analysis helps you in condensing variables and uncovering clusters of responses. Let us look at how it actually works with an example scenario:

You decide to ask some questions like the below that cover somewhat similar grounds of customer satisfaction –  

  • How did you like our product?
  • Will you consider referring us to your friends and family? 

As for evaluating the entire performance of the organization, it would be convenient for you to have only one variable that represents the customer experience score. It can be done in two ways:

  • Get an average of both the questions
  • Run a PCA (Primary Component Analysis) and have a factor-dependent variable. 

PCA technique is generally more effective than the average method as it calculates the weightage of the variables along with the calculations. 

There might be a case where you want to ask several questions to the customer but don’t exactly know which responses can be grouped together and which will be kept totally different. Example: purchase barriers of the target customers. The reasons for the same could be:

  • Price is prohibitive
  • Overall implementation costs
  • We can’t reach a consensus in our organization
  • Product is not consistent with our business strategy
  • I need to develop an ROI, but cannot or have not
  • We are locked into a contract with another product
  • The product benefits don’t outweigh the cost
  • We have no reason to switch
  • Our IT department cannot support your product
  • We do not have sufficient technical resources
  • Your product does not have a feature we require
  • Other (please specify)

Factor analysis helps you to group these responses like:

Factor analysis: Definition, sample questions Stratified Sampling

We have three labels for the groups and according to them, factor analysis uses heat maps to group the factors clearly to tell us what affects the responses and what they imply.

Clustering the above information with its three components will tell us the customer trends that are high in Cost and Org but not both.

Factor analysis: Definition, sample questions Stratified Sampling

See Voxco survey software in action with a Free demo.

Sample questions for factor analysis

Effective factor analysis involves formulating well-crafted questions that can capture meaningful insights. Here are sample questions categorized based on the types of factors analyzed:

  •  Psychographics (Opinions)

These are the “Agree – Disagree” questions that tell the opinions of the customers. 

  • Do you agree that brand loyalty is important?
  • Do you prioritize quality over affordability?
  •  Behavioral

These “Agree – Disagree” questions bring out the behavior of the customers.

  • Do you prefer purchasing the costliest option available?
  • Are you inclined to bargain while making purchases?
  • Attitudinal

These questions measure the attitudes of the customers in an “Agree–Disagree” manner.

  • How satisfied are you with the customer service provided?
  • Are you content with the product pricing?
  • Activity-based

These “Agree – Disagree” questions tell you what the customer usually does.

  • How frequently do you opt for online shopping?
  • How often do you visit zoos for leisure?

Know why Voxco Insights was named a Leader in the Survey Software category by G2 

Sample Output Reports

Factor analysis generates output reports that summarize the identified factors and their respective contributions to the dataset. These reports typically include factor loadings, eigenvalues, and variance explained, offering valuable insights into the data’s structure and underlying patterns.

Response ID 

Cost Barrier

IT Barrier

Org Barrier

Response_1

0.5

1.4

-0.6

Response_2

0.1

0.1

0.4

Response_3

-0.2

1.2

-0.2

Response_4

0.7

-0.4

-0.3

Response_5

2.7

-0.3

-0.3

Factor analysis is a powerful tool in the realm of data analysis, helping researchers comprehend complex data sets by uncovering inherent patterns and relationships. By asking the right questions and employing the appropriate type of factor analysis, businesses and researchers can unlock valuable insights that can drive informed decisions and strategies. Explore the potential of factor analysis in unraveling your data and propelling your research and business objectives.

Explore all the survey question types possible on Voxco

Explore Voxco Survey Software

Online page new product image3 02.png 1

+ Omnichannel Survey Software 

+ Online Survey Software 

+ CATI Survey Software 

+ IVR Survey Software 

+ Market Research Tool

+ Customer Experience Tool 

+ Product Experience Software 

+ Enterprise Survey Software 

STRATIFIED RANDOM SAMPLING

Understanding Stratified Sampling: A Comprehensive Guide

Understanding Stratified Sampling: A Comprehensive Guide SHARE THE ARTICLE ON Table of Contents Finding the correct representative sample that resembles your target audience can lead

iStock 1193927183 S 1

Email Surveys

The Best Email Survey Software We know how important email surveys are for businesses to gather valuable insights from customers and stakeholders, and our email

HCAHPS creation

All you need to know about HCAHPS

All you need to know about HCAHPS Voxco is trusted by 450+ Global Brands in 40+ countries See what question types are possible with a

woman carrying her baby and working on a laptop 4079283 400x250 1

Employee experience during the pandemic – Getting insights now will have a long term impact

Getting Employee Experience right can be hard even in the best of times, and now a global lockdown has placed employers in uncharted waters. COVID-19

RESEARCH 2

The Difference between Exploratory and Conclusive Research

Exploratory and Conclusive Research: Contrasting Research Designs Voxco’s Guide to Exploratory Research Conducting exploratory research seems tricky but an effective guide can help. Download our

Factor analysis: Definition, sample questions Stratified Sampling

Customer Journey 101

Customer Journey 101: How to map customer journey? SHARE THE ARTICLE ON Table of Contents We all make purchases online or on an app. We

We use cookies in our website to give you the best browsing experience and to tailor advertising. By continuing to use our website, you give us consent to the use of cookies. Read More

Name Domain Purpose Expiry Type
hubspotutk www.voxco.com HubSpot functional cookie. 1 year HTTP
lhc_dir_locale amplifyreach.com --- 52 years ---
lhc_dirclass amplifyreach.com --- 52 years ---
Name Domain Purpose Expiry Type
_fbp www.voxco.com Facebook Pixel advertising first-party cookie 3 months HTTP
__hstc www.voxco.com Hubspot marketing platform cookie. 1 year HTTP
__hssrc www.voxco.com Hubspot marketing platform cookie. 52 years HTTP
__hssc www.voxco.com Hubspot marketing platform cookie. Session HTTP
Name Domain Purpose Expiry Type
_gid www.voxco.com Google Universal Analytics short-time unique user tracking identifier. 1 days HTTP
MUID bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 1 year HTTP
MR bat.bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 7 days HTTP
IDE doubleclick.net Google advertising cookie used for user tracking and ad targeting purposes. 2 years HTTP
_vwo_uuid_v2 www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie. 1 year HTTP
_vis_opt_s www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie that detects if the user is new or returning to a particular campaign. 3 months HTTP
_vis_opt_test_cookie www.voxco.com A session (temporary) cookie used by Generic Visual Website Optimizer (VWO) to detect if the cookies are enabled on the browser of the user or not. 52 years HTTP
_ga www.voxco.com Google Universal Analytics long-time unique user tracking identifier. 2 years HTTP
_uetsid www.voxco.com Microsoft Bing Ads Universal Event Tracking (UET) tracking cookie. 1 days HTTP
vuid vimeo.com Vimeo tracking cookie 2 years HTTP
Name Domain Purpose Expiry Type
__cf_bm hubspot.com Generic CloudFlare functional cookie. Session HTTP
Name Domain Purpose Expiry Type
_gcl_au www.voxco.com --- 3 months ---
_gat_gtag_UA_3262734_1 www.voxco.com --- Session ---
_clck www.voxco.com --- 1 year ---
_ga_HNFQQ528PZ www.voxco.com --- 2 years ---
_clsk www.voxco.com --- 1 days ---
visitor_id18452 pardot.com --- 10 years ---
visitor_id18452-hash pardot.com --- 10 years ---
lpv18452 pi.pardot.com --- Session ---
lhc_per www.voxco.com --- 6 months ---
_uetvid www.voxco.com --- 1 year ---

SPSS tutorials website header logo

SPSS Factor Analysis – Beginners Tutorial

What is factor analysis, quick data check, running factor analysis in spss.

  • SPSS Factor Analysis Output

Adding Factor Scores to Our Data

  • Factor Analysis - APA Reporting

Factor analysis examines which underlying factors are measured by a (large) number of observed variables. Such “underlying factors” are often variables that are difficult to measure such as IQ, depression or extraversion. For measuring these, we often try to write multiple questions that -at least partially- reflect such factors. The basic idea is illustrated below.

Factor Analysis - What is It? - Factor Model Flowchart

Now, if questions 1, 2 and 3 all measure numeric IQ, then the Pearson correlations among these items should be substantial: respondents with high numeric IQ will typically score high on all 3 questions and reversely.

The same reasoning goes for questions 4, 5 and 6: if they really measure “the same thing” they'll probably correlate highly.

However, questions 1 and 4 -measuring possibly unrelated traits- will not necessarily correlate. So if my factor model is correct, I could expect the correlations to follow a pattern as shown below.

Factor Analysis - Correlation Matrix Given Some Factor Model

Confirmatory Factor Analysis

Right, so after measuring questions 1 through 9 on a simple random sample of respondents, I computed this correlation matrix. Now I could ask my software if these correlations are likely, given my theoretical factor model. In this case, I'm trying to confirm a model by fitting it to my data. This is known as “ confirmatory factor analysis ”.

SPSS does not include confirmatory factor analysis but those who are interested could take a look at AMOS .

Exploratory Factor Analysis

But what if I don't have a clue which -or even how many- factors are represented by my data? Well, in this case, I'll ask my software to suggest some model given my correlation matrix. That is, I'll explore the data (hence, “exploratory factor analysis”). The simplest possible explanation of how it works is that the software tries to find groups of variables that are highly intercorrelated. Each such group probably represents an underlying common factor. There's different mathematical approaches to accomplishing this but the most common one is principal components analysis or PCA. We'll walk you through with an example.

Research Questions and Data

A survey was held among 388 applicants for unemployment benefits. The data thus collected are in dole-survey.sav , part of which is shown below.

SPSS Factor Analysis - Variable View of Practice Data

The survey included 16 questions on client satisfaction. We think these measure a smaller number of underlying satisfaction factors but we've no clue about a model. So our research questions for this analysis are:

  • how many factors are measured by our 16 questions?
  • which questions measure similar factors?
  • which satisfaction aspects are represented by which factors?

Now let's first make sure we have an idea of what our data basically look like. We'll inspect the frequency distributions with corresponding bar charts for our 16 variables by running the syntax below.

SPSS Factor Analysis - Frequency Table Example for Quick Data Check

This very minimal data check gives us quite some important insights into our data:

  • All frequency distributions look plausible . We don't see anything weird in our data.

factor analysis research question

A somewhat annoying flaw here is that we don't see variable names for our bar charts in the output outline.

SPSS SET OVARS Not Working for Charts

If we see something unusual in a chart, we don't easily see which variable to address. But in this example -fortunately- our charts all look fine.

So let's now set our missing values and run some quick descriptive statistics with the syntax below.

SPSS Factor Analysis - Inspect Missing Values

Note that none of our variables have many -more than some 10%- missing values. However, only 149 of our 388 respondents have zero missing values on the entire set of variables. This is very important to be aware of as we'll see in a minute.

SPSS Menu Arrow

In the dialog that opens, we have a ton of options. For a “standard analysis”, we'll select the ones shown below. If you don't want to go through all dialogs, you can also replicate our analysis from the syntax below.

SPSS FACTOR Dialogs

SPSS Factor Analysis Syntax

Factor analysis output i - total variance explained.

Right. Now, with 16 input variables, PCA initially extracts 16 factors (or “components”). Each component has a quality score called an Eigenvalue . Only components with high Eigenvalues are likely to represent real underlying factors.

SPSS Factor Analysis Output - Eigenvalues and Total Variance Explained

So what's a high Eigenvalue? A common rule of thumb is to select components whose Eigenvalues are at least 1. Applying this simple rule to the previous table answers our first research question: our 16 variables seem to measure 4 underlying factors.

This is because only our first 4 components have Eigenvalues of at least 1. The other components -having low quality scores- are not assumed to represent real traits underlying our 16 questions. Such components are considered “scree” as shown by the line chart below.

Factor Analysis Output II - Scree Plot

SPSS Factor Analysis Output - Screeplot

A scree plot visualizes the Eigenvalues (quality scores) we just saw. Again, we see that the first 4 components have Eigenvalues over 1. We consider these “strong factors”. After that -component 5 and onwards- the Eigenvalues drop off dramatically . The sharp drop between components 1-4 and components 5-16 strongly suggests that 4 factors underlie our questions.

Factor Analysis Output III - Communalities

So to what extent do our 4 underlying factors account for the variance of our 16 input variables? This is answered by the r square values which -for some really dumb reason- are called communalities in factor analysis.

SPSS Factor Analysis Output - Communalities Table

Right. So if we predict v1 from our 4 components by multiple regression , we'll find r square = 0.596 -which is v1’ s communality. Variables having low communalities -say lower than 0.40- don't contribute much to measuring the underlying factors.

You could consider removing such variables from the analysis. But keep in mind that doing so changes all results. So you'll need to rerun the entire analysis with one variable omitted. And then perhaps rerun it again with another variable left out.

If the scree plot justifies it, you could also consider selecting an additional component . But don't do this if it renders the (rotated) factor loading matrix less interpretable.

Factor Analysis Output IV - Component Matrix

Thus far, we concluded that our 16 variables probably measure 4 underlying factors. But which items measure which factors? The component matrix shows the Pearson correlations between the items and the components. For some dumb reason, these correlations are called factor loadings .

SPSS Factor Analysis Output - Unrotated Component Matrix

Ideally, we want each input variable to measure precisely one factor. Unfortunately, that's not the case here. For instance, v9 measures (correlates with) components 1 and 3. Worse even, v3 and v11 even measure components 1, 2 and 3 simultaneously. If a variable has more than 1 substantial factor loading, we call those cross loadings . And we don't like those. They complicate the interpretation of our factors.

The solution for this is rotation : we'll redistribute the factor loadings over the factors according to some mathematical rules that we'll leave to SPSS. This redefines what our factors represent. But that's ok. We hadn't looked into that yet anyway.

Now, there's different rotation methods but the most common one is the varimax rotation , short for “ vari able max imization. It tries to redistribute the factor loadings such that each variable measures precisely one factor -which is the ideal scenario for understanding our factors. And as we're about to see, our varimax rotation works perfectly for our data.

Factor Analysis Output V - Rotated Component Matrix

Our rotated component matrix (below) answers our second research question: “ which variables measure which factors? ”

SPSS Factor Analysis Output - Rotated Component Matrix

Our last research question is: “ what do our factors represent? ” Technically, a factor (or component) represents whatever its variables have in common. Our rotated component matrix (above) shows that our first component is measured by

  • v17 - I know who can answer my questions on my unemployment benefit.
  • v16 - I've been told clearly how my application process will continue.
  • v13 - It's easy to find information regarding my unemployment benefit.
  • v2 - I received clear information about my unemployment benefit.
  • v9 - It's clear to me what my rights are.

Note that these variables all relate to the respondent receiving clear information. Therefore, we interpret component 1 as “clarity of information”. This is the underlying trait measured by v17, v16, v13, v2 and v9.

After interpreting all components in a similar fashion, we arrived at the following descriptions:

  • Component 1 - “ Clarity of information”
  • Component 2 - “ Decency and appropriateness”
  • Component 3 - “ Helpfulness contact person”
  • Component 4 - “ Reliability of agreements”

We'll set these as variable labels after actually adding the factor scores to our data.

It's pretty common to add the actual factor scores to your data. They are often used as predictors in regression analysis or drivers in cluster analysis. SPSS FACTOR can add factor scores to your data but this is often a bad idea for 2 reasons:

  • factor scores will only be added for cases without missing values on any of the input variables. We saw that this holds for only 149 of our 388 cases;
  • factor scores are z-scores: their mean is 0 and their standard deviation is 1. This complicates their interpretation.

In many cases, a better idea is to compute factor scores as means over variables measuring similar factors. Such means tend to correlate almost perfectly with “real” factor scores but they don't suffer from the aforementioned problems. Note that you should only compute means over variables that have identical measurement scales.

It's also a good idea to inspect Cronbach’s alpha for each set of variables over which you'll compute a mean or a sum score. For our example, that would be 4 Cronbach's alphas for 4 factor scores but we'll skip that for now.

Computing and Labeling Factor Scores Syntax

Table Showing Descriptive Statistics for Factor Scores

This descriptives table shows how we interpreted our factors. Because we computed them as means, they have the same 1 - 7 scales as our input variables. This allows us to conclude that

  • “Decency and appropriateness” is rated best (roughly 5.0 out of 7 points) and
  • “Clarity of information” is rated worst (roughly 3.9 out of 7 points).

Thanks for reading!

Tell us what you think!

This tutorial has 119 comments:.

factor analysis research question

By shankar Prasad Khanal on January 31st, 2024

Good explanation

factor analysis research question

By Edwin on February 26th, 2024

Thank you so much, it saved my graduation!

factor analysis research question

By John on May 7th, 2024

I would like to you to share with me a detailed handbook on spss and catastrophic modelling tutorial.

factor analysis research question

By dorjnamjilmaa on May 8th, 2024

Logo

Factor Analysis: A Short Introduction, Part 1

by guest contributer   97 Comments

Why use factor analysis?

factor analysis research question

It allows researchers to investigate concepts they cannot measure directly. It does this by using a large number of variables to esimate a few interpretable underlying factors.

What is a factor?

The key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent variable (i.e. not directly measured). their association with an underlying latent variable, the factor, which cannot easily be measured.

For example, people may respond similarly to questions about income, education, and occupation, which are all associated with the latent variable socioeconomic status.

In every factor analysis, there are one fewer factors than there are variables.  Each factor captures a certain amount of the overall variance in the observed variables, and the factors are always listed in order of how much variation they explain.

The eigenvalue is a measure of how much of the common variance of the observed variables a factor explains.  Any factor with an eigenvalue ≄1 explains more variance than a single observed variable.

So if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables.  This factor, which captures most of the variance in those three variables, could then be used in other analyses.

The factors that explain the least amount of variance are generally discarded.  Deciding how many factors are useful to retain will be the subject of another post.

What are factor loadings?

The factor loadings express the relationship of each variable to the underlying factor. Here is an example of the output of a simple factor analysis looking at indicators of wealth, with just six variables and two resulting factors.

Income 0.65 0.11
Education 0.59 0.25
Occupation 0.48 0.19
House value 0.38 0.60
Number of public parks in neighborhood 0.13 0.57
Number of violent crimes per year in neighborhood 0.23 0.55

The variable with the strongest association to the underlying latent variable. Factor 1, is income, with a factor loading of 0.65.

Since factor loadings can be interpreted like standardized regression coefficients , one could also say that the variable income has a correlation of 0.65 with Factor 1. Most research fields consider this a strong association for a factor analysis.

Two other variables, education and occupation, are also associated with Factor 1. Based on the variables loading highly onto Factor 1, we could call it “Individual socioeconomic status.”

House value, number of public parks, and number of violent crimes per year, however, have high factor loadings on the other factor, Factor 2. They seem to indicate the overall wealth within the neighborhood, so we may want to call Factor 2 “Neighborhood socioeconomic status.”

Notice that the variable house value also is marginally important in Factor 1 (loading = 0.38). This makes sense, since the value of a person’s house should be associated with his or her income.

About the Author: Maike Rahn is a health scientist with a strong background in data analysis.   Maike has a Ph.D. in Nutrition from Cornell University.

factor analysis research question

Reader Interactions

' src=

August 29, 2021 at 12:11 am

this is what i was searching. the interpretation.

' src=

March 16, 2021 at 2:44 am

Thanks for posting the best information and the blog is very informative seku .

' src=

November 16, 2020 at 9:34 am

Nice explanation thanks for the good work

' src=

July 30, 2020 at 3:37 am

Explained nicely. Now the meaning of factor loading is clear. But, there is still a confusion. What is eigen value. If eigen value is greater than 1, so what does it mean???

' src=

May 29, 2020 at 1:51 am

Thank you so much for my first understanding on FA

' src=

May 27, 2020 at 2:36 pm

Very nice presentation. I have two questions: 1)on the SPSS output which of the analyses do you prefer-component, pattern or structure? and 2)how do you interpret negative sign loadings? Thanks so much. Tiffany

' src=

April 20, 2020 at 7:27 am

Hi, I am still confused about the factor analysis. If have 6 factors in my analysis table, is it necessary to reduce it to say only 2 factors only? Thanks

' src=

August 22, 2019 at 7:42 pm

Thank you sir for this explanation.my question here can I add principal component analysis and factor analysis to make an analysis?

' src=

June 23, 2019 at 12:16 pm

Dear, In my study,l have selected some municipalities with their different indicators viz. Demographic, education, amenities, health. Here,my quarries is -by which analysis I am going to confirm that the situation of this or that municipality are good or bad. Pls reply.

' src=

May 30, 2019 at 6:50 am

Helpful thank you for help

' src=

April 13, 2019 at 2:44 am

please help me

how many variables minimum we need to run factor analysis? I saw some researchers use at least 15. Is it the rule of thumb?

I have 3 varible and for evry vaible 150 observation can I use factor analysis?

' src=

April 1, 2019 at 11:11 am

Well Explained, I found it very helpful and useful as described in the easiest way to understand it. Thank u.

' src=

March 29, 2019 at 7:01 am

Very clear example and useful coverage to the FA concept

' src=

November 20, 2018 at 4:45 am

Dear Mr Rahn,

I would like to ask for your piece of advice on the following questions in relation to factor analysis: 1) How do you decide how many factors should be extracted? For instance, I have 44 variables in my survey and data is mainly categorical. 2) Do you conduct the factor analysis for all of variables at once or it is best to first prepare a bunch of variables and conduct the analysis. In my case, should I make like for instance 4 bunches of 11 variables and on a separate case run the factor analysis for each of the bunches. Does this mean that I should in advance make a descriptive statistic for each variable? 3) Once conducting a principle factor analysis for all variables, I see that the highest correlations have value 0,252 or 0,314 (in the correlation matrix). Does this mean that the model is insignificant?

Thank you in advance for your kind guidance.

Kind regards, Mariya Zheleva PhD student at Sofia University “St. Kliment Ohridski”, Bulgaria and at UVSQ in Paris, France

' src=

October 11, 2019 at 6:57 am

can someone respond to this question please.

I am facing the same problem

' src=

October 30, 2018 at 6:16 am

Easy to understand. thank you.

' src=

August 31, 2018 at 5:31 pm

Really nice summary! Precise and comprehensive! Much appreciated,

' src=

July 11, 2018 at 11:18 pm

easy to understand.thks

' src=

May 4, 2018 at 10:54 am

Clear, precise, simple to understand!

' src=

May 4, 2018 at 2:47 am

Hi, how are the factors obtained?

' src=

April 24, 2018 at 8:05 am

How you get factor 1 and Factor 2 ??

' src=

February 5, 2018 at 3:00 pm

You are happy evening I would like to ask you about your effective position on whether it is possible to use counting variables with factor analysis thanks Best wishes from IRAQ

' src=

February 14, 2018 at 11:23 am

It’s possible. The assumption is that all variables are normally distributed. Count variables are often skewed, but not always. So check your distributions.

' src=

January 19, 2018 at 6:17 am

Dear Maike,

thank you so much for your clear and useful explanation. I totally understand how to apply it well.

Best wishes from Germany

' src=

November 16, 2017 at 1:52 am

Thank you. It was easy to understand.

' src=

November 13, 2017 at 10:10 am

thanks a lot for the information

' src=

October 21, 2017 at 3:02 pm

The article states “In every factor analysis, there are the same number of factors as there are variables”. However the table used in the example shows 6 variables and 2 factors. Why are the two numbers not equal? Does “variable” have different meanings in the statement and the table?

Thanks in advance for any clarification.

January 29, 2018 at 12:18 pm

Mark, Because although there are as many factors as variables, they aren’t all useful. So part of the job of the data analyst is to decide how many factors are useful and therefore retained.

' src=

September 30, 2017 at 1:44 pm

This is a clear and straight forward explanation.

September 30, 2017 at 1:42 pm

This clear and straight forward explanation. Thank you

' src=

September 14, 2017 at 5:25 am

Thank you for the clear explanation!

' src=

September 4, 2017 at 7:50 am

Thanks for the simplicity and clear info 🙂

' src=

August 17, 2017 at 7:22 pm

Thanks. It was explained very well.

' src=

June 29, 2017 at 2:04 am

' src=

June 21, 2017 at 10:13 am

It is a well written article. If I understood correctly, we may use many questionnaire to assess some construct like Motivation. For this, I may include questions related to Work environment, Supervisor relationship, pay and other benefits, job satisfaction, training facilities etc., So there are five subcategories under which I have framed the questions. A factor analysis, if done properly should result at least in five factors. So, a factor analysis tries to stratify the questions included in the survey to homogeneous sub groups. Whether my understanding is correct?

' src=

May 30, 2017 at 9:59 am

commendable . best explanation so far

' src=

April 5, 2017 at 6:59 pm

so if i understood it well, the FA can be used to analyse a data on “barroriers” to effective communication. That is when i have about 20 factors of the barriers to analyse. Thank you

' src=

March 29, 2017 at 1:46 am

God Bless you. it was an interesting, simple and understandable. it was well written and to the point. helped me a lot

' src=

January 15, 2017 at 3:22 am

Thanks for your contribution of FA. It’s is helping but need a hypothesis to support it

' src=

October 16, 2016 at 3:58 am

Dr Maike Rahn, Thanks so much for the short explanation of what factor analysis is all about. I fully understand how to apply. I wish one day you read my piece of work. Kindest regards from Queenstown in Eastern Cape-South Africa

' src=

October 14, 2016 at 2:42 pm

Hey, could you please name 4 psychological tests based on factor analysis, such as 16 PF and NEO, any other tests that you have come across? Thanks.

' src=

September 29, 2016 at 6:27 pm

I have read several articles trying to explain factor analysis. This one is the easiest to understand because it is clear and concise.

' src=

July 26, 2016 at 3:07 am

Is it safe to say that factor analysis is the the analysis done in seeking the relationship of demographic and the variables (dependent, mediator, moderator) in the study? or Or is it the analysis done on every items under a construct? to see the loading among the items that represent the construct. Do help me as I still cant figure out what factor analysis is. Kindly assist. Many thanks.

October 14, 2016 at 11:47 am

Hi Mike, No, FA isn’t done to seek relationship between different variables in a relationship model.

Factor Analysis is a measurement model for an unmeasured variable (a construct). So it’s closer to your latter definition.

' src=

July 18, 2016 at 4:24 am

Thank you very much! The clearest explanation I ever read. Regards from Spain.

November 13, 2017 at 10:08 am

Thank you very much. I fully understand how to apply it.

' src=

July 17, 2016 at 12:34 pm

Thank you for easier explanation. It definitely will helpful for my next step of data analysis.

' src=

July 5, 2016 at 7:10 am

Excellent description, very helpful to build understanding of the topic.

' src=

June 29, 2016 at 5:10 am

Explained in the simplest way even a lay man can understand. Thanks a bunch.

' src=

June 26, 2016 at 1:57 pm

Simple and very clear explanation. It’s very clear for me now. Thank you.

' src=

June 18, 2016 at 6:20 pm

Very nice explained, as simple as lay mans language

' src=

June 6, 2016 at 3:08 am

I wish everything had such an easy to understand definition! Thank you

' src=

June 6, 2016 at 12:31 am

Very crisp, clear and concise explanation. Thanks a ton.

' src=

May 30, 2016 at 7:37 am

have been through many documents about factor analysis, yours is the most clear explanation. Thanks big time

' src=

May 19, 2016 at 4:02 pm

this is the best explanation that i have understand, keep on the standard Dr,,

' src=

May 19, 2016 at 9:56 am

I like it. kudos!

' src=

May 19, 2016 at 3:11 am

Very nice explanation of factor analysis. Keep up the nice work. A small request to you sir – please start small regular tutorials on statistics & data analysis.

' src=

April 14, 2016 at 5:42 pm

Just adding my thanks to the list so you keep the posts coming!

' src=

April 10, 2016 at 10:21 am

OMG ! As I have searched many of websites for factor analysis. This was the best and easiest explanation i found yet. Really helpful ! Great attempt ! Keep on doing social service !

' src=

March 11, 2016 at 6:44 pm

that is very nice explanation. you are so wonderful

' src=

March 9, 2016 at 6:56 am

Very lucid introduction on factors which would be useful to any novice to FA.

' src=

March 5, 2016 at 9:37 am

' src=

February 20, 2016 at 1:26 am

Simple but valuable explanation. Thanks.

' src=

December 30, 2015 at 2:08 pm

Thank you for your clear explanation of factor loading!

' src=

December 8, 2015 at 5:56 am

thanks for the introduction on factor analysis

' src=

November 27, 2015 at 11:28 pm

Excellent explanation of the basics, in my language there is a saying ( around 2000 years old) “Good teachings should have the quality of mothers milk,being good ,simple,digestable and sustaining) and I feel I have found it for Factor analysis. Keep up the good work!

' src=

September 30, 2015 at 7:40 am

Explained in one of the best ways possible!!! Helps you understand by just reading it once (quite the contrary for the definitions on the other websites)

' src=

September 29, 2015 at 4:57 am

Hi Maike, I have a survey with 15 q, 3 measure reading ability, 3 writing, 3 understanding, 3 measure monetary values and 3 measure literacy unrelated aspects. I am confused do I pick the read, write and understanding on the SPSS for factor analysis? how about the literacy unrelated q which are controls? Thanks for your help. Sat

' src=

September 19, 2015 at 6:27 am

Very simple and straight forward…Thanx

' src=

September 16, 2015 at 7:18 pm

Very clear explanation and useful examples. Thanks. I woudl liek to aks you somehting. I have a questionnaire of 52 items (I used it for Pilot Sutdy)and I have done FA obtaining 1O factors after reduction. I need to reduce the number of questions since 52 is too much and leave the most ‘powerful’ can I use the FA analysis to reduce the number of questions? Thank you

' src=

July 30, 2015 at 7:15 am

I would like to design a questionnaire using Likert scale that I can use for factor analysis. my challenge is should I mix positive statements and negative statements in my compilation of the questionnaire? e.g. Let us say I need to find out the view of a student if they have a negative attitude towards learning a subject. Should I say in my questionnaire, “I have a negative attitude towards Mathematics.” or I do not have a negative attitude towards Mathematics.”

' src=

June 2, 2015 at 8:36 pm

A very good work, thank you sir.

' src=

May 9, 2015 at 10:11 pm

It seems to me you have mixed up the difference between factor analysis and PCA (Principal Component Analysis). Where you talked about the amount of variance a factor captures and eigenvalue that measures that. it is principal components in PCA that tells you that because each principal component is orthogonal to the others and associated with an eigen-vector with a corresponding eigenvalue.

If not please let me know how eigenvalues of factors are calculated in factor anlysis

' src=

April 23, 2015 at 2:24 pm

Very simple and nice explainations

' src=

April 17, 2015 at 6:48 am

' src=

March 30, 2015 at 10:47 am

Thanks Doc This has been the most understandable explanation I have so far had. You mentioned something about your next post? about determination of number of factors. May you please also talk about factor analysis using R.

' src=

March 29, 2015 at 6:32 am

Good day to you. I have a question on factor analysis. I have a pool of 30 items for my construct, then I conducted the PCs, with nine items. After conducted the CFA, it only has three items. Does this acceptable ? Thank you.

' src=

March 26, 2015 at 10:31 am

Fantastic explanation!! Thank you

' src=

January 12, 2015 at 3:38 am

I have two kinds of questions: one with a 5-option response and another with a 7-option one. Can I run exploratory FA on both at the same time? When I run them with SPSS it lead to 8 factors that can explain 61% of the variance. But, mathematically, is it right?

' src=

December 31, 2014 at 11:28 am

Hi Rahn, Great Job.!!! How am I suppose to put citations to your web site?

' src=

December 11, 2014 at 10:16 pm

FACTOR ANALYSIS IS VERY USEFUL METHOD FOR ANALYSING SCIENTIFIC DATA PARTICULARLY FOR DATA RELATING TO BIOTECH AND FOOD TECNOLOGY AND ANIMAL BEHAVIOUR ALSO;Principal component analysis and exploratory factor analysis are both data reduction techniques — techniques to combine a group of correlated variables into fewer variables. You can then use those combination variables — indices or subscales — in other analyses.

' src=

September 8, 2015 at 1:33 am

Dear sir, I am a new research student please help me about ”Comparatively study on data reduction method between factor analysis and principal component analysis”. Kindly guide me about this I will waiting for your answer.

' src=

October 25, 2014 at 5:57 pm

I am grateful to have little idea on how to apply factor analysis. But stil sir! How would I enter data on exel spreat sheet and how will I start running the analysis? I am ph.D student and one of my objective of the study has to do with factor analysis. I have identify four factors with twenty three variable in question. Pls explain step by step for me. Thanks and best regard. Looking forward to hear from you sir.

' src=

October 24, 2014 at 3:15 pm

' src=

October 7, 2014 at 6:25 am

Thank you very much Dr. Rahn. I have struggled 13 months to understand Factor Analysis, and this has been the simple and very helpful. Thank you again.

' src=

September 24, 2014 at 12:00 pm

Dear Dr Thanks very much for you explanation on factor analysis, even those who beginners in statistics like me can follow your elaborations. its so illuminating. have gone through several text on factor analysis but could hardly capture the concept, Thanks

' src=

September 23, 2014 at 3:55 pm

As i am using Factor analysis by SPSS in my master research, i got five factors related to my research. At the end of the results by spss there is a 5*5 matrix ( 5 are the factors ). What does this matrix endicated for? in the beginning i thought it is a correlation matrix of the factors, but then i’ve been told no it isn’t ( without giving me what it is exactly). Can you help please? p.s ; welcome to everybodys’ answer.

' src=

August 3, 2014 at 2:42 am

This was simple and clear with commonsense.

' src=

July 21, 2014 at 7:40 am

very usefull an understandable explanation.saved lit if time bcoz if this easy explationation..thank you…sir mikhe…

' src=

July 18, 2014 at 7:14 am

Thanks a lot this made my life a lot easier in the PHD Thanks again!!

' src=

July 13, 2014 at 8:33 pm

Dr. Rahn- I’ve been trying all afternoon to understand a research article that used this method and this was the first explanation that has helped me. Thank you very much for posting it!

' src=

June 30, 2014 at 11:01 am

Thanks, this was great. simple and to the point. many thanks.

' src=

March 11, 2014 at 4:54 am

very simple and informative.

' src=

November 10, 2013 at 10:53 am

the first one is correct. the Factor is a linear combination of the original variable. Hence, your first formula, represents the required info.

' src=

September 17, 2013 at 10:14 pm

Dear Dr. Rahn,

I would like to hear your opinion if this method is valid:

I have used a PLS model and created an ‘factor’ (lets called it “Loyalty”). To make that factor I’ve used four variables and the factor loadings are the following:

s1 factorloading: 0,934 s2 factorloading: 0,886 s3 factorloading: 0,913 s4 factorloading: 0,937

Next I would like to estimate the loyalty of a respondent, who has the following values:

s1 = 3 s2 = 4 s3 = 4 s4 = 2

How can I emerge these values to one value and group each respondent into e.g. two groups (e.g. high loyalty, low loyalty)

I have an idea: I use this formular:

Sum of (factorloading (si) * values(si))

(0.934 * 3) + (0.886 * 4) + (0.913 * 4) * (0.937 * 2) = 11.872

or maybe this formular:

Sum of (factorloadings(si) / (sum of factorloadings(s1,s2,s3,s4)) * values(si)

((0.934/(0.934+0.886+0.913+0.937)) * 3) + ((0.886/ (0.934+0.886+0.913+0.937)) * 4 + ((0.913 * (0.934+0.886+0.913+0.937)) * 4 + ((0.937 * (0.934+0.886+0.913+0.937)) * 2) = 3.23 Using this formular in this example would give the respondent a value of:

which formular is the right one (if any), and if either of them are the right one, what is?

p.s. Anyone is welcome to answer this question 🙂

' src=

September 12, 2013 at 9:04 am

Very clear and useful description, also understandable for non-mathematicians, e.g. linguists. Many thanks for posting this!

' src=

August 17, 2013 at 7:33 pm

Hello Dr. Rahn

This was the best and and easiest to understand explanation of Factor Analysis I have found. I will book mark your page as a future reference. Thanks

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Privacy Overview

Exploratory Factor Analysis

Factor analysis is a method that aims to uncover structures in large variable sets. If you have a data set with many variables, it is possible that some of them are interrelated, i.e. correlate with each other. These correlations are the basis of factor analysis.

The aim of the factor analysis is to divide the variables into groups. The aim is to separate those variables that correlate highly from those that correlate less strongly.

What is a factor?

In factor analysis, the factor can be seen as a hidden variable that influences several actually observed variables.

Factor Analysis

Or, in other words, several variables are observable phenomena of fewer underlying factors.

In factor analysis, therefore, the variables that are highly correlated with each other are combined. It is assumed that this correlation is due to a non-measurable variable, which is called a factor.

Example Factor Analysis

Factor analysis can be used to answer the following questions:

  • What structure can be detected in the data?
  • How can the data be reduced to some factors?

The following table contains examples of content that show where factor analysis is used in different fields of expertise.

Question Variable Possible factors
Psychological Can different personality traits be grouped into personality types? Be sociable, be spontaneous, be curious, be nervous, be aggressive etc. Neuroticism, Extraversion, Openness for new things, Conscientiousness ,Social compatibility
Business Administration How can different cost types be summarized in cost characteristics? Material costs, personnel costs, equipment costs, fixed costs etc. Influenceability, urgency of coverage

Research questions Factor Analysis

A possible research question might be: Can different personality traits such as outgoing, curious, sociable, or helpful be grouped into personality types such as conscientious, extraverted, or agreeable?

Exploratory Factor Analysis

You want to find out whether some of the characteristics sociable , sociable , hard-working , conscientious , warm-hearted or helpful correlate with each other and can be described by an underlying factor. To find out, you created a small survey with DATAtab .

You have interviewed 20 people and have the results output to an Excel table. Here you can find the example data set for the Principal Component Analysis with which you can calculate the example directly online on DATAtab under Factor Analysis Calculator .

Factor load, eigenvalue, communalities

The important terms or characteristic values for a factor analysis are factor charge, eigenvalue and communalities. With their help, it is possible to see how strong the correlation between the individual variables and the factors is.

Factor load

  • Correlation between a variable and a factor
  • Loading a variable to a factor
  • The variance explained by a factor
  • Sum of the squared factor charges

Communalities

  • Variance of the variables, which is explained by all factors
  • Sum of the squared factor charges of a variable

factor load, eigenvalue, communalities

Correlation Matrix

The first step in factor analysis is to calculate the correlation matrix. Starting from the correlation matrix, the so-called eigenvalue problem is solved, which is used to calculate the factors.

Correlation Matrix PCA

Factor Analysis and dimensionality

It is important to note, however, that factor analysis does not give a "clear" answer as to how many factors must be used and how these factors can then be interpreted.

There are two common methods to determine the number of required factors: the eigenvalue criterion (Kaiser criterion) and the scree test.

Eigenvalue criterion (Kaiser criterion)

In order to determine the dimensions, i.e. the number of factors, with the help of the Eigenvalue Criterion or the Kaiser Criterion, the Eigenvalues of the individual factors are needed. If these are calculated, all factors with eigenvalues greater than 1 are used.

In order to determine the number of factors with the help of the scree test or scree plot , the eigenvalues are sorted by size and represented by a line chart. Where there is a bend in the chart, the number of factors can be read.

Scree-Test

Furthermore, in the table "Explained total variance" the variance can be read, which explains each individual factor and the cumulative variance.

Explained total variance PCA

Once the number of factors is determined, the communalities can be calculated. As written above, the communality indicates the variance of the variables, which is explained by all factors. If e.g. three factors were selected, the communalities give the variance portion of the respective variable at that with these three factors to be described can.

Communalities

Component matrix

The component matrix indicates the factor loads of the factors on the variables. Since the first factor explains most of the variance, the values of the first component or factor are the largest. With this form of representation it is however difficult to make a statement about the factors, therefore this matrix is still rotated.

Component matrix PCA

Rotation Matrix

The computation of the component matrix has the consequence that on the first factor many variables highly load. This results in the fact that the component matrix usually cannot be interpreted meaningfully. Therefore a rotation of this matrix takes place. For this rotation there are different procedures, but the most common is the analytical Varimax rotation .

Varimax Rotation

With the help of the Varimax rotation it should be analytically ensured that per factor certain variables load as high as possible and the other variables load as low as possible. This is obtained when the variance of the factor charges per factor should be as high as possible.

Rotation Matrix (Varimax)

Here it is to be recognized now that "outgoing" and "sociable" lay on Extraversion , "industriously" and "dutiful" lay on conscientiousness and "warmheartedly" and "helpfully" on agreeableness .

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 412 pages
  • 5rd revised edition (April 2024)
  • Only 8.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Statistical Methods and Data Analytics

A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA). Please refer to A Practical Introduction to Factor Analysis: Confirmatory Factor Analysis .

I. Exploratory Factor Analysis

  • Motivating example: The SAQ
  • Pearson correlation formula

Partitioning the variance in factor analysis

  • principal components analysis
  • principal axis factoring
  • maximum likelihood

Simple Structure

  • Orthogonal rotation (Varimax)
  • Oblique (Direct Oblimin)
  • Generating factor scores

Back to Launch Page

Introduction.

Suppose you are conducting a survey and you want to know whether the items in the survey have similar patterns of responses, do these items “hang together” to create a construct? The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying variables called  factors (smaller than the observed variables), that can explain the interrelationships among those variables. Let’s say you conduct a survey and collect responses about people’s anxiety about using SPSS. Do all these items actually measure what we call “SPSS Anxiety”?

fig01b

Motivating Example: The SAQ (SPSS Anxiety Questionnaire)

Let’s proceed with our hypothetical example of the survey which Andy Field terms the SPSS Anxiety Questionnaire. For simplicity, we will use the so-called “ SAQ-8 ” which consists of the first eight items in the SAQ . Click on the preceding hyperlinks to download the SPSS version of both files. The SAQ-8 consists of the following questions:

  • Statistics makes me cry
  • My friends will think I’m stupid for not being able to cope with SPSS
  • Standard deviations excite me
  • I dream that Pearson is attacking me with correlation coefficients
  • I don’t understand statistics
  • I have little experience of computers
  • All computers hate me
  • I have never been good at mathematics

Pearson Correlation of the SAQ-8

Let’s get the table of correlations in SPSS Analyze – Correlate – Bivariate:

Correlations
1 2 3 4 5 6 7 8
1 1
2 -.099 1
3 -.337 .318 1
4 .436 -.112 -.380 1
5 .402 -.119 -.310 .401 1
6 .217 -.074 -.227 .278 .257 1
7 .305 -.159 .409 .339 1
8 .331 -.050 -.259 .349 .269 .223 .297 1
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

From this table we can see that most items have some correlation with each other ranging from \(r=-0.382\) for Items 3 and 7 to \(r=.514\) for Items 6 and 7. Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer (latent) variables. These interrelationships can be broken up into multiple components

Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean. Factor analysis assumes that variance can be partitioned into two types of variance, common and unique

  • Communality (also called \(h^2\)) is a definition of common variance that ranges between \(0 \) and \(1\). Values closer to 1 suggest that extracted factors explain more of the variance of an individual item.
  • Specific variance : is variance that is specific to a particular item (e.g., Item 4 “All computers hate me” may have variance that is attributable to anxiety about computers in addition to anxiety about SPSS).
  • Error variance:  comes from errors of measurement and basically anything unexplained by common or specific variance (e.g., the person got a call from her babysitter that her two-year old son ate her favorite lipstick).

The figure below shows how these concepts are related:

fig02d

Performing Factor Analysis

As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:

  • factor extraction
  • factor rotation

Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving  simple structure  in order to improve interpretability.

Extracting Factors

There are two approaches to factor extraction which stems from different approaches to variance partitioning: a) principal components analysis and b) common factor analysis.

Principal Components Analysis

Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance (see figure below). Additionally, if the total variance is 1, then the common variance is equal to the communality.

Running a PCA with 8 components in SPSS

The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items. Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later.

First go to Analyze – Dimension Reduction – Factor. Move all the observed variables over the Variables: box to be analyze.

fig4-2a

Under Extraction – Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8. We also bumped up the Maximum Iterations of Convergence to 100.

fig4-2b4

The equivalent SPSS syntax is shown below:

Eigenvalues and Eigenvectors

Before we get into the SPSS output, let’s understand a few things about eigenvalues and eigenvectors.

Eigenvalues represent the total amount of variance that can be explained by a given principal component.  They can be positive or negative in theory, but in practice they explain variance which is always positive.

  • If eigenvalues are greater than zero, then it’s a good sign.
  • Since variance cannot be negative, negative eigenvalues imply the model is ill-conditioned.
  • Eigenvalues close to zero imply there is item multicollinearity, since all the variance can be taken up by the first component.

Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.

Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings  which can be interpreted as the correlation of each item with the principal component. For this particular PCA of the SAQ-8, the  eigenvector associated with Item 1 on the first component is \(0.377\), and the eigenvalue of Item 1 is \(3.057\). We can calculate the first component as

$$(0.377)\sqrt{3.057}= 0.659.$$

In this case, we can say that the correlation of the first item with the first component is \(0.659\). Let’s now move on to the component matrix.

Component Matrix

The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. For example, Item 1 is correlated \(0.659\) with the first component, \(0.136\) with the second component and \(-0.398\) with the third, and so on.

The square of each loading represents the proportion of variance (think of it as an \(R^2\) statistic) explained by a particular component. For Item 1, \((0.659)^2=0.434\) or \(43.4\%\) of its variance is explained by the first component. Subsequently, \((0.136)^2 = 0.018\) or \(1.8\%\) of the variance in Item 1 is explained by the second component. The total variance explained by both components is thus \(43.4\%+1.8\%=45.2\%\). If you keep going on adding the squared loadings cumulatively down the components, you find that it sums to 1 or 100%. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.

Component Matrix
Item Component
1 2 3 4 5 6 7 8
1 0.659 0.136 -0.398 0.160 -0.064 0.568 -0.177 0.068
2 -0.300 0.866 -0.025 0.092 -0.290 -0.170 -0.193 -0.001
3 -0.653 0.409 0.081 0.064 0.410 0.254 0.378 0.142
4 0.720 0.119 -0.192 0.064 -0.288 -0.089 0.563 -0.137
5 0.650 0.096 -0.215 0.460 0.443 -0.326 -0.092 -0.010
6 0.572 0.185 0.675 0.031 0.107 0.176 -0.058 -0.369
7 0.718 0.044 0.453 -0.006 -0.090 -0.051 0.025 0.516
8 0.568 0.267 -0.221 -0.694 0.258 -0.084 -0.043 -0.012
Extraction Method: Principal Component Analysis.
a. 8 components extracted.

Summing the squared component loadings across the components (columns) gives you the communality estimates for each item, and summing each squared loading down the items (rows) gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:

$$(0.659)^2 +  (-.300)^2 – (-0.653)^2 + (0.720)^2 + (0.650)^2 + (0.572)^2 + (0.718)^2 + (0.568)^2 = 3.057$$

You will get eight eigenvalues for eight components, which leads us to the next table.

Total Variance Explained in the 8-component PCA

Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component. For example, Component 1 is \(3.057\), or \((3.057/8)\% = 38.21\%\) of the total variance. Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column.

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523 0.958 11.980 63.523
4 0.736 9.205 72.728 0.736 9.205 72.728
5 0.622 7.770 80.498 0.622 7.770 80.498
6 0.571 7.135 87.632 0.571 7.135 87.632
7 0.543 6.788 94.420 0.543 6.788 94.420
8 0.446 5.580 100.000 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Choosing the number of components to extract

Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items. One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue (total variance explained) by the component number. Recall that we checked the Scree Plot option under Extraction – Display, so the scree plot should be produced automatically.

fig4-2d

The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? If you look at Component 2, you will see an “elbow” joint. This is the marking point where it’s perhaps not too beneficial to continue further component extraction. There are some conflicting definitions of the interpretation of the scree plot but some say to take the number of components to the left of the the “elbow”. Following this criteria we would pick only one component. A more subjective interpretation of the scree plots suggests that any number of components between 1 and 4 would be plausible and further corroborative evidence would be helpful.

Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%. Picking the number of components is a bit of an art and requires input from the whole research team. Let’s suppose we talked to the principal investigator and she believes that the two component solution makes sense for the study, so we will proceed with the analysis.

Running a PCA with 2 components in SPSS

Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2.

fig06

We will focus the differences in the output between the eight and two-component solution. Under Total Variance Explained, we see that the Initial Eigenvalues no longer equals the Extraction Sums of Squared Loadings. The main difference is that there are only two rows of eigenvalues, and the cumulative percent variance goes up to \(51.54\%\).

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Similarly, you will see that the Component Matrix has the same loadings as the eight-component solution but instead of eight columns it’s now two columns.

Component Matrix
Item Component
1 2
1 0.659 0.136
2 -0.300 0.866
3 -0.653 0.409
4 0.720 0.119
5 0.650 0.096
6 0.572 0.185
7 0.718 0.044
8 0.568 0.267
Extraction Method: Principal Component Analysis.
a. 2 components extracted.

Again, we interpret Item 1 as having a correlation of 0.659 with Component 1. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest.

Quick check:

True or False

  • The elements of the Component Matrix are correlations of the item with each component.
  • The sum of the squared eigenvalues is the proportion of variance under Total Variance Explained.
  • The Component Matrix can be thought of as correlations and the Total Variance Explained table can be thought of as \(R^2\).

1.T, 2.F (sum of squared loadings), 3. T

Communalities of the 2-component PCA

The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities.

Communalities
Initial Extraction
1 1.000 0.453
2 1.000 0.840
3 1.000 0.594
4 1.000 0.532
5 1.000 0.431
6 1.000 0.361
7 1.000 0.517
8 1.000 0.394
Extraction Method: Principal Component Analysis.

Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality (since this is the total variance across all 8 components), and then proceeds with the analysis until a final communality extracted. Notice that the Extraction column is smaller Initial column because we only extracted two components. As an exercise, let’s manually calculate the first communality from the Component Matrix. The first ordered pair is \((0.659,0.136)\) which represents the correlation of the first item with Component 1 and Component 2. Recall that squaring the loadings and summing down the components (columns) gives us the communality:

$$h^2_1 = (0.659)^2 + (0.136)^2 = 0.453$$

Going back to the Communalities table, if you sum down all 8 items (rows) of the Extraction column, you get \(4.123\). If you go back to the Total Variance Explained table and summed the first two eigenvalues you also get \(3.057+1.067=4.124\). Is that surprising? Basically it’s saying that the summing the communalities across all items is the same as summing the eigenvalues across all components.

1. In a PCA, when would the communality for the Initial column be equal to the Extraction column?

Answer : When you run an 8-component PCA.

  • The eigenvalue represents the communality for each item.
  • For a single component, the sum of squared component loadings across all items represents the eigenvalue for that component.
  • The sum of eigenvalues for all the components is the total variance.
  • The sum of the communalities down the components is equal to the sum of eigenvalues down the items.

1. F, the eigenvalue is the total communality across all items for a single component, 2. T, 3. T, 4. F (you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal).

Common Factor Analysis

The partitioning of variance differentiates a principal components analysis from what we call common factor analysis. Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate. In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.

Running a Common Factor Analysis with 2 factors in SPSS

To run a factor analysis, use the same steps as running a PCA (Analyze – Dimension Reduction – Factor) except under Method choose Principal axis factoring. Note that we continue to set Maximum Iterations for Convergence at 100 and we will see why later.

fig07

Pasting the syntax into the SPSS Syntax Editor we get:

Note the main difference is under /EXTRACTION we list PAF for Principal Axis Factoring instead of PC for Principal Components. We will get three tables of output, Communalities, Total Variance Explained and Factor Matrix. Let’s go over each of these and compare them to the PCA output.

Communalities of the 2-factor PAF

Communalities
Item Initial Extraction
1 0.293 0.437
2 0.106 0.052
3 0.298 0.319
4 0.344 0.460
5 0.263 0.344
6 0.277 0.309
7 0.393 0.851
8 0.192 0.236
Extraction Method: Principal Axis Factoring.

The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. What principal axis factoring does is instead of guessing 1 as the initial communality, it chooses the squared multiple correlation coefficient \(R^2\). To see this in action for Item 1  run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze – Regression – Linear and enter q01 under Dependent and q02 to q08 under Independent(s).

fig08

Pasting the syntax into the Syntax Editor gives us:

The output we obtain from this analysis is

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .541 0.293 0.291 0.697

Note that 0.293 (highlighted in red) matches the initial communality estimate for Item 1. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA,  factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.00. This represents the total common variance shared among all items for a two factor solution.

Total Variance Explained (2-factor PAF)

The next table we will look at is Total Variance Explained. Comparing this to the table from the PCA we notice that the Initial Eigenvalues are exactly the same and includes 8 rows for each “factor”. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains 31.38% of the variance whereas Factor 2 explains 6.24% of the variance. Just as in PCA the more factors you extract, the less variance explained by each successive factor.

Total Variance Explained
Factor Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 2.511 31.382 31.382
2 1.067 13.336 51.543 0.499 6.238 37.621
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Axis Factoring.

A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria (Analyze – Dimension Reduction – Factor – Extraction), it bases it off the Initial and not the Extraction solution. This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.067. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.

fig09

  • In theory, when would the percent of variance in the Initial column ever equal the Extraction column?
  • True or False, in SPSS when you use the Principal Axis Factor method the scree plot uses the final factor analysis solution to plot the eigenvalues.

Answers: 1. When there is no unique variance (PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice), 2. F, it uses the initial PCA solution and the eigenvalues assume no unique variance.

Factor Matrix (2-factor PAF)

Factor Matrix
Item Factor
1 2
1 0.588 -0.303
2 -0.227 0.020
3 -0.557 0.094
4 0.652 -0.189
5 0.560 -0.174
6 0.498 0.247
7 0.771 0.506
8 0.470 -0.124
Extraction Method: Principal Axis Factoring.
a. 2 factors extracted. 79 iterations required.

First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. This is why in practice it’s always good to increase the maximum number of iterations. Now let’s get into the table itself. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor. Just as in PCA, squaring each loading and summing down the items (rows) gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. Let’s calculate this for Factor 1:

$$(0.588)^2 +  (-0.227)^2 + (-0.557)^2 + (0.652)^2 + (0.560)^2 + (0.498)^2 + (0.771)^2 + (0.470)^2 = 2.51$$

This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors (columns) for each item. For example, for Item 1:

$$(0.588)^2 +  (-0.303)^2 = 0.437$$

Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.

The relationship between the three tables

To see the relationships among the three tables let’s first start from the Factor Matrix (or Component Matrix in PCA). We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item. Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table. Going back to the Factor Matrix, if you square the loadings and sum down the items you get Sums of Squared Loadings (in PAF) or eigenvalues (in PCA) for each factor. These now become elements of the Total Variance Explained table. Summing down the rows (i.e., summing down the factors) under the Extraction column we get \(2.511 + 0.499 = 3.01\) or the total (common) variance explained. In words, this is the total (common) variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total (common) variance explained, in this case

$$ (0.437)^2 + (0.052)^2 + (0.319)^2 + (0.460)^2 + (0.344)^2 + (0.309)^2 + (0.851)^2 + (0.236)^2 = 3.01$$

which is the same result we obtained from the Total Variance Explained table. Here is a table that that may help clarify what we’ve talked about:

fig12b

In summary:

  • Squaring the elements in the Factor Matrix gives you the squared loadings
  • Summing the squared loadings of the Factor Matrix across the factors gives you the communality estimates for each item in the Extraction column of the Communalities table.
  • Summing the squared loadings of the Factor Matrix down the items gives you the Sums of Squared Loadings (PAF) or eigenvalue (PCA) for each factor across all items.
  • Summing the eigenvalues or Sums of Squared Loadings in the Total Variance Explained table gives you the total common variance explained.
  • Summing down all items of the Communalities table is the same as summing the eigenvalues or Sums of Squared Loadings down all factors under the Extraction column of the Total Variance Explained table.

True or False (the following assumes a two-factor Principal Axis Factor solution with 8 items)

  • The elements of the Factor Matrix represent correlations of each item with a factor.
  • Each squared element of Item 1 in the Factor Matrix represents the communality.
  • Summing the squared elements of the Factor Matrix down all 8 items within Factor 1 equals the first Sums of Squared Loading under the Extraction column of Total Variance Explained table.
  • Summing down all 8 items in the Extraction column of the Communalities table gives us the total common variance explained by both factors.
  • The total common variance explained is obtained by summing all Sums of Squared Loadings of the Initial column of the Total Variance Explained table
  • The total Sums of Squared Loadings in the Extraction column under the Total Variance Explained table represents the total variance which consists of total common variance plus unique variance.
  • In common factor analysis, the sum of squared loadings is the eigenvalue.

Answers: 1. T, 2. F, the sum of the squared elements across both factors, 3. T, 4. T, 5. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA.

Maximum Likelihood Estimation (2-factor ML)

Since this is a non-technical introduction to factor analysis, we won’t go into detail about the differences between Principal Axis Factoring (PAF) and Maximum Likelihood (ML). The main concept to know is that ML also assumes a common factor analysis using the \(R^2\) to obtain initial estimates of the communalities, but uses a different iterative process to obtain the extraction solution. To run a factor analysis using maximum likelihood estimation under Analyze – Dimension Reduction – Factor – Extraction – Method choose Maximum Likelihood.

fig10

Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables (although Initial columns will overlap). The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit. Non-significant values suggest a good fitting model. Here the p -value is less than 0.05 so we reject the two-factor model.

Goodness-of-fit Test
Chi-Square df Sig.
198.617 13 0.000

In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors. The table shows the number of factors extracted (or attempted to extract) as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed. Additionally, NS means no solution and N/A means not applicable. In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative (which cannot happen). For the eight factor solution, it is not even applicable in SPSS because it will spew out a warning that “You cannot request as many factors as variables with any extraction method except PC. The number of factors will be reduced by one.” This means that if you try to extract an eight factor solution for the SAQ-8, it will default back to the 7 factor solution. Now that we understand the table, let’s see if we can find the threshold at which the absolute fit indicates a good fitting model. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose 4-5 factors. We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. Note that there is no “right” answer in picking the best factor model, only what makes sense for your theory. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.

Number of Factors Chi-square Df -value Iterations needed
1 553.08 20 <0.05 4
2 198.62 13 < 0.05 39
3 13.81 7 0.055 57
4 1.386 2 0.5 168
5 NS -2 NS NS
6 NS -5 NS NS
7 NS -7 NS NS
8 N/A N/A N/A N/A
  • The Initial column of the Communalities table for the Principal Axis Factoring and the Maximum Likelihood method are the same given the same analysis.
  • Since they are both factor analysis methods, Principal Axis Factoring and the Maximum Likelihood method will result in the same Factor Matrix.
  • In SPSS, both Principal Axis Factoring and Maximum Likelihood methods give chi-square goodness of fit tests.
  • You can extract as many factors as there are items as when using ML or PAF.
  • When looking at the Goodness-of-fit Test table, a p -value less than 0.05 means the model is a good fitting model.
  • In the Goodness-of-fit Test table, the lower the degrees of freedom the more factors you are fitting.

Answers: 1. T, 2. F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, you can extract as many components as items in PCA, but SPSS will only extract up to the total number of items minus 1, 5. F, greater than 0.05, 6. T, we are taking away degrees of freedom but extracting more factors.

Comparing Common Factor Analysis versus Principal Components

As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance (i.e., no unique variance). For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components. However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.

fig11c

The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:

  • For each item, when the total variance is 1, the common variance becomes the communality.
  • In principal components, each communality represents the total variance across all 8 items.
  • In common factor analysis, the communality represents the common variance for each item.
  • The communality is unique to each factor or component.
  • For both PCA and common factor analysis, the sum of the communalities represent the total variance explained.
  • For PCA, the total variance explained equals the total variance, but for common factor analysis it does not.

Answers: 1. T, 2. F, the total variance for each item, 3. T, 4. F, communality is unique to each item (shared across components or factors), 5. T, 6. T.

Rotation Methods

After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.

  • orthogonal rotation assume factors are independent or uncorrelated with each other
  • oblique rotation factors are not independent and are correlated

The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure. 

Simple structure

Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases. Suppose you wanted to know how well a set of items load on each  factor; simple structure helps us to achieve this.

The definition of simple structure is that in a factor loading matrix:

  • Each row should contain at least one zero.
  • For m factors, each column should have at least m zeroes (e.g., three factors, at least 3 zeroes per factor).

For every pair of factors (columns),

  • there should be several items for which entries approach zero in one column but large loadings on the other.
  • a large proportion of items should have entries approaching zero.
  • only a small number of items have two non-zero entries.

The following table is an example of simple structure with three factors:

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0
2 0.8 0 0
3 0.8 0 0
4 0 0.8 0
5 0 0.8 0
6 0 0.8 0
7 0 0 0.8
8 0 0 0.8

Let’s go down the checklist to criteria to see why it satisfies simple structure:

  • each row contains at least one zero (exactly two in each row)
  • each column contains at least three zeros (since there are three factors)
  • for every pair of factors, most items have zero on one factor and non-zeros on the other factor (e.g., looking at Factors 1 and 2, Items 1 through 6 satisfy this requirement)
  • for every pair of factors, all items have zero entries
  • for every pair of factors, none of the items have two non-zero entries

An easier criteria from Pedhazur and Schemlkin (1991) states that

  • each item has high loadings on one factor only
  • each factor has high loadings for only some of the items.

For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test.

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0.8
2 0.8 0 0.8
3 0.8 0 0
4 0.8 0 0
5 0 0.8 0.8
6 0 0.8 0.8
7 0 0.8 0.8
8 0 0.8 0

Solution: Using the conventional test, although Criteria 1 and 2 are satisfied (each row has at least one zero, each column has at least three zeroes), Criteria 3 fails because for Factors 2 and 3, only 3/8 rows have 0 on one factor and non-zero on the other. Additionally, for Factors 2 and 3, only Items 5 through 7 have non-zero loadings or 3/8 rows have non-zero coefficients (fails Criteria 4 and 5 simultaneously). Using the Pedhazur method, Items 1, 2, 5, 6, and 7 have high loadings on two factors (fails first criteria) and Factor 3 has high loadings on a majority or 5/8 items (fails second criteria).

Orthogonal Rotation (2 factor PAF)

We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability. Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.

Running a two-factor solution (PAF) with Varimax rotation in SPSS

The steps to running a two-factor Principal Axis Factoring is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Varimax. Make sure under Display to check Rotated Solution and Loading plot(s), and under Maximum Iterations for Convergence enter 100.

fig13

Pasting the syntax into the SPSS editor you obtain:

Let’s first talk about what tables are the same or different from running a PAF with no rotation. First, we know that the unrotated factor matrix (Factor Matrix table) should be the same. Additionally, since the  common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution (Rotated Factor Matrix) as well as the transformation used to obtain the rotation (Factor Transformation Matrix). Finally, although the total variance explained by all factors stays the same, the total variance explained by  each  factor will be different.

Rotated Factor Matrix (2-factor PAF Varimax)

Rotated Factor Matrix
Factor
1 2
1 0.646 0.139
2 -0.188 -0.129
3 -0.490 -0.281
4 0.624 0.268
5 0.544 0.221
6 0.229 0.507
7 0.275 0.881
8 0.442 0.202
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.

The Rotated Factor Matrix table tells us what the factor loadings look like after rotation (in this case Varimax).  Kaiser normalization  is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation. The only drawback is if the communality is low for a particular item, Kaiser normalization will weight these items equally with items with high communality. As such, Kaiser normalization is preferred when communalities are high across all items. You can turn off Kaiser normalization by specifying

Here is what the Varimax rotated loadings look like without Kaiser normalization. Compared to the rotated factor matrix with Kaiser normalization the patterns look similar if you flip Factors 1 and 2; this may be an artifact of the rescaling. Another possible reasoning for the stark differences may be due to the low communalities for Item 2  (0.052) and Item 8 (0.236). Kaiser normalization weights these items equally with the other high communality items.

Rotated Factor Matrix
Factor
1 2
1 0.207 0.628
2 -0.148 -0.173
3 -0.331 -0.458
4 0.332 0.592
5 0.277 0.517
6 0.528 0.174
7 0.905 0.180
8 0.248 0.418
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax without Kaiser Normalization.
a. Rotation converged in 3 iterations.

Interpreting the factor loadings (2-factor PAF Varimax)

In the table above, the absolute loadings that are higher than 0.4 are highlighted in blue for Factor 1 and in red for Factor 2. We can see that Items 6 and 7 load highly onto Factor 1 and Items 1, 3, 4, 5, and 8 load highly onto Factor 2. Item 2 does not seem to load highly on any factor. Looking more closely at Item 6 “My friends are better at statistics than me” and Item 7 “Computers are useful only for playing games”, we don’t see a clear construct that defines the two. Item 2, “I don’t understand statistics” may be too general an item and isn’t captured by SPSS Anxiety. It’s debatable at this point whether to retain a two-factor or one-factor solution, at the very minimum we should see if Item 2 is a candidate for deletion.

Factor Transformation Matrix and Factor Loading Plot (2-factor PAF Varimax)

The Factor Transformation Matrix tells us how the Factor Matrix was rotated. In SPSS, you will see a matrix with two rows and two columns because we have two factors.

Factor Transformation Matrix
Factor 1 2
1 0.773 0.635
2 -0.635 0.773
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.

How do we interpret this matrix? Well, we can see it as the way to move from the Factor Matrix to the Rotated Factor Matrix. From the Factor Matrix we know that the loading of Item 1 on Factor 1 is \(0.588\) and the loading of Item 1 on Factor 2 is \(-0.303\), which gives us the pair \((0.588,-0.303)\); but in the Rotated Factor Matrix the new pair is \((0.646,0.139)\). How do we obtain this new transformed pair of values? We can do what’s called matrix multiplication. The steps are essentially to start with one column of the Factor Transformation matrix, view it as another ordered pair and multiply matching ordered pairs. To get the first element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) in the first column of the Factor Transformation Matrix.

$$(0.588)(0.773)+(-0.303)(-0.635)=0.455+0.192=0.647.$$

To get the second element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) from the second column of the Factor Transformation Matrix:

$$(0.588)(0.635)+(-0.303)(0.773)=0.373-0.234=0.139.$$

Voila! We have obtained the new transformed pair with some rounding error. The figure below summarizes the steps we used to perform the transformation

fig18

The Factor Transformation Matrix can also tell us angle of rotation if we take the inverse cosine of the diagonal element. In this case, the angle of rotation is \(cos^{-1}(0.773) =39.4 ^{\circ}\). In the factor loading plot, you can see what that angle of rotation looks like, starting from \(0^{\circ}\) rotating up in a counterclockwise direction by \(39.4^{\circ}\). Notice here that the newly rotated x and y-axis are still at \(90^{\circ}\) angles from one another, hence the name orthogonal (a non-orthogonal or oblique rotation means that the new axis is no longer \(90^{\circ}\) apart. The points do not move in relation to the axis but rotate with it.

fig17b

Total Variance Explained (2-factor PAF Varimax)

The Total Variance Explained table contains the same columns as the PAF solution with no rotation, but adds another set of columns called “Rotation Sums of Squared Loadings”. This makes sense because if our rotated Factor Matrix is different, the square of the loadings should be different, and hence the Sum of Squared loadings will be different for each factor. However, if you sum the Sums of Squared Loadings across all factors for the Rotation solution,

$$ 1.701 + 1.309 = 3.01$$

and for the unrotated solution,

$$ 2.511 + 0.499 = 3.01,$$

you will see that the two sums are the same. This is because rotation does not change the total common variance. Looking at the Rotation Sums of Squared Loadings for Factor 1, it still has the largest total variance, but now that shared variance is split more evenly.

Total Variance Explained
Factor Rotation Sums of Squared Loadings
Total % of Variance Cumulative %
1 1.701 21.258 21.258
2 1.309 16.363 37.621
Extraction Method: Principal Axis Factoring.

Other Orthogonal Rotations

Varimax rotation is the most popular but one among other orthogonal rotations. The benefit of Varimax rotation is that it maximizes the variances of the loadings within the factors while maximizing differences between high and low loadings on a particular factor. Higher loadings are made higher while lower loadings are made lower. This makes Varimax rotation good for achieving simple structure but not as good for detecting an overall factor because it splits up variance of major factors among lesser ones. Quartimax may be a better choice for detecting an overall factor. It maximizes the squared loadings so that each item loads most strongly onto a single factor.

Here is the output of the Total Variance Explained table juxtaposed side-by-side for Varimax versus Quartimax rotation.

Total Variance Explained
Factor Quartimax Varimax
Total Total
1 2.381 1.701
2 0.629 1.309
Extraction Method: Principal Axis Factoring.

You will see that whereas Varimax distributes the variances evenly across both factors, Quartimax tries to consolidate more variance into the first factor.

Equamax is a hybrid of Varimax and Quartimax, but because of this may behave erratically and according to Pett et al. (2003), is not generally recommended.

Oblique Rotation

In oblique rotation, the factors are no longer orthogonal to each other (x and y axes are not \(90^{\circ}\) angles to each other). Like orthogonal rotation, the goal is rotation of the reference axes about the origin to achieve a simpler and more meaningful factor solution compared to the unrotated solution. In oblique rotation, you will see three unique tables in the SPSS output:

  • factor pattern matrix contains partial standardized regression coefficients of each item with a particular factor
  • factor structure matrix contains simple zero order correlations of each item with a particular factor
  • factor correlation matrix is a matrix of intercorrelations among factors

Suppose the Principal Investigator hypothesizes that the two factors are correlated, and wishes to test this assumption. Let’s proceed with one of the most common types of oblique rotations in SPSS, Direct Oblimin.

Running a two-factor solution (PAF) with Direct Quartimin rotation in SPSS

The steps to running a Direct Oblimin is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Direct Oblimin. The other parameter we have to put in is delta , which defaults to zero. Technically, when delta = 0, this is known as Direct Quartimin. Larger positive values for delta increases the correlation among factors. However, in general you don’t want the correlations to be too high or else there is no reason to split your factors up. In fact, SPSS caps the delta value at 0.8 (the cap for negative values is -9999). Negative delta factors may lead to orthogonal factor solutions. For the purposes of this analysis, we will leave our delta = 0 and do a Direct Quartimin analysis.

fig14

All the questions below pertain to Direct Oblimin in SPSS.

  • When selecting Direct Oblimin, delta = 0 is actually Direct Quartimin.
  • Smaller delta values will increase the correlations among factors.
  • You typically want your delta values to be as high as possible.

Answers: 1. T, 2. F, larger delta values, 3. F, delta leads to higher factor correlations, in general you don’t want factors to be too highly correlated

Factor Pattern Matrix (2-factor PAF Direct Quartimin)

The factor pattern matrix represent partial standardized regression coefficients of each item with a particular factor. For example,  \(0.740\) is the effect of Factor 1 on Item 1 controlling for Factor 2 and \(-0.137\) is the effect of Factor 2 on Item 1 controlling for Factor 1. Just as in orthogonal rotation, the square of the loadings represent the contribution of the factor to the variance of the item, but excluding the overlap between correlated factors. Factor 1 uniquely contributes \((0.740)^2=0.405=40.5\%\) of the variance in Item 1 (controlling for Factor 2 ), and Factor 2 uniquely contributes \((-0.137)^2=0.019=1.9%\) of the variance in Item 1 (controlling for Factor 1).

Pattern Matrix
Factor
1 2
1 0.740 -0.137
2 -0.180 -0.067
3 -0.490 -0.108
4 0.660 0.029
5 0.580 0.011
6 0.077 0.504
7 -0.017 0.933
8 0.462 0.036
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.
a. Rotation converged in 5 iterations.

Factor Structure Matrix (2-factor PAF Direct Quartimin)

The factor structure matrix represent the simple zero-order correlations of the items with each factor (it’s as if you ran a simple regression of a single factor on the outcome). For example, \(0.653\) is the simple correlation of Factor 1 on Item 1 and \(0.333\) is the simple correlation of Factor 2 on Item 1. The more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. From this we can see that Items 1, 3, 4, 5, and 8 load highly onto Factor 1 and Items 6, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load well on either factor.

Additionally, we can look at the variance explained by each factor not controlling for the other factors. For example,  Factor 1 contributes \((0.653)^2=0.426=42.6\%\) of the variance in Item 1, and Factor 2 contributes \((0.333)^2=0.11=11.0%\) of the variance in Item 1. Notice that the contribution in variance of Factor 2 is higher \(11\%\) vs. \(1.9\%\) because in the Pattern Matrix we controlled for the effect of Factor 1, whereas in the Structure Matrix we did not.

Structure Matrix
Factor
1 2
1 0.653 0.333
2 -0.222 -0.181
3 -0.559 -0.420
4 0.678 0.449
5 0.587 0.380
6 0.398 0.553
7 0.577 0.923
8 0.485 0.330
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor Correlation Matrix (2-factor PAF Direct Quartimin)

Recall that the more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. In our case, Factor 1 and Factor 2 are pretty highly correlated, which is why there is such a big difference between the factor pattern and factor structure matrices.

Factor Correlation Matrix
Factor 1 2
1 1.000 0.636
2 0.636 1.000
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor plot

The difference between an orthogonal versus oblique rotation is that the factors in an oblique rotation are correlated. This means not only must we account for the angle of axis rotation \(\theta\), we have to account for the angle of correlation \(\phi\). The angle of axis rotation is defined as the angle between the rotated and unrotated axes (blue and black axes). From the Factor Correlation Matrix, we know that the correlation is \(0.636\), so the angle of correlation is \(cos^{-1}(0.636) = 50.5^{\circ}\), which is the angle between the two rotated axes (blue x and blue y-axis). The sum of rotations \(\theta\) and \(\phi\) is the total angle rotation. We are not given the angle of axis rotation, so we only know that the total angle rotation is \(\theta + \phi = \theta + 50.5^{\circ}\).

fig19c

Relationship between the Pattern and Structure Matrix

The structure matrix is in fact a derivative of the pattern matrix. If you multiply the pattern matrix by the factor correlation matrix, you will get back the factor structure matrix. Let’s take the example of the ordered pair \((0.740,-0.137)\) from the Pattern Matrix, which represents the partial correlation of Item 1 with Factors 1 and 2 respectively. Performing matrix multiplication for the first column of the Factor Correlation Matrix we get

$$ (0.740)(1) + (-0.137)(0.636) = 0.740 – 0.087 =0.652.$$

Similarly, we multiple the ordered factor pair with the second column of the Factor Correlation Matrix to get:

$$ (0.740)(0.636) + (-0.137)(1) = 0.471 -0.137 =0.333 $$

Looking at the first row of the Structure Matrix we get \((0.653,0.333)\) which matches our calculation! This neat fact can be depicted with the following figure:

fig21

As a quick aside, suppose that the factors are orthogonal, which means that the factor correlations are 1′ s on the diagonal and zeros on the off-diagonal, a quick calculation with the ordered pair \((0.740,-0.137)\)

$$ (0.740)(1) + (-0.137)(0) = 0.740$$

and similarly,

$$ (0.740)(0) + (-0.137)(1) = -0.137$$

and you get back the same ordered pair. This is called multiplying by the identity matrix (think of it as multiplying \(2*1 = 2\)).

  • Without changing your data or model, how would you make the factor pattern matrices and factor structure matrices more aligned with each other?
  • True or False, When you decrease delta, the pattern and structure matrix will become closer to each other.

Answers: 1. Decrease the delta values so that the correlation between factors approaches zero. 2. T, the correlations will become more orthogonal and hence the pattern and structure matrix will be closer.

Total Variance Explained (2-factor PAF Direct Quartimin)

The column Extraction Sums of Squared Loadings is the same as the unrotated solution, but we have an additional column known as Rotation Sums of Squared Loadings. SPSS says itself that “when factors are correlated, sums of squared loadings cannot be added to obtain total variance”. You will note that compared to the Extraction Sums of Squared Loadings, the Rotation Sums of Squared Loadings is only slightly lower for Factor 1 but much higher for Factor 2. This is because unlike orthogonal rotation, this is no longer the unique contribution of Factor 1 and Factor 2. How do we obtain the Rotation Sums of Squared Loadings? SPSS squares the Structure Matrix and sums down the items.

Total Variance Explained
Factor Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total
1 2.511 31.382 31.382 2.318
2 0.499 6.238 37.621 1.931
Extraction Method: Principal Axis Factoring.
a. When factors are correlated, sums of squared loadings cannot be added to obtain a total variance.

As a demonstration, let’s obtain the loadings from the Structure Matrix for Factor 1

$$ (0.653)^2 + (-0.222)^2 + (-0.559)^2 + (0.678)^2 + (0.587)^2 + (0.398)^2 + (0.577)^2 + (0.485)^2 = 2.318.$$

Note that \(2.318\) matches the Rotation Sums of Squared Loadings for the first factor. This means that the Rotation Sums of Squared Loadings represent the non- unique contribution of each factor to total common variance, and summing these squared loadings for all factors can lead to estimates that are greater than total variance.

Interpreting the factor loadings (2-factor PAF Direct Quartimin)

Finally, let’s conclude by interpreting the factors loadings more carefully. Let’s compare the Pattern Matrix and Structure Matrix tables side-by-side. First we highlight absolute loadings that are higher than 0.4 in blue for Factor 1 and in red for Factor 2. We see that the absolute loadings in the Pattern Matrix are in general higher in Factor 1 compared to the Structure Matrix and lower for Factor 2. This makes sense because the Pattern Matrix partials out the effect of the other factor. Looking at the Pattern Matrix, Items 1, 3, 4, 5, and 8 load highly on Factor 1, and Items 6 and 7 load highly on Factor 2. Looking at the Structure Matrix, Items 1, 3, 4, 5, 7 and 8 are highly loaded onto Factor 1 and Items 3, 4, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load on any factor. The results of the two matrices are somewhat inconsistent but can be explained by the fact that in the Structure Matrix Items 3, 4 and 7 seem to load onto both factors evenly but not in the Pattern Matrix. For this particular analysis, it seems to make more sense to interpret the Pattern Matrix because it’s clear that Factor 1 contributes uniquely to most items in the SAQ-8 and Factor 2 contributes common variance only to two items (Items 6 and 7). There is an argument here that perhaps Item 2 can be eliminated from our survey and to consolidate the factors into one SPSS Anxiety factor. We talk to the Principal Investigator and we think it’s feasible to accept SPSS Anxiety as the single factor explaining the common variance in all the items, but we choose to remove Item 2, so that the SAQ-8 is now the SAQ-7.

Pattern Matrix Structure Matrix
Factor Factor
1 2 1 2
1 0.740 -0.137 0.653 0.333
2 -0.180 -0.067 -0.222 -0.181
3 -0.490 -0.108 -0.559 -0.420
4 0.660 0.029 0.678 0.449
5 0.580 0.011 0.587 0.380
6 0.077 0.504 0.398 0.553
7 -0.017 0.933 0.577 0.923
8 0.462 0.036 0.485 0.330
  • In oblique rotation, an element of a factor pattern matrix is the unique contribution of the factor to the item whereas an element in the factor structure matrix is the non- unique contribution to the factor to an item.
  • In the Total Variance Explained table, the Rotation Sum of Squared Loadings represent the unique contribution of each factor to total common variance.
  • The Pattern Matrix can be obtained by multiplying the Structure Matrix with the Factor Correlation Matrix
  • If the factors are orthogonal, then the Pattern Matrix equals the Structure Matrix
  • In oblique rotations, the sum of squared loadings for each item across all factors is equal to the communality (in the SPSS Communalities table) for that item.

Answers: 1. T, 2. F, represent the non -unique contribution (which means the total sum of squares can be greater than the total communality), 3. F, the Structure Matrix is obtained by multiplying the Pattern Matrix with the Factor Correlation Matrix, 4. T, it’s like multiplying a number by 1, you get the same number back, 5. F, this is true only for orthogonal rotations, the SPSS Communalities table in rotated factor solutions is based off of the unrotated solution, not the rotated solution.

As a special note, did we really achieve simple structure? Although rotation helps us achieve simple structure, if the interrelationships do not hold itself up to simple structure, we can only modify our model. In this case we chose to remove Item 2 from our model.

Promax Rotation

Promax rotation begins with Varimax (orthgonal) rotation, and uses Kappa to raise the power of the loadings. Promax really reduces the small loadings. Promax also runs faster than Varimax, and in our example Promax took 3 iterations while Direct Quartimin (Direct Oblimin with Delta =0) took 5 iterations.

  • Varimax, Quartimax and Equamax are three types of orthogonal rotation and Direct Oblimin, Direct Quartimin and Promax are three types of oblique rotations.

Answers: 1. T.

Generating Factor Scores

Suppose the Principal Investigator is happy with the final factor analysis which was the two-factor Direct Quartimin solution. She has a hypothesis that SPSS Anxiety and Attribution Bias predict student scores on an introductory statistics course, so would like to use the factor scores as a predictor in this new regression analysis. Since a factor is by nature unobserved, we need to first predict or generate plausible factor scores. In SPSS, there are three methods to factor score generation, Regression, Bartlett, and Anderson-Rubin.

Generating factor scores using the Regression Method in SPSS

In order to generate factor scores, run the same factor analysis model but click on Factor Scores (Analyze – Dimension Reduction – Factor – Factor Scores). Then check Save as variables, pick the Method and optionally check Display factor score coefficient matrix.

fig25

The code pasted in the SPSS Syntax Editor looksl like this:

Here we picked the Regression approach after fitting our two-factor Direct Quartimin solution. After generating the factor scores, SPSS will add two extra variables to the end of your variable list, which you can view via Data View. The figure below shows what this looks like for the first 5 participants, which SPSS calls FAC1_1 and FAC2_1 for the first and second factors. These are now ready to be entered in another analysis as predictors.

fig26

For those who want to understand how the scores are generated, we can refer to the Factor Score Coefficient Matrix. These are essentially the regression weights that SPSS uses to generate the scores. We know that the ordered pair of scores for the first participant is \(-0.880, -0.113\). We also know that the 8 scores for the first participant are \(2, 1, 4, 2, 2, 2, 3, 1\). However, what SPSS uses is actually the standardized scores, which can be easily obtained in SPSS by using Analyze – Descriptive Statistics – Descriptives – Save standardized values as variables. The standardized scores obtained are:   \(-0.452, -0.733, 1.32, -0.829, -0.749, -0.2025, 0.069, -1.42\). Using the Factor Score Coefficient matrix, we multiply the participant scores by the coefficient matrix for each column. For the first factor:

$$ \begin{eqnarray} &(0.284) (-0.452) + (-0.048)-0.733) + (-0.171)(1.32) + (0.274)(-0.829) \\ &+ (0.197)(-0.749) +(0.048)(-0.2025) + (0.174) (0.069) + (0.133)(-1.42) \\ &= -0.880, \end{eqnarray} $$

which matches FAC1_1  for the first participant. You can continue this same procedure for the second factor to obtain FAC2_1.

Factor Score Coefficient Matrix
Item Factor
1 2
1 0.284 0.005
2 -0.048 -0.019
3 -0.171 -0.045
4 0.274 0.045
5 0.197 0.036
6 0.048 0.095
7 0.174 0.814
8 0.133 0.028
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

The second table is the Factor Score Covariance Matrix,

Factor Score Covariance Matrix
Factor 1 2
1 1.897 1.895
2 1.895 1.990
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

This table can be interpreted as the covariance matrix of the factor scores, however it would only be equal to the raw covariance if the factors are orthogonal. For example, if we obtained the raw covariance matrix of the factor scores we would get

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.777 0.604
FAC1_2 Covariance 0.604 0.870

You will notice that these values are much lower. Let’s compare the same two tables but for Varimax rotation:

Factor Score Covariance Matrix
Factor 1 2
1 0.670 0.131
2 0.131 0.805
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. Factor Scores Method: Regression.

If you compare these elements to the Covariance table below, you will notice they are the same.

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.670 0.131
FAC1_2 Covariance 0.131 0.805

Note with the Bartlett and Anderson-Rubin methods you will not obtain the Factor Score Covariance matrix.

Regression, Bartlett and Anderson-Rubin compared

Among the three methods, each has its pluses and minuses. The regression method maximizes the correlation (and hence validity) between the factor scores and the underlying factor but the scores can be somewhat biased. This means even if you have an orthogonal solution, you can still have correlated factor scores. For Bartlett’s method, the factor scores highly correlate with its own factor and not with others, and they are an unbiased estimate of the true factor score. Unbiased scores means that with repeated sampling of the factor scores, the average of the scores is equal to the average of the true factor score. The Anderson-Rubin method perfectly scales the factor scores so that the factor scores are uncorrelated with other factors and uncorrelated with other factor scores . Since Anderson-Rubin scores impose a correlation of zero between factor scores, it is not the best option to choose for oblique rotations. Additionally, Anderson-Rubin scores are biased.

In summary, if you do an orthogonal rotation, you can pick any of the the three methods. For orthogonal rotations, use Bartlett if you want unbiased scores, use the regression method if you want to maximize validity and use Anderson-Rubin if you want the factor scores themselves to be uncorrelated with other factor scores. If you do oblique rotations, it’s preferable to stick with the Regression method. Do not use Anderson-Rubin for oblique rotations.

  • If you want the highest correlation of the factor score with the corresponding factor (i.e., highest validity), choose the regression method.
  • Bartlett scores are unbiased whereas Regression and Anderson-Rubin scores are biased.
  • Anderson-Rubin is appropriate for orthogonal but not for oblique rotation because factor scores will be uncorrelated with other factor scores.

Answers: 1. T, 2. T, 3. T

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2024 UC REGENTS

Breadcrumbs Section. Click here to navigate to respective pages.

Factor analysis

Factor analysis

DOI link for Factor analysis

Click here to navigate to parent product.

This introductory chapter discusses the purposes of factor analysis and dimension reduction, limitations of factor analysis, and common research questions associated with factor analysis. After the introductory overview of factor analysis, brief explanations are given for ten specific common questions related to factor methodology:

May I use factor analysis on sub-interval data?

How many dimensions are there in my data?

What are the best measures for my construct and how should I weigh them?

How do people in my sample cluster?

How do I use factor analysis in R to compare groups?

How do I know if my factors are really subfactors of a more comprehensive construct?

How may I use factor analysis to predict a dependent variable?

Can factor analysis help me understand the effect of outliers on my results?

How may I represent my factors spatially?

How can factor analysis be used to tell if I have common method bias?

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

Lesson 12: Factor Analysis

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) â€œfactors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables.  In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

  • Understand the terminology of factor analysis, including the interpretation of factor loadings, specific variances, and commonalities;
  • Understand how to apply both principal component and maximum likelihood methods for estimating the parameters of a factor model;
  • Understand factor rotation, and interpret rotated factor loadings.

12.1 - Notations and Terminology

Collect all of the variables X 's into a vector \(\mathbf{X}\) for each individual subject. Let \(\mathbf{X_i}\) denote observable trait i. These are the data from each subject and are collected into a vector of traits.

\(\textbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right) = \text{vector of traits}\)

This is a random vector, with a population mean. Assume that vector of traits \(\mathbf{X}\) is sampled from a population with population mean vector:

\(\boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right) = \text{population mean vector}\)

Here, \(\mathrm { E } \left( X _ { i } \right) = \mu _ { i }\) denotes the population mean of variable i .

Consider m unobservable common factors \(f _ { 1 } , f _ { 2 } , \dots , f _ { m }\). The \(i^{th}\) common factor is \(f _ { i } \). Generally, m is going to be substantially less than p .

The common factors are also collected into a vector,

\(\mathbf{f} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right) = \text{vector of common factors}\)

Our factor model can be thought of as a series of multiple regressions, predicting each of the observable variables \(X_{i}\) from the values of the unobservable common factors \(f_{i}\) :

\begin{align} X_1 & =  \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & =  \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ &  \vdots \\ X_p & =  \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{align}

Here, the variable means \(\mu_{1}\) through \(\mu_{p}\) can be regarded as the intercept terms for the multiple regression models.

The regression coefficients \(l_{ij}\) (the partial slopes) for all of these multiple regressions are called factor loadings. Here, \(l_{ij}\) = loading of the \(i^{th}\) variable on the \(j^{th}\) factor. These are collected into a matrix as shown here:

\(\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) = \text{matrix of factor loadings}\)

And finally, the errors \(\varepsilon _{i}\) are called the specific factors. Here, \(\varepsilon _{i}\) = specific factor for variable i . The specific factors are also collected into a vector:

\(\boldsymbol{\epsilon} = \left(\begin{array}{c}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_p\end{array}\right) = \text{vector of specific factors}\)

In summary, the basic model is like a regression model. Each of our response variables X is predicted as a linear function of the unobserved common factors \(f_{1}\), \(f_{2}\) through \(f_{m}\). Thus, our explanatory variables are \(f_{1}\) , \(f_{2}\) through \(f_{m}\). We have m unobserved factors that control the variation in our data.

We will generally reduce this into matrix notation as shown in this form here:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+\boldsymbol{\epsilon}\)

12.2 - Model Assumptions

The specific factors or random errors all have mean zero: \(E(\epsilon_i) = 0\); i = 1, 2, ... , p

The common factors, the f 's, also have mean zero: \(E(f_i) = 0\); i = 1, 2, ... , m

A consequence of these assumptions is that the mean response of the i th trait is \(\mu_i\). That is,

\(E(X_i) = \mu_i\)

The common factors have variance one: \(\text{var}(f_i) = 1\); i = 1, 2, ... , m  

Correlation

The common factors are uncorrelated with one another: \(\text{cov}(f_i, f_j) = 0\)   for i ≠ j

The specific factors are uncorrelated with one another: \(\text{cov}(\epsilon_i, \epsilon_j) = 0\)  for i ≠ j  

The specific factors are uncorrelated with the common factors: \(\text{cov}(\epsilon_i, f_j) = 0\);   i = 1, 2, ... , p; j = 1, 2, ... , m  

These assumptions are necessary to estimate the parameters uniquely. An infinite number of equally well-fitting models with different parameter values may be obtained unless these assumptions are made.

Under this model the variance for the i th observed variable is equal to the sum of the squared loadings for that variable and the specific variance:

The variance of trait i is: \(\sigma^2_i = \text{var}(X_i) = \sum_{j=1}^{m}l^2_{ij}+\psi_i\) 

This derivation is based on the previous assumptions. \(\sum_{j=1}^{m}l^2_{ij}\) is called the  Communality for variable i.  Later on, we will see how this is a measure of how well the model performs for that particular variable. The larger the commonality, the better the model performance for the i th variable.

The covariance between pairs of traits i and j is: \(\sigma_{ij}= \text{cov}(X_i, X_j) = \sum_{k=1}^{m}l_{ik}l_{jk}\) 

The covariance between trait i and factor j is: \(\text{cov}(X_i, f_j) = l_{ij}\)

In matrix notation, our model for the variance-covariance matrix is expressed as shown below:

\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)

This is the matrix of factor loadings times its transpose, plus a diagonal matrix containing the specific variances.

Here \(\boldsymbol{\Psi}\) equals:

\(\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\)

A parsimonious (simplified) model for the variance-covariance matrix is obtained and used for estimation.

  • The model assumes that the data is a linear function of the common factors. However, because the common factors are not observable, we cannot check for linearity.

The variance-covariance matrix is going to have p ( p +1)/2 unique elements of \(\Sigma\) approximated by:

  • mp factor loadings in the matrix \(\mathbf{L}\), and
  • p specific variances

This means that there are mp plus p parameters in the variance-covariance matrix. Ideally,  mp + p is substantially smaller than p ( p +1)/2. However, if mp is too small, the mp + p parameters may not be adequate to describe \(\Sigma\). There may always be the case that this is not the right model and you cannot reduce the data to a linear combination of factors.

\(\mathbf{T'T = TT' = I} \)

We can write our factor model in matrix notation:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{LTT'f}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{L^*f^*}+\boldsymbol{\epsilon}\)

Note that This does not change the calculation because the identity matrix times any matrix is the original matrix. This results in an alternative factor model, where the relationship between the new factor loadings and the original factor loadings is:

\(\mathbf{L^*} = \textbf{LT}\)

and the relationship between the new common factors and the original common factors is:

\(\mathbf{f^*} = \textbf{T'f}\)

This gives a model that fits equally well. Moreover, because there is an infinite number of orthogonal matrices, then there is an infinite number of alternative models. This model, as it turns out, satisfies all of the assumptions discussed earlier.

\(E(\mathbf{f^*}) = E(\textbf{T'f}) = \textbf{T'}E(\textbf{f}) = \mathbf{T'0} =\mathbf{0}\),

\(\text{var}(\mathbf{f^*}) = \text{var}(\mathbf{T'f}) = \mathbf{T'}\text{var}(\mathbf{f})\mathbf{T} = \mathbf{T'IT} = \mathbf{T'T} = \mathbf{I}\)

\(\text{cov}(\mathbf{f^*, \boldsymbol{\epsilon}}) = \text{cov}(\mathbf{T'f, \boldsymbol{\epsilon}}) = \mathbf{T'}\text{cov}(\mathbf{f, \boldsymbol{\epsilon}}) = \mathbf{T'0} = \mathbf{0}\)

So f* satisfies all of the assumptions, and hence f* is an equally valid collection of common factors.  There is a certain apparent ambiguity to these models. This ambiguity is later used to justify a factor rotation to obtain a more parsimonious description of the data.

12.3 - Principal Component Method

We consider two different methods to estimate the parameters of a factor model:

Principal Component Method

  • Maximum Likelihood Estimation

A third method, the principal factor method, is also available but not considered in this class.

Let \(X_i\) be a vector of observations for the \(i^{th}\) subject:

\(\mathbf{X_i} = \left(\begin{array}{c}X_{i1}\\ X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\)

\(\mathbf{S}\) denotes our sample variance-covariance matrix and is expressed as:

\(\textbf{S} = \dfrac{1}{n-1}\sum\limits_{i=1}^{n}\mathbf{(X_i - \bar{x})(X_i - \bar{x})'}\)

We have p eigenvalues for this variance-covariance matrix as well as corresponding eigenvectors for this matrix.

 Eigenvalues of \(\mathbf{S}\):

\(\hat{\lambda}_1, \hat{\lambda}_2, \dots, \hat{\lambda}_p\)

Eigenvectors of \(\mathbf{S}\):

\(\hat{\mathbf{e}}_1, \hat{\mathbf{e}}_2, \dots, \hat{\mathbf{e}}_p\)

Recall that the variance-covariance matrix can be re-expressed in the following form as a function of the eigenvalues and the eigenvectors:

Spectral Decomposition of \(ÎŁ\)

\(\Sigma = \sum_{i=1}^{p}\lambda_i \mathbf{e_{ie'_i}} \cong \sum_{i=1}^{m}\lambda_i \mathbf{e_{ie'_i}} = \left(\begin{array}{cccc}\sqrt{\lambda_1}\mathbf{e_1} & \sqrt{\lambda_2}\mathbf{e_2} &  \dots &  \sqrt{\lambda_m}\mathbf{e_m}\end{array}\right)  \left(\begin{array}{c}\sqrt{\lambda_1}\mathbf{e'_1}\\ \sqrt{\lambda_2}\mathbf{e'_2}\\ \vdots\\ \sqrt{\lambda_m}\mathbf{e'_m}\end{array}\right) = \mathbf{LL'}\)

The idea behind the principal component method is to approximate this expression. Instead of summing from 1 to p , we now sum from 1 to m , ignoring the last p - m terms in the sum, and obtain the third expression. We can rewrite this as shown in the fourth expression, which is used to define the matrix of factor loadings \(\mathbf{L}\), yielding the final expression in matrix notation.

This yields the following estimator for the factor loadings:

\(\hat{l}_{ij} = \hat{e}_{ji}\sqrt{\hat{\lambda}_j}\)

This forms the matrix \(\mathbf{L}\) of factor loading in the factor analysis. This is followed by the transpose of \(\mathbf{L}\).  To estimate the specific variances, recall that our factor model for the variance-covariance matrix is

\(\boldsymbol{\Sigma} = \mathbf{LL'} + \boldsymbol{\Psi}\)

in matrix notation. \(\Psi\) is now going to be equal to the variance-covariance matrix minus \(\mathbf{LL'}\).

\( \boldsymbol{\Psi} = \boldsymbol{\Sigma} - \mathbf{LL'}\)

This in turn suggests that the specific variances, the diagonal elements of \(\Psi\), are estimated with this expression:

\(\hat{\Psi}_i = s^2_i - \sum\limits_{j=1}^{m}\lambda_j \hat{e}^2_{ji}\) 

We take the sample variance for the i th variable and subtract the sum of the squared factor loadings (i.e., the commonality).

12.4 - Example: Places Rated Data - Principal Component Method

Example 12-1: places rated.

Let's revisit the Places Rated Example from Lesson 11 .  Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:

  • Climate and Terrain
  • Health Care & Environment
  • Transportation

Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.

Our objective here is to describe the relationships among the variables.

Before carrying out a factor analysis we need to determine m . How many common factors should be included in the model? This requires a determination of how many parameters will be involved.

For p = 9, the variance-covariance matrix \(\Sigma\) contains

\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)

unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to

\(p(m+1) = 9(m+1)\)

Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.

It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.

Component Eigenvalue Proportion Cumulative
1 3.2978 0.3664 0.3664
2 1.2136 0.1348 0.5013
3 1.1055 0.1228 0.6241
4 0.9073 0.1008 0.7249
5 0.8606 0.0956 0.8205
6 0.5622 0.0625 0.8830
7 0.4838 0.0538 0.9368
8 0.3181 0.0353 0.9721
9 0.2511 0.0279 1.0000

We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.

Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.

  •   Example

The factor analysis is carried out using the program as shown below:

Download the SAS Program here: places2.sas

Note : In the upper right-hand corner of the code block you will have the option of copying (   ) the code to your clipboard or downloading (   ) the file to your computer.

Performing factor analysis (principal components extraction)

To perform factor analysis and obtain the communalities:.

  • Open the ‘ places_tf.csv ’ data set in a new worksheet.
  • Calc > Calculator
  • Highlight and select ‘climate’ to move it to the Store result window.
  • In the Expression window, enter LOGTEN( 'climate') to apply the (base 10) log transformation to the climate variable.
  • Choose OK . The transformed values replace the originals in the worksheet under ‘climate’.
  • Repeat sub-steps 1) through 4) above for all variables housing through econ.
  • Highlight and select climate through econ to move all 9 variables to the Variables window.
  • Choose 3 for the number of factors to extract.
  • Choose Principal Components for the Method of Extraction.
  • Under Options, select Correlation as Matrix to Factor .
  • Under Graphs, select Scree Plot.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the screen plot graph. The last column has the communality values.

Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression

\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)

These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m =3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.

  Factor
Variable 1 2 3
Climate 0.286 0.076
Housing 0.153 0.084
Health -0.410 -0.020
Crime 0.135
Transportation -0.156 -0.148
Education -0.253
Arts -0.115 0.011
Recreation 0.322 0.044
Economics 0.298

Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criteria, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:

Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.

Similarly, Factor 2 is correlated most strongly with Crime, Education, and Economics. You can say that the second factor is primarily a measure of these variables.

Likewise, Factor 3 is correlated most strongly with Climate and Economics. You can say that the third factor is primarily a measure of these variables.

The interpretation above is very similar to that obtained in the standardized principal component analysis.

12.5 - Communalities

Example 12-1: continued....

The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:

\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)

To understand the computation of communulaties, recall the table of factor loadings:

  Factor
Variable(HEADING) 1 2 3
Climate
Housing 0.698 0.153 0.084
Health 0.744 -0.410 -0.020
Crime 0.471 0.522 0.135
Transportation 0.681 -0.156 -0.148
Education 0.498 -0.498 -0.253
Arts 0.861 -0.115 0.011
Recreation 0.642 0.322 0.044
Economics 0.298 0.595 -0.533

Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:

\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)

The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:

Final Communality Estimates: Total =
Climate housing health crime trans educate arts recreate econ
0.79500707 0.51783185 0.72230182 0.51244913 0.50977159 0.56073895 0.75382091 0.51725940 0.72770402

5.616885, (located just above the individual communalities), is the "Total Communality".

Performing factor analysis (MLE extraction)

To perform factor analysis and obtain the communities:.

In summary, the communalities are placed into a table:

Variable Communality
Climate 0.795
Housing 0.518
Health 0.722
Crime 0.512
Transportation 0.510
Education 0.561
Arts 0.754
Recreation 0.517
Economics 0.728

You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variations in climate, the arts, economics, and health.

One assessment of how well this model performs can be obtained from the communalities.  We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best and is not bad for other variables such as Economics, Health, and the Arts. However, for other variables such as Crime, Recreation, Transportation, and Housing the model does not do a good job, explaining only about half of the variation.

The sum of all communality values is the total communality value:

\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)

Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is

\(\dfrac{5.617}{9} = 0.624\)

This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.

Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:

\(\hat{\Psi}_i = 1-\hat{h}^2_i\)

Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:

\(\hat{\Psi}_1 = 1-0.795 = 0.205\)

The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:

Residual Correlation with Uniqueness on the Diagonal

  Climate Housing Health crime Trans Educate Arts Recreate Econ
Climate 0.20499 -0.00924 -0.01476 -0.06027 -0.03720 0.18537 -0.07518 -0.12475 0.21735
Housing -0.00924 0.48217 -0.02317 -0.28063 -0.12119 -0.04803 -0.07518 -0.04032 0.04249
Health -0.01476 -0.02317 0.27770 0.05007 -0.15480 -0.11537 -0.00929 -0.09108 0.06527
Crime -0.06027 -0.28063 0.05007 0.48755 0.05497 0.11562 0.00009 -0.18377 -0.10288
Trans -0.03720 -0.12119 -0.15480 0.05497 0.49023 -0.14318 -0.05439 0.01041 -0.12641
Educate 0.18537 -0.04803 -0.11537 0.11562 -0.14318 0.43926 -0.13515 -0.05531 0.14197
Arts -0.07518 -0.07552 -0.00929 0.00009 -0.05439 -0.13515 0.24618 -0.01926 -0.04687
Recreate -0.12475 -0.04032 -0.09108 -0.18377 0.01041 -0.05531 -0.01926 0.48274 -0.18326
Econ 0.21735 0.04249 0.06527 -0.10288 -0.12641 0.14197 -0.04687 -0.18326 0.27230

For example, the specific variance for housing is 0.482.

This model provides an approximation to the correlation matrix.  We can assess the model's appropriateness with the residuals obtained from the following calculation:

\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)

This is basically the difference between R and LL' , or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217.  These values give an indication of how well the factor model fits the data.

One disadvantage of the principal component method is that it does not provide a test for lack of fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this.  Such a test is available for the maximum likelihood method.

12.6 - Final Notes about the Principal Component Method

Unlike the competing methods, the estimated factor loadings under the principal component method do not change as the number of factors is increased. This is not true of the remaining methods (e.g., maximum likelihood). However, the communalities and the specific variances will depend on the number of factors in the model. In general, as you increase the number of factors, the communalities increase toward one and the specific variances will decrease toward zero.

The diagonal elements of the variance-covariance matrix \(\mathbf{S}\) (or \(\mathbf{R}\)) are equal to the diagonal elements of the model:

\(\mathbf{\hat{L}\hat{L}' + \mathbf{\hat{\Psi}}}\)

The off-diagonal elements are not exactly reproduced. This is in part due to variability in the data - just random chance. Therefore, we want to select the number of factors to make the off-diagonal elements of the residual matrix small:

\(\mathbf{S - (\hat{L}\hat{L}' + \hat{\Psi})}\)

Here, we have a trade-off between two conflicting desires. For a parsimonious model, we wish to select the number of factors m to be as small as possible, but for such a model, the residuals could be large. Conversely, by selecting m to be large, we may reduce the sizes of the residuals but at the cost of producing a more complex and less interpretable model (there are more factors to interpret).

Another result to note is that the sum of the squared elements of the residual matrix is equal to the sum of the squared values of the eigenvalues left out of the matrix:

\(\sum\limits_{j=m+1}^{p}\hat{\lambda}^2_j\)

General Methods used in determining the number of Factors

Below are three common techniques used to determine the number of factors to extract:

  • Cumulative proportion of at least 0.80 (or 80% explained variance)
  • Eigenvalues of at least one
  • Scree plot is based on the "elbow" of the plot; that is, where the plot turns and begins to flatten out

12.7 - Maximum Likelihood Estimation Method

Maximum Likelihood Estimation requires that the data are sampled from a multivariate normal distribution. This is a drawback of this method. Data is often collected on a Likert scale, especially in the social sciences. Because a Likert scale is discrete and bounded, these data cannot be normally distributed.

Using the Maximum Likelihood Estimation Method, we must assume that the data are independently sampled from a multivariate normal distribution with mean vector \(\mu\) and variance-covariance matrix of the form:

\(\boldsymbol{\Sigma} = \mathbf{LL' +\boldsymbol{\Psi}}\)

where \(\mathbf{L}\) is the matrix of factor loadings and \(\Psi\) is the diagonal matrix of specific variances.

We define additional notation: As usual, the data vectors for n subjects are represented as shown:

\(\mathbf{X_1},\mathbf{X_2}, \dots, \mathbf{X_n}\)

Maximum likelihood estimation involves estimating the mean, the matrix of factor loadings, and the specific variance.

The maximum likelihood estimator for the mean vector \(\mu\), the factor loadings \(\mathbf{L}\), and the specific variances \(\Psi\) are obtained by finding \(\hat{\mathbf{\mu}}\), \(\hat{\mathbf{L}}\), and \(\hat{\mathbf{\Psi}}\) that maximize the log-likelihood given by the following expression:

\(l(\mathbf{\mu, L, \Psi}) = - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL' + \Psi}|} - \dfrac{1}{2}\sum_{i=1}^{n}\mathbf{(X_i-\mu)'(LL'+\Psi)^{-1}(X_i-\mu)}\)

The log of the joint probability distribution of the data is maximized. We want to find the values of the parameters, (\(\mu\), \(\mathbf{L}\), and \(\Psi\)), that are most compatible with what we see in the data. As was noted earlier the solutions for these factor models are not unique. Equivalent models can be obtained by rotation. If \(\mathbf{L'\Psi^{-1}L}\) is a diagonal matrix, then we may obtain a unique solution.

Computationally this process is complex. In general, there is no closed-form solution to this maximization problem so iterative methods are applied. Implementation of iterative methods can run into problems as we will see later.

12.8 - Example: Places Rated Data

Example 12-2: places rated.

This method of factor analysis is being carried out using the program shown below:

Download the SAS Program here: places3.sas

Here we have specified the Maximum Likelihood Method by setting method=ml. Again, we need to specify the number of factors.

You will notice that this program produces errors and does not complete the factor analysis. We will start out without the Heywood or priors options discussed below to see the error that occurs and how to remedy it.

For m = 3 factors, maximum likelihood estimation fails to converge.  An examination of the records of each iteration reveals that the commonality of the first variable (climate) exceeds one during the first iteration.  Because the communality must lie between 0 and 1, this is the cause for failure.

SAS provides a number of different fixes for this kind of error.  Most fixes adjust the initial guess, or starting value, for the commonalities.

  • priors=smc: Sets the prior commonality of each variable proportional to the R 2 of that variable with all other variables as an initial guess.
  • priors=asmc: As above with an adjustment so that the sum of the commonalities is equal to the maximum of the absolute correlations.
  • priors=max: Sets the prior commonality of each variable to the maximum absolute correlation within any other variable.
  • priors=random: Sets the prior commonality of each variable to a random number between 0 and 1.

This option is added within the proc factor line of code (proc factor method=ml nfactors=3 priors=smc;).  If we begin with better-starting values, then we might have better luck at convergence. Unfortunately, in trying each of these options, (including running the random option multiple times), we find that these options are ineffective for our Places Rated Data. The second option needs to be considered.

  • Attempt adding the Heywood option to the procedure (proc factor method=ml nfactors=3 heywood;). This sets communalities greater than one back to one, allowing iterations to proceed. In other words, if the commonality value falls out of bounds, then it will be replaced by a value of one. This will always yield a solution, but frequently the solution will not adequately fit the data.

We start with the same values for the commonalities and then at each iteration, we obtain new values for the commonalities. The criterion is a value that we are trying to minimize in order to obtain our estimates. We can see that the convergence criterion decreases with each iteration of the algorithm.

Iteration Criterion Ridge Change Communalities
1 0.3291161 0.0000 0.2734 0.47254 0.40913 0.73500 0.22107
        0.38516 0.26178 0.75125 0.46384
        0.15271      
2 0.2946707 0.0000 0.5275 1.00000 0.37872 0.75101 0.20469
        0.36111 0.26155 0.75298 0.48979
        0.11995      
3 0.2877116 0.0000 0.0577 1.00000 0.41243 0.80868 0.22168
        0.38551 0.26263 0.74546 0.53277
        0.11601      
4 0.2876330 0.0000 0.0055 1.00000 0.41336 0.81414 0.21647
        0.38365 0.26471 0.74493 0.53724
        0.11496      
5 0.2876314 0.0000 0.0007 1.00000 0.41392 0.81466 0.21595
        0.38346 0.26475 0.74458 0.53794
        0.11442      

You can see in the second iteration that rather than report a commonality greater than one, SAS replaces it with the value one and then proceeds as usual through the iterations.

After five iterations the algorithm converges, as indicated by the statement on the second page of the output.  The algorithm converged to a setting where the commonality for Climate is equal to one.

To perform factor analysis using maximum likelihood

  • Choose Maximum Likelihood for the Method of Extraction.
  • Under Results, select All and MLE iterations , and choose OK .
  • Choose OK again . The numeric results are shown in the results area.

12.9 - Goodness-of-Fit

Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:

\(\mathbf{\Sigma = LL' + \Psi}\)

\(\mathbf{L}\) is the matrix of factor loadings, and the diagonal elements of \(Κ\) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:

\(X^2 = \left(n-1-\frac{2p+4m-5}{6}\right)\log \frac{|\mathbf{\hat{L}\hat{L}'}+\mathbf{\hat{\Psi}}|}{|\hat{\mathbf{\Sigma}}|}\)

The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant in the statistic is called the Bartlett correction. The log is the natural log. In the numerator, we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:

\(\hat{\boldsymbol{\Sigma}} = \frac{n-1}{n}\mathbf{S}\)

and \(\mathbf{S}\) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for \(X_{2}\). However, if the model does not fit well, then the determinants will be different and \(X_{2}\) will be large.

Under the null hypothesis that the factor model adequately describes the relationships among the variables,

\(\mathbf{X}^2 \sim \chi^2_{\frac{(p-m)^2-p-m}{2}} \)

Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if \(X_{2}\) exceeds the critical value from the chi-square table.

Back to the Output...

Looking just past the iteration results, we have....

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 3 Factors are sufficient 12 92.6652 < 0.0001
\(H_{A}\colon\) More Factors are needed      

For our Places Rated dataset, we find a significant lack of fit. \(X _ { 2 } = 92.67 ; d . f = 12 ; p < 0.0001\). We conclude that the relationships among the variables are not adequately described by the factor model. This suggests that we do not have the correct model.

The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy

\(p(m+1) \le \frac{p(p+1)}{2}\)

In the present example, this means that m ≀ 4.

Let's return to the SAS program and change the "nfactors" value from 3 to 4:

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 4 Factors are sufficient 6 41.6867 < 0.0001
\(H_{A}\colon\) More Factors are needed      

We find that the factor model with m = 4 does not fit the data adequately either, \(X _ { 2 } = 41.69 ; d . f . = 6 ; p < 0.0001\). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. The next step could be to drop variables from the data set to obtain a better-fitting model.

12.10 - Factor Rotations

From our experience with the Places Rated data, it does not look like the factor model works well. There is no guarantee that any model will fit the data well.

The first motivation of factor analysis was to try to discern some underlying factors describing the data. The Maximum Likelihood Method failed to find such a model to describe the Places Rated data. The second motivation is still valid, which is to try to obtain a better interpretation of the data. In order to do this, let's take a look at the factor loadings obtained before from the principal component method.

  Factor
Variable 1 2 3
Climate 0.286 0.076
Housing 0.153 0.084
Health -0.410 -0.020
Crime 0.135
Transportation -0.156 -0.148
Education -0.253
Arts -0.115 0.011
Recreation 0.322 0.044
Economics 0.298

The problem with this analysis is that some of the variables are highlighted in more than one column. For instance, Education appears significant to Factor 1 AND Factor 2. The same is true for Economics in both Factors 2 AND 3. This does not provide a very clean, simple interpretation of the data. Ideally, each variable would appear as a significant contributor in one column.

In fact, the above table may indicate contradictory results. Looking at some of the observations, it is conceivable that we will find an observation that takes a high value on both Factors 1 and 2. If this occurs, a high value for Factor 1 suggests that the community has quality education, whereas a high value for Factor 2 suggests the opposite, that the community has poor education.

Factor rotation is motivated by the fact that factor models are not unique. Recall that the factor model for the data vector, \(\mathbf{X = \boldsymbol{\mu} + LF + \boldsymbol{\epsilon}}\), is a function of the mean \(\boldsymbol{\mu}\), plus a matrix of factor loadings times a vector of common factors, plus a vector of specific factors.

Moreover, we should note that this is equivalent to a rotated factor model, \(\mathbf{X = \boldsymbol{\mu} + L^*F^* + \boldsymbol{\epsilon}}\), where we have set \(\mathbf{L^* = LT}\) and \(\mathbf{f^* = T'f}\) for some orthogonal matrix \(\mathbf{T}\) where \(\mathbf{T'T = TT' = I}\). Note that there are an infinite number of possible orthogonal matrices, each corresponding to a particular factor rotation.

We plan to find an appropriate rotation, defined through an orthogonal matrix \(\mathbf{T}\) , that yields the most easily interpretable factors.

To understand this, consider a scatter plot of factor loadings. The orthogonal matrix \(\mathbf{T}\) rotates the axes of this plot. We wish to find a rotation such that each of the p variables has a high loading on only one factor.

We will return to the program below to obtain a plot.  In looking at the program, there are a number of options (marked in blue under proc factor) that we did not yet explain.

Download the SAS program here: places2.sas

One of the options above is labeled 'preplot'. We will use this to plot the values for factor 1 against factor 2.

In the output these values are plotted, the loadings for factor 1 on the y-axis, and the loadings for factor 2 on the x-axis.

Similarly, the second variable, labeled with the letter B, has a factor 1 loading of about 0.7 and a factor 2 loading of about 0.15.  Each letter on the plot corresponds to a single variable. SAS provides plots of the other combinations of factors, factor 1 against factor 3 as well as factor 2 against factor 3.

Three factors appear in this model so we might consider a three-dimensional plot of all three factors together.

Obtaining a scree plot and loading plot

To perform factor analysis with scree and loading plots:.

  • Transform variables. This step is optional but used in the steps below.  
  • Choose OK. The transformed values replace the originals in the worksheet under ‘climate’.
  • Stat > Multivariate > Factor Analysis
  • Under Graphs, select Scree plot and Loading plot for first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with both the scree plot and the loading plot.

The selection of the orthogonal matrixes \(\mathbf{T}\) corresponds to our rotation of these axes. Think about rotating the axis of the center. Each rotation will correspond to an orthogonal matrix \(\mathbf{T}\). We want to rotate the axes to obtain a cleaner interpretation of the data. We would really like to define new coordinate systems so that when we rotate everything, the points fall close to the vertices (endpoints) of the new axes.

If we were only looking at two factors, then we would like to find each of the plotted points at the four tips (corresponding to all four directions) of the rotated axes. This is what rotation is about, taking the factor pattern plot and rotating the axes in such a way that the points fall close to the axes.

12.11 - Varimax Rotation

This is the sample variances of the standardized loadings for each factor summed over the m factors.

Returning to the options of the factoring procedure (marked in blue):

"rotate," asks for factor rotation and we specified the Varimax rotation of our factor loadings.

"plot," asks for the same kind of plot that we just looked at for the rotated factors. The result of our rotation is a new factor pattern given below (page 11 of SAS output):

Here is a copy of page 10 from the SAS output:

At the top of page 10 of the output, above, we have our orthogonal matrix T .

Using Varimax Rotation

To perform factor analysis with varimax rotation:.

  • Choose Varimax for the Type of Rotation.
  • Under Graphs, select Loading plot for the first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the loading plot.

The values of the rotated factor loadings are:

  Factor
Variable 1 2 3
Climate 0.021 0.239
Housing 0.438 0.166
Health 0.127 0.137
Crime 0.031 0.139
Transportation 0.289 -0.028
Education -0.094 -0.117
Arts 0.432 0.150
Recreation 0.301 0.099
Economics -0.022 -0.551

Let us now interpret the data based on the rotation. We highlighted the values that are large in magnitude and make the following interpretation.

  • Factor 1: primarily a measure of Health, but also increases with increasing scores for Transportation, Education, and the Arts.
  • Factor 2: primarily a measure of Crime, Recreation, the Economy, and Housing.
  • Factor 3: primarily a measure of Climate alone.

This is just the pattern that exists in the data and no causal inferences should be made from this interpretation. It does not tell us why this pattern exists. It could very well be that there are other essential factors that are not seen at work here.

Let us look at the amount of variation explained by our factors under the rotated model and compare it to the original model. Consider the variance explained by each factor under the original analysis and the rotated factors:

  Analysis
Factor Original Rotated
1 3.2978 2.4798
2 1.2136 1.9835
3 1.1055 1.1536
Total 5.6169 5.6169

The total amount of variation explained by the 3 factors remains the same. Rotations, among a fixed number of factors, do not change how much of the variation is explained by the model. The fit is equally good regardless of what rotation is used.

However, notice what happened to the first factor. We see a fairly large decrease in the amount of variation explained by the first factor. We obtained a cleaner interpretation of the data but it costs us something somewhere. The cost is that the variation explained by the first factor is distributed among the latter two factors, in this case mostly to the second factor.

The total amount of variation explained by the rotated factor model is the same, but the contributions are not the same from the individual factors. We gain a cleaner interpretation, but the first factor does not explain as much of the variation. However, this would not be considered a particularly large cost if we are still interested in these three factors.

Rotation cleans up the interpretation. Ideally, we should find that the numbers in each column are either far away from zero or close to zero. Numbers close to +1 or -1 or 0 in each column give the ideal or cleanest interpretation. If a rotation can achieve this goal, then that is wonderful. However, observed data are seldom this cooperative!

Nevertheless, recall that the objective is data interpretation. The success of the analysis can be judged by how well it helps you to make sense of your data If the result gives you some insight as to the pattern of variability in the data, even without being perfect, then the analysis was successful.

12.12 - Estimation of Factor Scores

Factor scores are similar to the principal components in the previous lesson. Just as we plotted principal components against each other, a similar scatter plot of factor scores is also helpful. We also might use factor scores as explanatory variables in future analyses. It may even be of interest to use the factor score as the dependent variable in a future analysis.

The methods for estimating factor scores depend on the method used to carry out the principal components analysis. The vectors of common factors f are of interest. There are m unobserved factors in our model and we would like to estimate those factors. Therefore, given the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}; i = 1,2,\dots, n,\)

we may wish to estimate the vectors of factor scores

\(\mathbf{f_1, f_2, \dots, f_n}\)

for each observation.

There are a number of different methods for estimating factor scores from the data. These include:

Ordinary Least Squares

  • Weighted Least Squares
  • Regression method

By default, this is the method that SAS uses if you use the principal component method. The difference between the \(j^{th}\) variable on the \(i^{th}\) subject and its value under the factor model is computed. The \(\mathbf{L}\) 's are factor loadings and the f 's are the unobserved common factors. The vector of common factors for subject i , or \( \hat{\mathbf{f}}_i \), is found by minimizing the sum of the squared residuals:

\[\sum_{j=1}^{p}\epsilon^2_{ij} = \sum_{j=1}^{p}(y_{ij}-\mu_j-l_{j1}f_1 - l_{j2}f_2 - \dots - l_{jm}f_m)^2 = (\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})'(\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})\]

This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:

\(\mathbf{\hat{f}_i = (L'L)^{-1}L'(Y_i-\boldsymbol{\mu})}\)

In practice, we substitute our estimated factor loadings into this expression as well as the sample mean for the data:

\(\mathbf{\hat{f}_i = \left(\hat{L}'\hat{L}\right)^{-1}\hat{L}'(Y_i-\bar{y})}\)

Using the principal component method with the unrotated factor loadings, this yields:

\[\mathbf{\hat{f}_i} = \left(\begin{array}{c} \frac{1}{\sqrt{\hat{\lambda}_1}}\mathbf{\hat{e}'_1(Y_i-\bar{y})}\\  \frac{1}{\sqrt{\hat{\lambda}_2}}\mathbf{\hat{e}'_2(Y_i-\bar{y})}\\ \vdots \\  \frac{1}{\sqrt{\hat{\lambda}_m}}\mathbf{\hat{e}'_m(Y_i-\bar{y})}\end{array}\right)\]

\(e_i\) through \(e_m\) are our first m eigenvectors.

Weighted Least Squares (Bartlett)

The difference between WLS and OLS is that the squared residuals are divided by the specific variances as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances.  The factor model fits the data best for variables with low specific variances.  The variables with low specific variances should give us more information regarding the true values for the specific factors.

Therefore, for the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}\)

we want to find \(\boldsymbol{f_i}\) that minimizes

\( \sum\limits_{j=1}^{p}\frac{\epsilon^2_{ij}}{\Psi_j} = \sum\limits_{j=1}^{p}\frac{(y_{ij}-\mu_j - l_{j1}f_1 - l_{j2}f_2 -\dots - l_{jm}f_m)^2}{\Psi} = \mathbf{(Y_i-\boldsymbol{\mu}-Lf_i)'\Psi^{-1}(Y_i-\boldsymbol{\mu}-Lf_i)}\)

The solution is given by this expression where \(\mathbf{\Psi}\) is the diagonal matrix whose diagonal elements are equal to the specific variances:

\(\mathbf{\hat{f}_i = (L'\Psi^{-1}L)^{-1}L'\Psi^{-1}(Y_i-\boldsymbol{\mu})}\)

and can be estimated by substituting the following:

\(\mathbf{\hat{f}_i = (\hat{L}'\hat{\Psi}^{-1}\hat{L})^{-1}\hat{L}'\hat{\Psi}^{-1}(Y_i-\bar{y})}\)

Regression Method

This method is used for maximum likelihood estimates of factor loadings. A vector of the observed data, supplemented by the vector of factor loadings for the i th subject, is considered.

The joint distribution of the data \(\boldsymbol{Y}_i\) and the factor \(\boldsymbol{f}_i\) is

\(\left(\begin{array}{c}\mathbf{Y_i} \\ \mathbf{f_i}\end{array}\right) \sim N \left[\left(\begin{array}{c}\mathbf{\boldsymbol{\mu}} \\ 0 \end{array}\right), \left(\begin{array}{cc}\mathbf{LL'+\Psi} & \mathbf{L} \\ \mathbf{L'} & \mathbf{I}\end{array}\right)\right]\)

Using this we can calculate the conditional expectation of the common factor score \(\boldsymbol{f}_i\) given the data \(\boldsymbol{Y}_i\) as expressed here:

\(E(\mathbf{f_i|Y_i}) = \mathbf{L'(LL'+\Psi)^{-1}(Y_i-\boldsymbol{\mu})}\)

This suggests the following estimator by substituting in the estimates for L and \(\mathbf{\Psi}\):

\(\mathbf{\hat{f}_i = \hat{L}'\left(\hat{L}\hat{L}'+\hat{\Psi}\right)^{-1}(Y_i-\bar{y})}\)

There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.

\(\mathbf{\tilde{f}_i = \hat{L}'S^{-1}(Y_i-\bar{y})}\)

12.13 - Summary

In this lesson we learned about:

  • The interpretation of factor loadings;
  • The principal component and maximum likelihood methods for estimating factor loadings and specific variances
  • How communalities can be used to assess the adequacy of a factor model
  • A likelihood ratio test for the goodness-of-fit of a factor model
  • Factor rotation
  • Methods for estimating common factors
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Understanding Factor Analysis in Psychology

Skynesher / Getty Images

What Is Factor Analysis and What Does It Do?

Types of factor analysis, advantages and disadvantages of factor analysis, how is factor analysis used in psychology.

Like many methods encountered by those studying psychology , factor analysis has a long history.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors.

It was originally discussed by British psychologist Charles Spearman in the early 20th century and has gone on to be used in not only psychology but in other fields that often rely on statistical analyses,

But what is it, what are some real-world examples, and what are the different types? In this article, we'll answer all of those questions.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD , who works at the University of California-Irvine, uses factor analysis in her work on attachment.

She is doing research that looks into how people perceive relationships and how they connect to one another. She gives the example of providing a hypothetical questionnaire with 100 items on it and using factor analysis to drill deeper into the data. "So, rather than looking at each individual item on its own I'd rather say, 'Is there is there any way in which these items kind of cluster together or go together so that I can... create units of analysis that are bigger than the individual items."

Factor analysis is looking to identify patterns where it is assumed that there are already connections between areas of the data.

An Example Where Factor Analysis Is Useful

One common example of a factor analysis is when you are taking something not easily quantifiable, like socio-economic status , and using it to group together highly correlated variables like income level and types of jobs.

Factor analysis isn't just used in psychology but also deployed in fields like sociology, business, and technology sector fields like machine learning.

There are two types of factor analysis that are most commonly referred to: exploratory factor analysis and confirmatory factor analysis.

Here are the two types of factor analysis:

  • Exploratory analysis : The goal of this analysis is to find general patterns in a set of data points.
  • Confirmatory factor analysis : The goal of this analysis is to test various hypothesized relationships among certain variables.

Exploratory Analysis

In an exploratory analysis, you are being a little bit more open-minded as a researcher because you are using this type of analysis to provide some clarity in your data set that you haven't yet found. It's an approach that Borelli uses in her own research.

Confirmatory Factor Analysis

On the other hand, if you're using a confirmatory factor analysis you are using the assumptions or theoretical findings you have already identified to drive your statistical model.

Unlike in an exploratory factor analysis, where the relationships between factors and variables are more open, a confirmatory factor analysis requires you to select which variables you are testing for. In Borelli's words:

"When you do a confirmatory factor analysis, you kind of tell your analytic program what you think the data should look like, in terms of, 'I think it should have these two factors and this is the way I think it should look.'"

Let's take a look at the advantages and disadvantages of factor analysis.

A main advantage of a factor analysis is that it allows researchers to reduce a number of variables by combining them into a single factor.

You Can Analyze Fewer Data Points

When answering your research questions, it's a lot easier to be working with three variables than thirty, for example.

Disadvantages

Disadvantages include that the factor analysis relies on the quality of the data, and also may allow for different interpretations of the data. For example, during one study, Borelli found that after deploying a factor analysis, she was still left with results that didn't connect well with what had been found in hundreds of other studies .

Due to the nature of the sample being new and being more culturally diverse than others being explored, she used an exploratory factor analysis that left her with more questions than answers.

The goal of factor analysis in psychology is often to make connections that allow researchers to develop models with common factors in ways that might be hard or impossible to observe otherwise.

So, for example, intelligence is a difficult concept to directly observe. However, it can be inferred from factors that we can directly measure on specific tests.

Factor analysis has often been used in the field of psychology to help us better understand the structure of personality.

This is due to the multitude of factors researchers have to consider when it comes to understanding the concept of personality. This area of personality research is certainly not new, with easily findable research dating as far back as 1942 recognizing its power in personality research.

Britannica. Charles E. Spearman .

United State Environmental Protection Agency. Exploratory Data Analysis .

Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data .  Psychol Methods . 2004;9(4):466-491. doi:10.1037/1082-989X.9.4.466

Wolfle D. Factor analysis in the study of personality .  The Journal of Abnormal and Social Psychology. 1942;37(3):393–397.

By John Loeppky John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

Factor Analysis - Science method

Luca Ceribelli

  • asked a question related to Factor Analysis
  • 30 Aug 2024

Chuck A Arize

  • 31 Aug 2024
  • Transforming Data : Apply transformations (e.g., logarithmic) to normalize skewed distributions.
  • Robust Methods : Use robust statistical techniques that are less sensitive to outliers.
  • Outlier Detection : Identify and address outliers through methods like Z-scores or influence diagnostics.
  • Data Imputation : Replace outliers with estimated values or remove them if justified.
  • 0 Recommendations

Soumya Kammar

  • 23 Aug 2024

Mark Shevlin

  • 27 Aug 2024
  • 5 Recommendations

Hajar Ameen Ameen

  • 11 Aug 2024

Nishant Kedia

  • 28 Feb 2020

Megbaru Tesfaw Molla

  • 27 Aug 2021
  • Check Conceptual Overlap: Ensure both variables are conceptually related and should be on the same factor.
  • Review Factor Structure: Consider whether a combined factor is appropriate or if more factors should be extracted.
  • Refine Variables: Reassess and possibly revise the variables or their measurements.
  • Perform Rotation: Try different factor rotations (e.g., varimax, oblimin) to clarify the factor structure.

Antonello Clemente

  • 26 Jul 2024

Holger Steinmetz

  • 27 Jul 2024
  • 2 Recommendations

Asmita Jain

  • 17 Jul 2024

Musa ÇalÄ±ĆŸÄ±r

  • 15 Jun 2024

Martin Schultze

  • 24 Jun 2024
  • 4 Recommendations

Soumya Ranjan Das

  • 20 Jun 2024

Bhumika Bunkar

  • 3 Recommendations

Reda Tamanine

  • 13 Jun 2024

Bruce Weaver

  • 18 Jun 2024

Mohamed el Jihaoui

  • 31 May 2024

Samer Sarsam

  • 16 Dec 2022

Ali Fakhr-Movahedi

  • 21 May 2024

Radka HanzlovĂĄ

  • 14 May 2024

xu Yang

  • 13 May 2024

Robert Trevethan

  • 8 Recommendations

Maj Shei

  • 29 Apr 2024

David L Morgan

  • 30 Apr 2024

Francesco Cataldo

  • 13 Apr 2024
  • https://quantpsy.org/pubs/preacher_maccallum_2003.pdf
  • 9 Recommendations

Youssef Jebbar

  • 27 Mar 2024

Talia Nabi

  • 29 Feb 2024

Paul Max Kohn

  • 28 Feb 2024

Samantha Curle

  • 29 Jan 2024

Deborah J Hilton

  • 16 Feb 2024

Mariam Taher Amin

  • 31 Jan 2024

Carlos Moreno Bonilla

  • 18 Jan 2024
  • 25 Recommendations

Alexander Ohnemus

  • 28 Dec 2023

Roomal Seferaj

  • 12 Jan 2024

Arkaprava Roy

  • 10 Recommendations

Gjergji Josifi

  • 21 Dec 2023

Shiva Dutta Chapagai

  • 27 Dec 2023

Marián Čvirik

  • 20 Recommendations
  • 22 Dec 2023

Pablo D Valencia

  • 25 Dec 2023

Abdul Hakim Karimi

  • 14 Dec 2023
  • 16 Dec 2023
  • 12 Recommendations

Parami Abeyrathna

  • 12 Dec 2023

Burke D. Grandjean

  • 13 Dec 2023

Colin Steele

  • 10 Oct 2023

Christian Geiser

  • 25 Nov 2023
  • 27 Nov 2023

Michal NovockĂœ

  • 24 Nov 2023
  • 15 Recommendations

Holley Pitts Arnold

  • 21 Nov 2023
  • 23 Nov 2023
  • 24 Recommendations

Ali Nouri

  • 29 Oct 2023
  • 6 Recommendations

Alex Richard Costa Silva

  • 27 Sept 2023

RonĂĄn Michael Conroy

  • 24 Sept 2023
  • 25 Sept 2023

MatyĂĄĆĄ StraĆĄĂ­k

  • 18 Sept 2023
  • pic.sv g 52.9 kB

E.A. Gawad

  • 20 Sept 2023
  • 33 Recommendations

Mohammed Ausama Alkatib

  • Psychology : In psychology, factor analysis is a prevalent tool for investigating latent dimensions or factors that account for patterns of correlation among variables. It simplifies intricate data and aids in grasping the organization of psychological constructs.
  • Economics : Economists frequently employ factor analysis to dissect economic data and pinpoint the underlying factors shaping economic phenomena. It can illuminate the contributors to economic growth, for instance.
  • Environmental Science : Within environmental science, factor analysis serves to probe relationships among environmental variables and unveil hidden factors that impact environmental processes like pollution levels and climate patterns.
  • Marketing : Factor analysis is instrumental in marketing research, allowing researchers to comprehend consumer behavior and preferences by unearthing latent factors that influence buying decisions.
  • Biology : Biologists have harnessed factor analysis for scrutinizing extensive datasets, such as gene expression data, in order to identify patterns and the underlying factors steering biological processes.
  • Medicine and Healthcare : In the realm of healthcare, factor analysis aids in revealing concealed factors within healthcare data, offering insights into patient outcomes, risk factors for diseases, and the efficiency of healthcare systems.
  • Education : Educational researchers apply factor analysis to delve into factors that impact student performance, learning outcomes, and the effectiveness of educational programs.
  • Engineering : Engineers can utilize factor analysis to comprehend intricate relationships among various components within complex systems, facilitating optimization and troubleshooting.
  • Market Research : Beyond conventional marketing, factor analysis plays a valuable role in market research, encompassing areas such as product development and the analysis of brand perception.
  • Cognitive Science : Researchers in cognitive science harness factor analysis to delve into the underlying cognitive processes and factors that influence human perception and decision-making."

Shivali Muthukumar

  • 9 Sept 2023

Nirmali Das

  • 10 Aug 2023

Tara Marlena Petzke

  • 7 Recommendations

Christof Imhof

  • 27 Jul 2023

Pandia Vadivu Pandian

  • 28 Jul 2023

Manon Bakker

  • 10 Jul 2023

Daniel Wright

  • 11 Jul 2023

Muqadas Fatima

  • 29 Jun 2023
  • Should I do an EFA or a CFA on the data in order to validate the constructs, as there seems to be an ongoing debate in regards to what should be used for adapted scales. How would you go about this?
  • From what I read it is not proper to do an EFA and then a CFA on the same sample, but I have seen it done in similar theses. Is there a reason why I would need to do that too?

Mohialdeen Alotumi

  • 13 Jun 2023
  • 19 Jun 2023
  • https://www.youtube.com/watch?v=0vB-VfnT3ok

Hannah Willemijn

  • 28 May 2023

David Eugene Booth

  • 29 May 2023

Enrique LĂłpez RamĂ­rez

  • 22 May 2023
  • 24 May 2023

Agathe Agathe

  • 23 May 2023

Vignesh K C

  • 27 Apr 2023
  • 28 Apr 2023

Ali Zia-Tohidi

  • 20 Apr 2023
  • 26 Apr 2023

Muhammad Qasim Shabbir

  • 26 Mar 2023
  • 28 Mar 2023

Jacques André Cloutier

  • 12 Mar 2023

Anitha Patnayakuni

  • 12 Jan 2023
  • 11 Feb 2023

Siavash Manzoori

  • 25 Oct 2022

Peymaneh Davoodi

  • 21 Jan 2023

Diwash Malla Thakuri

  • 31 Dec 2022

Lovemore Chikazhe

  • 22 Nov 2022

Carole Orchard

  • 19 Dec 2022

Peter Prudon

  • 11 Dec 2022

Amir Khushk

  • 21 Sept 2022

Jochen Wilhelm

  • 26 Sept 2022

Joseph Buckley

  • 3 Sept 2022

factor analysis research question

  • Fit statisti cs.jpg 80.8 kB
  • Total map - with CoV ar.jpg 143 kB
  • 10 Sept 2022

factor analysis research question

  • Big fi t.jpg 178 kB

Ezginur Çelik

  • 8 Sept 2022

Mark Olsthoorn

  • 7 Sept 2022

Mohammed Farhan

  • 24 Aug 2022
  • 30 Aug 2022

IstvĂĄn KĂłsa

  • 28 Aug 2022

Chetna Chauhan

  • 23 Aug 2022

Christie Anggelo

  • 22 Aug 2022

Wang Lu

  • 16 Jul 2022

Andrei Mikhailov

  • Model_1_ Ok.png 88.8 kB
  • Unidentified Model 2.png 88 kB
  • 13 Jul 2022

Niteesh Shanbog

  • 22 Jun 2022

Ghanem Aicha

  • 11 Jul 2022

Andrew Black

  • 28 Jun 2022

Josipa Kern

  • 30 Jun 2022
  • 18 Jun 2022
  • 24 Jun 2022

Bas Bar

  • 19 Jun 2022

factor analysis research question

  • Screenshot 2022-06-19 at 18.43. 56.png 247 kB
  • Screenshot 2022-06-19 at 18.44. 09.png 225 kB
  • Screenshot 2022-06-19 at 18.44. 23.png 210 kB
  • Screenshot 2022-06-19 at 18.44. 31.png 225 kB

Karin Schermelleh-Engel

  • 21 Jun 2022

Matheus Perazzo

  • 15 Jun 2022

Esther Cuadrado

  • 12 Jun 2022

Ehsan Mortazavi

  • First ord er.PNG 108 kB
  • Second-ord er.PNG 111 kB

Murat Tuna

  • 29 Nov 2019

Terlumun Adagba

  • 28 Recommendations

Amalia Raquel PĂ©rez Nebra

  • 23 May 2022

Olufemi Oroye

  • 19 May 2022
  • 22 May 2022

Sura Almahasis

  • 14 May 2022

Peter Samuels

  • 13 May 2022

Shikha Ahuja

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Confirmatory Factor Analysis (CFA): A Detailed Overview

Introduction to confirmatory factor analysis.

Confirmatory Factor Analysis (CFA) is a sophisticated statistical technique used to verify the factor structure of a set of observed variables. It allows researchers to test the hypothesis that a relationship between observed variables and their underlying latent constructs exists. CFA is distinct from Exploratory Factor Analysis (EFA), where the structure of the data is not predefined and is instead determined through the analysis.

Purpose and Procedure of CFA

The primary goal of CFA is to confirm whether the data fits a hypothesized measurement model based on theory or prior research. This involves several critical steps:

1. Defining Constructs : The process begins by clearly defining the theoretical constructs. This stage often involves a pretest to evaluate the construct’s items and ensure they are well-defined and represent the concept accurately.

2. Developing the Measurement Model : In CFA, it is essential to establish the concept of unidimensionality, where each factor or construct is represented by multiple observed variables that are presumed to measure only that specific construct. Typically, a good practice involves having at least three items per construct.

3. Specifying the Model : Researchers must specify the number of factors and the pattern of loadings (which variables load on which factors). This specification is based on theoretical expectations or results from previous studies.

4. Assessing Model Fit : The validity of the measurement model is assessed by comparing the theoretical model with the actual data. This includes examining factor loadings (with a standard threshold of 0.7 or higher for adequate loadings), and fit indices such as Chi-square, Root Mean Square Error of Approximation (RMSEA), Goodness of Fit Index (GFI), and Comparative Fit Index (CFI).

Key Questions CFA Addresses

Can the proposed five factors in a 20-question instrument be identified and validated through the specific items designed to measure them?

Do four specific survey questions reliably measure a single underlying factor?

Assumptions in CFA

Multivariate Normality : The data should follow a multivariate normal distribution.

Sample Size : Adequate sample size is crucial, generally n > 200, to ensure reliable results.

Model Specification : The model should be correctly specified a priori based on theoretical or empirical justification.

Random Sampling : Data must be collected from a random sample to generalize findings.

Need help with your research?

Schedule a time to speak with an expert using the calendar below.

User Friendly Software

Transform raw data to written, interpreted, APA formatted CFA results in seconds.

Key Terms and Concepts in CFA

  • Theory and Model : A theory is a systematic set of causal relationships explaining a phenomenon, while a model is a specified set of dependent relationships within that theory used for testing.
  • Path Analysis and Diagram : Path analysis is utilized to test structural equation models, with path diagrams visually representing the cause-effect relationships.
  • Endogenous and Exogenous Variables : Endogenous variables are outcomes within the model, influenced by other variables, while exogenous variables are predictors not influenced by other variables within the model.
  • Confirmatory Analysis and Cronbach’s Alpha : Confirmatory analysis tests pre-specified relationships, and Cronbach’s Alpha assesses the reliability of construct indicators.
  • Identification : This refers to the ability of the data to provide sufficient information to estimate the model. Models can be under-identified, exactly identified, or over-identified.
  • Goodness of Fit : This measures how well the model fits the observed data. Fit indices help in evaluating whether the model is acceptable.

CFA is an essential tool in the toolkit of researchers aiming to validate the structure of their measurement instruments. It provides a rigorous method to ensure that the data aligns with expected theoretical constructs, enhancing the reliability and validity of subsequent analyses based on these measurements.

Confirmatory factor analysis (CFA) and statistical software:

Usually, statistical software like Intellectus Statistics , AMOS , LISREL, and SAS are used for confirmatory factor analysis.  In AMOS, visual paths are manually drawn on the graphic window and analysis is performed. In LISREL, confirmatory factor analysis can be performed graphically as well as from the menu.  In SAS, confirmatory factor analysis can be performed by using the programming languages.

Related Pages:

  • Exploratory Factor Analysis
  • Sample Size
  • SPSS Manual

To Reference This Page:

Statistics Solutions. (2013). Confirmatory Factor Analysis . Retrieved from https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/confirmatory-factor-analysis/

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

  • Edit your research questions and null/alternative hypotheses
  • Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
  • Justify your sample size/power analysis, provide references
  • Explain your data analysis plan to you so you are comfortable and confident
  • Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )

  • Clean and code dataset
  • Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
  • Conduct analyses to examine each of your research questions
  • Write-up results
  • Provide APA 7 th edition tables and figures
  • Explain Chapter 4 findings
  • Ongoing support for entire results chapter statistics

Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar , or email [email protected]

  • Find Us On Facebook
  • Follow on Twitter
  • Subscribe using RSS

Getting More from your Survey Questions with Factor Analysis

Surveys can be a rich source of information, including not only factual questions, but asking about attitudes, behaviours, and activities. The results from a survey analysis can also provide more than just percentages, averages and crosstabulations.

Factor analysis is a statistical technique that combines questions that are related (correlated) into a smaller number of factors, to create more robust measures.

By combining questions or variables and using the resulting measures rather than analysing and reporting the questions individually, factor analysis is useful as a dimensionality-reduction technique (other dimension-reduction techniques include Principal Component Analysis [PCA], for example). And being based on correlations it can help to avoid some of the problems of collinearity that can arise in analyses. Furthermore, factors can often provide more meaningful results, by capturing overall, intrinsic characteristics and qualities, rather than individual, separate questions.

It is worth noting though that factor analysis can be used with many types of data, not just with survey responses. It can be used to analyse, for example, items bought in shops or supermarkets, time spent in different office areas (solo pods, meeting rooms, conference spaces, etc.), patient reported outcomes (PROs) (e.g., of pain or depression), and so on. Hence, factor analysis can not only help you to understand your students’, your customers’ or your employees’ attitudes and opinions, it can be used to help uncover their preferences and behaviours via transaction or office utilisation data, for example.

In this blog we show factor analysis in action.

What Is a Factor?

A factor, sometimes called a latent trait or construct, is an intrinsic characteristic or quality. Factors are multi-faceted and difficult to measure directly. Examples being qualities like empathy, IQ, self-confidence, or ethos.

The theory of factor analysis is that these deeper level factors or latent traits underpin your actions and attitudes and also influence your responses to questions about these topics.

Example Data

To illustrate factor analysis, we use some data from the Organisation for Economic Co-operation and Development’s (OECD) Programme for International Student Assessment (PISA) as an example. The PISA study runs every 3 years, across many countries, and assesses the numeracy, literacy and science knowledge and skills of 15-year-old students. The OECD make PISA data available for secondary analyses.

PISA also includes a teacher questionnaire. The 2018 questionnaire asked teachers to answer the following set of questions:

factor analysis research question

Our example is based on the responses of teachers in the UK.

Factor Analysis

The first step in factor analysis is to calculate the correlations between each of the questions. As the responses are on a Likert scale (from ‘ strongly disagree ‘ to ‘ strongly agree ‘) and are ordered categorical ( ordinal ) data rather than on a continuous scale, we calculate polychoric correlations , which are appropriate for these sorts of data (unlike, for example, Pearson product-moment correlations).

In the next step, the appropriate number of factors to be extracted is determined (ensuring, for example, that a sufficient proportion of the variation in the data is explained), and factor solutions are calculated based on the correlations (e.g., perhaps using the fa() function in the psych package in the statistical software R ). Factor analysis groups together questions that are highly correlated to derive a smaller set of factors that retain a high proportion of the information in the original questions. We won’t go into the detail of how this is done here (as it’s not the focus of this post), but in our example we find that 2 factors are potentially a good solution.

A key aim of factor analysis is to obtain factors that are interpretable. To interpret the factors, we look at the “factor loadings” from the factor analysis output. Each factor has a set of factor loadings corresponding to the input questions. These are the correlations between each input question and the factor, the underlying latent construct.

We identify the questions that are strongly correlated with each of the underlying factors. Strong correlations are indicated by values close to +1 or to -1 (positively and negatively correlated, respectively), weaker correlations are values closer to zero.

In the example below, we see that the 1 st , 2 nd , 4 th and 6 th statements are strongly correlated with factor 1, and the 3 rd , 5 th and 7 th statements are strongly correlated with factor 2. (These are indicated by the shaded cells.) Though, we note that statements 8, 9 and 10 also correlate with the factors, but to a lesser extent.

factor analysis research question

Interpreting the Factors

Factor 1 – satisfaction with teaching.

The strong correlations between the statements with shaded cells and factor 1 indicate that teachers who agreed that “ The advantages of being a teacher clearly outweigh the disadvantages ” also tended to agree with the statement “ If I could decide again, I would still choose to work as a teacher “. In addition, teachers who agreed with these first 2 statements also tended to disagree (indicated by the negative correlation) with the statements “ I regret that I decided to become a teacher ” and “ I wonder whether it would have been better to choose another profession .”

The converse is also true; teachers who disagreed with the first two statements tended to agree to the latter two shaded statements.

Collectively these four statements provide a measure of teachers’ satisfaction with being a teacher: their satisfaction with their profession.

Factor 2 – Satisfaction with Their School

In factor 2, the strong correlations between the 3 rd , 5 th and 7 th statements (with shaded cells) and the factor indicate that teachers who agreed that “ I enjoy working at this school ” also tended to agree with the statement “ I would recommend my school as a good place to work “, and additionally tended to disagree (indicated by the negative correlation) with the statement “ I would like to change to another school if that were possible “. Again, the converse is also true; teachers who agreed with the first statement tended to disagree with the latter two statements.

Collectively these three statements provide a measure of teachers’ satisfaction with their particular school.

All in All, I Am Satisfied with My Job

Ideally the correlations between statements and factors should show associations between each statement and only one of the factors (or neither of the factors). In the factor loading matrix above, the final statement (“ All in All, I Am Satisfied with My Job “) is positively correlated with both Factor 1: teachers’ satisfaction with their profession, and Factor 2: teachers’ satisfaction with their school, though the correlations are weaker than for the shaded statements (and are described as moderate rather than strong). It is quite sensible in terms of interpretation that teachers’ overall job satisfaction is (positively) related to both their satisfaction with the profession and with their school. However, since the correlations associated with this statement are not strong, the factors may be improved by excluding this statement, and the other statements with small correlations, from the analysis, as they may be adding more noise than information.

What Next?/Using Factors

Having identified and interpreted the factors, we can use the data and the factor solution to calculate factor scores: in this case use the teachers’ responses to calculate the ‘satisfaction with the profession’ and ‘satisfaction with their school’ measures for each teacher. (A weighted combination of the factor loadings multiplied by the corresponding question responses gives the factor score, measuring the relative magnitude of each factor (i.e., trait), for each teacher.)

Teachers’ scores, whether they are high or low, or nearer the average, will reflect (because they are calculated from) their levels of agreement and disagreement and the strength of their opinions. And so, we have taken responses to (in this case) 10 categorical variables and created two scale measures (continuous variables), which provide more robust measures of teachers’ satisfaction than the survey questions individually.

While factor analysis is a technique in its own right, it is not usually the analysis outcome itself. The derived factors can be really useful when used in subsequent analyses. They can be used to compare or describe different groups of teachers, for example to answer hypotheses such as are older teachers more satisfied with their profession than younger teachers? They can be used in statistical models, for example to explore whether and how students’ outcomes vary according to their teachers’ levels of satisfaction or what are the drivers of teachers’ satisfaction? They can be used with cluster analysis to identify groups of teachers according to their characteristics.  Similarly, in another context, with a customer or brand survey, for example, we could investigate whether customer satisfaction might be associated with a particular customer demographic, or cluster customers into different groups based on their attitudes, opinions, preferences, and shopping behaviours, to better understand your customer base and brand positioning, and target products and/or advertisements accordingly.

Tell us what you want to achieve

  • Data Collection & Management
  • Data Mining
  • Innovation & Research
  • Qualitative Analysis
  • Surveys & Sampling
  • Visualisation
  • Agriculture
  • Environment
  • Market Research
  • Public Sector

Select Statistical Services Ltd

Oxygen House, Grenadier Road, Exeter Business Park,

Exeter EX1 3LH

t: 01392 440426

e: [email protected]

Sign up to our Newsletter

  • Please tick this box to confirm that you are happy for us to store and process the information supplied above for the purpose of managing your subscription to our newsletter.
  • Name This field is for validation purposes and should be left unchanged.
  • Telephone Number

' width=

  • By using this form you agree with the storage and handling of your data by this website.
  • Phone This field is for validation purposes and should be left unchanged.

Enquiry - Jobs

' width=

  • Comments This field is for validation purposes and should be left unchanged.

COMMENTS

  1. Factor Analysis Guide with an Example

    Use factor analysis to identify a smaller number of latent factors that cause a larger number of observable variables to covary.

  2. Exploratory Factor Analysis: A Guide to Best Practice

    Abstract. Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements.

  3. Factor analysis and how it simplifies research findings

    Factor analysis isn't a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research, as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

  4. Factor Analysis

    Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.

  5. Factor analysis: Definition, sample questions

    Factor analysis is a statistical technique that aids in reducing a complex dataset into simpler, more manageable components, unveiling patterns and relationships within the data. By identifying underlying structures, it provides valuable insights into the variables and how they relate to one another. This blog explores the definition, types, examples, and sample questions of factor analysis to ...

  6. Lesson 12: Factor Analysis

    Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...

  7. SPSS Factor Analysis

    Factor Analysis Output V - Rotated Component Matrix Our rotated component matrix (below) answers our second research question: " which variables measure which factors?

  8. PDF Factor Analysis

    What do we need factor analysis for? What are the modeling assumptions? How to specify, fit, and interpret factor models? What is the difference between exploratory and confirmatory factor analysis? What is and how to assess model identifiability? What is

  9. Exploratory Factor Analysis: A Five-Step Guide for Novices

    Abstract Factor analysis is a multivariate statistical approach commonly used in psychology, education, and more recently in the health-related professions. This paper will attempt to provide novice researchers with a simplified approach to undertaking exploratory factor analysis (EFA). As the paramedic body of knowledge continues to grow, indeed into scale and instrument psychometrics, it is ...

  10. Factor Analysis: A Short Introduction, Part 1

    Most research fields consider this a strong association for a factor analysis. Two other variables, education and occupation, are also associated with Factor 1.

  11. Factor Analysis ‱ Simply explained

    Research questions Factor Analysis A possible research question might be: Can different personality traits such as outgoing, curious, sociable, or helpful be grouped into personality types such as conscientious, extraverted, or agreeable?

  12. A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

    This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).

  13. An Introduction to Factor Analysis: Reducing Variables

    Learn about factor analysis, a statistical method for reducing variables and extracting common variance for further analysis.

  14. Factor analysis

    This introductory chapter discusses the purposes of factor analysis and dimension reduction, limitations of factor analysis, and common research questions associated with factor analysis.

  15. Factor analysis

    Factor analysis is commonly used in psychometrics, personality psychology, biology, marketing, product management, operations research, finance, and machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables.

  16. Lesson 12: Factor Analysis

    Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon.

  17. Factor Analysis in Psychology: Types, How It's Used

    The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD, who works at the University of California-Irvine, uses factor analysis in her work on attachment. She is doing research that looks into how people perceive relationships and how they connect to one another.

  18. Understanding Factor Analysis: A Statistical Method for Data

    Factor analysis is a sophisticated statistical method used to group and reduce a large set of variables into fewer, more manageable factors or dimensions. This technique is crucial for uncovering latent variables or constructs that are not directly observable but are inferred from the relationships among observed variables. The primary aim of factor analysis is to simplify complex data sets ...

  19. 747 questions with answers in FACTOR ANALYSIS

    Factor Analysis - Science method Explore the latest questions and answers in Factor Analysis, and find Factor Analysis experts.

  20. 3 Factor Analysis

    The factor analysis video series is available for FREE as an iTune book for download on the iPad. The ISBN is 978-1-62847-041-3. The title is "Factor Analysis". Waller and Lumadue are the authors ...

  21. Understanding Confirmatory Factor Analysis: An In-Depth Guide

    Discover how confirmatory factor analysis can identify and validate factors and measure reliability in survey questions.

  22. Getting More from your Survey Questions with Factor Analysis

    The results from a survey analysis can also provide more than just percentages, averages and crosstabulations. Factor analysis is a statistical technique that combines questions that are related (correlated) into a smaller number of factors, to create more robust measures. By combining questions or variables and using the resulting measures ...