Jump to ContentJump to Main Navigation
Health Measurement ScalesA practical guide to their development and use$

David L. Streiner and Geoffrey R. Norman

Print publication date: 2008

Print ISBN-13: 9780199231881

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780199231881.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 February 2017

(p.409) Appendix C A (very) brief introduction to factor analysis

(p.409) Appendix C A (very) brief introduction to factor analysis

Health Measurement Scales
Oxford University Press

Exploratory factor analysis

Assume you are developing a test to measure a person’s level of anxiety. After following the steps in Chapter 3 (Devising the items), you come up with the following 10 items:

  1. 1. I often avoid high places.

  2. 2. I worry a lot.

  3. 3. My hands often get sweaty.

  4. 4. I cross the street in order to avoid having to meet someone.

  5. 5. I have difficulty concentrating.

  6. 6. I can often feel my heart racing.

  7. 7. I find myself pacing when I’m under stress.

  8. 8. I frequently feel tense.

  9. 9. When I’m under stress, I tend to get headaches.

  10. 10. People tell me I have trouble letting go of an idea.

There are three hypotheses regarding the way these 10 items are related. At the one extreme, the first hypothesis is that they are totally unrelated, and tap 10 different, uncorrelated aspects of anxiety. The second, at the other extreme, is that they are all highly correlated with each other. The third hypothesis is somewhere in the middle: that there are groups of items that cluster together, with each cluster tapping a different aspect of anxiety. A natural place to begin is to look at the correlation matrix. If all of the correlations are high, it would favour the first hypothesis; while all low correlations would lead you to adopt the second; and groups of items that seem related to each other but uncorrelated with the other groups would support the last hypothesis. There are two problems, though, in simply examining a correlation matrix. First, it is unusual for correlations to be very close to 1.0 or to 0.0; most often, they fall in a more restricted range, making it more difficult (p.410) to separate high correlations from low ones. Second, even with as few as 10 items, there are 45 unique correlations to examine; if there are 30 items, which is more common when we are developing a test, there will be 435 correlations to look at (the number of unique correlations is n (n – 1)/2); far more than we can comfortably make sense of. However, we can turn to factor analysis to help us.

What factor analysis does with the correlation matrix is, as the name implies, to derive factors, which are weighted combinations of all of the variables. The first two factors will look like:

Appendix C A (very) brief introduction to factor analysis
where the Fs are the factors, the Xs are the variables (in this case, the items), and the ws are weights. The first subscript for w indicates the factor number, and the second the variable, so that w 1,2 means the weight for Factor 1 and variable 2.

There are as many factors ‘extracted’ as there are variables, so that there would be 10 in this case. It may seem as if we have only complicated matters at this point, because we now have 10 factors, each of which is a weighted combination of the variables, rather than simply 10 variables. However, the factors are extracted following definite rules. The weights for first factor are chosen so that it explains, or accounts for, the maximum amount of the variability (referred to as the variance) among the scores across all of the subjects. The second factor is derived so that it:

  1. (a) explains the maximum amount of the variance that remains (i.e. that is left unaccounted for after Factor 1 has been extracted); and

  2. (b) is uncorrelated with (the technical term is orthogonal to) the first factor.

Each remaining factor is derived using the same two rules; account for the maximum amount of remaining variance, and be orthogonal to the previous factors. In order to completely capture all of the variance, we would need all 10 factors. But, we hope that the first few factors will adequately explain most (ideally, somewhere above 70 per cent or so) of the variance, and we can safely ignore the remaining ones with little loss of information. There are a number of criteria that can be used to determine how many factors to retain; these are described in more detail in Norman and Streiner (2003, 2007). At this point, the computer will print a table called the factor loading matrix, where there will be one row for each variable, one column for each retained factor, and where the cells will contain the ws. These weights are called the factor loadings, (p.411) and are the correlations between the variables and the factors. After we have done this initial factor extraction, we usually find that:

  1. (a) the majority of the items ‘load’ on the first factor;

  2. (b) a number of the items load on two or more factors;

  3. (c) most of the factor loadings are between 0.3 and 0.7; and

  4. (d) each factor after the first has some items that have positive weights and other items with negative weights.

Mathematically, there is nothing wrong with any of these, but they make the interpretation of the factors quite difficult.

To try to overcome these four problems, the factors are rotated. In the best of cases, this results in:

  1. (a) a more uniform distribution of the items among the factors that have been retained;

  2. (b) items loading on one and only one factor;

  3. (c) loadings that are closer to either 1.0 or 0.0; and

  4. (d) all of the significant loadings on a factor having the same sign.

A hypothetical example of a rotated factor matrix with three factors is seen in Table Table C.1.

At the bottom of each column is a number called the eigenvalue, which is an index of the amount of variance accounted for by each factor. Its value is

Table C.1 An example of a rotated factor loading matrix for three factors and 10 variables


Factor 1

Factor 2

Factor 3








− 0.02























− 0.18














(p.412) equal to the sum of the squares of all of the ws in the column, so for Factor 1, it is (0.122 + 0.812 + … + 0.722). Because all of the variables have been standardized to have a mean of zero and a standard deviation (and hence, a variance) of 1.0, the total amount of variance in the data set is equal to the number of variables; in this case, 10. Consequently, Factor 1 accounts for 2.847/10 = 28.47 per cent of the variance, and the three factors together account for (2.847 + 1.622 + 1.581)/10 = 6.050/10 = 60.5 per cent of the variance; a bit low, but still acceptable.

The items that load highest on Factor 1 are 2, 5, 8, and 10, which appear to tap the cognitive aspect of anxiety. Similarly, Factor 2, composed of items 1, 4, and 7, reflects the behavioural component; and Factor 3, with items 3, 6, and 9, measures the physiological part of anxiety. Note also that, although item 4 loads most heavily on Factor 2, its loading on Factor 1 is nearly as high. This factorially complex item may warrant rewording in a revised version.

This is the older, more traditional form of factor analysis, and is generally what is meant when people use the term. Because a new form of factor analysis was later introduced (described in the next section), a way had to be found to distinguish the two. Consequently, this is now referred to as exploratory factor analysis (EFA). This reflects the fact that we start with no a priori hypotheses about the correlations among the variables, and rely on the procedure to explore what relationships do exist. This example was also somewhat contrived, in that the rotated solution was easily interpreted, corresponded to existing theory about the nature of anxiety (Lang 1971), did not have too many factorially complex items, nor any items that did not load on any of the extracted factors. Reality is rarely so generous to us. More often, the results indicate that more items should be rewritten, others discarded, and there may be factors which defy explanation.

It cannot be emphasized too strongly that EFA should not be used with dichotomous items, and most likely not with ordered category (e.g., Likert scale) items, either. The reason is that the correlation between two dichotomous items is not a Pearson correlation (r), but rather a phi(φ) coefficient. Unlike r, which can assume any value between –1.0 and +1.0, φ is constrained by the marginal totals. Table C.2 shows the proportions of people responding True or False to each of two items. The value of φ is quite small, only 0.09. However, because the marginal distribution of Item A deviates so much from a 50:50 split, the maximum possible value of φ is 0.29 and, compared with this, 0.09 is no longer negligible. Thus, EFA groups items that are similar in terms of the proportions of people endorsing them, rather than their content (Ferguson 1941). Special programs exist to factor analyse dichotomous or ordinal level data, which begin with different types of coefficients, (p.413)

Table C.2 Proportions of people answering True and False to two items

Question A



Question B











called tetrachoric correlations in the case of dichotomous items, and polychoric correlations for ordered categories.

Confirmatory factor analysis

Confirmatory factor analysis (CFA) is a subset of a fairly advanced statistical technique called structural equation modeling (see Norman and Streiner 2003 for a basic introduction, and Norman and Streiner 2007 for a more complete one). Although it has been around for many years, the earlier statistical programs required a high degree of statistical sophistication. Within the past decade or so, though, a number of programs have appeared that have made the process considerably easier and available to more researchers.

We said in the previous section that EFA is a hypothesis generating technique, used when we do not know beforehand what relationships exist among the variables. Thus, while it can be used to evaluate construct validity, the support is relatively weak because no hypotheses are stated a priori. As the name implies, though, CFA is a hypothesis testing approach, used when we have some idea regarding which items belong on each factor. So, if we began with Lang’s conceptualization of the structure of anxiety, and specifically wrote items to measure each of the three components, it would be better if we were to use CFA rather than EFA, and identified which items should belong to which factor. At the simplest level, we can specify which items comprise each factor. If our hypotheses were better developed, or we had additional information (as explained in the next paragraph), we can ‘constrain’ the loadings to be of a given magnitude; for example, that certain items will have a high loading, and others a moderate one.

The technique is extremely useful when we are trying to compare two different versions of a scale (e.g. an original and a translated version), or to see if two different groups (e.g. men and women) react similarly to the items. We would begin by running an EFA on the target version in order to determine the characteristics of the items. Testing for equivalence could then be done in a stepwise fashion.

(p.414) First, we would simply specify that the items on the second version (or with the second group of people) load on the same factors as the original. If this proves to be the case, we can make the test for equivalence more stringent, by using the factor loadings from the original as trial loadings in the second. If this more tightly specified model continues to fit the data we have from the second sample, we can proceed to the final step, where we see if the variances of each item are equivalent across versions. If all three steps are passed successfully, we can be confident that both versions of the test or both groups are equivalent. Various ‘diagnostic tests’ can tell us which items were specified correctly and which do not fit the hypothesised model. However, unlike EFA, CFA will not reassign an ill-fitting item to a different factor.


Ferguson, G. A. (1941). The factorial interpretation of test difficulty. Psychometrika, 6, 323–9.

Lang, P. J. (1971). The application of psychophysiological methods. In Handbook of psychotherapy and behavior change (eds. S. Garfield and A. Bergin) pp. 75–125. Wiley, New York.

Norman, G. R. and Streiner, D. L. (2003). PDQ Statistics (3rd ed.). B. C. Decker, Toronto.

Norman, G. R. and Streiner, D. L. (2007). Biostatistics: The bare essentials (3rd edn). B. C. Decker, Toronto.