Components and Processes
Abstract and Keywords
Social perception in general, and accuracy in particular, can be viewed as being composed of different components. Several different componential approaches to accuracy (Cronbach’s , Kenny’s , and Judd & Park’s ) are described, reviewed, and critically evaluated. For example, if Roberto accurately predicts the weather to be warm and sunny tomorrow, is that because he is an inveterate optimist who usually predicts good things, because he usually predicts the weather in particular to be nice, or because he has based his prediction for tomorrow on the latest and most valid in meteorological tools? Each of these reasons can be viewed as a “component” (one component is his tendency to predict good things, another is his tendency to predict uniquely good weather, the third is his knowledge of tomorrow’s weather in particular). This chapter concludes that although componential approaches provide important and useful information about the processes of social judgment and sources of accuracy and inaccuracy, the claim that one “must” assess components in order to assess accuracy—often made by advocates of componential approaches—is not justified. Several productive and instructive theoretical perspectives on accuracy that are not explicitly componential are reviewed. Although they do not “conflict” with componential approaches, they do demonstrate that one can productively study accuracy without performing an explicitly componential analysis. These include correlational approaches to accuracy (which include an instructive subsection emphasizing the similarities of assessing social perceptual accuracy to those of assessing construct validity in the social sciences), Brunswik’s Lens Model, Funder’s Realistic Accuracy Model, and Dawes’ Improper Linear Models. Nonetheless, this chapter also concludes that understanding componential approaches also contributes to a greater understanding of results even obtained from approaches that do not specifically perform componential analyses.
A stopped clock is right twice each day. That does not make it a good clock.
What does this have to do with social perception? More than it seems. Let’s return to my successful prediction (from Chapter 11) that Mike Piazza would hit a homerun when he came up to bat with the bases loaded. That makes me look like a pretty darn good baseball prognosticator, doesn’t it?
Not necessarily. Maybe I always predict that Mike will hit a homerun. Maybe I always predict everyone will hit a homerun. And it could even be worse than that: Maybe I always predict all baseball players will do great things, whether in the field, at bat, or on the base paths. (This would be logically absurd, because it would mean that I would predict both that Mike would hit a grand slam and that the pitcher would strike him out. A detailed discussion of people’s ability to hold mutually exclusive beliefs [see, e.g., Dawes, 2001], however, is beyond the scope of this book.)
Even though I might have happened to have been right that one time, I could not necessarily be considered a particularly astute judge of baseball. One could think of my prediction regarding Mike’s at bat as stemming from several sources or components: (1) my overall tendency to think well of baseball players; (2) my overall tendency to predict that batters will hit homeruns (over and above my general tendency to think well of players); (3) my overall tendency to think that Mike is a good hitter (over and above my general tendency to think well of players); and (4) my specific tendency to predict that Mike will hit a homerun (over and above my tendency to think well of players; to predict that they, in general, will hit homeruns; and to think well of Mike as a hitter in general—OK, I admit it, even I am getting dizzy at this point). Each component of my prediction can be accurate to some degree, and each contributes both to my prediction for Mike and my overall likelihood of being accurate (across lots of judgments or predictions).
This type of thinking inspired Cronbach’s (1955) (in?)famous review and at least two other more recent perspectives (Judd & Park, 1993; Kenny, 1994), all of which identified several processes contributing to social perception and which argued that accuracy needs to be separated into different components reflecting these different processes. This section describes each of these three componential approaches to the study of accuracy.
Cronbach’s (1955) analysis suggested that each perceiver’s judgment consisted of several components: elevation, differential elevation, stereotype accuracy, and differential accuracy. Each component is discussed next.
Elevation accuracy. Do I see other people through rose-colored glasses or am I a nasty cynical malcontent? Elevation refers to my general tendency to over- or underestimate people on the attributes being judged. It corresponds, in the baseball example, to my tendency to predict good (or bad) things for all players all the time.
Elevation accuracy addresses whether a perceiver rates targets, overall, more or less favorably than indicated by the criterion. It is the difference between (1) the average of all of a perceiver’s ratings of all targets across all judgments and (2) the average of all targets across all criteria (Kenny, 1994). Thus, there is a single elevation accuracy score for each perceiver’s judgments regarding targets.
Let’s say I was asked to predict several players’ (1) likelihood of getting a hit, (2) likelihood of getting a walk, and (3) likelihood of batting in a run with runners in scoring position.1 Elevation accuracy would be assessed, for example, by comparing my overall probability estimate, averaging over all players and all three judgments, to the actual overall probabilities, averaging over all players and all judgments. For example, Chris predicts the players on the San Francisco Giants to hit .300, get a walk once every 20 at bats (.05), and drive in a run 40% (.40) of the time with runners in scoring position. The average of these averages would be .250. If, on average, the Giants’ players actually hit .250, got a walk once every 25 at bats (.04), and drove in runners in scoring position 31% of the time, the overall average of these averages would be .200. Thus, Chris’s elevation accuracy score would be 0.05 and would indicate, in this particular case, that he generally expects these players to do better than they actually do.
In a typical social perception case, elevation accuracy would represent the difference between a person’s average ratings across several targets on several characteristics (e.g., friendliness, intelligence, conscientiousness) and the actual average of all targets’ scores on criteria reflecting those characteristics (less difference, more elevation accuracy). When both the judgment and criterion are on purely subjective scales (e.g., a 1 to 7 scale going from “not at all” to “a great deal”), as in many social perception studies, elevation typically has little relevance to accuracy. Instead, it primarily reflects response bias: people’s tendency to give responses at higher or lower ends of the scale (e.g., Cronbach, 1955; Kenny, 1994). Furthermore, elevation accuracy cannot be readily assessed when judgment and criterion are on different scales (e.g., if I rate how good the players are on a 1 to 7 scale, elevation cannot be assessed if the criteria are percentages).
(p.196) Differential elevation accuracy. Let’s say you rate Professor Smith as a better teacher than Professor Jones. Differential elevation refers to a perceiver’s tendency to rate one target higher than another, averaging over all ratings of each target. This can be considered a “target effect” in that it represents mean differences (averaging over all judgments) between targets.
Differential elevation accuracy addresses whether your differential perception of Smith and Jones is correct. It refers to the correspondence between (1) a perceiver’s ratings of each of several targets, averaging over all judgments (e.g., if there are two targets, there are two sets of averaged ratings; if there are three targets, there are three sets of averaged ratings; etc.), and (2) each target’s actual standing averaging over all criteria (e.g., if there are three criteria, each target’s “actual standing” is the average of his or her criterion scores). Differential elevation accuracy indicates how closely a perceiver’s ranking of targets on the traits or characteristics (overall) being judged corresponds to the targets’ ranking on the criteria (overall).2
Differential elevation accuracy answers the question of whether my perception of Mike Piazza as a better hitter (averaging over all hitting categories) than Jorge Posada is correct. That is, according to hitting criteria (batting average, homeruns, RBIs, etc.), is Mike actually a better hitter than Jorge? In a social perception case, differential elevation accuracy might indicate whether my belief that Alice is more competent (averaging over ratings of intelligence, responsibility, and social skill) than Susan is actually true. That is, averaging over the criteria for intelligence, responsibility, and motivation, is Alice actually more competent than Susan?
Stereotype accuracy. Does Fred believe that high self-esteem is more common than high intelligence among a group of targets? Does the perceiver rate some traits as being more common than others? The stereotype, in Cronbach’s system, is the tendency to see some traits as more common than others, averaging over all targets. It probably would have been better to call it a “trait effect,” because it represents people’s perceptions of the prevalence of each of several traits among a group of targets. Thus, if each of three targets is rated on each of five traits, there will be five trait effects, one for each trait.
Of the traits being rated, do people see those that are most and least common as actually being most and least common? Stereotype accuracy refers to the perceived versus actual relative prevalence or ranking of the traits, averaged across all targets (as such, it has nothing to do with what most people usually think of as social stereotypes regarding, e.g., race, class, sex, etc. and nothing to do with stereotype accuracy as discussed elsewhere in this book).
In the baseball example, stereotype accuracy might address the question: Does the perceiver realize that the probability of driving in a runner from scoring position is higher than the probability of getting a hit, which, in turn, is higher than the probability of getting a walk? In a social perception case, it might mean recognizing the relative prevalence of friendliness, intelligence, and conscientiousness among the targets being judged (e.g., perhaps all are friendly, some are high in intelligence, and only one is conscientious). Like elevation, however, if judgment and criterion are measured on purely subjective scales, stereotype accuracy scores would most likely primarily reflect response bias (which ends of the scale people tend to use when judging that particular attribute), rather than anything substantively related to accuracy.
Differential accuracy. After accounting for (by removing) elevation, differential elevation (target effect), and the stereotype (trait) effect, does the perceiver see Bill as more articulate but less moral than George? If so, this constitutes the perceiver’s differential in judgments about Bill and George. This can be considered a uniqueness component to social perception, (p.197) because it reflects the perceiver’s specific judgments about the degree or level of a specific trait found in a particular target, rather than general tendencies to view the target’s characteristics or the particular trait as being more or less prevalent (Kenny, 1994). Thus, if there are three perceivers and five traits, there will be 15 uniqueness effects (one for each perceiver × trait combination).
Differential accuracy represents people’s ability to rank order targets on each specific trait. Out of ten players, Piazza might have the second highest batting average; he might have the highest probability of knocking in a run with runners in scoring position; but he might only have the fifth highest probability of getting on base by a walk. How well people’s judgments or predictions correspond to this actual ranking would reflect differential accuracy. In a social perception case, differential elevation accuracy would indicate how well a teacher’s belief that Louisa is smarter, wilder, and more ambitious than Kendra corresponds with their actual relative amounts of smarts, wildness, and ambition.
Cronbach’s components as ANOVA. For the statistically inclined, it may help to point out that Cronbach’s components, which appear highly complex and hard to follow in the original 1955 article, can be simplified into a two-way analysis of variance (Kenny, 1994). For the statistically uninitiated, ANOVA is a statistical technique commonly used in psychological experiments for determining how much each of two variables, independently and in combination with one another, predict or explain some outcome. In Cronbach’s system, the ANOVA factors are trait and target (for the statistically uninitiated, the constant below is simply the grand mean of all observations; the trait and target effects are deviations from the grand mean):
Judgment = Constant (elevation) + Target main effect (differential elevation) + Trait main effect (stereotype) + Target × Trait interaction (differential)
This equation only describes the components of the judgment. To assess accuracy, these components would need to be compared to the same component score on the criterion (which would be obtained by replacing “judgment” with “criterion” in the equation; everything else remains the same but would refer to target behavior or trait rather than perceiver judgment). For those of you interested in seeing Cronbach’s components broken out into all their gory detail, Kenny (1994, pp. 117–121) provides a relatively clear concrete example involving three perceivers, three targets, and three traits.
Kenny’s Social Relations Model
Kenny (e.g., Kenny, 1994; Kenny & Albright, 1987; Kenny & DePaulo, 1993; Kenny & LaVoie, 1984) has been a prolific researcher in the areas of accuracy and agreement, using a componential model that is related to, but different from, that proposed by Cronbach (1955; see, e.g., Kenny, 1994; Kenny & Albright, 1987, for detailed discussions of similarities and differences). Kenny’s social relations model (SRM) partitions social judgment into four factors: A constant (elevation), a perceiver effect, a target effect, and a perceiver × target interaction or uniqueness effect (plus error, which refers to random error in measurement of a judgment).
(p.198) The SRM differs from Cronbach’s components in several important ways. First, it is intended to be a broad and general model for assessing many different aspects of social perception, of which accuracy is only one. Second, research using SRM typically focuses on perceptions regarding one trait at a time, rather than the multiplicity of traits addressed by Cronbach’s components. Third, however, it also typically focuses on several perceivers, rather than the one perceiver (at a time) that was the focus of Cronbach’s analysis. Thus, SRM research might perform one analysis to find out how accurately Dave, Charles, and Bella perceive each other’s intelligence and another to find out how accurately they perceive each other’s friendliness. Whereas Cronbach partitioned the judgment into target, trait, and target × trait components, Kenny partitioned judgment into target, perceiver, and target × perceiver components.
Elevation accuracy. Is there a general tendency for people to see others as better or worse, overall, than they really are? SRM starts with an elevation score (constant) that is similar to Cronbach’s. It represents all perceivers’ average ratings of all targets on the trait. It can be thought of as a grand, overall mean rating, averaging over all perceivers and all targets.
Elevation accuracy refers to the extent to which this average rating corresponds to the average score of all targets on the criterion. Thus, there is only a single elevation accuracy score for any particular group of perceivers and targets. Furthermore, it can only be obtained if the judgment and criterion are measured in the same units (e.g., if both are on a 1 to 7 scale, it can be assessed; if one is on a 1 to 7 scale and the other a 1 to 100 scale, it cannot be assessed).
For example, consider a hypothetical group of little leaguers asked to predict each other’s batting averages. Overall (averaging across all predictions for every person in this group), they predict all other kids will bat .290. By the end of the year, however, the kids only bat .270. Therefore, .20 (.290 – .270) would be their elevation accuracy score (i.e., they overestimate their ability to hit).3
Perceiver accuracy. How well does a perceiver’s overall ratings of all targets correspond to what those targets (on average) are actually like when interacting with that perceiver? The perceiver effect, in SRM, refers to each perceiver’s average rating of all targets (after subtracting out the elevation score). Perceiver accuracy refers to how well each perceiver’s overall ratings (i.e., averaging over all targets) corresponds to the targets’ overall average on the criterion when interacting with that perceiver. There will, therefore, be one perceiver accuracy score for each perceiver.
In the little league example, consider Lillian, who is a good pitcher and knows it.4 Even though the rest of league bats .270, they only bat .220 against her. She believes, however, that their average against her is .230. Does this mean that she underestimates just how good she is? Not necessarily. Remember that in this example, these little leaguers have a general tendency to predict that the other kids hit better than they really do (see the elevation example). Lillian believes the other kids are .060 worse against her than they are in general (.290 – .230 = .060), which actually underestimates how well the other kids hit against her (on the criterion, .270 – .220 = .050)—the kids only hit .050 worse against her, not .060 worse.
Whether the perceiver accuracy score is conceptually meaningful or a methodological nuisance, however, is often unclear. It could represent a genuine tendency on the part of the perceiver to consistently over- or underestimate targets. This would seem to be the case in the little league example, but that is not clear. This is because the perceiver accuracy score may (p.199) also reflect response bias—the perceiver may merely tend to use the judgment scale differently than do other perceivers.
For example, Lillian may not believe that, overall, the average player hits .290 or .270. Perhaps she thinks the average player only hits .230. If this was the case, then her overall prediction of .230 would mean that she sees herself as only an average pitcher. Thus, the precise meaning of her prediction that other kids hit .230 against her is unclear. It may mean that she underestimates how well other kids hit against her, but it may also reflect differences between how Lillian and the other kids interpret and use the batting average scale.
Generalized (target) accuracy. How accurately, on average, is a particular target viewed by others? The target effect refers to each target’s mean rating averaged over all perceivers. Kenny (1994) refers to the correspondence of this mean rating with the target’s overall average on the criterion as generalized accuracy (I would prefer a label such as “generalized target accuracy” because it reflects how accurately, overall, a target is viewed by a group of perceivers, but I am following Kenny’s terminology here). Thus, there is a generalized accuracy score for each target (e.g., if there are three targets, there are three generalized accuracy scores).
Let’s say, on average, Lillian’s teammates believe she is a great hitter and predict that her batting average is .400, when, in fact, although she is good, she is not quite that good and her batting average is actually .350. Her teammates overestimate her batting skill by .050. This is almost, but not quite, her generalized accuracy score. That is because we have not yet subtracted out the elevation component. Remember, these kids overrate everyone by .20. If they overrated Lillian by .20, they would simply be viewing the difference between her and the other kids as dead-on accurate. Thus, the .20 elevation effect needs to be subtracted out. Lillian’s generalized accuracy score would be .30, not .50, which would still mean the kids overestimate her hitting ability (compared to other kids), but not by quite as much as it first seemed.
Dyadic accuracy. How accurately does the perceiver view that target’s behavior with that particular perceiver? Kenny (1994) refers to differences in how a target actually behaves (reality) with a particular perceiver (as compared to with other perceivers) and differences in how a perceiver judges (perception) a particular target (as compared to other targets) as uniqueness or relationship effects. Such effects reflect unique characteristics of the relationship of this particular perceiver with this particular target. The more closely these two relationship effects (perception and reality) correspond to one another (i.e., the more highly correlated they are), the higher the dyadic accuracy (accuracy within that particular pair, or “dyad”). Because the math begins to become laborious when computing this effect (see Kenny, 1994, for a clear and complete concrete example), I present a simplified conceptual example below.
Lillian’s team is ahead 5–4, with two out in the bottom of the sixth inning (little league games only go six innings), but the opposing team has runners on second and third. The current pitcher, Lillian’s teammate Joe, is obviously tired, and the opposing team’s best hitter, George, is coming up to bat. The coach brings in Lillian to pitch to George. Lillian could walk George, which would bring up the other team’s second best hitter. However, Lillian remembers that, even though George has already hit two homeruns this game, almost every time in the past when Lillian has pitched to George, she has gotten him to either strike out or pop out by pitching high, slow balls to him. Although George’s overall batting average is .500, he has gotten only 1 hit in 10 at bats (.100) against Lillian.5
(p.200) Lillian never heard of Kenny’s SRM and does not go through anything remotely resembling the hairy componential computations required to estimate dyadic accuracy. Nonetheless, her understanding that, although George is generally a very good hitter, he is not very good against her, is, in essence, dyadic accuracy. So she decides not to walk him and pitches a slow, high ball, which George flails at and pops up behind home plate. The catcher makes the catch, the game is over, and Lillian’s team wins a tough one—all because of dyadic accuracy.
Kenny’s components as ANOVA. For the statistically inclined, it may help to point out that, like Cronbach’s components, Kenny’s components, which also often appear highly complex, can be simplified into a two-way analysis of variance (Kenny, 1994). Whereas Cronbach’s ANOVA factors are trait and target, Kenny’s are perceiver and target:
Judgment = Constant (elevation) + Perceiver main effect (perceiver effect) + Target main effect (target effect) + Perceiver × Target interaction (relationship effect)
To assess accuracy, each component would be compared to the parallel effect on the criterion (see Kenny, 1994, pp. 129–134, for a detailed presentation).
Judd and Park’s Full Accuracy Design for Research on Stereotypes
Judd and Park (1993) developed the first componential model focusing on explaining sources of accuracy and inaccuracy in social stereotypes. They identified four main components of judgments regarding groups: elevation, perceiver group, target group, and attributes. Because Judd and Park’s (1993) componential model is not identical to those of Cronbach (1955) and Kenny (1994), I discuss them here. Because they are so similar, however, and because they are even more mathematically complex than Cronbach’s or Kenny’s approaches, I present only a brief simplification of their main ideas.
Do people, in general, over- or underestimate others’ attributes? Elevation accuracy, as in the other componential models, is the overall difference between judgment and criterion, averaging over all perceivers, targets, and attributes. Because it involves averaging over all attributes, this component does not have much substantive meaning. In a study where people evaluate several groups’ intelligence, competitiveness, and social skill, the elevation component merely indicates that adding together all people’s ratings of all target groups and all three attributes produces a higher or lower number than obtained when adding together all target groups’ scores on all three attributes on the criterion.
Does one group tend to over- or underestimate others more than everyone else? The perceiver group effect is an overall tendency for one group of perceivers to over- or underestimate all the attributes (added together) in other groups (i.e., beyond the elevation effect).
Are all the attributes (added together) of one target group consistently over- or underestimated? The target group effect is an overall tendency for one group of targets to have all their attributes (added together) over- or underestimated (beyond the elevation effect).
Do people see other groups (in general) in a stereotypical manner? The attribute effect represents an overall tendency to over- or underestimate the prevalence of a particular type or class of attributes. When attributes are chosen for each of two groups so that attributes that are stereotypic for one group are counterstereotypic for the other, the attribute effect (p.201) becomes a “stereotypicality” effect—the tendency to view groups as more or less stereotypic than they really are. A general tendency to overestimate stereotypical attributes and underestimate counterstereotypical attributes represents a general tendency (across target groups) for the stereotype to exaggerate real differences. A general tendency to underestimate stereotypical attributes and overestimate counterstereotypical attributes represents a general tendency (across target groups) for the stereotype to underestimate real differences.
Like the other componential approaches, Judd and Park’s (1993) full accuracy design was modeled after an analysis of variance—but with three ANOVA factors (perceiver group, target group, attributes, and all two-way and three-way interactions) rather than the two of Cronbach and Kenny’s SRM. Although a full discussion of those factors is beyond the scope of this chapter, the three-way combination is particularly important to the study of stereotype inaccuracy because it tests for in-group bias. The Subject group × Target group × Attribute factor tests whether stereotype exaggeration or underestimation (the attribute effect) is more likely to occur when (1) people from Group A judge people from Group B and when people from Group B judge people from Group A than when (2) people from Group A judge people from Group A and people from Group B judge people from Group B. If, for example, stereotype exaggeration only occurs when people judge groups other than their own, one would have an in-group bias effect.
“Must” Components Be Assessed in All Accuracy Research?
Ever since Cronbach’s (1955) review, researchers have been prone to emphasize the importance of assessing components, sometimes going as far as to claim or imply that components must be assessed in order to address accuracy questions. Even the most avid component proponents, however, agree that obtaining the type of data necessary to do their recommended componential analysis is often extremely difficult. Although many types of research can be justifiably characterized as “difficult,” the componential approaches raised the difficulty bar to a new level, which helps explain why Cronbach’s review helped discourage researchers from addressing accuracy questions at all.
So, “must” all accuracy research assess components? Well, the answer depends on what this question means. If it means, “Must all accuracy researchers understand existing componential approaches in order to have better insights into the meaning of the results obtained from studies that do not explicitly assess components?” then my answer is “yes.” It certainly behooves all of us interested in accuracy research to have more, rather than fewer, insights into the potential sources of social perception and the processes leading to accurate or inaccurate judgments and, especially, of the limitations and potential artifacts that influence whatever index of accuracy we do use.
If, however, the question means, “Must all accuracy researchers perform componential analyses because otherwise their research will be completely meaningless or uninterpretable?” then my answer is an emphatic “no!” Here’s why.
Process versus accuracy, one more time. First, componential approaches provide one class of explanations for how a person arrived at an accurate or inaccurate judgment. Indeed, Cronbach (1955) titled his article “Processes Affecting Scores on ‘Understanding of Others’ and ‘Assumed Similarity’” (emphasis mine). Why? Because components provide information about the processes of judgment. And they do a good job of it.
(p.202) How do I arrive at my prediction that Mike will hit a grand slam? Do I always predict that he will hit homeruns, or am I a particularly astute judge of Mike’s hitting? How do I arrive at my judgment that Bertha is extroverted? Does everyone say she is extroverted? Or perhaps I always say everyone is extroverted. And why do I think African Americans are less likely to complete high school than they really are? Do I underestimate every group’s likelihood of completing high school? Does everyone, including African Americans, underestimate African Americans’ likelihood of completing high school? Or am I ethnocentric, underestimating only African Americans’ success, and not my own group’s success, at completing high school?
Answers to these types of questions address the processes by which people arrive at accurate or inaccurate social judgments. So, components give valuable insights into process.
But process is irrelevant with respect to establishing the degree of (in)accuracy of some perception. If I say, “Mike is going to hit a homerun” and he does, this particular prediction is right. End of discussion regarding my degree of accuracy.
With respect to understanding how I arrived at that prediction, it would be valuable to estimate my elevation, stereotype accuracy, differential elevation, and differential accuracy (if you like Cronbach), or, if you prefer SRM, my elevation, my perceiver effect, Mike’s target effect, and our interaction effect. But if you want to determine whether my prediction is accurate, the only thing we need to do is figure out whether he hit the ball over the outfield fence, in fair territory.
If Vlad believes that Armenians are public parasites burdening the financial community with their constant need for charity, and Armenians actually make fewer demands on public charity than other groups (LaPiere, 1936), then Vlad overestimates the financial burden created by Armenians. Again, period, the end—at least “period, the end” with respect to establishing the inaccuracy of Vlad’s belief.
Interpreting that inaccuracy is another matter. As Ryan, Park, and Judd (1996) have pointed out, in the absence of their full accuracy design (a research design that permits assessments of all the components in their model), we cannot conclude, as did LaPiere (1936), that this means that Vlad is necessarily a raging anti-Armenian bigot. Perhaps Vlad overestimates every group’s, including his own group’s, need for charity. In that case, Vlad is not ethnocentric at all. People with nasty beliefs about all groups, including their own, may be, well, nasty people. But if their beliefs are equally nasty about all groups, including their own, they are not ethnocentric. So, Judd and Park’s (1993) full accuracy design would be extremely useful for providing some insight into why Vlad overestimates Armenians’ request for charity. But it is completely irrelevant with respect to establishing whether Vlad overestimates their requests for charity. That question can only be answered by comparing Vlad’s estimate of their need for charity to some criteria.
So, establishing (in)accuracy is a very different animal than explaining (in)accuracy. Establishing (in)accuracy merely involves comparing the perception (judgment, prediction, expectation, etc.) to the criteria. The more closely the perception corresponds with the criteria, the more accurate the perception.
No reification of components! I think it is extremely tempting to reify componential approaches to accuracy. First, they are presented with a sort of heavy-handed statistical rigor that gives them a veneer of being more scientific than the rest of us statistically backward folks could ever aspire to. Second, they really do capture important, fundamental aspects of (p.203) social perceptual judgment processes. Third, they successfully identify sources of bias or noise in judgments that few of us usually mean by accuracy. Thus, it is very tempting to view components as concrete fixtures on the social perceptual landscape. If they are there, then we should have to assess them, shouldn’t we?
Such absolutist positions regarding components (“Cronbach’s [or SRM] components must always be assessed” or “Accuracy can only be viewed componentially,” etc.) are, in my view, unjustified for several reasons. First, there is no one right way to divide up components of social perception. This should be clear from my brief review of Cronbach’s, Kenny’s, and Judd and Park’s componential approaches. They have important similarities, but, obviously, there are also differences between all three. Such differences are made salient when the three approaches appear side-by-side, as they do in Table 12–1. If there was any single “right” set of components that “must” be examined, and if components were actually hard and fast fixtures in the social perception landscape, there could not possibly be three different breakdowns of components, unless one breakdown is “right” and the other two are “wrong” or unless each was woefully incomplete.
Table 12–1 Componential Approaches to Social Judgment
Cronbach’s (1955) components:
Judgment of a person’s trait =
Constant (elevation) + Target main effect (differential elevation) +
Trait main effect (stereotype accuracy) + Target × Trait interaction (differential accuracy)
Social Relations Model Components (e.g., Kenny, 1994):
Judgment of a person’s trait =
Constant (elevation) + Perceiver main effect (perceiver effect) + Target main effect (target effect)
+ Perceiver × Target interaction (relationship effect)
Judd & Park’s (1993) Components for Research on Stereotype Accuracy:
Judgment of a group’s traits =
Constant + Perceiver group effect (pge) + Target group effect (tge) +
Attribute (stereotypic vs. counterstereotypic) effect (ae) + (pge × tge) + (pge × ae) + (tge × ae) + (pge × tge × ae)
Hypothetical componential approach combining Cronbach’s, Kenny’s, and Judd & Park’s approaches:
Judgment of a person’s trait =
Constant + Perceiver group effect + Target group effect + Attribute (stereotypic vs. counterstereotypic) main effect + Target main effect + Perceiver main effect + Individual trait effect + 15 Two-way interactions +
All 20 three-way interactions + All 11 four-way interactions +
All 4 five-way interactions + The six-way interactiona
aThis model assumes all main effect terms are independent, which may not be the case. For example, the target group main effect may not be independent of the target main effect and the attribute main effect may not be independent of the individual trait effect. In such a situation, there might be fewer interactions than displayed in this model. There would, however, still be literally dozens of such interactions. No researcher has ever advocated this model, including me.
(p.204) If all were partially right but incomplete in that they failed to address components identified by other researchers, then a full componential model would need to assess all the components identified by all models. Such a model is presented at the bottom of Table 12–1. If components are “real” and “must” be assessed, then the only complete way to do it would be to assess the more than 50 components identified in this model. Such a model has never been recommended even by advocates of componential approaches and is not being recommended here. Indeed, it is so extreme as to border on absurd. But such an absurd model might be required if all components “must” be assessed.
The situation, however, is far more complex than even this hypothetical combined componential model suggests. There is a potentially infinite number of ways in which social perception could be broken down into components (see also Kruglanski, 1989). Attributes could be further broken down into a variety of types or subclasses (e.g., positive vs. negative; explanations vs. descriptions vs. predictions; behaviors vs. traits; and so on). Similarly, both perceiver and target groups could be broken down, not only by in-group and out-group, but by any of the infinite ways of identifying groups (culture, demographic characteristics, memberships in organizations, professional expertise, etc.).
This is not meant to suggest, however, that existing componential approaches are purely subjective and arbitrary and, therefore, can be ignored. But the choice of components will depend entirely on the types of processes one would be interested in studying and the types of response biases one would like to assess or eliminate. Different componential breakdowns serve different purposes and provide insights into different aspects and processes of social perception. Thus, understanding existing componential approaches would seem crucial to anyone studying accuracy to gain insights into how best to interpret their own or anyone else’s data addressing the degree of (in)accuracy in social perception.
Componential models may be most important when the criteria are self-reports and self-perceptions. Although Cronbach’s componential approach never generated much empirical research, Kenny’s and Judd and Park’s have, and much of that research has used target individuals’ or target groups’ self-perceptions as criteria (e.g., Kenny, 1994; Ryan, 1996). I am reluctant to use absolutes (e.g., “all” research on accuracy must be based on components), but I come awfully close when the criteria are self-perceptions, especially self-perceptions regarding traits, attitudes, or dispositions, rather than behaviors or other objective characteristics.
Self-perceptions of traits, for example, typically have no objective referent. How extroverted is someone who rates him- or herself “5” on a 1 to 7 scale with endpoints labeled “not at all” and “very”? It is hard to say because each choice is subjective, in that each rater imputes his or her own meaning to each scale point (e.g., Biernat, 1995). Such differences in subjective meanings cloud the assessment of accuracy. Componential approaches, however, are particularly good at identifying differences in subjective meaning and removing them from estimates of accuracy (this is often captured by the various elevation components). Thus, it is probably a good idea to use a componential approach, if possible, almost any time one uses self-perceptions as criteria.
Noncomponential Approaches to Assessing Accuracy
It has just been argued that componential processes are not necessary for the assessment of accuracy. This argument, however, does not rest solely on a critical evaluation of the claim (p.205) that “all accuracy research must perform a complex componential analysis.” Instead, much of the best evidence for the idea that componential analyses are not strictly necessary comes from the many noncomponential approaches to the study of accuracy that have made important and enduring contributions to understanding social perception. The next section, therefore, briefly reviews three of the most influential noncomponential approaches.
The term “noncomponential” here is potentially misleading, because it unfortunately implies that one can completely ignore the components issue. Even “noncomponential” approaches can themselves be considered to assess subsets of components in the various componential models, as shall be made explicit in the discussion that follows. Furthermore, componential and noncomponential models are not necessarily mutually exclusive or antagonistic; indeed, one can even take a componential approach to applying ideas from each of the two noncomponential models described below (Kenny, West, Malloy, & Albright, 2006). Nonetheless, I use the term “noncomponential” to refer to approaches that assess accuracy without an explicit and intentional assessment of components. When, from a componential standpoint, such approaches only assess a subset of components in one or more of the componential models, this is pointed out explicitly below. What may be lost by not performing a full componential analysis is also explicitly discussed.
Most noncomponential approaches to assessing accuracy, or processes underlying accurate and inaccurate social perceptions, use Pearson’s correlations to assess the extent to which judgments correspond to criteria. In general, when judgments concern a single attribute, correlations between judgments and criteria capture Cronbach’s (1955, p. 191) differential accuracy correlation, which he described as: “. . . sensitivity to individual differences. . . . These are the only processes included in present measures of social perception which depend on J’s [perceivers’] sensitivity to the particular O [target].”6
The simplest and most typical form of correlation in accuracy research is that between a set of perceivers’ judgments or predictions regarding a single trait or attribute of a set of targets. For example, teachers predict students’ achievement, interviewers may evaluate a set of interviewees, or perceivers may estimate the percentage of people belonging to various demographic groups that complete college. Such correlations automatically remove the elevation and stereotype accuracy components from correspondence between perceivers’ judgments and the criterion. (A brief aside for the statistically inclined: This is because correlations reduce all data to deviations from the mean.) Thus, a simple correlation (between judgment and criterion) goes a long way toward eliminating many of the biases, artifacts, and problems in assessing accuracy first identified by Cronbach.
Of course, the correlation coefficient is not perfect. First, it removes or avoids, but does not directly assess, elevation and stereotype accuracy. Because correlations remove average differences between judgments and criterion, they cannot assess any consistent tendency to over- or underestimate targets (elevation, ala Cronbach). If it was important to assess those components in order to address some research question, one could not use the correlation to do so.
Second, correlations equate the variability of judgments and criterion. Therefore, they cannot assess whether perceivers consistently over- or underestimate target variability.
(p.206) Because mean and variability differences between judgments and criteria probably often reflect response bias and/or scaling discrepancies between perceiver and criterion, these limitations to correlations do not greatly undermine their utility in assessing accuracy. I use the term “scaling discrepancies” here to refer to the idea that people may use scale points in a manner differently than used in the criterion. This would obviously be true if, for example, judgment and criterion are assessed in different metrics (e.g., subjective rating scale and percentages, respectively).
People, however, still might use the numbers in some scale differently than is used for the criterion, even if they are supposedly on the same scale. For example, let’s say Alfred estimates three people’s IQ scores as 40, 50, and 60, when they are really 115, 120, and 125. Although it is possible that Alfred believes all three of these fairly intelligent people are classifiably retarded, it is more likely that Alfred does not fully understand how IQ scores are scaled. He dramatically underestimates people’s IQ in absolute terms, but his estimates are also overly sensitive to actual variations in IQ (Alfred’s judgments go up 10 IQ points for every 5-point increase in actual IQ). But given his subjective IQ scale, the correlation between his judgments and actual IQ would be perfect (1.0), because (1) mean differences in judgment criteria are irrelevant to computation of the correlation, (2) the correlation coefficient is computed after statistically equating the variability in judgment and criterion, and (3) his judgments move in (differently scaled) lockstep with targets’ actual IQ.
Thus, the correlation coefficient would yield a conclusion that Alfred is an excellent judge of people’s intelligence. Is the conclusion justified? As long as you keep in mind that what this really means is that “Alfred is very good at detecting differences in people’s intelligence, but does not tell us anything about either how Alfred uses the IQ scale or about whether he consistently over- or underestimates people’s intelligence,” it is perfectly justified.
Construct Validity and Accuracy
In Chapter 10, I argued that assessing accuracy was much like assessing the validity of many social science constructs. This is important here, because the correlation coefficient is so frequently used to establish the validity of some measure that it is often referred to as the “validity” or “validity coefficient” of some measure (e.g., Campbell & Stanley, 1963; Cook & Campbell, 1979; Dawes, 1979). In much the same manner, correlations can be used to establish the accuracy of social perception. Because establishing accuracy is in many ways so similar to establishing construct validity, I next briefly review some of the main ideas underlying social science approaches to construct validity.
Basic construct validity. An extended discussion of the richness and complexity of establishing validity of some measure, construct, theory, etc., is beyond the scope of this chapter (see, e.g., Campbell & Stanley, 1963; Cook & Campbell, 1979; Cronbach & Meehl, 1955, for such discussions). However, the basic ideas can be summarized by the duck test described in Chapter 11 (if it walks like a duck. . .).
How do we know that people even “have” self-esteem, intelligence, attitudes, personality, etc., if we cannot directly observe them? That is where the issues of constructs and construct validity come in. A construct is, in essence, a mini-theory regarding the existence of some phenomenon. It includes, at minimum, a definition of the phenomenon and some clear (p.207) hypotheses regarding how that phenomenon should manifest. Construct validity refers to ways of demonstrating that some construct really does work as hypothesized and of ruling out alternative explanations.
This has all been very abstract so far. What does it mean to “show that a construct works as hypothesized”? Well, it could mean lots of things in different situations. Consider a construct regarding the existence of a psychological attribute, trait, or characteristic, such as intelligence, self-esteem, depression, ideology, etc. Establishing the validity of some method of measuring that psychological construct (e.g., self-report questionnaire, reaction time task, etc.) often might mean something like assessing the relationship of that measurement method with (1) other people’s agreement about a target’s possession of the attribute being measured, (2) target behavior that should reflect that attribute, and (3) target responses on some sort of standardized test. If all of these observed measures converge reasonably well, then we have probably done a pretty good job of establishing the validity (likely existence, probabilistic reality, etc.) of the underlying attribute and our means of measuring it.
Consider intelligence. Maria’s brilliance might lead at least some people to believe in her brilliance, it might lead her to engage in some highly intellectual activities, and it might lead her to receive high scores on standardized intelligence and achievement tests. Indeed, highly sophisticated statistical methods have existed for some time now for estimating the extent to which underlying, unmeasured attributes predict observed variables that are supposed to reflect those underlying attributes (e.g., Bollen, 1989; Joreskog & Sorbom, 1983).
If, for example, everybody says Maria is really smart, and Maria spends her time reading Einstein’s original works, and she scores at the high end of a test that is supposed to measure intelligence, then we have probably fairly well-established both the utility of the intelligence construct and the fact that she is pretty smart. (Obviously, we would need to do this for more than one person and it would need to work just as well on the low and middle portions of the intelligence spectrum as on the high end, but I hope you get the idea).
Consider extroversion. Let’s say everyone describes Ian as a wild and crazy guy, we find that he spends much of his spare time attending or holding parties, and he scores highly on a personality test assessing extroversion. It sure is beginning to look like extroversion may be a (probabilistically) real attribute, and Ian scores highly on it (as with intelligence, we would need to do this for more than one person and show that this works just as well for people low or average in extroversion).
Thus, intelligence and extroversion are both constructs, but, in the examples above, constructs that have been reasonably well-validated. As such, we now have license to treat them as real—with one big caveat. It is always possible that someday someone will come along with evidence that questions, challenges, or even successfully undermines the validity of either the construct or some previously well-established measure of the construct. Until that time, however, it is reasonable to act as if the construct is about as real as apple pie, and most researchers will treat it that way (if it looks like a duck. . .).7
Accuracy. Construct validity is very nice, but where is the accuracy? Just because scientists can, within the context of probabilistic realism, identify that Maria is smart or that Ian is extroverted through a variety of rigorous scientific methods does not necessarily mean that regular everyday walking around people are necessarily very accurate.
This is true. Establishing accuracy involves establishing correspondence between perception and reality, not just establishing that some attribute (intelligence, extroversion, tennis (p.208) ability, etc.) can be successfully measured. Even statistical beginners, however, should know how to establish “correspondence” between two variables—just correlate them! If my beliefs that Maria is smarter than Ian but Ian is more outgoing than Maria are reasonably accurate, then those beliefs should correlate well (not necessarily perfectly!) with their behavior reflecting intelligence and extroversion, with others’ beliefs about their intelligence and extroversion, and with their scores on tests assessing intelligence and extroversion.
Thus, establishing accuracy in social perception for correlational approaches is highly similar to establishing construct validity. But this means that establishing accuracy is not much more difficult than establishing construct validity! Establishing construct validity is not easy—it is sufficiently complex that whole books have been written about it. Nonetheless, it suffers from none of the controversy, hand wringing, or intellectually imperious dismissiveness regarding supposed conceptual, political, or criterion “problems” one often finds in the literature criticizing accuracy (see Chapters 10 and 11 for reviews of such controversies).
Establishing accuracy, however, is a bit trickier than establishing construct validity for several reasons. First, self-fulfilling prophecy and bias have to be ruled out as explanations for the correspondence between belief and criteria, although this is not easy, nor is it inordinately difficult, either (and Chapter 10 discussed some of the many ways of doing so).
Second, establishing accuracy is a bit trickier than establishing construct validity because social perception, judgment, and expectations are themselves constructs! You cannot feel someone’s judgment or taste their expectations. Thus, all the issues involved in establishing construct validity kick in, not just when measuring targets’ attributes, but when measuring perceivers’ expectations and beliefs about others.
Accuracy, therefore, in this imperfect world where little can be known with certainty and observed measures are only probabilistically related to underlying attributes, is not usually best reflected by correlations between observable measures (e.g., a measure of perceiver expectations and a measure of target extrovertedness). Accuracy will often best be reflected by correlations between the underlying constructs representing the social perception and the criteria (for the statistically inclined, this can often be accomplished either by disattenuating correlations for unreliability or by using LISREL-type models—see, e.g., Bollen, 1989; Carmines & Zeller, 1979).
Thus, assessing relationships between underlying constructs for expectations and criteria will usually yield the best estimate of accuracy. This should not, however, be misinterpreted to mean that all accuracy research must necessarily assess relationships between underlying constructs rather than observed measures. Although doing so will usually provide the best assessment of accuracy, sometimes it may just not be possible. In such cases, more information regarding accuracy and inaccuracy would be provided by assessing correlations between observed measures of underlying attributes, judgments, or expectations, rather than by assessing nothing at all. Such correlations will tend to underestimate accuracy to the extent that the observed measures only imperfectly reflect the underlying attributes or expectations. This, of course, does not constitute any sort of immovable obstacle to or fatal flaw in accuracy research. It simply means that people may be more accurate than indicated by research that only assesses correlations between observed measures of expectations and observed measures of criteria.
This brief delving into the statistical and methodological arcania of construct validity and unmeasured variables was necessary to lay the foundation for understanding three of the (p.209) main noncomponential approaches for assessing both degree of (in)accuracy and processes underlying (in)accuracy in social perception. All three are fundamentally based on the correlation between perception and criteria.
Brunswik’s Lens Model
Brunswik (1952) suggested that accurate perception of reality (both object and social) involves the use of cues probabilistically related to objective reality. He metaphorically called his approach the Lens Model to capture the idea that objective reality is never observed directly. Instead, cues related to objective reality must be observed and interpreted as relevant to some judgment—that is, objective cues are seen through the “lens” of subjective perception and judgment.
This does not necessarily mean that perception is a purely subjective phenomenon unrelated to objective reality. Indeed, one of the main purposes of the Lens Model is to provide a mechanism not only for assessing people’s degree of accuracy but also for understanding sources of both accuracy and inaccuracy in their judgments.
Figure 12–1 presents a simplified but general version of the Lens Model. It captures several main ideas. First, the circled “Psychological Attribute” represents some sort of psychological construct that cannot be directly observed (self-esteem, extroversion, intelligence, etc.). The Cues, shown in the middle of the figure, are directly observable or measurable phenomena. The arrows pointing from the Psychological Attribute to the Cues are labeled “Validity,” because they represent the extent to which the underlying attribute manifests itself in the observable Cues.
The rightmost circle represents perceivers’ judgments (or perceptions). The arrows going from the Cues to Judgments represent the extent to which the observable cues influence perceivers’ judgments (labeled “Cue Utilization”).
(p.210) Thus, the Lens Model characterizes social perception as a two-step process: (1) observable manifestation of psychological attributes and (2) perceiver use of observable cues to arrive at judgments. Accuracy, therefore, is captured by the correlation between the psychological attribute and the judgments (the long, double-headed arrow in Figure 12–1). Correlations assess how well the judgments correspond to the attributes—i.e., accuracy.
The Lens Model is a noncomponential, correlational model for assessing both degree of accuracy and processes of social perception. Identifying cue validity and cue utilization focuses on a very different set of the processes than is typically the focus in componential models. As such, it provides different (not better or worse) types of insights into processes of social perception than do componential models.
The Realistic Accuracy Model
Funder’s (1995) Realistic Accuracy Model (RAM) draws on essentially the same set of fundamental assumptions I described under “probabilistic realism” in Chapter 11 (indeed, Funder’s perspective inspired much of that section) to create a model that could be viewed as an extension and elaboration of Brunswik’s Lens Model. Some of the main ideas of RAM are depicted in Figure 12–2.
As with the Lens Model, overall accuracy is typically assessed by the correlation of the underlying attribute with the perceiver’s judgment (represented by the large, curved, double arrow on the bottom of Figure 12–2). Four steps needed for perceivers to arrive at an accurate judgment are displayed in between the underlying attribute and the judgment.
First, the underlying attribute needs to create some sort of observable evidence relevant to that attribute (the cues, in the Lens Model). Dishonesty, for example, is not likely to be displayed in a large lecture hall (except perhaps during test time). Interest in the class is more likely than honesty to be displayed, for example, through high attendance levels, keeping up with class assignments, and/or class participation.
Second, the evidence needs to be at least hypothetically available to the perceiver. Whether or not a student has completed all assigned readings may rarely be directly observable to the class’s teacher. Attendance and participation, however, would be considerably more available.
(p.211) Third, the perceiver has to detect that evidence. In a large lecture hall, which students participate is pretty obvious. However, detecting precisely which students do and do not attend regularly, out of a swarming mass of hundreds of students in the lecture hall, is considerably more difficult (unless attendance is somehow recorded).
Fourth, the perceiver has to actually use the detected evidence/cues for arriving at a judgment. If lecture hall teachers can’t remember which students regularly participate, it would be pretty difficult to use participation as a basis for judgment regarding interest in the class.
Although this is the heart of Funder’s (1995) RAM, applying the model might be considerably more complex. People have lots of underlying characteristics. Funder (1995) focused primarily on personality traits, but RAM seems applicable to all sorts of unobservable personal characteristics, including, for example, emotions, attitudes, motivation, etc. Furthermore, one attribute may create many cues (as is most obviously captured in the Lens Model), and some cues may reflect multiple attributes. Thus, one type of complexity involves the sheer number of possible interrelationships among attributes, cues, and judgments.
Like several of the componential models, RAM also considers how the perceiver, target, attribute, and evidence relate to accuracy (Funder  referred to these as Judge, Target, Trait, and Information, respectively, but I am sticking with my terminology here). But this is not to identify components. Instead, the purpose is to analytically identify how specific combinations of perceiver, target, attribute, and evidence might combine to influence accuracy.
For example, some perceivers may be particularly good (poor) at evaluating certain types of traits (e.g., clinical psychologists might be better than most of the rest of us at evaluating others’ mental health). Some perceivers might be particularly good (poor) at judging particular targets (e.g., close friends might be better judges of each other than are strangers). Some perceivers might be especially good (poor) at using or obtaining certain types of evidence (e.g., some people may be better than others at picking up on targets’ emotion-revealing nonverbal cues). Funder’s (1995) article goes into considerable length regarding how the various combinations of perceiver, target, attributes, and evidence may combine to influence accuracy.
Like the Lens Model, RAM assumes that relationships between underlying attributes, cues, and judgments are probabilistic. Like the Lens Model, overall accuracy is typically assessed via correlations, although discrepancy scores (between judgment and criterion) can be used, too (see Funder, 1987, 1995). RAM is particularly good at explaining why accuracy in person perception may often not be all that high. For the judgment to closely correspond to the criterion, that criterion needs to clearly manifest itself in ways that could be, and in fact are, detected by the perceiver, and then the perceiver needs to use that detected information (as well as not use information that is not relevant to the judgment). A breakdown at any step will dramatically undermine accuracy. Furthermore, by focusing on combinations of perceiver, target, attributes, and evidence, RAM is also particularly good at highlighting processes that may enhance or undermine accuracy.
Dawes’ (1979) Improper Linear Models
Dawes (1979) made a very interesting discovery. In reviewing his own and others’ research on decision making, he discovered that (1) people tend to be very good at identifying the (p.212) evidence or cues that are relevant to making some prediction, but (2) they are not very good at combining or integrating those cues. Thus, their overall predictive accuracy tends to be quite low. Note, however, that this is not because people are completely out to lunch (biased, error-prone, etc.). They are good at one part of the prediction task (identifying criteria for making a prediction) but poor at another part (putting those criteria together).
Consider admissions to graduate school in psychology. The criteria typically used for making admissions decisions seem appropriate: GRE scores (general intellectual ability), GPA (achievement at academic tasks over an extended period), and letters of recommendation (what experts in the field who are highly familiar with the applicant have to say about him or her). Nonetheless, Dawes (1979) found that the correlation of graduate admissions committee evaluations with later success in graduate school is typically quite low (.19).
If people are completely out to lunch, then they would not even use appropriate criteria—that is, the criteria they do use would not predict success in graduate school. However, if they are good at identifying the appropriate criteria but use them poorly, then the raw criteria themselves should do a much better job at predicting graduate success. This was indeed the case—the overall (multiple) correlation of the criteria themselves with graduate success was about .4.
What to do? It is unreasonable to expect admissions committees to compute complex statistical formulas in their heads or to create a formal statistical score for each applicant. Dawes provided an elegantly simple and even amusing answer. Identify the criteria, weight them all equally, and add. For example, GRE, GPA, and letters of recommendation might each be transformed onto a 1 to 10 scale.8 Priscilla, with good GREs, a high GPA, and excellent letters of recommendation, might receive weights of 7, 9, and 9, respectively, for a total score of 25. George, with high GREs, a good GPA, and good letters, might receive scores of 9, 7, and 7, for a total of 23. Priscilla would be ranked more highly than George.
This is different from a formal statistical model primarily because the weights for each predictor have been chosen in a less than optimal manner (many statistical prediction techniques, such as regression, identify how to weight the criteria in such a manner as to maximize their overall predictive validity). But here is the second amazingly elegant aspect of Dawes’ analysis: Equal-weight, easy-to-compute, improper linear models predict outcomes nearly as well as do formal statistical models! In the graduate admissions example, Dawes’ improper linear model correlated .38 with future success in graduate school. Dawes (1979) went on to show that a simple, improper linear model performed similarly well in predicting all sorts of outcomes, including choice of bullet type for a police department and a bank’s predictions regarding companies likely to go bankrupt.9
Dawes’ improper linear model is fundamentally different than the Lens Model and RAM. The Lens Model and RAM were specifically designed to assess degree of accuracy and processes underlying social perception. That is, they were meant to describe aspects of the social perception process. In contrast, Dawes’ model is primarily prescriptive (it suggests how people should go about making decisions and arriving at predictions).
Nonetheless, I have included it here for two reasons. First, Dawes’ (1979) conclusion that people are good at selecting criteria but not good at using them is descriptive. In RAM terms, it suggests that people often are good at detecting available and relevant cues but that they often do not utilize them well (in Lens Model terms, their cue utilization would be poor). Second, although Dawes did document that people were, on their own, not very good (p.213) at arriving at accurate predictions, he also showed that the accuracy of their predictions could easily be improved. Identify the criteria, weight them equally, and then add!
Noncomponential Models: Final Comments
Correlational approaches to accuracy, including but not restricted to the Lens Model and RAM, do not oppose or contradict componential approaches. Indeed, it is quite possible to perform a Lens Model or RAM analysis via components, if one felt that would be useful or important (Kenny et al., 2006). Furthermore, simple correlations between criterion and judgment go a long way toward eliminating many of the often irrelevant artifacts and biases identified by Cronbach (1955) and Kenny (1994). Nonetheless, my point for presenting them here has not been to argue that they refute componential approaches. My point, instead, has only been to demonstrate that some very sophisticated and successful noncomponential models and approaches to accuracy have been developed. One will find no mention of components in Brunswik (1952) or Funder (1995), or in many other influential articles and books on accuracy (e.g., Ickes, 1997; Jussim, 1991; McCauley, Stitt, & Segal, 1980; Swim, 1994). Components are interesting and important, but claims that one must always assess components when studying accuracy are not justified.
I have just spent three chapters discussing accuracy but have described very little empirical research assessing people’s accuracy. How accurate are interpersonal expectations? Not answered (yet). Do teachers’ expectations predict student achievement primarily because of self-fulfilling prophecy or accuracy? Not answered yet. How accurate are people’s beliefs about demographic groups? Despite three chapters on accuracy, I have still not reviewed research assessing the actual (in)accuracy of social stereotypes.
Why not? For several reasons. First, assessing accuracy is a genuinely complex undertaking and is also theoretically and politically controversial. Therefore, I felt it was necessary to explore some of those complexities and controversies before describing the relevant research findings. Second, in my opinion, those complexities, although real, have often been characterized as “problems” or “difficulties,” and once characterized as such, have led many social scientists to despair at the prospect that accuracy can even be assessed (or to denigrate the value of attempting to do so).
Chapter 10 was necessary to review and critically evaluate many of the historical reasons for the demise of accuracy research in social psychology. In many cases, it contested the viability of many of the common criticisms of accuracy research, concluding that such criticisms were themselves often more flawed than accuracy research itself. In other cases, Chapter 10 concluded that even the most valid of those criticisms only warranted care and caution in interpreting accuracy research, rather than a wholesale dismissal of the entire accuracy endeavor.
Chapter 11 explored the crucial issue of identifying criteria for establishing accuracy. This chapter was necessary to (1) demonstrate that, although many social cognitive process–oriented researchers have suggested that identifying criteria for establishing accuracy is so difficult as to cast a significant cloud over the viability of accuracy research (e.g., Jones, 1985; (p.214) Fiske, 1998; Stangor, 1995), the logic of establishing accuracy of social perception overlaps almost completely with the (minimally controversial) logic of establishing construct validity in the social sciences; and (2) identify myriad useful potential criteria for assessing the accuracy of social judgments.
The purposes of the current chapter have been to (1) present a simplified review of the ideas underlying the three main componential approaches to accuracy; (2) argue that, although componential approaches are valuable and important, not all research on accuracy must necessarily assess components; and (3) review some of the major noncomponential approaches to assessing accuracy and social perception processes.
Thus, Chapters 10, 11, and12 were necessary to convey the scientific foundation on which accuracy research rests. There is, however, a third reason I have not yet reviewed much accuracy research. The best way to draw general conclusions about the relative roles of accuracy, self-fulfilling prophecy, and bias in interpersonal expectations is to perform research that assesses at least two of these three expectancy phenomena and, preferably, all three. Such research is in a much better position to place evidence regarding the power and pervasiveness of each phenomenon in context. Research that only assesses one phenomenon at a time, such as most of the research on self-fulfilling prophecies reviewed in Chapter 4 and the research on bias reviewed in Chapter 5, cannot possibly provide direct information about the relative roles of accuracy, self-fulfilling prophecy, and bias in interpersonal expectations.
Indeed, research focusing on only one phenomenon at a time is potentially extremely misleading. Consider 10 hypothetical studies all finding statistically significant evidence of perceptual bias. When considered together, but in isolation from research on accuracy, they might seem to support a conclusion emphasizing the almost universal presence of bias. After all, all 10 studies found evidence of bias. It sure looks like people are biased almost all the time. And from this conclusion, only a tiny step further gets you to the conclusion that bias pervades social perception. (Although 10 studies demonstrating accuracy here would be just as potentially problematic, the 10 studies demonstrating bias situation more closely mirrors the reality of social psychological research and conclusions regarding bias—see Chapters 5, 9, and10).
But such a conclusion could not be justified by research that only examined bias. Research that does not test for accuracy cannot possibly provide any evidence of low or high accuracy. Thus, 10 studies demonstrating bias could not rule out the possibility that people tend to be far more accurate than biased in their social judgments. Combine the tendency to overinterpret significant evidence of bias with insufficient consideration of the effect size associated with bias (compare, e.g., the conclusions regarding the power and pervasiveness of both bias and self-fulfilling prophecy highlighted in Chapters 4 and 5 with the evidence regarding self-fulfilling prophecy and bias summarized in Table 6–1 in Chapter 6) with the almost complete exile of accuracy research from social psychology from 1955 to 1985, and it becomes easy to understand much of the source of social psychological emphasis on bias and self-fulfilling prophecy.
For the same reason, therefore, I will not present a chapter devoted to research that exclusively focuses on accuracy. I have no objection in principle to research focusing exclusively on accuracy (or on bias or on self-fulfilling prophecy). However, all three phenomena (bias, self-fulfilling prophecy, and accuracy) characterize interpersonal expectations. Furthermore, social psychology’s history of emphasizing self-fulfilling prophecy and bias relative to accuracy seems to result more from researchers’ emphasis on studying those phenomena than (p.215) from actual empirical results demonstrating that self-fulfilling prophecy and bias are high relative to accuracy. On top of that, lines of research that only focus on one of these phenomena at a time can even be viewed not as “talking to each other” but as “talking past each other” (e.g., social psychological studies and reviews of interpersonal expectations have almost never cited educational research demonstrating high accuracy in teacher expectations). Therefore, in this book, I bring these separate lines of research together in order to provide a fuller, more balanced, and more valid perspective regarding the extent to which expectations create versus reflect social reality.
Just as research focusing exclusively on biases or errors implicitly conveys the idea that social perception is dominated by flaws, a section focusing exclusively on accuracy might implicitly convey the idea that social perception is nothing but accurate (a view that is also not supported by the evidence). Instead, the next chapter is titled “Accuracy and the Quest for the Powerful Self-Fulfilling Prophecy” in order to present a much more even-handed vision of the extent to which expectations are accurate versus create self-fulfilling prophecies or biases. Doing so, however, requires discussing these phenomena together, rather than in isolation from one another.
(1.) For those of you with some (but not a lot of) familiarity with baseball, having a player on second or third base is referred to as “runners in scoring position,” because a single will usually drive in a run. This includes having a runner on second, having a runner on third, having a runner on second and third, having runners at first and second or at first and third, and having the bases loaded. For those of you completely unfamiliar with baseball, this will not help at all.
(2.) Cronbach’s components actually involve comparisons of ratings, not rankings. However, once you start removing components, what’s left is not really a rating, either. It is a sort of “rating adjusted for removal of other components.” Rankings capture most of what’s important from such “adjusted ratings.” That is, the main accuracy question is something like: “Does the rank order of adjusted judgments correspond to the rank order of adjusted criteria?” Therefore, I also refer to rankings when discussing stereotype accuracy and differential accuracy.
(3.) If you do not know anything about baseball, with a little help, this example should still be pretty easy to follow. “Hits” are good, at least if you are the hitter. Throughout a game, each player usually has several opportunities to try to get a hit. For each player, one can compute a batting average, which is simply the proportion of times that player has succeeded in getting a hit. For example, a player who had 3 hits in 10 chances would have an average of .300 (3/10).
(4.) For those of you unfamiliar with baseball: The pitcher is on the team opposing the batter. The pitcher’s job is to get the batter “out”—that is, not allow a hit.
(5.) For those of you uninitiated in baseball, here are the key elements of this situation. If George gets a hit, the runners on second and third will likely score, and his team will win the game 6–5. But if Lillian can get George “out,” the game will be over and Lillian’s team will win.
(6.) Cronbach was referring to, in addition to differential accuracy correlation, differential elevation correlation in this quote, which refers to the correlation between the perceivers’ judgment of a target averaging over all attributes and the targets’ average score on the criteria representing those attributes. This, however, is irrelevant when there is only a single attribute being judged.
(7.) In their practice of inferring the existence of unobserved phenomena from their theoretically predicted observed effects, social scientists are in some pretty good company. Physicists have never seen a neutron, proton, electron, or quark; cosmologists have not witnessed the big bang; and Darwin never witnessed speciation. Like psychological attributes, the existence of these unobserved phenomena is inferred from their effects.
(8.) For those of you with an even introductory familiarity with statistics, there is an even better alternative: Standardize all predictors, and then add. This weights all predictors equally by virtue of not only putting all variables on the same scale but also equating their variances. I thank Gretchen Chapman, Rutgers’ resident expert on decision making, for this suggestion.
Such procedures may work, in part, because they reduce reliance on salient, easy-to-use criteria. In college and graduate admissions, for example, I suspect that most programs rely very heavily on standardized test scores, such as the SATs, GREs, MCATs, LSATs, etc., not because someone has made the decision that these should be the main criteria (indeed, it is probably easier to find righteous proclamations denying that admission is based primarily on these criteria than to find defenses of the appropriateness of doing so), but because using such numbers is so easy (as compared to, e.g., letters of recommendations, personal statements, and even GPA). A Dawesian simple, improper linear model that includes standardized test scores as only one criterion among many goes a long way toward eliminating any tendency to overemphasize such scores in making admissions decisions.
(9.) Dawes’ results were actually even more amazing, at least for the statistically inclined. The equal-weight improper linear model generally (i.e., not just in the graduate admissions example) predicted outcomes better than the split-sample cross-validated regression weights.