(p.123) Appendix I The General Inquirer (GI)
(p.123) Appendix I The General Inquirer (GI)
The version of the GI I use in this book maps every word in a text to tabulate its content according to 182 predetermined categories of 11,790 words designated by the Harvard IV-4 psychosociological and Lasswell value dictionaries.1 Once a corpus of text is digitally loaded in a computer folder, the GI can be executed to scan the texts for every word that is assigned to a category and produces a score for each category as a percentage of the total number of words in the text; this can then be compared with other texts. For instance, in a three-word text document “Word1, Word2, Word3” in which “Word1” belongs in Category1, Word2 belongs in Category2 and Word3 does not belong in any category, the GI will report the scores of Category1 =.33 and Category2 =.33.
Each category is formally a cluster of words, but analytically it gives us a reading of a specific dimension of meaning, defined by the rule governing a word's membership in that category, inherent in the text being examined. For instance, the degree of negativity of a text can be measured by the Harvard IV-4 category Negativ, which is a predetermined list of 2,291 words (such as “abhor,” “condemn,” and “hatred”) that register negativity.2 I have chosen an externally derived dictionary to avoid inferential circularity (insofar as the rules governing any particular word's membership in a particular category were externally defined and validated), and because these categories were developed to capture social scientific concepts developed in established (p.124) scholarship. The definitions of the 10 GI categories used for this book can be found in appendix II.
The GI does make some simplifying assumptions, namely, that all words are equally weighted, and it only performs basic root word analysis (“great,” “greater,” and “greatest” are all recognized as the same word). That said, the GI does disambiguate different uses of the same word, where meaning—which is what we are most concerned with—does change. For example, it distinguishes between four usages of “race”: as a contest, as moving quickly, as an indicator of a group of people of common descent, and as in the idiom “rat race.” Set up to recognize 11,790 words, the GI typically maps all but less than 3 percent of the words in a text into 182 categories, so there are rarely problems of uncounted data.3
The Rhetoric of President George W. Bush
To provide a sense of what quantitative content analysis alone can tell us, I will tell a brief illustrative story of George W. Bush's first year in office, using three GI categories. I collected every word published in the Weekly Compilation of Presidential Documents (N≈ 1,075,748) from January 20, 2001, to January 19, 2002, and coded all of the relevant words in these documents according to three content analytic categories taken from the Harvard IV-4 and Lasswell psychosociological dictionaries to produce a weekly time series.
Imagine the quantitative content analyst isolated from the world on a remote island with no access either to the texts of Bush's speeches or to third-party (media) accounts of these speeches. All the analyst possesses is a series of content analytic data derived from Bush's speeches. Even in this isolated world, an analyst looking at these data cannot but suspect that something very significant occurred in the 35th week of President's Bush first year in office. As figures A.1 and A.2 show, the president suddenly became discernibly more negative and hostile in his rhetoric at week 35 and continued to be so for at least another 10 weeks.4 Despite the chaos of politics, these distinct patterns managed to emerge amid the deluge of presidential words.
Since we lived through the terrorist attacks of September 11, 2001, these charts are less interesting in confirming what we already expected than in showing that presidential rhetoric creates a permanent footprint of events and of presidential responses to events that researchers can fruitfully examine. We are accustomed to thinking that presidents, together with their wordsmiths, are masters of their rhetorical fortunes. But figures A.1 and A.2 reveal that (p.125)
Quantitative content analysis does not only reliably confirm what is obvious and expected. It can often uncover unexpected textual traits not (p.126)
(1.) Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilvie, The General Inquirer: A Computer Approach to Content Analysis (Cambridge, MA: MIT Press, 1966); J. Zvi Namenwirth and Robert P. Weber, Dynamics of Culture (Boston: Allen and Unwin, 1987). See also http://www.wjh.harvard.edu/inquirer/3JMoreInfo.html.
(2.) The Harvard IV-4 and Lasswell value dictionary categories were developed to represent social science concepts introduced by Harold Lasswell, Talcott Parsons, David McClelland, and others. Specifically, the Negativ category was developed to capture the semantic language universals theorized in Charles Osgood, Universals of Language (Cambridge, MA: MIT Press, 1966).
(3.) Just the 300 most frequently used English words represent roughly 65 percent of all texts. See Edward B. Fry, Jacqueline E. Kress, and Dona Lee Fountoukidis, The Reading Teacher's Book of Lists, 5th ed. (New York: Wiley, 2006).
(4.) Hostile, like Negativ, is a category derived from one of the semantic language universals theorized in Osgood, Universals of Language. The category contains 833 words (such as “afflict,” “execute,” and “oppress”).
(5.) PowLoss is a Lasswell value dictionary category costing of 109 words (such as “concede,” “loss,” and “overwhelm”) in which the author divided language into four deference domains (power, rectitude, respect, affiliation) and four welfare domains (wealth, well-being, enlightenment, skill). See Namenwirth and Weber, Dynamics of Culture, 46–53. PowLoss is a subcategory within the first of these domains.