(p.469) Appendix A Our Approach to Epidemiologic Concepts and Methods
(p.469) Appendix A Our Approach to Epidemiologic Concepts and Methods
Throughout this book, we have sought to present epidemiologic concepts and methods in a way that is relevant and accessible to investigators studying psychiatric disorders, particularly those interested in causal hypotheses. Teaching epidemiology, we have found it is easiest to convey methods when they are tightly tied to a few central organizing principles. We therefore developed such organizing principles in the first parts of the book and presented epidemiologic methods with reference to these principles in the subsequent parts.
Although this book is about epidemiologic approaches, we integrated insights from many other disciplines. Influenced by our cross-disciplinary training in sociology (SS), psychiatry (ES), and epidemiology (SS and ES), we drew from the work of methodologists in epidemiology, psychology, sociology, human genetics, and other disciplines. These disciplines converge and overlap around the central theme of this book—uncovering causes of mental health problems.
We have emphasized that the investigation of causes demands a far broader scope than the study of risk factors. Nonetheless, a large part of the book concerns the designs used to study risk factors. This appendix explains how we arrived at our particular approach to the presentation of these designs. It is intended for readers already familiar with epidemiology. Researchers for whom this book is an introduction to epidemiologic methods may safely skip it or postpone its reading until it becomes germane.
Our Approach to Risk Factor Designs
Our approach to the risk factor designs can be viewed as an attempt to integrate epidemiologic methods as articulated by Rothman and Greenland (1998) into the validity typology presented by Shadish and co-workers (2002). Since the Shadish et al. typology is widely used by other disciplines, especially the social sciences, one advantage of this approach is that it facilitates linking epidemiology to other disciplines. However, such integration required that both Rothman and Greenland and Shadish et al. be adjusted, altered, and (p.470) adapted to make a coherent whole. Here we make it clear how much we have borrowed from these two sources, and we explain the ways in which we departed from them. Although many of the changes may seem subtle, we think that they have implications for decisions that researchers make in how they conduct their studies, analyze their data, and interpret their results.
Our departures from Shadish et al.’s validity typology derive from its application to the outcomes and designs of central concern to epidemiology. Epidemiologic outcomes are frequently dichotomous, as opposed to the continuous outcomes of psychological quasi-experiments. More centrally, viewing epidemiologic designs as proxies to quasi-experiments often underestimates their utility. For example, Shadish and colleagues’ discussion of case-control studies as a type of “quasi-experimental design that uses a control group but no pretest,” leads to a more pessimistic assessment of this design than an epidemiologic perspective would suggest.
Our departures from Rothman and Greenland’s approach derive mainly from our viewing epidemiology’s central task as theory testing rather than effect estimation; on this point our perspective is more in line with Shadish et al. In addition, our reading of their work is refracted through our prior training in sociology (SS) and psychiatry (ES). This refraction has led us to some insights deriving directly from their work that they might wish to reject. Our understanding of the reasons behind our points of departure greatly benefited from comments of Sander Greenland on earlier drafts of the introductory chapters and this appendix.
Adaptations of Shadish et al. (2002)
We kept the essentials of the Shadish et al. (2002) validity scheme by organizing the tasks of epidemiologic research into a series of loosely hierarchical questions: (1) How large and reliable is the association between the exposure and the disease? (2) Is the association between the exposure and disease, as measured, plausibly causal? (3) Which constructs are reflected in the causal effect? (4) How generalizable is the effect over persons, times, and settings? Although the language is adapted for the epidemiologic context, these questions correspond closely to the Shadish et al. validity scheme.
We also kept as an ongoing underlying theme, what Shadish et al. (2002, p. 16) refer to as a “falliblilist version of falsification.” This means that the researcher’s continual task is to “rule out” plausible alternatives to hypotheses while recognizing that each assessment is fallible. Increased understanding can be gained from this process, despite this fallibility.
The first adaptation we made was to label the first question in Shadish et al.’s validity scheme association rather than statistical conclusion validity. We did so to limit the introduction of unfamiliar terms. Association is familiar to epidemiologists and has the same meaning in epidemiology as in Shadish et al. We considered any factors that would decrease the precision of our estimate or cause biases toward the null value as threats to finding an association.
Second, we carved temporality out from internal validity for special attention. In quasi-experimental designs, the issue of temporality is of minor (p.471) concern, as Shadish et al. note, because in such designs the cause is manipulated and occurs before the outcome. In epidemiologic studies, temporal order is a concern so central and problematic that it warrants separate attention.
To accommodate this alteration, we employ the term sole plausible explanation to refer specifically to internal validity concerns other than temporality. This is in accord with Shadish et al.: “We use the term internal validity to refer to inferences about whether observed covariation between A and B reflects a casual relationship from A to B in the form in which the variables were manipulated or measured. To support such inference, the researcher must show that A precedes B in time, that A covaries with B (already covered under statistical conclusion validity) and that no other explanations for the relationship are plausible.” (Shadish et al. 2002, p. 53). Thus, temporality and sole plausibility are the two components of internal validity in Shadish et al. These three central components of causal identification—association, temporal order, and sole plausibility—date back to John Stuart Mill (1843) and have been used in a similar way in epidemiology (e.g., Susser 1973; Susser 1991).
We did not use the term internal validity in this text. We wanted to avoid the implication that temporality is not a component of internal validity. In addition, internal validity has different and varied meanings in epidemiology that diverge from Shadish et al.’s definition.
Third, some adaptations were required to place particular threats to validity in the context of observational epidemiologic studies where, unlike quasi-experiments, the exposures are not manipulated. For example, it was not clear where to place nondifferential and differential misclassification. Since Shadish et al. did not focus on dichotomous variables, this type of problem was not discussed explicitly. In deciding how to categorize various threats, we give priority to their consequences for answering the four questions of causal inference. We grouped all measurement problems that generally bias toward the null, including nondifferential misclassification, under problems of identifying an association, because they mask associations. Differential misclassification is mentioned twice—as causing difficulties for finding an association (when it causes bias toward the null) and also as posing a plausible alternative explanation for an exposure disease relationship (when the misclassification exaggerates the effect.)
Similarly, it was not clear where to place certain other measurement problems. Disease detection bias could have been legitimately considered under either sole plausibility (one component of internal validity under Shadish et al.) or under construct validity. In Shadish et al. confounding is at the heart of both internal and construct validity. The distinction is that “internal validity confounds are forces that could have occurred in the absence of the treatment and could have caused some or all of the outcome observed” (Shadish et al. 2002, p. 95), whereas construct confounds are part of the treatment and could not have occurred in absence of the treatment.
In observational epidemiology, we study nonmanipulated exposures; consequently, this distinction is often blurred. In particular, disease detection bias is a confound that occurs because the measurement error in the disease is linked with the exposure. In this sense, it would not have occurred without (p.472) the exposure. On the other hand, the measurement error in the disease could have occurred from the same biases and caused the appearance of the disease even in the absence of the exposure.
Ultimately, we decided to include disease detection biases under sole plausibility for two reasons. First, disease detection bias is an alternative explanation for the relationship between the exposure and disease as they are measured. Second, we wanted to differentiate disease detection bias from construct validity, to highlight those aspects of construct validity that are relatively neglected in epidemiology and well developed in psychiatric epidemiology. In our view, the gap between our constructs and the operationalization of those constructs has not been given enough consideration in epidemiology. To draw attention to this issue, we gave centrality in construct validity to the identification of the active ingredient of the exposure and the factors that give rise to it. We discussed construct validity as the separation of the aspects of our measures that are essential to the meaning we had in mind, from those that are not.
It is important to distinguish our definition of construct validity from the notion of the resemblance of a measure to a gold standard. For example, we might use a DSM diagnosis of depression as the gold standard and some more easily applied symptom scale score as a proxy to it. The construct validity issue, however, is the extent to which the gold standard itself captures our concept of depression. This issue requires an iterative relationship among theory, research, and measurement. Probing the mechanisms in our studies refines the construct.
We think that these adaptations are within the spirit of Shadish et al. As they note, their list of specific threats to validity is “not divinely ordained.” “Threats are better identified from insiders’ knowledge than from abstract and non-local lists of threats,” they acknowledge (2002, p. 473). Indeed, this is why we prefer to present a few key principles rather than a long list of specific threats. The principles provide a guideline to think through potential threats to validity that may or may not apply in particular situations.
Adaptations of Rothman and Greenland (1998)
We find the concept of exchangeability, introduced to us through the work of Greenland and Robins (1986), extremely useful in presenting epidemiologic methods within an integrated framework. Exchangeability, rooted in the counterfactual, provides the logical connection between confounding, a central concept in epidemiology, and the logic of causal inference. Once introduced, the notion, like many well-thought-out ideas, seems quite obvious and clear. Confounding is the difference between the comparison in our study (i.e., the exposed and unexposed) and the comparison of true interest (i.e., the exposed and the counterfactual). A good proxy for the counterfactual is when the disease risk in the unexposed represents the disease risk in the exposed had they not been exposed. Therefore, confounding is the imbalance in risk factors for disease, other than the exposure, between the exposed and the unexposed.
(p.473) This notion parallels the aspect of sole plausible explanation in Shadish et al.’s discussion of internal validity. Indeed, Shadish et al. explicitly ground internal validity in counterfactual thinking. However, with some exceptions (e.g., Lewis 1973), Shadish et al. and Rothman and Greenland invoke different literature. Shadish et al. ground their presentation of counterfactual thinking in the work of Mackie (1974), brought into psychology by Meehl (1977). Rothman and Greenland depend more on the statistical literature (Rubin 1990). Shadish et al. explicitly set Rubin’s work in the background as being of critical importance to students of causal thinking but more relevant to statistical rather than the conceptual issues that are their main concern. Here our own perspective is more in line with Shadish et al. This emphasis leads to several differences between Rothman and Greenland’s and our application of the counterfactual.
In Rothman and Greenland’s model, and as explicated earlier in Greenland and Robins (1986), people in the population are divided into four “types”: susceptible positives and negatives, immune, and doomed. Susceptible positives and negatives are people who have the causal partners required for the exposure to have an effect either causing (susceptible positive) or preventing (susceptible negative) the disease. The immune are people who do not have a full complement of risk factors necessary to cause disease. The doomed are those who have a full complement of risk factors to cause disease from a sufficient cause other than one that includes the exposure of interest.
Among the exposed, disease occurs in the susceptible positives and the doomed. Among the unexposed, disease occurs in the susceptible negatives and the doomed. Therefore, for the exposed and unexposed to be exchangeable, the sum of the susceptible negative and doomed in the unexposed must equal the sum of the susceptible negative and doomed in the exposed. Under these conditions, the disease risk in the unexposed will represent the disease risk in the exposed had they not been exposed.
Through the connection to the counterfactual, the fundamental meaning and effects of confounding are greatly clarified by their contributions. Confounding becomes a concept that has meaning that is dependent neither on criteria nor on statistical tests that only imperfectly detect its presence. This realization makes it clear that confounding is not about single risk factors for disease but about the relationship and balance among the totality of risk factors that differ between the exposed and unexposed. Rothman and Greenland’s approach offers a useful way to think through practical problems in murky situations where the imperfect rules may provide misleading direction.
We think exchangeability can be used more broadly, however, to include not only the unequal distribution of true causes of disease, but also the unequal distribution of methodological artifacts. For example, the maldistributions of follow-up time and disease ascertainment also pose threats to the validity of causal inference. They are alternative explanations for an apparent exposure-disease association. We find it most compelling to have a parsimonious model—one that unites all the ways in which an apparent association between an exposure and disease as measured in our data can be caused by (p.474) factors other than a causal relationship between the exposure and the disease.
We think that this more expansive application is alluded to in Maldonado and Greenland (2002) and in Miettinen (1985). Nonetheless, Rothman and Greenland make it clear in their textbook (and in personal communication with Greenland) that their notion of exchangeability, and correspondingly confounding, applies only to true risk factors for disease (for reasons that will be explained). Therefore, to avoid confusion, we use a different term, full comparability, to refer to our expanded notion, and we reserve the term confounding to refer to a subtype of noncomparability that derives from differences between the exposed and unexposed on other causes of disease, so that our use of this term is more in line with Rothman and Greenland.
We prefer the unified concept of noncomparability because the maldistribution of true causes of disease and the maldistribution of artifacts cause the same types of problems—they provide alternative explanations for a relationship between the exposure and disease in our data. They all relate to Shadish et al.’s second question: given that there is an association between the exposure and the disease, is that association plausibly causal? To answer this question it is necessary to face all plausible explanations for the association other than the exposure of interest. Any factor that links the exposure as measured with the disease as measured is a candidate. Thus, the maldistribution of true causes of disease (i.e., confounders) and the maldistribution of artifacts (e.g., differences in follow-up time or disease detection) serve the same function in our schema. This notion ties the whole enterprise of causal inference more closely to the counterfactual and is true to Shadish et al.’s schema.
Why then do Rothman and Greenland separate out maldistribution between the exposed and the unexposed on true causes of disease from the maldistribution on methodological artifacts? We think that there are several reasons. First, Rothman and Greenland seem to group biases according to their “cure” rather than their harm. For example, confounding and selection bias are categorized separately because their harm cannot always be cured in the same manner. In their discussion of selection bias they state that “some forms of selection bias (selection confounding) can be controlled like confounding; other forms can be impossible to control without external information that is rarely (if ever) available” (Rothman and Greenland 1998, p. 355). However, selection bias and confounding cause the same harm (i.e., cause the appearance of an association between the exposed and unexposed that is not causal). For selection bias that can be cured in the same manner as confounding, the hybrid label selection confounding is applied. This approach is particularly awkward because selection causes the exposed and unexposed to differ on true risk factors for disease and, in this sense, leads to confounding.
The focus on the cures is also seen in the inclusion of surrogates or proxies that are associated with causes of disease but are not themselves causes of disease under the rubric of confounders.1 Surrogates or proxies do not cause confounding. Nonetheless, their control may improve the validity of the estimate (p.475) of the effect by controlling for the effect of the causes for which they are a proxy.
A similar rationale lies behind the separation of differential misclassification from confounding. Rothman and Greenland note that even when the cause of differential misclassification can be identified and controlled like a third-variable confounder, the resulting adjusted effect estimate still suffers from the effect of nondifferential misclassification. Thus, the adjusted effect estimate is likely to be biased toward the null. Rothman and Greenland specifically admonish researchers against controlling for differential misclassification as they do confounders, with the warning that the adjusted effect estimate may be more biased (albeit toward the null) than the nonadjusted effect estimate (away from the null). This is an important point. However, in terms of addressing the questions that we pose to assess our causal ideas, biases toward and away from the null play different roles. Once we see an association, we then ask if it is causal. To provide a plausible alternate explanation for the exposure-disease relationship, the misclassification must bias away from the null. We can control for it and see what happens to the effect estimate. If the adjusted estimate is no longer appreciably different from the null value, differential misclassification is a plausible explanation for the relationship. This does not mean, however, that we should necessarily accept the adjusted estimate as the true estimate, or the one more accurate than the unadjusted. Rather, the analysis helps probe our causal idea and suggest that an identified alternative explanation is plausible.
We suspect that underlying this specific difference is a more general difference in perspective on the goal of epidemiologic studies. Rothman and Greenland state that the “overall goal of an epidemiologic study is accuracy in estimation: to estimate the value of the parameter that is the object of measurement with little error” (Rothman and Greenland 1998, p. 116). Thus, the goal is to have a true estimate of the exposure effect in the study. Our perspective, which we think we share with Shadish et al., is somewhat different. We consider the estimates in our data to be the imperfect tools we use to reach our goal of testing our theories of disease causation. What we want from our studies is evidence that will lead us to change the belief in our causal ideas. Of course we cannot do this with data that are not trustworthy, but the precise or true estimate of the effect is the means, not the end goal, of the study.
We see our effect estimates as ephemeral and context dependent. The risk ratio can change according to characteristics of persons, places, and settings. As Rothman and Greenland’s scheme makes abundantly clear, the risk ratio is dependent on the prevalence and distribution of the causal partners of the exposure. Again, it is critical for our tools to be as accurate as possible, but the tool and the goal are, for us, separate.
This distinction is reflected in a subtle difference in the definition of internal validity in Rothman and Greenland and in Shadish et al. For Rothman and Greenland, “Precision … corresponds to the reduction of random error” (p. 116), and “internal validity implies validity of inference for the source population of study subjects” (p. 118). For Shadish et al., statistical conclusion validity addresses the issue of inferences about the association between the (p.476) exposure and the disease, and internal validity is the extent to which casual attributions are correctly made for an exposure-disease association.
Precision in Rothman and Greenland encompasses only random measurement error. Internal validity includes all systematic errors, whether they serve to mask or exaggerate an association. For Shadish et al., statistical conclusion validity includes random measurement error, but also systematic errors that would bias toward the null. Internal validity, therefore, is only about alternative explanations for an association (other than the exposure causing the disease) that the data suggest. The underlying difference is whether the central concern in internal validity is validity of the effect estimate or the validity of the causal inference.
Our understanding of external validity, though generally consistent with both Rothman and Greenland and Shadish et al., is also influenced by the difference in goal between effect estimation and theory testing. We agree with Rothman and Greenland that external validity is not about the generalization of the study findings to some target population. As they state, “Scientific generalization amounts to moving from time- and place-specific observations to an abstract ‘universal’ hypothesis” (p. 124). However, we think that the way to reach this universal hypothesis is through the specification of the range of persons, places, and settings over which the risk factor has an effect. This specification amounts to identifying the causal partners of the risk factors. If a risk factor is shown to have a causal effect in any context, even a very limited idiosyncratic one, then it is a cause in the sense that if the context were recreated the disease would occur. We find the identification of contexts critically important both for public health and understanding of disease causation.
We think that organizational schema are not about truth and falsehood, but about utility. As a framework for understanding the logic of risk factor designs, Shadish et al.’s validity scheme, we believe, meets this standard. It is a parsimonious tool for guiding us along the often concealed path toward informative studies. It has been successfully used in psychology, sociology, and intervention studies in public health, and we think that it can serve as a useful organizing scheme in psychiatric epidemiology, as well. Nonetheless, the specific application to epidemiology makes the contribution of Rothman and Greenland essential. We think that their approach represents a clear distillation of epidemiologic principles that provides the localization for which Shadish et al. advocate. It is our hope that the integration of these two disciplinary traditions will aid researchers in improving the validity of their studies and ultimately in enriching their search for causes.
(1) This insight was provided by Ulka Campbell.