Empirical applications in economics often struggle with the question of how to accommodate (often binary) endogenous regressor(s) in a model aimed at capturing the relationship between the endogenous regressor(s) and an outcome variable. Problems of causal inference motivated by policy concerns involve ‘what if’ statements, and thus counterfactual outcomes. In this Appendix, we discuss the approaches followed in the present book.
A.1 THE TREATMENT CONTROL APPROACH
In some cases, it is possible to put the policy question in the treatment control form typical of the experimental framework. The fact that the treatment is endogenous reflects the idea that the outcomes are jointly determined with the treatment status or that there are variables related to both treatment status and outcomes. ‘Endogeneity’ makes it impossible to compare ‘treated’ and ‘non-treated’ individuals: no causal interpretation could be given to such a comparison, because the two groups are different irrespective of their treatment status.
A growing strand of applied economic literature has tried to identify causal effects of interventions from observational (i.e. non-experimental) studies using the conceptual framework of randomized experiments, and the so-called potential outcomes approach that allows causal questions to be translated into a statistical model.1 While it is possible to find some identification strategies for causal effects even in non-experimental settings, data alone do not suffice to identify treatment effects. Suitable assumptions, possibly based on prior information available to the researchers, are always needed.
Guarcello, Mealli, and Rosati (2003), on which we draw in Chapter 6, uses the potential outcomes approach to causal inference, based on the statistical work on randomized experiments by Fisher and Neyman, and extended by Rubin (see Holland 1986). In recent years, many economists have accepted and adopted this framework2 because of the clarity it brings to questions of causality. This approach defines a causal effect as the comparison of the (p.229) potential outcomes on the same unit measured at the same time: Y(0) = the value of the outcome variable Y if the unit is exposed to treatment T = 0, and Y(1) = the value of Y if exposed to treatment T = 1. Only one of these two potential outcomes can be observed, yet causal effects are defined by their comparison, e.g. Y(1) − Y(0). Thus, causal inference requires developing inferences able to handle missing data. The focus of the analysis is usually that of estimating the average treatment effect ATT=E(Y(1) − Y(0)), or the average treatment effect for subpopulations of individuals defined by the value of some variable, most notably the subpopulation of the treated individuals ATT = E(Y(1) − Y(0)| T = 1).
The assignment mechanism is a stochastic rule for assigning treatments to units and thereby for revealing Y(0) or Y(1) for each unit. This assignment mechanism can depend on other measurements, i.e. P(T = 1| Y(0), Y(1), X). If these other measurements are observed values, then the assignment mechanism is ignorable; if given observed values involve missing values, possibly even missing Ys, then it is non-ignorable. Unconfoundedness is a special case of ignorable missing mechanisms and holds when P(T = 1| Y(0), Y(1), X)=P(T = 1| X) and X is fully observed.
Unconfoundedness is similar to the so-called ‘selection on observables’ assumption (also exogeneity of treatment assignment), which states that the value of the regressor of interest is independent of potential outcomes after accounting for a set of observable characteristics X. This approach is equivalent to assuming that exposure to treatment is random within the cells defined by the variables X. Although very strong, the plausibility of these assumptions relies heavily on the amount and on the quality of the information on the individuals contained in X.
Under unconfoundedness one can identify the average treatment effect within subpopulations defined by the values of X:
One such strategy is regression modelling. One usually starts by assuming a functional form for E(Y(t) | X=x), for example a linear one, in a vector of (p.230) functions of the covariates E(Y(t) | X=x)=g(x)′βt. The estimates of the parameters' vectors βt(t=0, 1) are usually obtained by least squares or maximum likelihood methods. Unless some restrictions are imposed on the βt,3 causal effects are rarely estimated from the value of the parameters (especially if the model is non-linear).
Using regression models to ‘adjust’ or ‘control for’ pre-intervention covariates, while being in principle a good strategy, has its pitfalls. For example, if there are many covariates, it can be difficult to find an appropriate specification. In addition, regression modelling obscures information on the distribution of covariates in the two treatment groups. In principle, one would like to compare individuals that have the same values for all the covariates: unless there is a substantial overlap of the covariates' distributions in the two groups, with a regression model one relies heavily on model specification (i.e. on extrapolation) for the estimation of treatment effects. Therefore, it is crucial to check the extent of the overlapping between the two distributions, and the ‘region of common support’ of these distributions. When the number of covariates is large, this task is not an easy one. An approach that can be followed is to reduce the problem to one dimension by using the propensity score, that is to say, the individual probability of receiving the treatment given the observed covariates p(X) = P(T = 1| X). Under unconfoundedness the following results in fact hold (Rosenbaum and Rubin 1983a):
(i) T is independent of X given the propensity score p(X),
(ii) Y(0) and Y(1) are independent of T given the propensity score.
From (i) we can see that the propensity score has the so-called balancing property, i.e. observations with the same value of the propensity score have the same distribution of observable (and possibly unobservable) characteristics independently of the treatment status. From (ii), we can see that exposure to treatment and control is random for a given value of the propensity score. These two properties allow us (a) to use the propensity score as a univariate summary of all the X, to check the overlap of the distributions of X, because it is enough to check the distribution of the propensity score in the two groups, and (b) to use the propensity score in the ATE (or ATT) estimation procedure as the single covariate that needs to be adjusted for, as adjusting for the propensity score automatically controls for all observed covariates (at least in large samples). In this paper we will use the estimated propensity score to serve purpose (a) to validate the regression results, and purpose (b) by estimating the ATT with a propensity score based matching algorithm.
The analysis of the propensity score alone can be very informative because it reveals the extent of the overlap in the treatment and comparisons groups in terms of pre-intervention variables. The conclusion of this initial phase may (p.231) be that treatment and control groups are too far apart to produce reliable estimates without heroic modelling assumptions. The propensity score itself must be estimated: if the treatment is binary, any model for binary dependent variables can be used, although the balancing property should be used to choose the appropriate specification of the model, i.e. how the observed covariates enter the model. Some specification strategies are described in Becker and Ichino (2001) and Rubin (2002). Propensity score methods can be extended to include multiple treatments (Imbems 2000, Lechner 2001).
The assumption that the treatment assignment is ignorable, or even unconfounded, underlies much of the recent economic policy intervention evaluation strategies (Jalan and Ravallion 2003), so that one might have the impression that researchers no longer pay much attention to unobservables. The problem of the analyses involving adjustments for unobserved covariates, such as the Heckman's type corrections (Heckman and Hotz 1989), is that they tend to be quite subjective and very sensitive to distributional and functional specification. This has been shown in a series of theoretical and applied papers (Lalonde 1986, Dehejia and Wahba 1999, Copas and Li 1997). The adjustment for unobserved variables, however, strongly relies on the existence of valid instruments, i.e. on variables that are correlated with T but are otherwise independent of the potential outcomes. If such variables exist, they can then be used as a source of exogenous variation to identify causal effects (Angrist and Imbens 1995, Angrist and Rubin 1996).
The validity of a variable as an instrument, i.e. the validity of the exclusion restrictions, cannot be directly tested. In observational studies such variables are usually very hard to find, although there are some exceptions (see Angrist and Krueger 1999 for some examples). Thus, despite the strength of the unconfoundedness assumption, and the fact that it cannot be tested, it is very hard not to use this assumption in observational studies. It is then crucial to adjust the ‘best’ possible way for all observed covariates. Propensity score methods can help achieve this. The issue of unobserved covariates should then be addressed using models for sensitivity analysis (e.g. Rosenbaum and Rubin 1983b) or using non-parametric bounds for treatment effects (Manski 1990, Manski et al. 1992).
A.2 SENSITIVITY ANALYSIS
Many observational studies assume, implicitly or explicitly, that unobservables do not impinge on the effect of the exogenous variable of interest. This assumption (‘unconfoundedness’) lends itself to the criticism that it effectively rules out any role of the unobservable variables. Chapter 8 of this book reported the estimates in Guarcello, Mealli, and Rosati (2003) of the effects of uninsured shocks, and credit rationing, on education and child labour. In order to check whether the results are robust, that paper applies the sensitivity analysis method proposed by Rosenbaum and Rubin (1983a), extended (p.232) here to a multinomial outcome. That method allows the researcher to assess the sensitivity of the causal effects to assumptions about an unobserved binary covariate associated with both the treatments and the response.
The unobservables are assumed to be summarized by a binary variable in order to simplify the analysis, although similar techniques could be used assuming other distributions for the unobservables. Note however that a Bernoulli distribution can be thought of as a discrete approximation to any distribution, and thus we believe that our distributional assumption will not severely restrict the generality of the results.
Suppose that the treatment assignment is not unconfounded given a set of observable variables X, i.e.
Since Y(0), Y(1), and T are conditionally independent given X and U, the joint distribution of (Y(t), T, X, U) for t = 0, 1 can be written as
In these expressions, π represents the proportion of individuals with U = 0 in the population, and the distribution of U is assumed to be independent of X. This should render the sensitivity analysis more stringent, since, if U were associated with X, controlling for X should capture at least some of the effects of the unobservables. The sensitivity parameter α captures the effect of U on treatment receipt (e.g. credit rationing), while the δti,'s are the effects of U on the outcome.
By comparing the results obtained for different values of the sensitivity parameters with those obtained from the reference estimate, it is possible to assess the sensitivity of the conclusions reached with respect to the presence of unobservables. This was done in Chapters 6 and 8. The results of the sensitivity analysis carried out on the estimates presented in those chapters show that allowing for the presence of unobservables would not influence the results substantially.
(3) For example imposing that the treatment effect is constant (i.e. excluding the interaction terms of the treatment with the other covariates).