Examining the Multilevel Model’s Error Covariance Structure
Examining the Multilevel Model’s Error Covariance Structure
Abstract and Keywords
The previous chapters emphasized the fixed effects in the multilevel model for change. This chapter, in contrast, focuses on the model's random effects as embodied in its error covariance structure. Section 7.1 begins by reviewing the “standard” multilevel model for change, expressed in composite form. Section 7.2 examines this model's random effects, demonstrating that the composite error term is indeed both heteroscedastic and autocorrelated, as preferred for longitudinal data. Section 7.3 compares several alternative error covariance structures and provide strategies for choosing among them.
Keywords: multilevel model, individual change, composite model, error covariance matrix
Change begets change. Nothing propagates so fast.… The mine which Time has slowly dug beneath familiar objects is spring in an instant, and what was rock before, becomes but sand and dust.
—Charles Dickens
In previous chapters, we often emphasized the fixed effects in the multilevel model for change. Doing so made great sense because the fixed effects typically provide the most direct answers to our research questions. In this chapter, in contrast, we focus on the model’s random effects as embodied in its error covariance structure. Doing so allows us to both describe the particular error covariance structure that the “standard” multilevel model for change invokes and it also allows us to broaden its representation to other—sometimes more tenable assumptions—about its behavior.
We begin, in section 7.1, by reviewing the “standard” multilevel model for change, expressed in composite form. In section 7.2, we closely examine this model’s random effects, demonstrating that the composite error term is indeed both heteroscedastic and autocorrelated, as we would prefer for longitudinal data. But we also find that this error covariance structure may not be as general as we might like and, in some settings, alternatives may have greater appeal. This brings us to section 7.3, in which we compare several alternative error covariance structures and provide strategies for choosing among them.
7.1 The “Standard” Specification of the Multilevel Model for Change
Throughout this chapter, we use a small, timestructured data set first presented in Willett (1988). On each of four days, spaced exactly one week apart, 35 people completed an inventory that assesses their performance (p.244)
Table 7.1: Ten cases from a personlevel data set containing scores on an opposite’s naming task across four occasions of measurement, obtained weekly, and a baseline measurement of COG, a measure of cognitive skill, obtained in the first week
ID 
OPP1 
OPP2 
OPP3 
OPP4 
COG 

01 
205 
217 
268 
302 
137 
02 
219 
243 
279 
302 
123 
03 
142 
212 
250 
289 
129 
04 
206 
230 
248 
273 
125 
05 
190 
220 
229 
220 
81 
06 
165 
205 
207 
263 
110 
07 
170 
182 
214 
268 
99 
08 
96 
131 
159 
213 
113 
09 
138 
156 
197 
200 
104 
10 
216 
252 
274 
298 
96 
We specify the “standard” multilevel model for change in the usual manner. For individual i on occasion j, we assume that oppositesnaming score, Y _{ij}, is a linear function of TIME:
To allow individual change trajectories to differ systematically across people, we posit a level2 submodel in which the cognitive skills score (COG) is associated with both growth parameters:
Table 7.2 presents the results of fitting this “standard” multilevel model for change to the oppositesnaming data. Because we focus in this chapter on the model’s stochastic portion, we use restricted, not full, maximum likelihood (see section 4.3 for a comparison of methods). For an individual of average cognitive skill, initial level of oppositesnaming skill is estimated to be 164.4 (p < .001); this average person’s weekly rate of linear change is estimated to be 27.0 (p < .001). Individuals whose cognitive skills differ by one point have an initial oppositesnaming score that is 0.11 lower (although this decrement is not statistically significant, p = .82); their average weekly rate of linear change is 0.43 higher (p < .01). Even after including cognitive skill as a predictor of both initial status and change, we detect statistically significant level2 residual variation in (p.246)
Table 7.2: Change in opposite’s naming over a fourweek period as a function of baseline IQ
Parameter 
Estimate 


Fixed Effects 

Initial status, π_{0i} 
Intercept 
γ_{00} 
164.37*** 
(COG – ) 
γ_{01} 
−0.11 

Rate of change, π_{1i} 
Intercept 
γ_{10} 
26.96*** 
(COG – ) 
γ_{11} 
0.43** 

Variance Components 

Level–1: 
Withinperson variance 
159.48*** 

Level–2: 
Variance in ζ_{0i} 
1236.41*** 

Variance in ζ_{1i} 
107.25*** 

Covariance of ζ_{0i} and ζ_{1i} 
σ_{01} 
−178.23* 

Goodnessoffit 

Deviance 
1260.3 

AIC 
1268.3 

BIC 
1274.5 
~ p < .10; * p < .05; ** p < .01; *** p < .001.
Parameter estimates, approximate pvalues, and goodnessoffit statistics from fitting a standard multilevel model for change (n = 35).
Note: SAS PROC MIXED, Restricted ML.
7.2 Using the Composite Model to Understand Assumptions about the Error Covariance Matrix
To understand the error covariance structure in the “standard” multilevel model for change, we move to the composite representation obtained by collapsing the level2 submodels in equation 7.2a into the level1 submodel in equation 7.1a: (p.247)
As in section 4.2, brackets distinguish the model’s structural and stochastic portions. Its structural portion contains our hypotheses about the way that oppositesnaming skill changes with time and depends on baseline cognitive skill. Its stochastic portion contains the composite residual, which we now label r, for convenience. The value of r for individual i on occasion j is:
But before examining these properties, let us simplify the composite model in equation 7.4 by substituting r _{ij} as defined in equation 7.5 into equation 7.4:
Because of the special nature of “r,” our standard practice is to fit the model in equation 7.6 by GLS regression analysis, not OLS, making specific assumptions about the distribution of the residuals. But before doing so, let’s suppose for a hypothetical moment that we were willing to invoke the simpler OLS assumptions that all the r _{ij} are independent and normally distributed, with zero means and homoscedastic variance (, say). (p.248) We could codify these simple distributional assumptions for all the residuals simultaneously in one grand statement:
While equation 7.7 may appear needlessly complex for describing the behavior of the residuals in an OLS analysis, it provides a convenient and generalizable form for codifying assumptions on residuals that we will find useful, later. It says that the complete set of residuals in the analysis has a multivariate normal distribution. The statement has several important features:

• It contains a vector of random variables whose distribution is being specified. To the left of the “is distributed as” sign (“~”), a column contains all the random variables whose distribution is being specified. This vector contains all the model’s residuals, which, for our data, run from the four residuals for person 1 (r _{11}, r _{12}, r _{13}, and r _{14}), to those for person 2 (r _{21}, r _{22}, r _{23}, r _{24}) and so on, through the four residuals for person n.

• It states the distribution type. Immediately after the ~, we stipulate that every element in the residual vector is normally distributed (“N”). Because the vector has many (“multi”) entries, the residuals have a multivariate normal distribution.

• It contains a vector of means. Also to the right of the ~, inside the (p.249) parentheses and before the comma, is a vector of hypothesized means, one for each residual. All these elements are 0, reflecting our belief that the population mean of each residual is 0.

• It contains an error (or, residual) covariance matrix. The last entry in equation 7.7 is the error covariance matrix, which contains our hypotheses about the residual variances and covariances. Under classical OLS assumptions, this matrix is diagonal—all elements are zero, except those along the main diagonal. The offdiagonal zero values represent the residual independence assumption, which stipulates that that the residuals do not covary. Along the diagonal, all residuals have an identical population variance, . This is the residual homoscedasticity assumption.
The new distributional specification in equation 7.8 allows the residuals in the composite model to have a multivariate normal distribution with zero means and a block diagonal, not diagonal, error covariance structure. The term “block diagonal” means that all the matrix’s elements are zero, except those within the “blocks” arrayed along the diagonal, one per person. The zero elements outside the blocks indicate that each (p.250) person’s residuals are independent of all others’—in other words, the residuals for person i have zero covariance with everyone else’s residuals. But the nonzero covariance parameters within each block allow the residuals to covary within person. In addition, the multiple distinct parameters along each block’s diagonal allow the variances of the withinperson residuals to differ across occasions. These distinctions between the diagonal and the block diagonal error covariance matrices demark the fundamental difference between a crosssectional and longitudinal design.^{1}
Notice that the blocks of the error covariance matrix in equation 7.8 are identical across people. This homogeneity assumption says that, in an analysis of change, although the composite residuals may be heteroscedastic and dependent within people, the entire error structure is repeated identically across people—that is, everyone’s residuals are identically heteroscedastic and autocorrelated. This assumption is not absolutely necessary, as it can be tested and relaxed in limited ways (provided you have sufficient data). Yet we typically invoke it for practical reasons, as it improves dramatically the parsimony with which we can specify the model’s stochastic portion. Limiting the number of unique variance/covariance components in a hypothesized model improves the rapidity with which iterative model fitting converges. If we allowed each person in this study to possess a unique set of variance components, for example, we would be need to estimate 10n variance components—6n more than the number of observations on the outcome in the personperiod data set!
Adopting the homogeneity assumption allows us to express the distributional assumptions in equation 7.8 in more parsimonious terms by writing:
When you investigate random effects in the analysis of change, you anticipate that the composite residuals will have a multivariate distributional form like equation 7.8 or 7.9. As part of your analyses, you estimate the elements of this error covariance matrix, which means that—under the homogeneity assumption—you estimate the elements of the error covariance submatrix Σ_{r}, in equation 7.10.
This specification of the Σ_{r} error covariance submatrix—and hence the shape of the full error covariance matrix—is very general. It contains a set of error variance and covariance parameters (four of the former and six of the latter, for the oppositesnaming data), each of which can take on an appropriate value. But when you specify a particular multilevel model for change, you invoke specific assumptions about these values. Most important for our purposes here is that the “standard” multilevel model for change invokes a specific mathematical structure for the r _{ij} As we show below, this model constrains the error covariance structure much more than that specified in equations 7.9 and 7.10.
What does the error covariance submatrix Σ_{r} of the “standard” multilevel model for change look like? When we presented this model earlier in the book, we focused on its ability to represent hypotheses about fixed effects. Does it also provide a reasonable covariance structure for the composite residuals? Fortunately, most of its behavior is exactly what you would hope and expect. First, because a weighted linear combination of normally distributed variables is also normally distributed, for example, each composite residual in equation 7.5 is also normally distributed, as specified in equation 7.9. Second, because the mean of a weighted linear combination of random variables is equal to an identically weighted linear combination of the means of those variables, the mean of the composite residual in equation 7.5 must also be zero, as specified in equation 7.9. Third, the error covariance matrix of the composite residuals is indeed block diagonal, as specified in equation 7.9. But fourth, in the standard multilevel model for change, the elements of the Σ_{r} error covariance blocks in equations 7.9 and 7.10 possess a powerful dependence on time. As this is both the most interesting—and potentially troublesome—aspect of the standard model, we delve into this feature in some detail below.
7.2.1 Variance of the Composite Residual
We begin by examining what the “standard” multilevel model for change hypothesizes about the composite residual’s variance. Straightforward (p.252) algebraic manipulation of r _{ij} in equation 7.5 provides an equation for the diagonal elements of the error covariance submatrix Σ_{r}, in equation 7.10, for the standard multilevel model for change, in terms of TIME and the model’s variance components. Under the standard multilevel model for change, the population variance of the composite residual at TIME t _{j} is:
Based on the algebraic representation in equation 7.11, what can we say about the general temporal dependence of composite residual variance (p.253) in the “standard” multilevel model for change? We can gain insight into this question by completing the square in equation 7.11:
So, ask yourself! Does it make sense to assume, in real data—as the “standard” multilevel model for change does implicitly—that composite residual variance increases parabolically over time from a single minimum? For the “standard” model to make sense, and be applied in the real world, your answer must be yes. But are other patterns of heteroscedasticity possible (or likely)? In longitudinal data, might residual heteroscedasticity possess both a minimum and a maximum? Might there be even multiple minima and maxima? Might composite residual variance decline from a maximum, on either side of some fiducial time, rather than increasing from a minimum? Although compelling, none of these options is possible under the “standard” multilevel model for change.
Before concluding that the model we have spent so long developing is perhaps untenable because of the restriction it places on the error covariance matrix, let us quickly offer some observations that we hope will assuage your concerns. Although the “standard” multilevel model for change assumes that composite residual variance increases parabolically from a minimum with time, the temporal dependence of residual heteroscedasticity need not be markedly curved. The magnitude of the curvature depends intimately on the magnitude of the model’s variance/covariance components. If all three level2 components—, and σ_{01}—are near zero, for example, the error covariance matrix is (p.254) actually close to homoscedastic, with common variance . Or, if level2 residual slope variability, , and residual initial status/slope covariance, σ_{01}, are near zero, composite residual variance will still be homoscedastic, but with common variance . In both cases, the “curvature” of the parabolic temporal dependence approaches zero and heteroscedasticity flattens.
In our own experience, these situations are common. The first occurs when the level2 predictors “explain” most, or all, of the betweenperson variation in initial status and rate of change. The second occurs when the slopes of the change trajectories do not differ much across people—a common occurrence when study duration is short. Finally, as the sizes of the residual slope variance and initial status/slope covariance, σ_{01}, differ relative to one another, the time at which minimum residual variance occurs can easily move beyond the temporal limits of the period of observation. When this happens, which is often, no minimum is evident within the period of observation, the composite residual variance appears to either increase or decrease monotonically over the time period under study. We conclude from these special cases and the general temporal dependence of the residual variance that, while the composite residual variance is indeed functionally constrained in the “standard” multilevel model for change, it is also capable of adapting itself relatively smoothly to many common empirical situations. Nonetheless, in any analysis of change, it makes great sense to check the hypothesized structure of the error covariance matrix—whether obtained implicitly, by adopting the standard model, or not—against data just as it is important to check the tenability of the hypothesized structure of the fixed effects. We illustrate the checking process in section 7.3.
7.2.2 Covariance of the Composite Residuals
We now examine the temporal dependence in the covariance of the composite residuals in the “standard” multilevel model for change. These covariances appear in the offdiagonal elements of the error covariance submatrix Σ_{r}, in equation 7.10. Again, mathematical manipulation of the composite residual in equation 7.5 provides the covariance between composite residuals at TIMES t _{j} and and t _{j}:
The expression for the covariance between composite residuals in equation 7.13 and the estimated error covariance matrix in equation 7.14 allow us to make some general comments about the temporal dependence of the composite residual covariance in the “standard” multilevel model for change. The dependence is powerful, principally because the covariance contains the product of pairs of times (the third term in equation 7.13). This product dramatically affects the magnitude of the error covariance when time values are large. Special cases are also evident—as in equation 7.14—the magnitude of the error covariance depends on the magnitudes of the three level2 variance components. If all three level2 components are close to zero, the composite residual covariances will also be near zero and the error covariance matrix in equations 7.9 and 7.10 becomes diagonal (in addition to being homoscedastic, as described in section 7.2.1). Regular OLS assumptions then apply, even for longitudinal data. Similarly, if only the level2 residual slope variability, , and residual initial status/slope covariance, σ_{01}, are both vanishingly small, then the composite residual covariance takes on a constant value, . In this case, the error covariance matrix is compound symmetric, with the following structure:
7.2.3 Autocorrelation of the Composite Residuals
Finally, for descriptive purposes, we can also estimate the autocorrelations imposed among the composite residuals in the “standard” multilevel model for change. Applying the usual formula for computing a correlation coefficient from two variances and their covariance, we have:
7.3 Postulating an Alternative Error Covariance Structure
To postulate an appropriate multilevel model for change, any properties imposed on the model’s composite residual—either implicitly by the assumptions of the model itself, or explicitly—must match those required by data. In specifying the model’s stochastic portion, you should allow for heteroscedasticity and autocorrelation among the composite residuals. But what type of heteroscedasticity and autocorrelation makes the most sense? Is the composite residual as specified by default in the “standard” multilevel model for change, uniformly appropriate? Do its random (p.257) effects always have the properties required of realworld residuals in the study of change? If you can answer yes to these questions, the “standard” multilevel model for change makes sense. But to determine whether you can safely answer yes, it is wise to evaluate the credibility of some plausible alternative error covariance structures, as we do now.
Fortunately, it is easy to specify alternative covariance structures for the composite residual and determine analytically which specification—the “standard” or an alternative—fits best. You already possess the analytic tools and skills needed for this work. After hypothesizing alternative models—as we describe below—you can use familiar goodness of fit statistics (deviance, AIC, and BIC) to compare their performance. Each model will have identical fixed effects but a different error covariance structure. The main difficulty you will encounter is not doing the analysis itself but rather identifying the error structures to investigate from among the dizzying array of options.
Table 7.3 presents six particular error covariance structures that we find to be the most useful in longitudinal work: unstructured, compound symmetric, heterogeneous compound symmetric, autoregressive, heterogeneous autoregressive and Toeplitz. The table also presents the results of fitting the multilevel model for change in equation 7.6 to the oppositesnaming data, imposing each of the designated error structures. The table also presents selected output from these analyses: goodnessoffit statistics; parameter estimates for the variance components and approximate pvalues; and the fitted error covariance matrix of the composite residual, As in table 7.2, we fit these models with SAS PROC MIXED and restricted ML. Because each has identical fixed effects, we could have used either full or restricted methods to compare models. We chose restricted methods because the obtained goodnessoffit statistics then reflect only the fit of only model’s stochastic portion, which is our focus here.
You compare these models in the usual way. A smaller deviance statistic indicates better fit, but because an improvement generally requires additional parameters, you must either formally test the hypotheses (if the models are nested) or use AIC and BIC statistics. Both penalize the loglikelihood of the fitted model for the number of parameters estimated, with the BIC exacting a higher penalty for increased complexity. The smaller the AIC and BIC statistics, the better the model fits.
7.3.1 Unstructured Error Covariance Matrix
An unstructured error covariance matrix is exactly what you would anticipate from its name: it has a general structure, in which each element of (p.258)
Table 7.3: Selection of alternative error covariance matrices for use with the multilevel model for change in opposite naming, including goodnessoffit statistics, variance component estimates, and fitted error covariance matrix
Goodnessoffit 
Variance components 


Description 
Hypothesized error covariance structure, Σ_{r} 
2LL 
AIC 
BIC 
Parameter 
Estimate 
Fitted error covariance Matrix, 

~p < .10; * p < .05; ** p < .01; *** p < .001.
Note: SAS PROC MIXED, Restricted ML.
The great appeal of an unstructured error covariance structure is that it places no restrictions on the structure of Σ_{r}. For a given set of fixed effects, its de0viance statistic will always be the smallest of any error covariance structure. If you have just a few waves of data, this choice can be attractive. But if you have many waves, it can require an exorbitant number of parameters. For 20 waves, you would need 20 variance parameters and 190 covariance parameters—210 parameters in all—whereas the “standard” model requires only 3 variance components ( and ) and one covariance component, σ_{01}.
In most analyses, a more parsimonious structure is desirable. Yet because the unstructured error covariance model always has the lowest deviance statistic, we usually begin exploratory comparisons here. For the oppositesnaming data, we find a deviance statistic of 1255.8 for this model, about 4.5 points less than that for the “standard” model. But this modest improvement uses up 10 degrees of freedom (as opposed to the 4 in the “standard” model). It should come as no surprise then that the AIC and BIC statistics, which both penalize us for overuse of unknown parameters, are larger under this assumption than they are under the “standard” multilevel model (1275.8 vs. 1268.3 for AIC; 1291.3 vs. 1274.5 for BIC). So, of the two potential error structures, we prefer the “standard” to the unstructured. The excessive size of BIC, in particular (it is 16.8 points larger!), suggests that we are “wasting” considerable degrees of freedom in choosing an unstructured form for Σ_{r}.
7.3.2 Compound Symmetric Error Covariance Matrix
A compound symmetric error covariance matrix requires just two parameters, labeled σ^{2} and in table 7.3. Under compound symmetry, the diagonal elements of Σ_{r} are homoscedastic (with variance σ^{2} + ) on all occasions, and all pairs of residuals have a constant covariance, regardless of the times with which they are associated.
As we would expect, this model fits less well than the multilevel model (p.261) with an unstructured Σ_{r}. But it also fits less well than the “standard” multilevel model. All three of its goodnessoffit statistics are much larger: deviance is 26.7 points larger, AIC is 22.7 points larger, and BIC is 19.7 points larger. Interestingly, as specified in equation 7.15, a compound symmetric Σ_{r} is a special case of the “standard” model, when there is little or no residual variation (and hence no residual covariation) in the true slopes of the change trajectories across people. Since we know, from hypothesis tests in table 7.2, that the residual slope variability and covariability are not zero for these data, it comes as no surprise that compound symmetry is not an acceptable error covariance structure for these data. This form is most attractive, then, when you find little or no residual variance in slopes among the individual change trajectories.
7.3.3 Heterogeneous Compound Symmetric Error Covariance Matrix
The third error covariance matrix in table 7.3 is heterogeneous compound symmetric. In our example, this extension of the compound symmetric structure requires five parameters. Under heterogeneous compound symmetry, the diagonal elements of Σ_{r} are heteroscedastic (with variances , , and on each occasion for these data). In addition, all pairs of errors have their own covariance (you can see this most easily in the fitted error covariance matrix in table 7.3). Specifically, these covariances are the products of the corresponding error standard deviations and a constant error autocorrelation parameter, labeled ρ, whose magnitude is always less than or equal to unity.
Based on the deviance statistics alone, a model with a heterogeneously compound symmetric Σ_{r} fits the oppositesnaming data better than a compound symmetric model (1285.0 vs. 1287.0), but still not as well as the “standard” (1285 vs. 1260.3). So, too, the AIC and BIC statistics penalize the heterogeneous compound symmetry model for its additional parameters (AIC = 1295.0; BIC = 1302.7), over both the compound symmetric and the “standard.” We conclude that the heterogeneous compound symmetric model is probably less acceptable—for these data—than any other multilevel model fit so far.
7.3.4 Autoregressive Error Covariance Matrix
The fourth potential error covariance matrix in table 7.3 has an autoregressive (actually, firstorder autoregressive) structure. Many researchers are drawn to an autoregressive error structure because its “banddiagonal” shape seems appropriate for growth processes. When Σ_{r} is firstorder (p.262) autoregressive, the elements on the main diagonal of Σ_{r} are homoscedastic (with variance σ^{2}). In addition, pairs of errors have identical covariances in bands parallel to the leading diagonal (again, examine the fitted error covariance matrix in table 7.3). These covariances are the product of the residual variance, σ^{2}, and an error autocorrelation parameter, labeled ρ, whose magnitude is again always less than, or equal to, unity. Error variance σ^{2} is multiplied by ρ to provide the error covariances in the first band immediately below the leading diagonal, by ρ^{2} in the band beneath that, by ρ^{3} in the band beneath that, and so on. Thus, because the magnitude of ρ is always fractional, the error covariances in the bands of Σ_{r} decline, the further you go from the leading diagonal. Although an autoregressive Σ_{r} “saves” considerable degrees of freedom—it uses only two variance components—its elements are tightly constrained: the identical covariances in any band must be the same fraction of any entry in the previous band as are the entries in the following band of them.
Although the autoregressive model fits the oppositesnaming data reasonably well, its constraints on the variance components’ relative magnitudes prevent it from fitting as well as the “standard” multilevel model for change. Both the deviance statistic (1265.9) and AIC statistic (1269.9) are slightly larger than their peers in the “standard” multilevel model. On the other hand, the BIC statistic is slightly smaller in this model than it is in the “standard” (1273.0 for the former, 1274.5 for the latter), owing to the burden of additional parameters (4 vs. 2) in the “standard.” Interestingly, although it cannot compete in a world of deviance, the autoregressive model is superior to the unstructured model according to AIC and BIC (as you might expect, given the number of the unknown parameters required by each model, 2 vs. 10).
7.3.5 Heterogeneous Autoregressive Error Covariance Matrix
The heterogeneous autoregressive error structure is a relaxed version of the strict autoregressive structure just described. Its main diagonal elements are heteroscedastic (with variances and for the four waves here). In addition, the bands of constant covariances between pairs of errors that appeared parallel to the main diagonal in the regular autoregressive model are free to differ in magnitude along the bands (again, examine the fitted error covariance matrix in table 7.3). This is achieved by multiplying the same error autocorrelation parameter, ρ, that appeared above by the product of the relevant error standard deviations. Thus, the band diagonal structure—with the magnitudes of the covariances declining across the bands, away from the main diagonal—is somewhat (p.263) preserved, but loosened by the inclusion of additional variance components. A model with heterogeneous autoregressive Σ_{r} spends additional degrees of freedom, but benefits from additional flexibility over its simpler sibling.
As you might expect, the model with heterogeneous autoregressive error structure benefits in terms of the deviance statistic over the homogeneous autoregressive case, but can be penalized from the perspective of AIC and BIC. For these data, the model with heterogeneous autoregressive Σ_{r} fits less well than the “standard” multilevel model for change. Notice that, in the heterogeneous autoregressive model, the deviance statistic (1264.8), AIC (1274.8), and BIC (1282.6) are all larger than the equivalent statistics in the “standard.” As with its homogeneous sibling, although it cannot compete in terms of deviance, the heterogeneous autoregressive model is superior to the unstructured model according to both the AIC and BIC statistics.
7.3.6 Toeplitz Error Covariance Matrix
For the oppositesnaming data, the Toeplitz error covariance structure represents a far superior option. The Toeplitz structure has some of the characteristics of the autoregressive structure, in that it has bands of identical covariances arrayed parallel to the main diagonal. However, these elements are not forced to be an identical fraction of the elements in the prior band. Instead, their magnitudes within each band are determined by the data and are not constrained to stand in identical ratios to one another. For the oppositesnaming data, we need four variance components to specify a Toeplitz structure (σ^{2}, σ_{1}, σ_{2}, and σ_{3} in table 7.3) and so Σ_{r} is more flexible than the homogeneous autoregressive structure but more parsimonious that its heterogeneous sibling.
For these data, a Toeplitz error covariance structure fits better than the “standard” multilevel model for change and better than all other error covariance structures we have tested, regardless of which goodnessoffit statistic you consult. The deviance statistic is 1258.1 (as opposed to 1260.3), AIC is 1266.1 (as opposed to 1268.3), and BIC is 1272.3 (as opposed to 1274.5). As we discuss below, however, these differences in goodnessoffit are relatively small.
7.3.7 Does Choosing the “Correct” Error Covariance Structure Really Matter?
The error covariance structures presented in table 7.3 are but the beginning. Even though the Toeplitz structure appears marginally more (p.264) successful than the implicit error covariance structure of the “standard” multilevel model for change, it is entirely possible that there are other error structures that would be superior for these data. Such is the nature of all data analysis. In fitting these alternative models, we have refined our estimates of the variance components and have come to understand better the model’s stochastic component. We would argue that, for these data, the “standard” multilevel model for change performs well—its deviance, AIC, and BIC statistics are only marginally worse than those of the Toeplitz model. The difference in BIC statistics—2.2 points—is so small that adopting Raftery’s (1995) guidelines, we would conclude that there is only weak evidence that adoption of a Toeplitz error structure improves on the “standard” multilevel model.
If you focus exclusively on the deviance statistic, however, an unstructured error covariance matrix always leads to the best fit. This model will always fit better than the “standard” model, and than any other model that is constrained in some way. The question is: How much do we sacrifice if we choose the unstructured model over these others? For these data, it cost 10 degrees of freedom to achieve this best fit for the model’s stochastic portion—five more degrees of freedom than any other error covariance structure considered here. Although some might argue that losing an additional handful of degrees of freedom is a small price to pay for optimal modeling of the error structure—a consequence of the fact that we have only four waves of data—in other settings, with larger panels of longitudinal data, few would reach this conclusion.
Perhaps most important, consider how choice of an error covariance structure affects our ability to address our research questions, especially given that it is the fixed effects—and not the variance components—that usually embody these questions. Some might say that refining the error covariance structure for the multilevel model for change is akin to rearranging the deck chairs on the Titanic—it rarely fundamentally changes our parameter estimates. Indeed, regardless of the error structure chosen, estimates of the fixed effects are unbiased and may not be affected much by choices made in the stochastic part of the model (providing that neither the data, nor the error structure, are idiosyncratic).
But refining our hypotheses about the error covariance structure does affect the precision of estimates of the fixed effects and will therefore impact hypothesis testing and confidence interval construction. You can see this happening in table 7.4, which displays estimates of the fixed effects and asymptotic standard errors for three multilevel models for change in oppositesnaming: the “standard” model (from table 7.2) and models with a Toeplitz and unstructured error covariance matrix (from table 7.3). Notice that that the magnitudes of the estimated fixed effects (p.265)
Table 7.4: Change in opposite’s naming score over a fourweek period, as a function of baseline IQ
Model with … 


Parameter 
Standard error covariance structure 
Toeplitz error covariance structure 
Unstructured error covariance structure 

Fixed Effects 

Initial status, π_{0i} 
Intercept 
γ_{00} 
164.37*** 
165.10*** 
165.83*** 
(6.206) 
(5.923) 
(5.952) 

(COG – ) 
γ_{01} 
−0.11 
−0.00 
−0.07 

(0.504) 
(0.481) 
(0.483) 

Rate of change, π_{1i} 
Intercept 
γ_{10} 
26.96*** 
26.895*** 
26.58*** 
(1.994) 
(1.943) 
(1.926) 

(COG – ) 
γ_{11} 
0.43** 
0.44** 
0.46** 

(0.162) 
(0.158) 
(0.156) 

Goodnessoffit 

Deviance 
1260.3 
1258.1 
1255.8 

AIC 
1268.3 
1266.1 
1275.8 

BIC 
1274.5 
1272.3 
1291.3 
~p < .10; * p < .05; ** p < .01; *** p < .001.
Parameter estimates (standard errors), approximate pvalues, and goodnessoffit statistics after fitting a multilevel model for change with standard, Toeplitz and unstructured error covariance structures (n= 35).
Note: SAS PROC MIXED, Restricted ML.