Jump to ContentJump to Main Navigation
Applied Longitudinal Data AnalysisModeling Change and Event Occurrence$

Judith D. Singer and John B. Willett

Print publication date: 2003

Print ISBN-13: 9780195152968

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195152968.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 16 January 2019

Examining the Multilevel Model’s Error Covariance Structure

Examining the Multilevel Model’s Error Covariance Structure

Chapter:
(p.243) 7 Examining the Multilevel Model’s Error Covariance Structure
Source:
Applied Longitudinal Data Analysis
Author(s):

Judith D. Singer

John B. Willett

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780195152968.003.0007

Abstract and Keywords

The previous chapters emphasized the fixed effects in the multilevel model for change. This chapter, in contrast, focuses on the model's random effects as embodied in its error covariance structure. Section 7.1 begins by reviewing the “standard” multilevel model for change, expressed in composite form. Section 7.2 examines this model's random effects, demonstrating that the composite error term is indeed both heteroscedastic and autocorrelated, as preferred for longitudinal data. Section 7.3 compares several alternative error covariance structures and provide strategies for choosing among them.

Keywords:   multilevel model, individual change, composite model, error covariance matrix

Change begets change. Nothing propagates so fast.… The mine which Time has slowly dug beneath familiar objects is spring in an instant, and what was rock before, becomes but sand and dust.

—Charles Dickens

In previous chapters, we often emphasized the fixed effects in the multilevel model for change. Doing so made great sense because the fixed effects typically provide the most direct answers to our research questions. In this chapter, in contrast, we focus on the model’s random effects as embodied in its error covariance structure. Doing so allows us to both describe the particular error covariance structure that the “standard” multilevel model for change invokes and it also allows us to broaden its representation to other—sometimes more tenable assumptions—about its behavior.

We begin, in section 7.1, by reviewing the “standard” multilevel model for change, expressed in composite form. In section 7.2, we closely examine this model’s random effects, demonstrating that the composite error term is indeed both heteroscedastic and autocorrelated, as we would prefer for longitudinal data. But we also find that this error covariance structure may not be as general as we might like and, in some settings, alternatives may have greater appeal. This brings us to section 7.3, in which we compare several alternative error covariance structures and provide strategies for choosing among them.

7.1 The “Standard” Specification of the Multilevel Model for Change

Throughout this chapter, we use a small, time-structured data set first presented in Willett (1988). On each of four days, spaced exactly one week apart, 35 people completed an inventory that assesses their performance (p.244)

Table 7.1: Ten cases from a person-level data set containing scores on an opposite’s naming task across four occasions of measurement, obtained weekly, and a baseline measurement of COG, a measure of cognitive skill, obtained in the first week

ID

OPP1

OPP2

OPP3

OPP4

COG

01

205

217

268

302

137

02

219

243

279

302

123

03

142

212

250

289

129

04

206

230

248

273

125

05

190

220

229

220

81

06

165

205

207

263

110

07

170

182

214

268

99

08

96

131

159

213

113

09

138

156

197

200

104

10

216

252

274

298

96

on a timed cognitive task called “opposites naming.” At wave 1, each person also completed a standardized instrument assessing general cognitive skill. Table 7.1 presents the first ten cases in the person-level data set (we use this format to conserve space), which includes values of: (a) OPP1, OPP2, OPP3, and OPP4—the individual’s opposites-naming score on each occasion; and (b) COG, the baseline cognitive skill score. The full person-period data set has 140 records, 4 per person. In what follows, we assume that any skill improvement over time results from practice, not cognitive development. That said, research interest centers on determining whether opposites-naming skill increases more rapidly with practice among individuals with stronger cognitive skills.

We specify the “standard” multilevel model for change in the usual manner. For individual i on occasion j, we assume that opposites-naming score, Y ij, is a linear function of TIME:

(7.1a)                       Examining the Multilevel Model’s Error Covariance Structure
where subscript i has been omitted from predictor TIME because the data are time-structured and
(7.1b)                       Examining the Multilevel Model’s Error Covariance Structure
To allow the individual growth parameters to take on their usual interpretations for person i—π0i as the true initial level and π1i as the true weekly rate of change—we scale TIME so that the first measurement occasion is labeled 0 and the others are labeled 1, 2, and 3. In the “standard” model we also assume that the random effects εij are drawn from a (p.245) univariate normal distribution with zero mean and unknown variance                       Examining the Multilevel Model’s Error Covariance Structure To further clarify the meaning of this assumption—our focus in this chapter—we add the notation “iid,” which declares that the errors are mutually independent, across occasions and persons, and identically distributed. We discuss the implications of this assumption in detail further below.

To allow individual change trajectories to differ systematically across people, we posit a level-2 submodel in which the cognitive skills score (COG) is associated with both growth parameters:

(7.2a)                       Examining the Multilevel Model’s Error Covariance Structure
where
(7.2b)                       Examining the Multilevel Model’s Error Covariance Structure
To facilitate interpretation, we center the continuous predictor COG on its sample mean. The level-2 fixed effects capture the effect of cognitive skill on the average trajectories of change; the level-2 random effects, ζ0i and ζ1i, represent those parts of the level-2 outcomes that remain “unexplained” by cognitive skill. In the “standard” multilevel model, we assume that these random effects have zero mean and that everyone draws them independently from a normal distribution. To allow for the possibility that even after accounting for cognitive skill, the unpredicted portions of a person’s true intercept and true slope may be intertwined, we assume that each person draws both level-2 residuals simultaneously from a bivariate normal distribution with variances                       Examining the Multilevel Model’s Error Covariance Structure and                       Examining the Multilevel Model’s Error Covariance Structure and covariance σ01.

Table 7.2 presents the results of fitting this “standard” multilevel model for change to the opposites-naming data. Because we focus in this chapter on the model’s stochastic portion, we use restricted, not full, maximum likelihood (see section 4.3 for a comparison of methods). For an individual of average cognitive skill, initial level of opposites-naming skill is estimated to be 164.4 (p < .001); this average person’s weekly rate of linear change is estimated to be 27.0 (p < .001). Individuals whose cognitive skills differ by one point have an initial opposites-naming score that is 0.11 lower (although this decrement is not statistically significant, p = .82); their average weekly rate of linear change is 0.43 higher (p < .01). Even after including cognitive skill as a predictor of both initial status and change, we detect statistically significant level-2 residual variation in (p.246)

Table 7.2: Change in opposite’s naming over a four-week period as a function of baseline IQ

Parameter

Estimate

Fixed Effects

Initial status, π0i

Intercept

γ00

164.37***

(COG                      Examining the Multilevel Model’s Error Covariance Structure)

γ01

−0.11

Rate of change, π1i

Intercept

γ10

26.96***

(COG                      Examining the Multilevel Model’s Error Covariance Structure)

γ11

0.43**

Variance Components

Level–1:

Within-person variance

                      Examining the Multilevel Model’s Error Covariance Structure

159.48***

Level–2:

Variance in ζ0i

                      Examining the Multilevel Model’s Error Covariance Structure

1236.41***

Variance in ζ1i

                      Examining the Multilevel Model’s Error Covariance Structure

107.25***

Covariance of ζ0i and ζ1i

σ01

−178.23*

Goodness-of-fit

Deviance

1260.3

AIC

1268.3

BIC

1274.5

~ p < .10; * p < .05; ** p < .01; *** p < .001.

Parameter estimates, approximate p-values, and goodness-of-fit statistics from fitting a standard multilevel model for change (n = 35).

Note: SAS PROC MIXED, Restricted ML.

both initial status (1236.41, p < .001) and rate of change (107.25, p < .001). We also detect a statistically significant negative covariance (−178.2, p < .05) between the level-2 residuals, ζ0i and ζ1i, which suggests that, after controlling for cognitive skill, those with weaker initial opposites-naming skills improve at a faster rate, on average, than those with stronger initial skills. To interpret this estimate more easily, we compute the partial correlation between change and initial status to find:
                      Examining the Multilevel Model’s Error Covariance Structure
Finally, the estimated level-1 residual variance,                       Examining the Multilevel Model’s Error Covariance Structure, is 159.5.

7.2 Using the Composite Model to Understand Assumptions about the Error Covariance Matrix

To understand the error covariance structure in the “standard” multilevel model for change, we move to the composite representation obtained by collapsing the level-2 submodels in equation 7.2a into the level-1 submodel in equation 7.1a: (p.247)

(7.3)                       Examining the Multilevel Model’s Error Covariance Structure
Multiplying out and rearranging terms yields:
(7.4)                       Examining the Multilevel Model’s Error Covariance Structure                       Examining the Multilevel Model’s Error Covariance Structure
where random effects εij, ζ0i, and ζ1i retain the distributional assumptions of equations 7.1b and 7.2b.

As in section 4.2, brackets distinguish the model’s structural and stochastic portions. Its structural portion contains our hypotheses about the way that opposites-naming skill changes with time and depends on baseline cognitive skill. Its stochastic portion contains the composite residual, which we now label r, for convenience. The value of r for individual i on occasion j is:

(7.5)                       Examining the Multilevel Model’s Error Covariance Structure
which is a weighted linear combination of the original three random effects from the level-1/level-2 specification (εij, ζ0i and ζ1i, with constants 1, 1, and TIME j acting as the weights). Our major focus in this chapter is on the statistical properties of r ij.

But before examining these properties, let us simplify the composite model in equation 7.4 by substituting r ij as defined in equation 7.5 into equation 7.4:

(7.6)                       Examining the Multilevel Model’s Error Covariance Structure
The composite model now looks like a regular multiple regression model, with the “usual” error term replaced by “r.” This reinforces the notion, discussed in chapter 4, that you can conceptualize the multilevel analysis of change as a multiple regression analysis in the person-period data set, in which you regress the outcome on the main effects of TIME, a level-2 predictor (COG), and their statistical interaction.

Because of the special nature of “r,” our standard practice is to fit the model in equation 7.6 by GLS regression analysis, not OLS, making specific assumptions about the distribution of the residuals. But before doing so, let’s suppose for a hypothetical moment that we were willing to invoke the simpler OLS assumptions that all the r ij are independent and normally distributed, with zero means and homoscedastic variance (                      Examining the Multilevel Model’s Error Covariance Structure, say). (p.248) We could codify these simple distributional assumptions for all the residuals simultaneously in one grand statement:

(7.7)                       Examining the Multilevel Model’s Error Covariance Structure
where, because we have four waves of data per person, we have four residuals (one per occasion) for each of the n sample members. With a different number of waves of data, we would simply rescale the component vectors and matrices.

While equation 7.7 may appear needlessly complex for describing the behavior of the residuals in an OLS analysis, it provides a convenient and generalizable form for codifying assumptions on residuals that we will find useful, later. It says that the complete set of residuals in the analysis has a multivariate normal distribution. The statement has several important features:

  • It contains a vector of random variables whose distribution is being specified. To the left of the “is distributed as” sign (“~”), a column contains all the random variables whose distribution is being specified. This vector contains all the model’s residuals, which, for our data, run from the four residuals for person 1 (r 11, r 12, r 13, and r 14), to those for person 2 (r 21, r 22, r 23, r 24) and so on, through the four residuals for person n.

  • It states the distribution type. Immediately after the ~, we stipulate that every element in the residual vector is normally distributed (“N”). Because the vector has many (“multi”) entries, the residuals have a multivariate normal distribution.

  • It contains a vector of means. Also to the right of the ~, inside the (p.249) parentheses and before the comma, is a vector of hypothesized means, one for each residual. All these elements are 0, reflecting our belief that the population mean of each residual is 0.

  • It contains an error (or, residual) covariance matrix. The last entry in equation 7.7 is the error covariance matrix, which contains our hypotheses about the residual variances and covariances. Under classical OLS assumptions, this matrix is diagonal—all elements are zero, except those along the main diagonal. The off-diagonal zero values represent the residual independence assumption, which stipulates that that the residuals do not covary. Along the diagonal, all residuals have an identical population variance,                       Examining the Multilevel Model’s Error Covariance Structure. This is the residual homoscedasticity assumption.

The distributional statement in equation 7.7 is inappropriate for longitudinal data. Although we expect the composite residuals to be independent across people and normally distributed with zero means, within people we expect them to be heteroscedastic and correlated over time. We can write an error covariance matrix that reflects these new “longitudinal” assumptions as:
(7.8)                       Examining the Multilevel Model’s Error Covariance Structure
where, again, the dimensions of the vectors and matrices reflect the design of the opposites-naming study.

The new distributional specification in equation 7.8 allows the residuals in the composite model to have a multivariate normal distribution with zero means and a block diagonal, not diagonal, error covariance structure. The term “block diagonal” means that all the matrix’s elements are zero, except those within the “blocks” arrayed along the diagonal, one per person. The zero elements outside the blocks indicate that each (p.250) person’s residuals are independent of all others’—in other words, the residuals for person i have zero covariance with everyone else’s residuals. But the non-zero covariance parameters within each block allow the residuals to covary within person. In addition, the multiple distinct parameters along each block’s diagonal allow the variances of the within-person residuals to differ across occasions. These distinctions between the diagonal and the block diagonal error covariance matrices demark the fundamental difference between a cross-sectional and longitudinal design.1

Notice that the blocks of the error covariance matrix in equation 7.8 are identical across people. This homogeneity assumption says that, in an analysis of change, although the composite residuals may be heteroscedastic and dependent within people, the entire error structure is repeated identically across people—that is, everyone’s residuals are identically heteroscedastic and autocorrelated. This assumption is not absolutely necessary, as it can be tested and relaxed in limited ways (provided you have sufficient data). Yet we typically invoke it for practical reasons, as it improves dramatically the parsimony with which we can specify the model’s stochastic portion. Limiting the number of unique variance/covariance components in a hypothesized model improves the rapidity with which iterative model fitting converges. If we allowed each person in this study to possess a unique set of variance components, for example, we would be need to estimate 10n variance components—6n more than the number of observations on the outcome in the person-period data set!

Adopting the homogeneity assumption allows us to express the distributional assumptions in equation 7.8 in more parsimonious terms by writing:

(7.9)                       Examining the Multilevel Model’s Error Covariance Structure
Equation 7.9 says that the complete vector of residuals r has a multivariate normal distribution with mean vector 0 and a block-diagonal error covariance matrix constituted from submatrices, Σr and 0, where:
(7.10)                       Examining the Multilevel Model’s Error Covariance Structure
(p.251) Again, the dimensions of Σr reflect the design of the opposite-naming study.

When you investigate random effects in the analysis of change, you anticipate that the composite residuals will have a multivariate distributional form like equation 7.8 or 7.9. As part of your analyses, you estimate the elements of this error covariance matrix, which means that—under the homogeneity assumption—you estimate the elements of the error covariance submatrix Σr, in equation 7.10.

This specification of the Σr error covariance submatrix—and hence the shape of the full error covariance matrix—is very general. It contains a set of error variance and covariance parameters (four of the former and six of the latter, for the opposites-naming data), each of which can take on an appropriate value. But when you specify a particular multilevel model for change, you invoke specific assumptions about these values. Most important for our purposes here is that the “standard” multilevel model for change invokes a specific mathematical structure for the r ij As we show below, this model constrains the error covariance structure much more than that specified in equations 7.9 and 7.10.

What does the error covariance submatrix Σr of the “standard” multilevel model for change look like? When we presented this model earlier in the book, we focused on its ability to represent hypotheses about fixed effects. Does it also provide a reasonable covariance structure for the composite residuals? Fortunately, most of its behavior is exactly what you would hope and expect. First, because a weighted linear combination of normally distributed variables is also normally distributed, for example, each composite residual in equation 7.5 is also normally distributed, as specified in equation 7.9. Second, because the mean of a weighted linear combination of random variables is equal to an identically weighted linear combination of the means of those variables, the mean of the composite residual in equation 7.5 must also be zero, as specified in equation 7.9. Third, the error covariance matrix of the composite residuals is indeed block diagonal, as specified in equation 7.9. But fourth, in the standard multilevel model for change, the elements of the Σr error covariance blocks in equations 7.9 and 7.10 possess a powerful dependence on time. As this is both the most interesting—and potentially troublesome—aspect of the standard model, we delve into this feature in some detail below.

7.2.1 Variance of the Composite Residual

We begin by examining what the “standard” multilevel model for change hypothesizes about the composite residual’s variance. Straightforward (p.252) algebraic manipulation of r ij in equation 7.5 provides an equation for the diagonal elements of the error covariance submatrix Σr, in equation 7.10, for the standard multilevel model for change, in terms of TIME and the model’s variance components. Under the standard multilevel model for change, the population variance of the composite residual at TIME t j is:

(7.11)                       Examining the Multilevel Model’s Error Covariance Structure
We can use this equation to obtain estimates of composite residual variance on each occasion for the opposites-naming data. Substituting the four associated values of TIME (0, 1, 2, and 3) and estimates of the variance components from table 7.2 into equation 7.11, we have:
                      Examining the Multilevel Model’s Error Covariance Structure
Rewriting the estimated error covariance sub-matrix                       Examining the Multilevel Model’s Error Covariance Structure in equation 7.10 with its diagonal entries replaced by their estimates, we have:
(7.12)                       Examining the Multilevel Model’s Error Covariance Structure
So, under the standard multilevel model for change, composite residual variance for the opposites-naming data differs across occasions, revealing anticipated heteroscedasticity. For the opposites-naming data, composite residual variance is greatest at the beginning and end of data collection and smaller in between. And, while not outrageously heteroscedastic, this situation is clearly beyond the bland homoscedasticity that we routinely assume for residuals in cross-sectional data.

Based on the algebraic representation in equation 7.11, what can we say about the general temporal dependence of composite residual variance (p.253) in the “standard” multilevel model for change? We can gain insight into this question by completing the square in equation 7.11:

(7.13)                       Examining the Multilevel Model’s Error Covariance Structure
Because t j appears in a term that is squared, equation 7.13 indicates that composite residual variance in the “standard” multilevel model for change has a quadratic dependence on time. It will be at its minimum at time                       Examining the Multilevel Model’s Error Covariance Structure and will increase parabolically and symmetrically over time on either side of this minimum. For the opposites-naming data, we have:
                      Examining the Multilevel Model’s Error Covariance Structure
which tells us that, under the standard multilevel model for change, the composite residual variance has an estimated minimum of almost 1100, occurring about two-thirds of the way between the second and third measurement occasions in the case of the opposites-naming data.

So, ask yourself! Does it make sense to assume, in real data—as the “standard” multilevel model for change does implicitly—that composite residual variance increases parabolically over time from a single minimum? For the “standard” model to make sense, and be applied in the real world, your answer must be yes. But are other patterns of heteroscedasticity possible (or likely)? In longitudinal data, might residual heteroscedasticity possess both a minimum and a maximum? Might there be even multiple minima and maxima? Might composite residual variance decline from a maximum, on either side of some fiducial time, rather than increasing from a minimum? Although compelling, none of these options is possible under the “standard” multilevel model for change.

Before concluding that the model we have spent so long developing is perhaps untenable because of the restriction it places on the error covariance matrix, let us quickly offer some observations that we hope will assuage your concerns. Although the “standard” multilevel model for change assumes that composite residual variance increases parabolically from a minimum with time, the temporal dependence of residual heteroscedasticity need not be markedly curved. The magnitude of the curvature depends intimately on the magnitude of the model’s variance/covariance components. If all three level-2 components—                      Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure and σ01—are near zero, for example, the error covariance matrix is (p.254) actually close to homoscedastic, with common variance                       Examining the Multilevel Model’s Error Covariance Structure. Or, if level-2 residual slope variability,                       Examining the Multilevel Model’s Error Covariance Structure, and residual initial status/slope covariance, σ01, are near zero, composite residual variance will still be homoscedastic, but with common variance                       Examining the Multilevel Model’s Error Covariance Structure. In both cases, the “curvature” of the parabolic temporal dependence approaches zero and heteroscedasticity flattens.

In our own experience, these situations are common. The first occurs when the level-2 predictors “explain” most, or all, of the between-person variation in initial status and rate of change. The second occurs when the slopes of the change trajectories do not differ much across people—a common occurrence when study duration is short. Finally, as the sizes of the residual slope variance                       Examining the Multilevel Model’s Error Covariance Structure and initial status/slope covariance, σ01, differ relative to one another, the time at which minimum residual variance occurs can easily move beyond the temporal limits of the period of observation. When this happens, which is often, no minimum is evident within the period of observation, the composite residual variance appears to either increase or decrease monotonically over the time period under study. We conclude from these special cases and the general temporal dependence of the residual variance that, while the composite residual variance is indeed functionally constrained in the “standard” multilevel model for change, it is also capable of adapting itself relatively smoothly to many common empirical situations. Nonetheless, in any analysis of change, it makes great sense to check the hypothesized structure of the error covariance matrix—whether obtained implicitly, by adopting the standard model, or not—against data just as it is important to check the tenability of the hypothesized structure of the fixed effects. We illustrate the checking process in section 7.3.

7.2.2 Covariance of the Composite Residuals

We now examine the temporal dependence in the covariance of the composite residuals in the “standard” multilevel model for change. These covariances appear in the off-diagonal elements of the error covariance submatrix Σr, in equation 7.10. Again, mathematical manipulation of the composite residual in equation 7.5 provides the covariance between composite residuals at TIMES t j and and t j:

(7.13)                       Examining the Multilevel Model’s Error Covariance Structure
where all terms have their usual meanings. For the opposites-naming data, substitution of appropriate values for time and estimates of the variance components from table 7.2 let us fill out the rest of                       Examining the Multilevel Model’s Error Covariance Structure in equation 7.12 with numerical values: (p.255)
(7.14)                       Examining the Multilevel Model’s Error Covariance Structure
Notice the somewhat imperfect “band diagonal” structure, in which the overall magnitude of the residual covariances tends to decline in diagonal “bands” the further you get from the main diagonal. The magnitude of the residual covariance is around 900 to 1050 in the band immediately below the main diagonal, between 840 to 880 in the band beneath that, and about 700 in the band beneath that. We often anticipate a band diagonal structure in longitudinal studies because we expect the strength of the correlation between pairs of residuals to decline as they become more temporally remote, within person.

The expression for the covariance between composite residuals in equation 7.13 and the estimated error covariance matrix in equation 7.14 allow us to make some general comments about the temporal dependence of the composite residual covariance in the “standard” multilevel model for change. The dependence is powerful, principally because the covariance contains the product of pairs of times (the third term in equation 7.13). This product dramatically affects the magnitude of the error covariance when time values are large. Special cases are also evident—as in equation 7.14—the magnitude of the error covariance depends on the magnitudes of the three level-2 variance components. If all three level-2 components are close to zero, the composite residual covariances will also be near zero and the error covariance matrix in equations 7.9 and 7.10 becomes diagonal (in addition to being homoscedastic, as described in section 7.2.1). Regular OLS assumptions then apply, even for longitudinal data. Similarly, if only the level-2 residual slope variability,                       Examining the Multilevel Model’s Error Covariance Structure, and residual initial status/slope covariance, σ01, are both vanishingly small, then the composite residual covariance takes on a constant value,                       Examining the Multilevel Model’s Error Covariance Structure. In this case, the error covariance matrix is compound symmetric, with the following structure:

(7.15)                       Examining the Multilevel Model’s Error Covariance Structure
Compound symmetric error covariance structures are particularly common in longitudinal data, especially if the slopes of the change (p.256) trajectories do not differ much across people. Regardless of these special cases, however, the most sensible question to ask of your data is whether the error covariance structure that the “standard” multilevel model for change demands is realistic when applied to data in practice? The answer to this question will determine whether the standard model can be applied ubiquitously, a question we soon address in section 7.3.

7.2.3 Autocorrelation of the Composite Residuals

Finally, for descriptive purposes, we can also estimate the autocorrelations imposed among the composite residuals in the “standard” multilevel model for change. Applying the usual formula for computing a correlation coefficient from two variances and their covariance, we have:

                      Examining the Multilevel Model’s Error Covariance Structure
which yields a composite residual autocorrelation matrix of:
                      Examining the Multilevel Model’s Error Covariance Structure
The approximate band-diagonal substructure of the error covariance matrix in the “standard” model is even more apparent in the error correlation matrix. For observations separated by one week, the residual autocorrelation is about 0.8; for observations separated by two weeks, the residual autocorrelation is about 0.70; for observations separated by three weeks, the residual autocorrelation is about 0.5. These magnitudes, regardless of temporal placement, are considerably larger than the zero autocorrelation anticipated among residuals in an OLS analysis.

7.3 Postulating an Alternative Error Covariance Structure

To postulate an appropriate multilevel model for change, any properties imposed on the model’s composite residual—either implicitly by the assumptions of the model itself, or explicitly—must match those required by data. In specifying the model’s stochastic portion, you should allow for heteroscedasticity and autocorrelation among the composite residuals. But what type of heteroscedasticity and autocorrelation makes the most sense? Is the composite residual as specified by default in the “standard” multilevel model for change, uniformly appropriate? Do its random (p.257) effects always have the properties required of real-world residuals in the study of change? If you can answer yes to these questions, the “standard” multilevel model for change makes sense. But to determine whether you can safely answer yes, it is wise to evaluate the credibility of some plausible alternative error covariance structures, as we do now.

Fortunately, it is easy to specify alternative covariance structures for the composite residual and determine analytically which specification—the “standard” or an alternative—fits best. You already possess the analytic tools and skills needed for this work. After hypothesizing alternative models—as we describe below—you can use familiar goodness of fit statistics (deviance, AIC, and BIC) to compare their performance. Each model will have identical fixed effects but a different error covariance structure. The main difficulty you will encounter is not doing the analysis itself but rather identifying the error structures to investigate from among the dizzying array of options.

Table 7.3 presents six particular error covariance structures that we find to be the most useful in longitudinal work: unstructured, compound symmetric, heterogeneous compound symmetric, autoregressive, heterogeneous autoregressive and Toeplitz. The table also presents the results of fitting the multilevel model for change in equation 7.6 to the opposites-naming data, imposing each of the designated error structures. The table also presents selected output from these analyses: goodness-of-fit statistics; parameter estimates for the variance components and approximate p-values; and the fitted error covariance matrix of the composite residual,                       Examining the Multilevel Model’s Error Covariance Structure As in table 7.2, we fit these models with SAS PROC MIXED and restricted ML. Because each has identical fixed effects, we could have used either full or restricted methods to compare models. We chose restricted methods because the obtained goodness-of-fit statistics then reflect only the fit of only model’s stochastic portion, which is our focus here.

You compare these models in the usual way. A smaller deviance statistic indicates better fit, but because an improvement generally requires additional parameters, you must either formally test the hypotheses (if the models are nested) or use AIC and BIC statistics. Both penalize the log-likelihood of the fitted model for the number of parameters estimated, with the BIC exacting a higher penalty for increased complexity. The smaller the AIC and BIC statistics, the better the model fits.

7.3.1 Unstructured Error Covariance Matrix

An unstructured error covariance matrix is exactly what you would anticipate from its name: it has a general structure, in which each element of (p.258)

Table 7.3: Selection of alternative error covariance matrices for use with the multilevel model for change in opposite naming, including goodness-of-fit statistics, variance component estimates, and fitted error covariance matrix

Goodness-of-fit

Variance components

Description

Hypothesized error covariance structure, Σr

-2LL

AIC

BIC

Parameter

Estimate

Fitted error covariance Matrix,                       Examining the Multilevel Model’s Error Covariance Structure

                      Examining the Multilevel Model’s Error Covariance Structure                       Examining the Multilevel Model’s Error Covariance Structure

~p < .10; * p < .05; ** p < .01; *** p < .001.

Note: SAS PROC MIXED, Restricted ML.

(p.259) (p.260) Σr takes on the value that the data demand. For the opposites-naming data, an unstructured error covariance matrix has 10 unknown parameters: 4 variances and 6 covariances. In table 7.3, we represent these parameters as                       Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure, σ21, σ31, σ32, σ41, σ42, and σ43. (Notice that in expressing the various error covariance matrices in table 7.3, we constantly reuse the same symbols—σ2,                       Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure, σ21, ρ, and so on. Use of the same symbol does not imply that we are estimating the same parameter. For example, we use the symbol                       Examining the Multilevel Model’s Error Covariance Structure for two entirely different purposes in the unstructured and compound symmetric error structures and each of these differs from its use in the level-2 submodel in equation 7.2b)

The great appeal of an unstructured error covariance structure is that it places no restrictions on the structure of Σr. For a given set of fixed effects, its de0viance statistic will always be the smallest of any error covariance structure. If you have just a few waves of data, this choice can be attractive. But if you have many waves, it can require an exorbitant number of parameters. For 20 waves, you would need 20 variance parameters and 190 covariance parameters—210 parameters in all—whereas the “standard” model requires only 3 variance components (                      Examining the Multilevel Model’s Error Covariance Structure and                       Examining the Multilevel Model’s Error Covariance Structure) and one covariance component, σ01.

In most analyses, a more parsimonious structure is desirable. Yet because the unstructured error covariance model always has the lowest deviance statistic, we usually begin exploratory comparisons here. For the opposites-naming data, we find a deviance statistic of 1255.8 for this model, about 4.5 points less than that for the “standard” model. But this modest improvement uses up 10 degrees of freedom (as opposed to the 4 in the “standard” model). It should come as no surprise then that the AIC and BIC statistics, which both penalize us for overuse of unknown parameters, are larger under this assumption than they are under the “standard” multilevel model (1275.8 vs. 1268.3 for AIC; 1291.3 vs. 1274.5 for BIC). So, of the two potential error structures, we prefer the “standard” to the unstructured. The excessive size of BIC, in particular (it is 16.8 points larger!), suggests that we are “wasting” considerable degrees of freedom in choosing an unstructured form for Σr.

7.3.2 Compound Symmetric Error Covariance Matrix

A compound symmetric error covariance matrix requires just two parameters, labeled σ2 and                       Examining the Multilevel Model’s Error Covariance Structure in table 7.3. Under compound symmetry, the diagonal elements of Σr are homoscedastic (with variance σ2 +                       Examining the Multilevel Model’s Error Covariance Structure) on all occasions, and all pairs of residuals have a constant covariance, regardless of the times with which they are associated.

As we would expect, this model fits less well than the multilevel model (p.261) with an unstructured Σr. But it also fits less well than the “standard” multilevel model. All three of its goodness-of-fit statistics are much larger: deviance is 26.7 points larger, AIC is 22.7 points larger, and BIC is 19.7 points larger. Interestingly, as specified in equation 7.15, a compound symmetric Σr is a special case of the “standard” model, when there is little or no residual variation (and hence no residual covariation) in the true slopes of the change trajectories across people. Since we know, from hypothesis tests in table 7.2, that the residual slope variability and covariability are not zero for these data, it comes as no surprise that compound symmetry is not an acceptable error covariance structure for these data. This form is most attractive, then, when you find little or no residual variance in slopes among the individual change trajectories.

7.3.3 Heterogeneous Compound Symmetric Error Covariance Matrix

The third error covariance matrix in table 7.3 is heterogeneous compound symmetric. In our example, this extension of the compound symmetric structure requires five parameters. Under heterogeneous compound symmetry, the diagonal elements of Σr are heteroscedastic (with variances                       Examining the Multilevel Model’s Error Covariance Structure,                       Examining the Multilevel Model’s Error Covariance Structure, and                       Examining the Multilevel Model’s Error Covariance Structure on each occasion for these data). In addition, all pairs of errors have their own covariance (you can see this most easily in the fitted error covariance matrix in table 7.3). Specifically, these covariances are the products of the corresponding error standard deviations and a constant error autocorrelation parameter, labeled ρ, whose magnitude is always less than or equal to unity.

Based on the deviance statistics alone, a model with a heterogeneously compound symmetric Σr fits the opposites-naming data better than a compound symmetric model (1285.0 vs. 1287.0), but still not as well as the “standard” (1285 vs. 1260.3). So, too, the AIC and BIC statistics penalize the heterogeneous compound symmetry model for its additional parameters (AIC = 1295.0; BIC = 1302.7), over both the compound symmetric and the “standard.” We conclude that the heterogeneous compound symmetric model is probably less acceptable—for these data—than any other multilevel model fit so far.

7.3.4 Autoregressive Error Covariance Matrix

The fourth potential error covariance matrix in table 7.3 has an autoregressive (actually, first-order autoregressive) structure. Many researchers are drawn to an autoregressive error structure because its “band-diagonal” shape seems appropriate for growth processes. When Σr is first-order (p.262) autoregressive, the elements on the main diagonal of Σr are homoscedastic (with variance σ2). In addition, pairs of errors have identical covariances in bands parallel to the leading diagonal (again, examine the fitted error covariance matrix in table 7.3). These covariances are the product of the residual variance, σ2, and an error autocorrelation parameter, labeled ρ, whose magnitude is again always less than, or equal to, unity. Error variance σ2 is multiplied by ρ to provide the error covariances in the first band immediately below the leading diagonal, by ρ2 in the band beneath that, by ρ3 in the band beneath that, and so on. Thus, because the magnitude of ρ is always fractional, the error covariances in the bands of Σr decline, the further you go from the leading diagonal. Although an autoregressive Σr “saves” considerable degrees of freedom—it uses only two variance components—its elements are tightly constrained: the identical covariances in any band must be the same fraction of any entry in the previous band as are the entries in the following band of them.

Although the autoregressive model fits the opposites-naming data reasonably well, its constraints on the variance components’ relative magnitudes prevent it from fitting as well as the “standard” multilevel model for change. Both the deviance statistic (1265.9) and AIC statistic (1269.9) are slightly larger than their peers in the “standard” multilevel model. On the other hand, the BIC statistic is slightly smaller in this model than it is in the “standard” (1273.0 for the former, 1274.5 for the latter), owing to the burden of additional parameters (4 vs. 2) in the “standard.” Interestingly, although it cannot compete in a world of deviance, the autoregressive model is superior to the unstructured model according to AIC and BIC (as you might expect, given the number of the unknown parameters required by each model, 2 vs. 10).

7.3.5 Heterogeneous Autoregressive Error Covariance Matrix

The heterogeneous autoregressive error structure is a relaxed version of the strict autoregressive structure just described. Its main diagonal elements are heteroscedastic (with variances                       Examining the Multilevel Model’s Error Covariance Structure and                       Examining the Multilevel Model’s Error Covariance Structure for the four waves here). In addition, the bands of constant covariances between pairs of errors that appeared parallel to the main diagonal in the regular autoregressive model are free to differ in magnitude along the bands (again, examine the fitted error covariance matrix in table 7.3). This is achieved by multiplying the same error autocorrelation parameter, ρ, that appeared above by the product of the relevant error standard deviations. Thus, the band diagonal structure—with the magnitudes of the covariances declining across the bands, away from the main diagonal—is some-what (p.263) preserved, but loosened by the inclusion of additional variance components. A model with heterogeneous autoregressive Σr spends additional degrees of freedom, but benefits from additional flexibility over its simpler sibling.

As you might expect, the model with heterogeneous autoregressive error structure benefits in terms of the deviance statistic over the homogeneous autoregressive case, but can be penalized from the perspective of AIC and BIC. For these data, the model with heterogeneous autoregressive Σr fits less well than the “standard” multilevel model for change. Notice that, in the heterogeneous autoregressive model, the deviance statistic (1264.8), AIC (1274.8), and BIC (1282.6) are all larger than the equivalent statistics in the “standard.” As with its homogeneous sibling, although it cannot compete in terms of deviance, the heterogeneous autoregressive model is superior to the unstructured model according to both the AIC and BIC statistics.

7.3.6 Toeplitz Error Covariance Matrix

For the opposites-naming data, the Toeplitz error covariance structure represents a far superior option. The Toeplitz structure has some of the characteristics of the autoregressive structure, in that it has bands of identical covariances arrayed parallel to the main diagonal. However, these elements are not forced to be an identical fraction of the elements in the prior band. Instead, their magnitudes within each band are determined by the data and are not constrained to stand in identical ratios to one another. For the opposites-naming data, we need four variance components to specify a Toeplitz structure (σ2, σ1, σ2, and σ3 in table 7.3) and so Σr is more flexible than the homogeneous autoregressive structure but more parsimonious that its heterogeneous sibling.

For these data, a Toeplitz error covariance structure fits better than the “standard” multilevel model for change and better than all other error covariance structures we have tested, regardless of which goodness-of-fit statistic you consult. The deviance statistic is 1258.1 (as opposed to 1260.3), AIC is 1266.1 (as opposed to 1268.3), and BIC is 1272.3 (as opposed to 1274.5). As we discuss below, however, these differences in goodness-of-fit are relatively small.

7.3.7 Does Choosing the “Correct” Error Covariance Structure Really Matter?

The error covariance structures presented in table 7.3 are but the beginning. Even though the Toeplitz structure appears marginally more (p.264) successful than the implicit error covariance structure of the “standard” multilevel model for change, it is entirely possible that there are other error structures that would be superior for these data. Such is the nature of all data analysis. In fitting these alternative models, we have refined our estimates of the variance components and have come to understand better the model’s stochastic component. We would argue that, for these data, the “standard” multilevel model for change performs well—its deviance, AIC, and BIC statistics are only marginally worse than those of the Toeplitz model. The difference in BIC statistics—2.2 points—is so small that adopting Raftery’s (1995) guidelines, we would conclude that there is only weak evidence that adoption of a Toeplitz error structure improves on the “standard” multilevel model.

If you focus exclusively on the deviance statistic, however, an unstructured error covariance matrix always leads to the best fit. This model will always fit better than the “standard” model, and than any other model that is constrained in some way. The question is: How much do we sacrifice if we choose the unstructured model over these others? For these data, it cost 10 degrees of freedom to achieve this best fit for the model’s stochastic portion—five more degrees of freedom than any other error covariance structure considered here. Although some might argue that losing an additional handful of degrees of freedom is a small price to pay for optimal modeling of the error structure—a consequence of the fact that we have only four waves of data—in other settings, with larger panels of longitudinal data, few would reach this conclusion.

Perhaps most important, consider how choice of an error covariance structure affects our ability to address our research questions, especially given that it is the fixed effects—and not the variance components—that usually embody these questions. Some might say that refining the error covariance structure for the multilevel model for change is akin to rearranging the deck chairs on the Titanic—it rarely fundamentally changes our parameter estimates. Indeed, regardless of the error structure chosen, estimates of the fixed effects are unbiased and may not be affected much by choices made in the stochastic part of the model (providing that neither the data, nor the error structure, are idiosyncratic).

But refining our hypotheses about the error covariance structure does affect the precision of estimates of the fixed effects and will therefore impact hypothesis testing and confidence interval construction. You can see this happening in table 7.4, which displays estimates of the fixed effects and asymptotic standard errors for three multilevel models for change in opposites-naming: the “standard” model (from table 7.2) and models with a Toeplitz and unstructured error covariance matrix (from table 7.3). Notice that that the magnitudes of the estimated fixed effects (p.265)

Table 7.4: Change in opposite’s naming score over a four-week period, as a function of baseline IQ

Model with …

Parameter

Standard error covariance structure

Toeplitz error covariance structure

Unstructured error covariance structure

Fixed Effects

Initial status, π0i

Intercept

γ00

164.37***

165.10***

165.83***

(6.206)

(5.923)

(5.952)

(COG                      Examining the Multilevel Model’s Error Covariance Structure)

γ01

−0.11

−0.00

−0.07

(0.504)

(0.481)

(0.483)

Rate of change, π1i

Intercept

γ10

26.96***

26.895***

26.58***

(1.994)

(1.943)

(1.926)

(COG                      Examining the Multilevel Model’s Error Covariance Structure)

γ11

0.43**

0.44**

0.46**

(0.162)

(0.158)

(0.156)

Goodness-of-fit

Deviance

1260.3

1258.1

1255.8

AIC

1268.3

1266.1

1275.8

BIC

1274.5

1272.3

1291.3

~p < .10; * p < .05; ** p < .01; *** p < .001.

Parameter estimates (standard errors), approximate p-values, and goodness-of-fit statistics after fitting a multilevel model for change with standard, Toeplitz and unstructured error covariance structures (n= 35).

Note: SAS PROC MIXED, Restricted ML.

are relatively similar (except, as you might expect, for γ01, which is not statistically significant anyway). But also notice that the respective asymptotic standard errors decline as the error covariance structure is better represented. The standard errors are generally smaller in the Toeplitz and unstructured models than in the “standard,” although differences between the Toeplitz and unstructured models are less unanimous. You should find it reassuring that—given the widespread application of the “standard” multilevel model for change—the differences in precision shown here are small and likely inconsequential. Of course, this conclusion is specific to these data; ensuing differences in precision may be greater in some data sets, depending on the design, the statistical model, the choices of error covariance structure, and the nature of the forces that bind the repeated observations together. To learn more about this topic, we refer interested readers to Van Leeuwen (1997); Goldstein, Healy, and Rasbash (1994); and Wolfinger (1993, 1996).

Notes:

(1.) Specifically, the identical blocks in the error covariance matrix in equation 7.8 require completely balanced data.