Jump to ContentJump to Main Navigation
Statistical Analysis of Epidemiologic Data$

Steve Selvin

Print publication date: 2004

Print ISBN-13: 9780195172805

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195172805.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 17 August 2018

Survival Data: Proportional Hazards Model

Survival Data: Proportional Hazards Model

Chapter:
(p.412) 13 SURVIVAL DATA: PROPORTIONAL HAZARDS MODEL
Source:
Statistical Analysis of Epidemiologic Data
Author(s):

Steve Selvin

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780195172805.003.13

Abstract and Keywords

The success of a model-based approach depends on choosing a model that accurately reflects the relationships within the data. This choice requires knowledge of the statistical properties of the model and a clear understanding of the phenomenon being investigated. One of the many useful models applied to survival data is the proportional hazards model. This chapter describes this model in simple terms, illustrating its properties and providing insight into the process of analyzing survival experience data using statistical modeling techniques.

Keywords:   survival data analysis, statistical models, survival curves, epidemiologic data analysis

It is rarely sufficient to demonstrate that one group of individuals has a significantly longer survival time than another. Pursuit of plausible explanations for the observed differences is an integral part of understanding survival experience. Survival time, like risk measured by a probability, is almost always affected by a number of interrelated factors. Factors such as age, severity of disease, past health status, and race/ethnicity provide additional information that likely improves the description of survival data. The investigation of the role of these explanatory variables usually requires employing a statistical model. Such a model formally relates a series of variables to an individual’s survival time. The analysis of the effect of these explanatory variables is conducted in much the same manner as the assessment of the risk variables in a logistic or Poisson regression model. Of course, a statistical model, at best, approximates the unknown underlying situation. Alternative approaches, however, are rarely possible without large amounts of data, making a statistical model a basic tool in the investigation of variables relevant to survival.

The success of a model-based approach depends on choosing a model that accurately reflects the relationships within the data. This choice requires, as always, knowledge of the statistical properties of the model and a clear understanding of the phenomenon under investigation. One of many useful models applied to survival data is the popular proportional hazards model. In the following, this sometimes complex model is described in simple terms, with the dual purpose of illustrating its properties and providing insight into the process of analyzing (p.413) survival experience data using statistical modeling techniques. The multivariable analysis of survival data is a mathematically sophisticated topic, and a number of texts are completely devoted to the many aspects of the statistical analysis of failure time data (for example, [1], [2], and [3]). This chapter is a brief introduction to the application of one specific model.

SIMPLEST CASE

The simplest application of a proportional hazards model (sometimes called the “Cox” model, after the statistician D. R. Cox, who originated the analytic approach) involves the comparison of two groups made up of individuals with varying survival times, some of which may be censored. A small hypothetical data set of 12 subjects provides an introduction (Table 13–1). These data consist of two treatment groups (A and B), each with six individuals, and a total of nine complete and three incomplete survival times.

Before employing a proportional hazards model to compare the two treatments A and B, it is useful to apply the log-rank test to assess differences in survival times for this small data set. When the analysis involves simply comparing two groups, the log-rank approach is a special case of the proportional hazards model, as will be illustrated. The hypothetical data classified into nine 2 × 2 tables based on the nine complete survival times is the first step in the log-rank procedure (Table 13–2).

The same data are fully displayed in Figure 13–1. The Mantel-Haenszel chi-square test (log-rank test) for evaluating an association between two variables stratified into k strata (2 × 2 tables) is, once again (Chapters 7 and 12),

                   Survival Data: Proportional Hazards Model
yielding a p-value of                    Survival Data: Proportional Hazards Model                . The data show evidence of a nonrandom difference in survival time associated with the two treatments, A and B.

Table 13–1. Hypothetical data

Treatment A

5

8

12

22+

37

41

Treatment B

23

40

43

51+

53+

62

(p.414)

Table 13–2. Hypothetical data displayed in nine 2 × 2 tables stratified by survival time: summary

Time (t)

ai

ai + bi

ni

                   Survival Data: Proportional Hazards Model

aiÂi

Variance(ai)

5

1

6

12

6/12

6/12

396/1584 = 0.250

8

1

5

11

5/11

6/11

300/1210 = 0.248

12

1

4

10

4/10

6/10

216/900 = 0.240

23

0

2

8

2/8

−2/8

84/448 = 0.188

37

1

2

7

2/7

5/7

60/294 = 0.204

40

0

1

6

1/6

−1/6

25/180 = 0.139

41

1

1

5

1/5

4/5

16/100 = 0.160

43

0

0

4

0/4

0

0/48 = 0.000

62

0

0

1

0/1

0

Total

5

2.257

2.743

1.428

Aside: The expected value                    Survival Data: Proportional Hazards Model                and the estimated variance of the distribution of ai in the situation where no survival times are identical can be simplified over the previous more general expressions as

                   Survival Data: Proportional Hazards Model

In its simplest form, the proportional hazards model postulates that the hazard function associated with survival times for individuals in treatment group A

                   Survival Data: Proportional Hazards Model

Figure 13–1. Full display of the hypothetical data by recorded times of death for group A and group B in a series of 2 × 2 tables.

(p.415) is related to the hazard function associated with survival times for individuals in treatment group B by a multiplicative constant or, in symbols,
                   Survival Data: Proportional Hazards Model
where λA(t) and λB(t) represents hazard functions describing the survival associated with treatments A and B, respectively. Unlike the exponential model, these hazard rates can vary depending on time t. For example, both could increase with increasing survival time, but the ratio remains constant.

The constant c can be estimated from the observed survival times. Continuing to use the hypothetical data, the estimated constant of proportionality for the two hazard rates associated with treatments A and B is ĉ = 0.168. The hazard rate for treatment B is then estimated to be about six times smaller than for treatment A for all times tA(t) = 5.962λB(t), or λB(t) = 0.168λA(t)]. The estimation of the parameter c, like the previous regression estimates, requires a computer algorithm.

In most cases, the collected survival data are too sparse to estimate reliably the hazard functions themselves. The estimate of the ratio c, however, provides a summary measure of the difference in survival experiences between two groups when the hazard functions are proportional. The entire data set (all 12 observations) is efficiently focused on the estimation of a single parameter.

A special property of the proportional hazards model is that the estimate of the proportionality constant (c) does not require the actual form of the hazard function to be specified. Thus, the comparison of the relative survival between the two groups can be efficiently summarized and statistically compared without knowledge or assumptions about the functional form of the hazard functions λA(t) or λB(t), as long as they are proportional. In addition, the estimation of the constant c makes use of both the complete and the incomplete observations in the sampled data. Not unlike the product-limit estimate of the survival probabilities, information from the censored observations is used to form an unbiased estimate of c when it is relevant and ignored when it is not.

In terms of survival curves, the property of proportionality of hazard functions translates to

                   Survival Data: Proportional Hazards Model
where S A(t) and S B (t) represent the survival curves for treatments A and B. If one survival curve is a known standard or referent curve, then the other is directly related when the hazard functions are proportional. For example, for treatment A, if S A(t 0) = 0.6 at a specific time t 0, then Ŝ B(t 0) = [S A(t 0)]ĉ = [0.6]0.168 = (p.416) 0.918 explicitly gives the probability of survival associated with treatment B in terms of the probability of survival associated with treatment A. The survival probabilities S B(t) will be greater than S A(t) when λB(t) is less than λA(t) (c < 1), which is not surprising because a smaller hazard rate implies a higher probability of survival. Thus, the estimated survival curve Ŝ A(t) will always be below the estimated survival curve Ŝ B(t) for all values of t because ĉ = 0.168 is less than one.

Standard methods can be used to evaluate the effect of sampling variation on the estimate ĉ. To test the null hypothesis that the hazard functions are the same (H 0: c = 1), one effective technique is to compare two log-likelihood statistics. First a log-likelihood value is estimated under the conditions that λB(t) = λA(t) and a second value under the conditions that λB(t) = cλA(t). For the example data, the two log-likelihoods statistics are L c=1 = 31.996 when the compared hazard functions are the same and L c≠1 = 27.148 when they differ systematically. The log-likelihood values are produced as part of the estimation process. The increase X 2 = L c=1L c≠1 = 4.849 has an approximate chi-square distribution with one degree of freedom when the two groups differ by chance alone, producing a p-value of P(X 2 ≥ 4.849 | c = 1) = 0.028. The degrees of freedom are the difference in the number of parameters used to describe the two compared models. This comparison of log-likelihood statistics is essentially the same process used in logistic and Poisson regression analyses.

Alternatively, a statistical test of the estimate ĉ in terms of the log (ĉ) is equally effective. The test-statistic

                   Survival Data: Proportional Hazards Model
has an approximate standard normal distribution when c = 1 or log(c) = 0. For the hypothetical data, the logarithm of ĉ is log(0.168) = −1.785, with a variance[log(ĉ)] = 0.745 (estimated as part of the estimation process). Therefore,                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                and the corresponding p-value is 0.039. This again is statistical evidence that the survival experience is likely different between treatments A and B(c ≠ 1).

Applied to the comparison of the two groups, the log-rank test always gives results similar to the difference between log-likelihood statistics, particularly when large sample sizes are involved. For example, the log-rank test-statistic X 2 = 5.286 (p-value = 0.022) is close to the log-likelihood test-statistic X 2 = 4.849 (p-value = 0.028) for the hypothetical data. The log-likelihood and log-rank approaches are essentially the same in all cases because the parameter c of the proportional hazards model is estimated by a process that also stratifies the survival data on the time of failure and combines information from each stratum to estimate the overall constant of proportionality. Also similar to the log-rank (p.417) procedure, the proportional hazards model is nonparametric in the sense that no need exists to specify the form of the hazard functions or survival curves to assess the relative influence of two treatments and, as mentioned, produces an estimate of the parameter c that is not biased by the presence of censored data. In other words, the log-rank test in this two-sample case is a special case of the proportional hazards model [4].

The data described in Chapter 12 reflecting the efficacy of two treatments for acute myelogenous leukemia (AML) [4] can be analyzed based on the conjecture that the hazard functions associated with the two treatments, maintained (M) and nonmaintained (NM), are proportional. The estimated constant of proportionality is ĉ = 0.405. Therefore, λM(t) = ĉλNM(t) = 0.405λNM(t). The risk of a relapse (reflected by proportional hazard functions) in the maintained group of leukemia patients is estimated to be considerably less than the risk experienced by the subjects receiving no special chemotherapy (a 2.5-fold difference).

The difference in log-likelihood statistics produces X 2 = L c=1L c≠1 = 85.796 − 82.500 = 3.296, which has an approximate chi-square distribution with one degree of freedom and yields a p-value of P(X 2 ≥ 3.296 | no treatment effect) = 0.069. Assuming proportional hazard functions, the comparison of log-likelihood statistics produces borderline evidence of a systematic difference between these two treatments. This result is expectedly similar to the log-rank analysis of the same data (X 2 = 3.396 with a p-value = 0.065; Table 12–17).

To explore these data further, it is assumed that the survival experience in both non-maintained and maintained groups is described by an exponential function (Chapter 12), providing a simple and direct comparison of the two groups. The exponential model produces estimated hazard rates of                    Survival Data: Proportional Hazards Model                weeks (nonmaintained) and                    Survival Data: Proportional Hazards Model                weeks (maintained), and the ratio based on these estimates is                    Survival Data: Proportional Hazards Model                . The estimated hazards ratio based on the proportional hazards model is, again,                    Survival Data: Proportional Hazards Model                . The similarity implies that the hazard rates are not only proportional but also likely constant with respect to time. The resulting estimated survival curves are displayed in Figure 13–2.

THE PROPORTIONAL HAZARDS MODEL

A general additive proportional hazards model with k explanatory variables is expressed as

                   Survival Data: Proportional Hazards Model
The value λi(t | x i1, x i2, …, x ik) represents the hazard rate for the ith observation with a specific set of explanatory values at time t relative to λ0(t), an arbitrary (p.418)
                   Survival Data: Proportional Hazards Model

Figure 13–2. Survival curves for the AML data for the maintained and nonmaintained groups.

“baseline” hazard rate also at time t. The value xij represents one of a series of k measures on each observation (jth variable, ith group or individual). The coefficient bj directly reflects the influence of the jth explanatory variable on the hazard rate. Like the odds ratio from the additive logistic model, the proportional hazards model dictates that the proportionality constant ci factors into a series of multiplicative components each associated with the influence of a specific explanatory variable. Specifically, the value ci can be written as
                   Survival Data: Proportional Hazards Model
The relative contribution to survival time from each variable xij is reflected by (ebjxij). The quantity ebj is called the relative hazard for the jth explanatory variable.

A common and useful practice is to “center” the x-variables so that the proportional hazards model becomes

                   Survival Data: Proportional Hazards Model
(p.419) where                    Survival Data: Proportional Hazards Model                is the mean of the jth explanatory variable based on all n sampled observations. For this form of the model, the “baseline” hazard function λ0(t) becomes the hazard function when all x-variables are at their mean values (the “average” hazard function).

Two alternative forms of the additive proportional hazards model are

                   Survival Data: Proportional Hazards Model
showing more directly the way the explanatory variables (xij) relate to the hazard function. The first expression indicates that the logarithm of the ratio of proportional hazard functions does not depend on time but only on the explanatory variables xij. The second expression shows the proportional hazards model in a form analogous to multivariable linear regression model with an “intercept” term that depends on follow-up time separated from a weighted sum of explanatory variables that does not depend on follow-up time. In both cases, the role of the explanatory variables (xij) is determined by the bj-coefficients in the much same way as the coefficients from linear regression models in general.

The basic proportional hazards model requires that the explanatory variables do not change over time. The influence of time and the influence of the explanatory variable are separate components of the model, which is another way of saying that the components of the constant of proportionality ci are unrelated to time. The proportional hazards model, however, can be modified so that the explanatory variables also depend on time. A treatment could vary during the follow-up period or characteristics of individuals such as blood pressure or cholesterol levels could change over time. The analysis of survival time data in the presence of such time-dependent explanatory variables is discussed elsewhere ([1] or [2]).

The hazard function and the survival curve are related. As mentioned, high hazard rates lead to low survival probabilities and conversely. Formally,

                   Survival Data: Proportional Hazards Model
When two hazard functions are proportional, then λi(t) = λ0(t)ci and
                   Survival Data: Proportional Hazards Model

(p.420) In the special case where the survival times have an exponential distribution (i.e., S(t) = e −λt), then

                   Survival Data: Proportional Hazards Model

The proportional hazards model is called a semiparametric model because it is made up of nonparametric and parametric components. The nonparametric property stems from the fact that the hazard functions are unspecified, and it is not necessary to describe this part of the model in a parametric form. The relationship between the explanatory variables and the survival times, however, is parametric. Specifically, the role of each explanatory variable is directly reflected by a parameter (bj), which is a fundamental element of the proportional hazards model.

PLOTTING SURVIVAL CURVES

When two hazard functions are proportional, the survival curves do not cross, as noted. Suppose that two groups (1 and 2) have proportional hazard functions where c = 2, then S 2(t) = [S 1(t)]2. These two survival curves do not cross because S 2(t) < S 1(t) for all values of t, but it is usual not obvious from the plots of survival curves that the hazard functions are proportional (Fig. 13–3, top). The transformation log{−log [S(t)]} is useful. For proportional hazards functions, this transformation produces survival curves that are parallel and differ only because of the influence of the explanatory variables. Figure 13–3 (bottom left) shows the transformed functions of S 1(t) and S 2(t). When the curves representing the transformed values of S 1(t) and S 2(t) are parallel, the plot of one against the other is a single straight line (Fig. 13–3, bottom right). In general, S 2(t) = [S 1(t)]c requires log(−log[S 1(t)]) − log(−log[S 2(t)]) = log(c).

All mathematical functions that are proportional, by definition of proportional, do not cross. This fact can be applied to proportional hazard analysis as part of exploring the issues of whether the data support the use of the “Cox” model, because proportional hazards guarantees that the survival curves do not cross.

Simplifying transformations and the geometry of the survival curves are fundamental tools in the difficult task of deciding if the relationships within a data set are meaningfully represented by a proportional hazards model. The plot of a “log-log” transformation of the survival curves is a good place to start a goodness-of-fit evaluation. However, the fact that the survival curves do not cross when the hazard functions are proportional does not mean that the estimates of the survival curves do not cross. Estimates, subject to random variation, can fluctuate (p.421)

                   Survival Data: Proportional Hazards Model

Figure 13–3. Comparisons of two hypothetical survival curves S 1(t) and S 2(t) based on λ0(t) = 1/(100 – t).

to an extent where plots cross even when the underlying hazard functions are proportional.

FOUR APPLICATIONS OF A PROPORTIONAL HAZARDS MODEL

Application I: CD4 Counts, Serum β 2-Microglobulin, and AIDS

A bivariate proportional hazards model addresses two fundamental questions: Do the two explanatory variables have independent influences? And if they do, what are their relative contributions to survival time? These two issues arise in the study of HIV-positive patients and the relationship of two predictors of AIDS: CD4 counts and serum β2-microglobulin levels. Past studies show levels of both measures correlate with progressively severe illness. A two-variable proportional hazards model describing the time to AIDS (“survival”) allows the evaluation of these two measures as a prognostic tools for AIDS among HIV-infected individuals.

The proportional hazards model

                   Survival Data: Proportional Hazards Model
(p.422) is a first step in exploring the issue of the independence of effects. The symbol x 1i represents the CD4 count, and x 2i, represents the β2-microglobulin level measured on the ith participant. The expression λ0(t) represents the hazard function for individuals with CD4 counts of 800 and levels of serum β2-microglobulin of 200. Estimates based on this statistical model allow evaluation of the degree of interaction indicated by the data. In a fashion almost identical to the bivariate logistic regression analysis, the key parameter b 3 reflects the magnitude of the interaction between the explanatory variables x 1 (CD4) and x 22).

Data from the San Francisco Men’s Health Study [5] provide n = 348 seropositive homosexual or bisexual men who were interviewed every six months so that their time from HIV diagnosis to AIDS (in months) is known (time-to-aids = “survival time”). Among the 348 study participants, 219 converted to AIDS (complete observations), and 129 remained HIV-positive (censored observations) over 40 months of follow-up. The CD4 counts and β2-microglobulin levels were measured when HIV-positive participants entered the study. Applying a bivariate proportional hazards model to these data provides estimates of the three parameters b 1, b 2, and b 3 (Table 13–3).

The effect of the interaction can be assessed by postulating that the coefficient b 3 is zero and, as usual, the test-statistic                    Survival Data: Proportional Hazards Model                is then viewed as a random observation sampled from an approximate standard normal distribution. The associated p-value is P(|Z| ≥ 0.604 | b 3 = 0) = 0.546.

Alternatively, the interaction model can be contrasted with the additive model (b 3 set to zero). The proportional hazards model becomes

                   Survival Data: Proportional Hazards Model
Estimates of the additive model coefficients where CD4 count and serum β2-microglobulin are required to be independent (no interaction) contributors to time-to-aids produce estimates of b 1 and b 2 (Table 13–4).

The increase in log-likelihood statistics measures the effect of excluding the interaction term from the model. The chi-square test-statistic is                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                (one degree of freedom, yielding a p-value

Table 13–3. Estimated model coefficients for CD4 counts and β2-microglobulin levels from a proportional hazards analysis including an interaction term

Term

Coefficient

SE

p-value

Hazard ratio

CD4

                   Survival Data: Proportional Hazards Model

−0.00149

0.00125

< 0.001

0.999

β2

                   Survival Data: Proportional Hazards Model

0.00348

0.00038

0.016

1.003

CD4 × β2

                   Survival Data: Proportional Hazards Model

− 0.0000023

0.0000038

0.546

0.999

−2 log-likelihood = 1995.591; number of model parameters = 3.

(p.423)

Table 13–4. Estimated model coefficients for CD4 counts and β2-microglobulin levels from a proportional hazards analysis excluding the interaction term: an additive model

Term

Coefficient

SE

p-value

Hazard ratio

CD4

                   Survival Data: Proportional Hazards Model

−0.00162

0.00033

< 0.001

0.998

β2

                   Survival Data: Proportional Hazards Model

0.00404

0.00082

< 0.001

1.004

−2 log-likelihood = 1995.952; number of model parameters = 2.

= P(X 2 ≥ 0.361 | b 3 = 0) = 0.548), showing again no evidence of an interaction between CD4 counts and β2-microglobulin levels.

Behaving as if CD4-counts and β2-microglobulin levels have independent influences, the additive model produces estimates of the separate effects of these two indicators of survival time. Estimated hazard ratios are given by

                   Survival Data: Proportional Hazards Model
where the comparison is relative to an individual with a CD4 count of 800 and β2-microglobulin level of 200 (Table 13–5).

A last issue concerns the confounding effects of these variables. Assessment of the confounding influence of a variable or variables using a proportional hazards model is not different in principle from the usual approach to assessing confounding. Two estimates are compared—one estimated from a model including and another estimated from a model excluding the confounding variable or variables. When the β2-microglobulin variable (x 2) is deleted from the bivariate proportional hazards model, the estimate of the coefficient associated with the CD4 counts is                    Survival Data: Proportional Hazards Model                , which is somewhat different from the estimate when the influence of β2-microglobulin is retained in the model where                    Survival Data: Proportional Hazards Model                (about a 23% difference). Similarly, the confounding influence of CD4 counts on (β2-microglobulin coefficient can be seen from the proportional hazards model excluding

Table 13–5. Estimated hazard ratios from the proportional hazards model for three levels of CD4 counts (x 1) and three levels of β2-microglobulin (x 2)

CD4 (x 1)

β2-microglobulin (x 2)

200

300

400

800

1.000

1.498

2.245

500

1.626

2.433

3.646

300

2.243

3.361

5.037

(p.424) CD4 counts where                    Survival Data: Proportional Hazards Model                , which also is somewhat different from the estimate from the bivariate model where                    Survival Data: Proportional Hazards Model                (about a 22% difference). This comparison illustrates a perhaps subtle distinction. Independence of variables (additivity) refers to the relationship between explanatory variables and the outcome; while confounding refers to the impact of a variable or variables on the relationship between another variable and its influence on the outcome.

Application II: Auer rods, WBC-Count, and Leukemia Survival

Data [6] collected to investigate the relationship between survival of patients with acute myelogenous leukemia, white blood cell (WBC) counts and a white cell morphologic characteristic can be explored using a proportional hazards model. The morphologic characteristic is the presence or absence of Auer rods, termed AG-positive and AG-negative. Thirty-three survival times (in weeks) and the white blood cell count for AG-positive and AG-negative leukemia patients are given in Table 13–6.

These survival times are complete (no censoring; all patients died), producing 17 AG-positive and 16 AG-negative observations. A simple correlation coefficient shows that the WBC count is related to survival time (correlation = −0.33), and the mean WBC counts appear to differ between AG-status groups

Table 13–6. Survival times and white blood cell counts for AG-positive and AG-negative acute myelogenous leukemia patients

AG-positive

AG-negative

Patient

Weeks

WBC

Patient

Weeks

WBC

1

65

2,300

18

65

3,000

2

156

750

19

17

4,000

3

100

4,300

20

7

1,500

4

134

2,600

21

16

9,000

5

16

6,000

22

22

5,300

6

108

10,500

23

3

10,000

7

121

10,000

24

4

19,000

8

4

17,000

25

2

27,000

9

39

5,400

26

3

28,000

10

143

7,000

27

8

31,000

11

56

9,400

28

4

26,000

12

26

32,000

29

3

21,000

13

22

35,000

30

30

79,000

14

1

100,000

31

4

100,000

15

1

100,000

32

43

100,000

16

5

52,000

33

56

4,400

17

65

100,000

(p.425)                    Survival Data: Proportional Hazards Model                . This association with both survival time and AG-status, as well as the generally recognized fact that WBC counts are a central factor in leukemia survival, imply that the WBC count should be part of the description of the relationship between AG-status and survival time. A proportional hazards model is one way to investigate the influence of AG-status (binary variable) on survival while accounting for the influence of differing WBC counts.

Plotting the survival curves (AG-positive and AG-negative; Fig. 13–4, top) and the “log-log” transformed survival curves show no evidence that the assumption of proportional hazards is unrealistic. The plot of the transformed survival functions, AG-positive against AG-negative, yields an essentially straight line (slope = 1), indicating a strong likelihood that the ratio of hazard functions is constant with respect to time (Fig. 13–4, bottom).

A proportional hazards model relating WBC count and AG-status to survival time (including the possibility of an interaction) is

                   Survival Data: Proportional Hazards Model
                   Survival Data: Proportional Hazards Model

Figure 13–4. Goodness-of-fit proportional hazards model: estimated leukemia survival curves and log-log transformed curves by Auer rod status.

(p.426) The relationship between the AG-positive and AG-negative survival curves is then
                   Survival Data: Proportional Hazards Model
where x 1 is an indicator (0, 1) of AG-status {+, −} and x 2 is the logarithm of the directly recorded WBC count. Again, the quantity represented by ci is the ratio between proportional hazard functions. The estimated model parameter                    Survival Data: Proportional Hazards Model                again indicates the degree of interaction of the logarithm of the WBC counts and AG-status variables (Table 13–7).

It is likely that the relationship between log-WBC count and survival pattern is not the same for the AG-positive and the AG-negative individuals. Specifically, the estimated coefficients                    Survival Data: Proportional Hazards Model                and                    Survival Data: Proportional Hazards Model                do not exclusively characterize the separate roles of the two explanatory variables, AG-status and log-WBC count. Wald’s test of the interaction coefficient                    Survival Data: Proportional Hazards Model                and the comparison of log-likelihood statistics (difference = X 2 = 154.468 − 150.681 = 3.787, degrees of freedom = 1) produce small p-values (both close to 0.05), indicating the definite possibility of a different relationship between log-WBC count and survival for each kind of Auer rod. The additive model (Table 13–7), therefore, is not likely a useful description of the impact of the two explanatory variables. The presence of interaction, as usual, limits the amount of summarization.

Application III: Vital Capacity, Age, and Lung Cancer Survival

Preliminary observations of patients participating in a clinical trial [7] provide 131 lung cancer survival times, as well as the ages of the patient and their vital capacity (explanatory variables). The data are divided into two groups based on the patient’s measured vital capacity. One group consists of 95 patients with “high” vital capacity ratios, and the other consists of 36 patients with “low” vital capacity ratios. The vital capacity groups, patient’s age, and survival time (in days) are given in Tables 13–8 and 13–9.

Figure 13–5 (top) displays the product-limit estimated survival curves associated with these two vital capacity groups. The “high” vital capacity group appears to have better survival experience. Figure 13–5 (bottom) reflects a definite influence of age on the pattern of survival for these patients. The two vital capacity groups each stratified by age (individuals less than or equal to 65 years old and individuals greater than 65 years old) produces distinctly separate survival patterns. A summary of mean survival times additionally indicates that age influences survival (Table 13–10).

Two expected issues arise when survival data are classified into subgroups. The choice of the bounds for the categories adds an arbitrary element to the analysis that influences the final interpretations. Of more importance, the number of (p.427)

Table 13–7. Proportional hazards model: leukemia and WBC count

Term

Coefficient

SE

p-value

Hazard ratio

Full model

AG-status

                   Survival Data: Proportional Hazards Model

−5.040

2.131

0.018

0.007

log-WBC

                   Survival Data: Proportional Hazards Model

0.145

0.177

0.412

1.156

log-WBC × AG-status

                   Survival Data: Proportional Hazards Model

0.531

0.276

0.055

1.700

−2 log-likelihood = 150.681; number of model parameters = 3.

Additive model: interaction excluded

AG-status

                   Survival Data: Proportional Hazards Model

−1.069

0.423

0.013

0.343

log-WBC

                   Survival Data: Proportional Hazards Model

0.368

0.136

0.007

1.444

−2 log-likelihood = 154.468; number of model parameters = 2.

Table 13–8. Lung cancer survival data: “high” vital capacity

Time

Age

Time

Age

Time

Age

Time

Age

Complete

0

74

1

74

1

63

3

78

4

66

5

40

9

65

19

51

21

73

30

62

36

68

39

50

40

56

48

64

51

72

61

58

89

64

90

41

90

69

92

76

113

73

127

64

131

51

138

75

139

56

143

50

159

60

168

74

170

71

180

69

189

56

192

68

201

64

212

58

223

70

229

76

238

63

265

65

275

63

292

55

317

65

322

55

350

54

357

73

380

51

Censored

62

66

75

44

77

60

81

38

83

59

83

42

84

67

86

62

88

53

92

59

98

55

104

62

116

62

129

35

131

43

162

45

167

56

173

54

178

63

179

69

184

69

184

67

194

58

256

57

263

46

269

63

338

47

344

52

347

59

349

61

350

66

362

56

362

60

364

63

364

58

364

58

365

66

368

70

368

39

372

58

388

59

388

68

400

64

524

59

528

63

545

63

546

55

552

52

555

57

558

63

Note: the first 45 survival times are complete (died within the study period) and the following 50 are censored.

(p.428)

Table 13–9. Lung cancer survival data: “low” vital capacity

Time

Age

Time

Age

Time

Age

Time

Age

Complete

0

55

2

75

2

73

2

65

6

61

17

74

22

51

23

66

54

67

56

51

61

36

63

54

64

54

69

70

146

53

155

47

161

46

233

41

248

61

283

53

Censored

47

56

73

55

86

48

89

65

91

58

169

58

172

62

177

53

183

48

188

52

194

67

266

53

266

53

267

52

351

71

372

71

Note: the first 20 survival times are complete (died within the study period) and the following 16 are censored.

observations in subcategories are substantially reduced (for example, “low,” age > 65 contains nine individuals and only six complete observations). This reduction in sample size leads to increased sampling variation causing increased difficulty in interpretation. The lack of clarity in Figure 13–5 comes, to a large extent, from “small-sample-size” variation associated with the estimated age-specific survival curves.

The mean age of patients in the two groups differ (“high” vital capacity                    Survival Data: Proportional Hazards Model                years and “low” vital capacity                    Survival Data: Proportional Hazards Model                years), and age is undoubtedly related to survival, implying that adjustment (via a proportional hazards model) will help identify differences between the two vital capacity groups that are not attributable to differing influences from age. The assumption of proportional hazards allows adjustment for the influence of age and provides an improved evaluation of “high” and “low” capacity groups, while dramatically decreasing the impact of random variation incurred by stratifying the data into four age categories. As with most regression models, large gains in efficiency are achieved because the entire data set (all 131 observations) is focused on the estimation of two summary parameters.

Application of an additive proportional hazards model to these lung cancer survival data requires making vital capacity and age components of the model, thus producing separate measures of the influence of each variable on survival time variability. That is, the overall ratio of hazard functions is factored into a component measuring the influence of vital capacity and a component measuring the influence of age on survival. Such an additive proportional hazards model describing the relationships of vital capacity and age to survival is

                   Survival Data: Proportional Hazards Model
(p.429)
                   Survival Data: Proportional Hazards Model

Figure 13–5. Estimates of the survival curves for two groups of lung cancer patients by vital capacity and age based on observed data.

where x 1 is a binary variable that identifies “high” (x 1 = 0) or “low” (x 1 = 1) vital capacity group and x 2 represents the reported age. The parameter estimates and log-likelihood statistics associated with this model are, as always, the key to the interpreting the analysis (Table 13–11).

Table 13–10. Summary: lung cancer mean survival time

Vital capacity

Age (years)

ni

Complete

Censored

Mean

SE

“High”

≤ 65

68

27

41

578.0

111.3

“High”

> 65

27

18

9

235.1

55.4

“Low”

≤ 65

27

14

13

255.3

68.2

“Low”

> 65

9

6

3

180.7

73.8

Total

131

65

66

376.9

46.7

Note: vital capacities “high” and “low” are in quotes as a reminder that these categories are arbitrarily chosen.

(p.430)

Table 13–11. Three proportional hazards models: lung cancer survival data

Term

Coefficient

SE

p-value

Hazard ratio

Vital capacity group and age included

Group

                   Survival Data: Proportional Hazards Model

0.637

0.275

0.020

1.891

Age

                   Survival Data: Proportional Hazards Model

0.038

0.015

0.013

1.034

−2 log-likelihood = 549.646; number of model parameters = 2.

Age excluded Group

                   Survival Data: Proportional Hazards Model

0.540

0.274

0.049

1.716

−2 log-likelihood = 556.093; number of model parameters = 1.

Vital capacity group excluded

Age

                   Survival Data: Proportional Hazards Model

0.034

0.016

0.027

1.035

−2 log-likelihood = 554.605; number of model parameters = 1.

Contrasting the additive bivariate model (group and age included) to the model with age excluded (reduced model) shows noticeable confounder bias. The coefficient associated with the group membership (b 1) decreases from 0.637 to 0.540 when age is excluded from the model; in terms of the relative hazard, the change is 1.891 to 1.716. Also, the statistical evaluation shows that age has a strong influence on the survival time. The increase in log-likelihood statistics                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                has an approximate chi-square distribution with one degree of freedom (the difference in the number of parameters needed to specify each of the two models) when age is not related to survival (b 2 = 0). The p-value is P(X 2 ≥ 6.447 | b 2 = 0) = 0.011. Both the extent of the confounder bias and the expected association with survival time indicate that age plays a significant role in the survival of these patients and is an important component in a model designed to identify the influence of vital capacity on survival time.

The influence of the “high” versus “low” vital capacity is similarly assessed. The comparison of the log-likelihood statistics (bivariate versus reduced model) produces an independent statistical evaluation of the influence of the vital capacity classification (difference between log-likelihood statistics is                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                ). This increase has a chi-square distribution with one degree of freedom when vital capacity is unrelated to survival, yielding a p-value of P(X 2 ≥ 4.959 | b 1 = 0) = 0.026. Like age, the classification of individuals by vital capacity is likely associated with survival time. The increase in the log-likelihood statistic, furthermore, cannot be attributed to influences of age because a measure of the independent age-effect is maintained in both the bivariate and reduced analyses (x 2 is included in both models).

Confidence intervals based on the estimated coefficients from the proportional hazard model are constructed in the usual way (                   Survival Data: Proportional Hazards Model                is an approximate 95% confidence interval for the underlying coefficient). Confidence (p.431) intervals for the relative hazard are then                    Survival Data: Proportional Hazards Model                . The approximate 95% confidence interval based on the estimated vial capacity coefficient                    Survival Data: Proportional Hazards Model                is                    Survival Data: Proportional Hazards Model                and                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                , and the corresponding confidence interval based on the estimated relative hazard of                    Survival Data: Proportional Hazards Model                is (e 0.134, e 1.212) or (1.103, 3.241). The analogous approximate 95% confidence interval based on the estimated relative hazard                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                associated with the age variable is (1.009, 1.070).

To explore further the lung cancer survival data, suppose that the “high” vital capacity group is chosen as a “baseline” survival function S 0(t). The product-limit estimated survival curve Ŝ 0(t) is given in Table 13–12 and displayed in Figure 13–6.

The survival curves for the “high” and “low” vital capacity groups (ignoring age),

                   Survival Data: Proportional Hazards Model
are displayed in Figure 13–7 (top). To show the influence of age, the survival curves
                   Survival Data: Proportional Hazards Model

Table 13–12. Lung cancer survival data: survival curve (“high” capacity) Ŝ 0(t)

Obs

Days

S 0(t)

Obs

Days

S 0(t)

1

1

0.968

23

139

0.720

2

3

0.958

24

143

0.707

3

4

0.947

25

159

0.694

4

5

0.937

26

168

0.680

5

9

0.926

27

170

0.666

6

19

0.916

28

180

0.652

7

21

0.905

29

189

0.637

8

30

0.895

30

192

0.622

9

36

0.884

31

201

0.606

10

39

0.874

32

212

0.590

11

40

0.863

33

223

0.575

12

48

0.853

34

229

0.559

13

51

0.842

35

238

0.544

14

61

0.832

36

265

0.527

15

89

0.820

37

275

0.510

16

90

0.808

38

292

0.493

17

90

0.796

39

317

0.476

18

92

0.784

40

322

0.459

19

113

0.771

41

350

0.439

20

127

0.759

42

357

0.418

21

131

0.746

43

380

0.380

22

138

0.733

(p.432)
                   Survival Data: Proportional Hazards Model

Figure 13–6. Product-limit estimated survival curve for lung cancer patients with “high” vital capacity, Ŝ 0(t).

are also displayed in Figure 13–7 (bottom) for ages 55 (x 2 = 55) and 75 (x 2 = 75). A model representing the relationship of age and group membership to survival time uses the data with great efficiency to produce a clear and easily interpreted picture of the survival pattern (Fig. 13–5 contrasted with Fig. 13–7). As usual, the cost is the insecurity that the model does not adequately represent the structure underlying data. Of course, this concern can be at least partially addressed by evaluating the goodness-of-fit of the proportional hazards model.

The ratio of two hazard functions summarizes the survival experience of two groups or individuals and is a particularly meaningful description of two proportional hazard functions. The estimated ratio of hazard rates from the proportional hazards model is analogous to ratios of average mortality or incidence rates except that these rates are instantaneous measures and are typically adjusted for the influence of other explanatory variables.

For the proportional hazards model, the difference in survival measured by the ratio of the two hazards functions is summarized by

                   Survival Data: Proportional Hazards Model
(p.433)
                   Survival Data: Proportional Hazards Model

Figure 13–7. Estimates of the survival curves for two groups of lung cancer patients by vital capacity and age based on a proportional hazards model.

where xj and                    Survival Data: Proportional Hazards Model                represent different levels of the jth explanatory variable. When two groups differ by only a single variable (identical for all but one variable, say xm), then for xmxm the hazard ratio is
                   Survival Data: Proportional Hazards Model
and identifies the impact of the different levels of a single explanatory variable (xm versus                    Survival Data: Proportional Hazards Model                ) as if the other variables were constant. In other words, the relative hazard associated with a specific variable represents an assessment of the influence on survival time as if the other k − 1 explanatory variables have equal values in the groups or the individuals compared. Adjusted coefficients are similarly interpreted in most additive multivariable regression models. A primary goal of multivariable analysis is the isolation and evaluation of individual effects. An additive proportional hazards model has exactly this property.

(p.434) The ratio of hazard functions is analogous to an odds ratio estimated from an additive logistic model. The value                    Survival Data: Proportional Hazards Model                indicates the relative influence of the jth-variable on survival independent of other explanatory variables, when an additive model represents the relationships between explanatory variables and the hazard rate. The relative hazard, for example, associated with vital capacity group membership is e 0.637 = 1.891. That is, the hazard rate in the “low” vital capacity group is a little less than twice that of the “high” vital capacity group, regardless of the ages of the individuals compared. Similarly, the independent influence of age on the hazard rates for these lung cancer patients, regardless of group membership, is e 0.038(age2−age1) where age1 and age2 are compared. For example, for age1 = 55 and age2 = 75, the relative hazard is e 0.038(20) = 2.138. Or, the lung cancer patients age 75 are at about twice the risk (measured by comparing hazard rates) as patients age 55 within both vital capacity groups.

The additive nature of a proportional hazards model dictates that vital capacity group membership and age do not interact. Therefore, a 55-year-old member of the “high” vital capacity group compared to a 75-year-old member of the “low” vital capacity group yields an estimated hazard ratio of e 0.637+0.038(20) = 1.891(2.138) = 4.043, demonstrating a specific partitioning of the overall hazard ratio (4.043) into relative components (vital capacity group status = 1.891 and age = 2.138). Furthermore, these comparisons reflect the impact of vital capacity and age independent of the time of observation. Of course, these efficient and parsimonious summaries of the relationships between vital capacity, age, and survival time are properties of the model and only when the model adequately represents the data do they reflect the parallel relationships among the observed variables.

Application IV: Histologic Type, Treatment, and Lung Cancer Survival

A series of 137 patients with advanced lung cancer categorized by histologic type [8] provides an opportunity to apply a proportional hazards model that requires that distinct but non-numeric categories be taken into account. The data consist of individual survival times (days) classified by one of four lung cancer histologic types (squamous cell, small cell, adenocarcinoma, and large cell) and by a new or standard treatment (x 1 = 0 for the new treatment and x 1 = 1 for the standard treatment). Three other explanatory variables are also recorded for each patient: a general medical status index (x 2), months from diagnosis to the start of the study (x 3), and age (x 4). These data are given in Tables 13–13 and 13–14.

An analysis to determine whether or not a proportional hazards model is an accurate representation of the data is not presented. Statistical tests to assess the assumption of proportionality are part of several “package” computer programs (p.435)

Table 13–13. Lung cancer by type: new treatment (x 1 = 0)

Squamous cell

Small cell

Adenocarcinoma

Large cell

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

999

90

12

54

25

30

2

69

24

40

2

60

52

60

4

45

122

80

6

60

103+

70

22

36

18

40

5

69

164

70

15

68

87+

80

3

48

21

20

4

71

83+

9

3

57

19

30

4

39

231+

50

8

52

13

30

2

62

31

80

3

39

53

60

12

66

242

50

1

70

87

60

2

60

51

60

5

62

15

30

5

63

991

70

7

50

2

40

36

44

90

60

22

50

43

60

11

49

111

70

3

62

20

30

9

54

52

60

3

43

340

80

10

64

1

20

21

65

7

20

11

66

73

60

3

70

133

75

1

65

587

60

3

58

24

60

8

49

8

50

5

66

111

60

5

64

389

90

2

62

99

70

3

72

36

70

8

61

231

70

18

67

33

30

6

64

8

80

2

68

48

10

4

81

378

80

4

65

25

20

36

63

99

85

4

62

7

40

4

58

49

30

3

37

357

70

13

58

61

70

2

71

140

70

3

63

467

90

2

64

95

70

1

61

186

90

3

60

201

80

28

52

80

50

17

71

84

80

4

62

1

50

7

35

51

30

87

59

19

50

10

42

30

70

11

63

29

40

8

67

45

40

3

69

44

60

13

70

25

70

2

6

80

40

4

63

283

90

2

51

15

50

13

40

(p.436)

Table 13–14. Lung cancer by type: standard treatment (x 1 = 1)

Squamous cell

Small cell

Adenocarcinoma

Large cell

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

Time

x 2

x 3

x 4

72

60

7

69

30

60

3

61

8

20

19

61

177

50

16

66

411

60

5

64

384

60

9

42

92

70

10

60

162

80

5

62

228

60

3

38

4

40

2

35

35

40

6

62

216

50

15

52

126

60

9

63

54

80

4

63

117

80

2

38

553

70

2

47

118

70

11

65

13

60

4

56

132

80

5

50

278

60

12

63

10

20

5

49

123+

40

3

55

12

50

4

63

12

40

12

68

82

40

10

69

97+

60

5

67

162

80

5

64

260

80

5

45

110

80

29

68

153

60

14

63

3

30

3

43

200

80

12

41

314

50

18

43

59

30

2

65

95

80

4

34

156

70

2

60

100+

70

6

70

117

80

3

46

182+

90

2

62

42

60

4

81

16

30

4

53

143

90

8

60

8

40

58

63

151

50

12

69

105

80

11

66

144

30

4

63

22

60

4

68

103

80

5

38

25+

80

9

52

56

80

12

43

250

70

8

53

11

70

11

48

21

40

2

55

100

60

13

37

18

20

15

42

139

80

2

64

20

30

5

65

31

75

3

65

52

70

2

55

287

60

25

66

18

30

4

60

51

60

1

67

122

80

28

53

27

60

8

62

54

70

1

67

7

50

7

72

63

50

11

48

392

40

4

68

10

40

23

67

(p.437) ([9], for example), and plotting transformed survival curves can help indicate the adequacy or, particularly, the inadequacy of a statistical model. The relationship among survival curves, as mentioned, is one key to exploring the utility of a proportional hazards model.

To begin to understand the influence of the explanatory variables on survival time among the lung cancer patients, an additive proportional hazards model is proposed that includes all five explanatory variables. The proportional hazards model is fundamentally the same as the previously additive models, except that a design variable indicates the four histologic categories. That is, a three-component design variable (z 1, z 2, and z 3) identifies four histologic types: z 1 = 1 if the cancer type is small cell with z 2 = z 3 = 0; z 2 = 1 if the cancer type is adenocarcinoma with z 1 = z 3 = 0; z 3 = 1 if the cancer type is large cell, with z 1 = z 2 = 0; and squamous cell carcinoma is established as the baseline reference group by setting z 1 = z 2 = z 3 = 0. An additive proportional hazards model incorporating the five explanatory variables is

                   Survival Data: Proportional Hazards Model
The proportional hazards model applied to the lung cancer survival data using three conditions produces the estimated parameters given in Table 13–15.

The comparison of log-likelihood values from the full model and the model with the months from diagnosis (x 3) and age (x 4) variables removed (reduced model) shows that these two variables add little to the description of the survival times of the lung cancer patients. Comparison of the respective log-likelihood statistics (X 2 = 918.101 − 916.335 = 1.766 with two degrees of freedom, yielding a p-value = P(X 2 ≥ 1.766 | b 3 = b 4 = 0) = 0.414 produces no statistical evidence that these two explanatory variables are useful contributors to the study of the survival of these patients. The medical status index (x 1), however, appears worth including in the analysis (z = −5.952 with a p-value < 0.001). The same is true for histologic type. Excluding the cancer histologic type (c 1 = c 2 = c 3 = 0) substantially increases the log-likelihood statistic over the five-variable model (Table 13–15) producing a likely nonrandom difference in log-likelihood values: X 2 = 936.722 − 918.101 = 18.621 with three degrees of freedom, yielding a p-value = P(X 2 ≥ 18.621 | c 1 = c 2 = c 3 = 0) < 0.001. The treatment variable coefficient                    Survival Data: Proportional Hazards Model                indicates that treatment status (old versus new) is marginally important in explaining the differences in survival times between these two groups of patients when adjusted for medical status and histologic type. The hazard rate associated the standard treatment divided by the hazard rate associated with the new treatment is e 0.334 = 1.400 (relative hazard). The value                    Survival Data: Proportional Hazards Model                                   Survival Data: Proportional Hazards Model                produces a p-value of P(| Z | ≥ 1.684 | b 1 = 0) = 0.092 when the medical status index and the histologic types are maintained in the model.

(p.438)

Table 13–15. Proportional hazards model: histology-specific lung cancer survival data

Term

Coefficient

SE

p-value

Hazard ratio

Full model

Small cell

ĉ 1

0.884

0.268

< 0.001

2.421

Adenocarcinoma

ĉ 2

1.170

0.296

< 0.001

3.223

Large cell

ĉ 3

0.372

0.280

0.184

1.450

Treatment

                   Survival Data: Proportional Hazards Model

0.385

0.205

0.061

1.470

Status

                   Survival Data: Proportional Hazards Model

−0.033

0.006

< 0.001

0.968

Months

                   Survival Data: Proportional Hazards Model

0.001

0.008

0.913

1.001

Age

                   Survival Data: Proportional Hazards Model

−0.012

0.009

0.188

0.988

−2 log-likelihood = 916.335; number of model parameters = 7.

Months and age excluded (b 3 = b 4 = 0)

Small cell

ĉ 1

0.848

0.264

< 0.001

2.334

Adenocarcinoma

ĉ 2

1.134

0.293

< 0.001

3.109

Large cell

ĉ 3

0.361

0.279

0.195

1.435

Treatment

                   Survival Data: Proportional Hazards Model

0.334

0.199

0.094

1.400

Status

                   Survival Data: Proportional Hazards Model

−0.031

0.005

< 0.001

0.970

−2 log-likelihood = 918.101; number of model parameters = 5.

Months, age and histology type excluded (b 3 = b 4 = c 1 = c 2 = c 3 = 0)

Treatment

                   Survival Data: Proportional Hazards Model

0.239

0.182

0.190

1.270

Status

                   Survival Data: Proportional Hazards Model

−0.033

0.005

< 0.001

0.967

−2 log-likelihood = 936.722; number of model parameters = 2.

To describe these data further, it is assumed that the hazard rates are at least approximately constant over the range of the follow-up period. This additional assumption yields a model in terms of survival probabilities as

                   Survival Data: Proportional Hazards Model
for the specified histologies, medical status, and treatment. Under the constant hazard rate assumption, the expression for the proportional hazards model produces estimated mean survival times as                    Survival Data: Proportional Hazards Model                . The value                    Survival Data: Proportional Hazards Model                is the estimated mean survival time for a specific group or individual (denoted k) relative to an arbitrary value                    Survival Data: Proportional Hazards Model                . The value                    Survival Data: Proportional Hazards Model                can be, for example, the mean survival time associated with a selected “baseline” category. Setting                    Survival Data: Proportional Hazards Model                to 138.57 days (the average survival time for patients with squamous cell carcinoma receiving the standard treatment with average level of medical status) yields estimated mean survival times                    Survival Data: Proportional Hazards Model                under selected conditions (Table 13–16).

Comparisons of these estimated mean survival times describe the relative influences of the histologic type and treatment (new versus standard) on the estimated (p.439)

Table 13–16. Lung cancer mean survival times: both treatments and four histologic types for patients with average level of medical status

Squamous

Small

Adeno

Large

New (x 1 = 0)

193.52

82.88

62.26

134.88

Standard (x 1 = 1)

138.92

59.35

44.58

96.58

model:                    Survival Data: Proportional Hazards Model

days of survival. The estimated mean survival times represent an application of the estimated relative hazard ratios applied to a baseline value (138.57 days) under the assumption that the hazard rates are at least approximately constant and illustrate one of many ways a statistical model can be used to describe the issues under study.

DEPENDENCY ON FOLLOW-UP TIME

A proportional hazards model does not require the relationship between follow-up time and survival to be specified in detail. The estimated relative hazard measures the separate influence of each explanatory variable on survival, free of any confounding influence of time as long as the hazard functions are proportional and the model is additive. It is instructive to describe a situation in which a dependency exists between follow-up time and a measure of survival. Two perspectives provide brief illustrations. A simple model illustrating the dependency of the odds ratio on follow-up time is presented, followed by a comparison of the logistic regression model (no influence of follow-up time) to the proportional hazard regression model (accounting for any influence of follow-up time) using the same data.

Odds Ratio Model

Consider two groups with constant hazard rates (exponential survival) given by λ1 and λ2. The survival probabilities associated with these two groups at a specific time are                    Survival Data: Proportional Hazards Model                and                    Survival Data: Proportional Hazards Model                .

The odds ratio measuring the relative differences in survival for these two groups at time t is

                   Survival Data: Proportional Hazards Model
To illustrate, Figure 13–8 displays or(t) for λ1 = 0.1, 0.01, and 0.001 with a hazard rate ratio of λ21 = 3 (note the extremely different scales on the vertical axes).

(p.440) As follow-up time increases, the odds ratio increases. This increase is large (very large) for a hazard rate of above 0.1 (top of Fig. 13–8). The structure of the odds ratio is such that it is forced to become large and difficult to interpret as follow-up time increases. However, for small and more typical hazard rates (in the neighborhood of 0.001), follow-up time has a less dramatic influence on the odds ratio (bottom of Fig. 13–8). For hazard rates in the range normally observed in human populations, the odds ratio, nevertheless, increases over time in an approximately linear pattern with a slope proportional to the difference between the two hazard rates. Specifically,

                   Survival Data: Proportional Hazards Model
                   Survival Data: Proportional Hazards Model

Figure 13–8. Odds ratio plotted against time for two groups that experience exponential survival (λ21 = 3).

(p.441) for λi < 0.01. The dependency of the odds ratio measure of risk on follow-up time for constant hazard rates is relatively simple. Postulating constant hazard rates, nevertheless, shows that the magnitude of a measure of risk can depend on follow-up time. For the example, the relationship between group membership and disease outcome is a function of time, as well as hazard rates (λ1, λ2), which complicates the interpretation of the odds ratio as a measure of differences in survival, even in this simple case. In other words, risk measured by an odds ratio is confounded by follow-up time for exponentially distributed survival times, as long as λ1 ≠ λ2. More complicated situations are easily envisioned.

Western Collaborative Group Study Data

The prospectively collected data from the Western Collaborative Group Study (WCGS) includes the time from admission to the study to the time of a coronary event or withdrawal, making it possible to calculate the follow-up times for 3154 participants (Appendix A). Of these study participants, 257 coronary events occurred, and the remaining 92% of the sample were either lost to follow-up (16%) or withdrawn from follow-up because they were disease-free (76% censored). A proportional hazards model applied to these data shows the influence of eight risk factors on CHD “survival” time (CHD-free time). Table 13–17 gives both the estimated coefficients for the additive proportional hazards model and for the parallel additive logistic model (Table 8–13) applied to the same WCGS data.

Multiple logistic model estimates are derived from binary outcomes (CHD event or no CHD events) disregarding time of occurrence. That is, the logistic analysis does not use survival time information. For example, a coronary event that occurs early in a study is given the same weight as a later coronary event.

Table 13–17. A comparison of the proportional hazards model and the logistic model (WCGS data)

“Cox” model

Logistic model

Factor

                   Survival Data: Proportional Hazards Model

SE

                   Survival Data: Proportional Hazards Model

SE

Age

0.063

0.011

0.065

0.012

Height

0.015

0.031

0.016

0.033

Weight

0.007

0.004

0.008

0.004

Systolic bp

0.014

0.006

0.018

0.006

Diastolic bp

0.008

0.010

−0.002

0.010

Cholesterol

0.009

0.001

0.011

0.002

Smoking

0.021

0.004

0.021

0.004

A/B

0.671

0.137

0.653

0.145

(p.442) The proportional hazards model takes time of occurrence into account. If follow-up time is related to outcome, accounting for its influence provides a more efficient and sensitive measure of survival. The results, therefore, from a proportional hazards model differ from those of a logistic model, depending on the extent to which time influences the outcome.

The logistic and proportional hazards models when applied to the WCGS data produce essentially the same results for two reasons (Table 13–17). Coronary events occurred only among a small proportion (8%) of the study subjects (92% of the subjects were censored or lost), producing relatively little information on the follow-up time and CHD events. Also, the eight risk variables, measured only once at the beginning of the study, changed very little during the years of follow-up. For example, height did not change at all, and the smoking and behavior variables were essentially constant. Therefore, the explanatory variables are not strongly associated with time to a coronary event.