Jump to ContentJump to Main Navigation
Psychiatric EpidemiologySearching for the Causes of Mental Disorders$

Ezra Susser, Sharon Schwartz, Alfredo Morabia, and Evelyn Bromet

Print publication date: 2006

Print ISBN-13: 9780195101812

Published to Oxford Scholarship Online: September 2009

DOI: 10.1093/acprof:oso/9780195101812.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 February 2017

(p.478) Appendix B Application of Survival Analysis to Prenatal Determinants of Schizophrenia Example (PDSE) Data

(p.478) Appendix B Application of Survival Analysis to Prenatal Determinants of Schizophrenia Example (PDSE) Data

Source:
Psychiatric Epidemiology
Publisher:
Oxford University Press

We apply the Kaplan-Meier method of survival analysis (Kaplan and Meier 1958) to the data of the Prenatal Determinants of Schizophrenia Example (PDSE) study. Recall that the PDSE is a birth cohort in which we examine exposure to high maternal body mass index as a risk factor for schizophrenia spectrum disorder in offspring (see tables 10.6–10.8). Cases are ascertained by a health plan treatment registry (and subsequently diagnosed). The cohort under observation is enumerated by a health plan membership registry.

We will use the Kaplan-Meier approach to calculate the risk for the exposed group in the PDSE. The exposed comprise 912 of the 12,090 cohort members. The same procedure could be applied to compare the risk in the unexposed group.

The procedures are outlined in a summary table, table B.1, and illustrated in more detail in a display table, table B.2. Throughout the following explanation, it is useful to keep in mind that risk and survival are connected. Over any given time interval, the survival probability is the complement of the disease risk. That is, the probability of surviving disease-free is equal to one minus the probability of developing disease. (Strictly speaking, these are conditional survival probabilities and conditional disease risks: conditional on having reached the given interval).

Step 1. Divide the risk period into many small time intervals. We use time intervals equal to one day, because in the PDSE the health plan membership and treatment registry data are provided according to calendar date. For each one-day interval, a cohort at risk is defined by those who remain members of the health plan and disease-free at the beginning of the interval. For example, as shown in table B.2, there were 912 people at risk on the first day, 802 people at risk on day 1332, and 346 people at risk on the last day. The cohort at risk declines across the intervals because some people drop out of the health plan and others develop the disease.

The cases for each one-day interval are those ascertained by the health plan on that day according to the treatment registry. As shown in table B.2, there were 13 days on which a case of schizophrenia spectrum disorder was ascertained. The one-day intervals are small enough that, in each interval, either no cases or only one case was ascertained.

Step 2. Compute a result for each interval. In the Kaplan-Meier approach, the result we compute is a survival probability. We begin by calculating a risk for each one-day interval (step 2A in table B.1). The risk in the interval can be observed (p.479)

Table B.1 Summary Table for Kaplan-Meier Survival Analysis: PDSE Exposed Group

Step Number

Action

1

Divide total follow-up period into 6,209 one-day intervals.

2A.

Estimate the interval risk for each of the 13 informative intervals in which a case occurred.

Interval risk = N cases / N cohort at risk.

2B.

Estimate interval survival probability for each of the 13 informative intervals in which a case occurred.

Interval survival probability = 1 − interval risk.

3A.

Multiply the interval survival probabilities of the 13 informative intervals to obtain survival probability for total period.

Cumulative survival probability for total period = 0.9988938 × 0.9987531 × … ×0.9974684 = 0.9799442.

3B.

Estimate risk for total period (6,209 days) = 1 − cumulative survival probability for total period = 1 − 0.9799442 = 0.0200558 = 0.0201

Note: Disease risk computed from the informative intervals.

directly, since within the interval there is no attrition. The risk is simply the proportion of people in the cohort at the beginning of the interval who develop disease during the interval. We then calculate the survival probability as one minus this risk (step 2B in table B.1). For example, as shown in table B.2 for day 1,332, the risk is 1/802 = 0.0012469, and the survival probability is 1 − 0.0012469 = 0.9987531.

In the vast majority of one-day intervals, no case occurred, and the survival probability is one. These intervals are uninformative because an interval with a survival probability of one has no impact on the overall survival probability, as will be explained. So we need to compute the survival probability for only the 13 informative intervals in which a case did occur.

Step 3. Combine the results across the intervals. In the Kaplan-Meier approach, this step has two parts. To obtain the overall survival probability (step 3A in table B.1), we multiply the survival probabilities for the 13 informative intervals, as shown in table B.2. We disregard the survival probabilities for the other intervals, because an interval with a survival probability of one can have no impact on the result of the multiplication. To obtain the disease risk (step 3B in table B.1), we subtract the overall survival probability from the number one. Using this method, we estimate the risk of schizophrenia spectrum disorder in the exposed group to be 0.0201. This disease risk applies to the entire period (6,209 days), even though the last case occurred well before the end of the period.

(p.480)

Table B.2 Display Table for Kaplan-Meier Survival Analysis: PDSE Exposed Group

01/01/81

One-day interval

Day 1

138

1,332

1,368

1,440

1,744

2,044

2,191

Cases N

0

1

1

1

1

1

1

1

Cohort N

912

904

802

798

783

751

714

701

Interval survival probability

1.000000

0.9988938

0.9987531

0.9987469

0.9987229

0.9986684

0.9985994

0.9985735

Cumulative survival

1.000000

0.9988938

0.9976483

0.9963981

0.9951256

0.9938005

0.9924086

0.9909929

One-day interval

2,208

2,212

2,527

4,088

4,968

5,550

Day 6,209

Cases N

1

1

1

1

1

1

0

Cohort N

697

696

662

499

438

395

346

Interval survival probability

0.9985653

0.9985632

0.9984894

0.9979960

0.9977169

0.9974684

1.0000000

Cumulative survival

0.9895712

0.9881494

0.9866567

0.9846794

0.9824313

0.9799442

0.9799442

Note: Follow-up period divided into 6,209 one-day intervals. Displayed are the first and last day, and 13 days in which cases occurred.

Double lines indicate intervals not shown in the table (e.g., days 2–137). For all days not shown, no case occurred so that the interval risk is 0, and the interval survival probability is 1. The risk for a one-day interval = N cases / N cohort at risk (e.g., the interval risk for day 1,332 = 1/802 = 0.00124688). The survival probability for a one-day interval = 1 − interval risk (e.g., the interval survival probability for day 1,332 = 1 − 0.00124688 = 0.9987531). The cumulative survival probability on a given day = the product of interval survival probabilities up to and including the specified day (e.g., cumulative survival on day 139 = p (survived day 1) × p (survival day 2 | survived day 1) × … × p (survived day 138 | survived day 137) × p (survived day 139 | survived day 138) = 1 × 1 × … × 0.9988938 × 1). The cumulative survival probability at the end of the study (day 6,209) = 0.9799442. The risk for the entire period (6,209 days) = 1 − 0.9799442.