(p.478) Appendix B Application of Survival Analysis to Prenatal Determinants of Schizophrenia Example (PDSE) Data
(p.478) Appendix B Application of Survival Analysis to Prenatal Determinants of Schizophrenia Example (PDSE) Data
We apply the KaplanMeier method of survival analysis (Kaplan and Meier 1958) to the data of the Prenatal Determinants of Schizophrenia Example (PDSE) study. Recall that the PDSE is a birth cohort in which we examine exposure to high maternal body mass index as a risk factor for schizophrenia spectrum disorder in offspring (see tables 10.6–10.8). Cases are ascertained by a health plan treatment registry (and subsequently diagnosed). The cohort under observation is enumerated by a health plan membership registry.
We will use the KaplanMeier approach to calculate the risk for the exposed group in the PDSE. The exposed comprise 912 of the 12,090 cohort members. The same procedure could be applied to compare the risk in the unexposed group.
The procedures are outlined in a summary table, table B.1, and illustrated in more detail in a display table, table B.2. Throughout the following explanation, it is useful to keep in mind that risk and survival are connected. Over any given time interval, the survival probability is the complement of the disease risk. That is, the probability of surviving diseasefree is equal to one minus the probability of developing disease. (Strictly speaking, these are conditional survival probabilities and conditional disease risks: conditional on having reached the given interval).
Step 1. Divide the risk period into many small time intervals. We use time intervals equal to one day, because in the PDSE the health plan membership and treatment registry data are provided according to calendar date. For each oneday interval, a cohort at risk is defined by those who remain members of the health plan and diseasefree at the beginning of the interval. For example, as shown in table B.2, there were 912 people at risk on the first day, 802 people at risk on day 1332, and 346 people at risk on the last day. The cohort at risk declines across the intervals because some people drop out of the health plan and others develop the disease.
The cases for each oneday interval are those ascertained by the health plan on that day according to the treatment registry. As shown in table B.2, there were 13 days on which a case of schizophrenia spectrum disorder was ascertained. The oneday intervals are small enough that, in each interval, either no cases or only one case was ascertained.
Step 2. Compute a result for each interval. In the KaplanMeier approach, the result we compute is a survival probability. We begin by calculating a risk for each oneday interval (step 2A in table B.1). The risk in the interval can be observed (p.479)
Table B.1 Summary Table for KaplanMeier Survival Analysis: PDSE Exposed Group
Step Number 
Action 

1 
Divide total followup period into 6,209 oneday intervals. 
2A. 
Estimate the interval risk for each of the 13 informative intervals in which a case occurred. Interval risk = N cases / N cohort at risk. 
2B. 
Estimate interval survival probability for each of the 13 informative intervals in which a case occurred. Interval survival probability = 1 − interval risk. 
3A. 
Multiply the interval survival probabilities of the 13 informative intervals to obtain survival probability for total period. Cumulative survival probability for total period = 0.9988938 × 0.9987531 × … ×0.9974684 = 0.9799442. 
3B. 
Estimate risk for total period (6,209 days) = 1 − cumulative survival probability for total period = 1 − 0.9799442 = 0.0200558 = 0.0201 
Note: Disease risk computed from the informative intervals.
In the vast majority of oneday intervals, no case occurred, and the survival probability is one. These intervals are uninformative because an interval with a survival probability of one has no impact on the overall survival probability, as will be explained. So we need to compute the survival probability for only the 13 informative intervals in which a case did occur.
Step 3. Combine the results across the intervals. In the KaplanMeier approach, this step has two parts. To obtain the overall survival probability (step 3A in table B.1), we multiply the survival probabilities for the 13 informative intervals, as shown in table B.2. We disregard the survival probabilities for the other intervals, because an interval with a survival probability of one can have no impact on the result of the multiplication. To obtain the disease risk (step 3B in table B.1), we subtract the overall survival probability from the number one. Using this method, we estimate the risk of schizophrenia spectrum disorder in the exposed group to be 0.0201. This disease risk applies to the entire period (6,209 days), even though the last case occurred well before the end of the period.
Table B.2 Display Table for KaplanMeier Survival Analysis: PDSE Exposed Group
01/01/81 


Oneday interval 
Day 1 
138 
1,332 
1,368 
1,440 
1,744 
2,044 
2,191 
Cases N 
0 
1 
1 
1 
1 
1 
1 
1 
Cohort N 
912 
904 
802 
798 
783 
751 
714 
701 
Interval survival probability 
1.000000 
0.9988938 
0.9987531 
0.9987469 
0.9987229 
0.9986684 
0.9985994 
0.9985735 
Cumulative survival 
1.000000 
0.9988938 
0.9976483 
0.9963981 
0.9951256 
0.9938005 
0.9924086 
0.9909929 
Oneday interval 
2,208 
2,212 
2,527 
4,088 
4,968 
5,550 
Day 6,209 


Cases N 
1 
1 
1 
1 
1 
1 
0 

Cohort N 
697 
696 
662 
499 
438 
395 
346 

Interval survival probability 
0.9985653 
0.9985632 
0.9984894 
0.9979960 
0.9977169 
0.9974684 
1.0000000 

Cumulative survival 
0.9895712 
0.9881494 
0.9866567 
0.9846794 
0.9824313 
0.9799442 
0.9799442 
Note: Followup period divided into 6,209 oneday intervals. Displayed are the first and last day, and 13 days in which cases occurred.
Double lines indicate intervals not shown in the table (e.g., days 2–137). For all days not shown, no case occurred so that the interval risk is 0, and the interval survival probability is 1. The risk for a oneday interval = N cases / N cohort at risk (e.g., the interval risk for day 1,332 = 1/802 = 0.00124688). The survival probability for a oneday interval = 1 − interval risk (e.g., the interval survival probability for day 1,332 = 1 − 0.00124688 = 0.9987531). The cumulative survival probability on a given day = the product of interval survival probabilities up to and including the specified day (e.g., cumulative survival on day 139 = p (survived day 1) × p (survival day 2  survived day 1) × … × p (survived day 138  survived day 137) × p (survived day 139  survived day 138) = 1 × 1 × … × 0.9988938 × 1). The cumulative survival probability at the end of the study (day 6,209) = 0.9799442. The risk for the entire period (6,209 days) = 1 − 0.9799442.