In many studies, the outcome of interest is the time from an initial observation until the occurrence of some event of interest, e.g.
Time from transplant surgery until new organ failure
Time to death in a pancreatic cancer trial
Time to first sex
Time to menopause
Time to divorce
Time to receipt of bachelor’s degree
Typically, the event of interest is called a failure (even if it is a good thing). The time interval between a starting point and the failure is known as the survival time and is often represented by \(t\).
Certain aspects of survival data make data analysis particularly challenging.
I hope you do visit Aitutaki some day! The Cook Islands are really nice.
It is important to distinguish between study time and patient time.
The distribution of survival times is characterized by the survival function, represented by \(S(t)\). For a continuous random variable \(T\), \(S(t)=Pr(T>t),\) and \(S(t)\) represents the proportion of individuals who have not yet failed.
The graph of \(S(t)\) versus \(t\) is called a survival curve. The survival curve shows the proportion of survivors at any given time.
Note: sometimes the survival function is defined as \(S(t)=Pr(T \geq t)\).
Consider a small study with 10 patients.
Patient | Event Time | Event Type |
---|---|---|
1 | 4.5 | Death |
2 | 7.5 | Death |
3 | 8.5 | Censored |
4 | 11.5 | Death |
5 | 13.5 | Censored |
6 | 15.5 | Death |
7 | 16.5 | Death |
8 | 17.5 | Censored |
9 | 19.5 | Death |
10 | 21.5 | Censored |
How do we estimate the survival curve for these data?
Perhaps the most popular estimate of a survival curve is the Kaplan-Meier or product-limit estimate. This method is actually fairly intuitive.
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
At each time \(t\), the probability of surviving is just \(1-Pr(failing)\). Before there are any failures in the data, our estimated \(\hat{S}(t)=1\). At the time of the first failure, this probability falls below 1 and is simply one minus the probability of failing at that time, or \(1-\frac{\# ~ failures}{\#~at ~ risk~ of~ failing}\).
After the first failure, things get more complicated. At the time of the second failure, you can calcuate \(1-\frac{\# ~ failures}{\# ~ at ~risk ~ of ~ failing}\), but this doesn’t provide the whole picture, as someone else has already died. In fact, this is the conditional probability of surviving now that you’ve made it past the time of the first failure.
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
How do you then calculate the total (unconditional) probability of survival? That is just the product of the probability of surviving past the first failure times the conditional probability of surviving beyond the second failure given that you made it past the first, or …
\(Pr(\text{survive past first and second times})\) \(=Pr(\text{survive past 1st time}) \times\) \(Pr(\text{survive past 2nd time}\mid\text{survived past 1st time})\) \(=\left(1-\frac{\# ~ failures ~at ~failure~ time~ 1}{\# ~at~ risk ~of ~failing ~at ~failure ~time ~1}\right)\left(1-\frac{\# ~of ~ failures ~ at ~ failure ~ time~ 2}{\# ~ at ~ risk ~ of ~ failing ~ at ~ failure~ time~ 2}\right)\)
If someone is censored, they are no longer at risk of failing at the next failure time and are taken out of the calculation
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | ||
4.5 | 1 | 0 | ||
7.5 | 1 | 0 | ||
8.5 | 0 | 1 | ||
11.5 | 1 | 0 | ||
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | ||
7.5 | 1 | 0 | ||
8.5 | 0 | 1 | ||
11.5 | 1 | 0 | ||
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 1-\(\frac{1}{10}\)=0.9 |
7.5 | 1 | 0 | ||
8.5 | 0 | 1 | ||
11.5 | 1 | 0 | ||
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.9\(\times (1-\frac{1}{9})\)=0.8 |
8.5 | 0 | 1 | ||
11.5 | 1 | 0 | ||
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8\(\times (1-\frac{0}{8})\)=0.8 |
11.5 | 1 | 0 | ||
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.8\(\times (1-\frac{1}{7})\)=0.69 |
13.5 | 0 | 1 | ||
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | ||
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | 4 | 0.69\(\times (1-\frac{1}{5})\)= 0.552 |
16.5 | 1 | 0 | ||
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | 4 | 0.552 |
16.5 | 1 | 0 | 3 | 0.552\(\times (1-\frac{1}{4})\)= 0.414 |
17.5 | 0 | 1 | ||
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | 4 | 0.552 |
16.5 | 1 | 0 | 3 | 0.414 |
17.5 | 0 | 1 | 2 | 0.414 |
19.5 | 1 | 0 | ||
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | 4 | 0.552 |
16.5 | 1 | 0 | 3 | 0.414 |
17.5 | 0 | 1 | 2 | 0.414 |
19.5 | 1 | 0 | 1 | 0.414\(\times (1-\frac{1}{2})\)=0.207 |
21.5 | 0 | 1 |
\(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)
\(t\) | # Failed (\(d_t\)) | # Censored | # Left (\(I_{t+1}\)) | \(\hat{S}(t)\) |
---|---|---|---|---|
0.0 | 0 | 0 | 10 | 1 |
4.5 | 1 | 0 | 9 | 0.9 |
7.5 | 1 | 0 | 8 | 0.8 |
8.5 | 0 | 1 | 7 | 0.8 |
11.5 | 1 | 0 | 6 | 0.69 |
13.5 | 0 | 1 | 5 | 0.69 |
15.5 | 1 | 0 | 4 | 0.552 |
16.5 | 1 | 0 | 3 | 0.414 |
17.5 | 0 | 1 | 2 | 0.414 |
19.5 | 1 | 0 | 1 | 0.207 |
21.5 | 0 | 1 | 0 | 0.207 |
What would \(\hat{S}(21.5)\) be if the last observation were a failure instead of censored?
In between failure times, the KM estimate does not change but is constant. This gives the estimated survival function its step-like appearance (we call this type of function a step function). If we have a very large data set, the KM estimate may look smooth if the steps are very small.
The primary object of inferential interest is generally the survival function, \(S(t)=Pr(T>t)\).
The cumulative distribution function F is defined as the complement of the survival function: \(F(t)=Pr(T \leq t)\). Then if F is differentiable, its derivative is the density function \(f(t)\).
The hazard function, often represented using \(\lambda(t)\), is the event rate at time \(t\) conditional on survival to time \(t\) (or later). It is defined as \(\lambda(t)=\underset{dt \rightarrow 0}{\text{lim}} \frac{Pr(t \leq T < t+dt)}{dt \cdot S(t)}=\frac{f(t)}{S(t)}\).
The shape and characteristics of the hazard and survival functions depend on the assumed probability distribution for the survival times.
Recall the exponential distribution, \(f(t)=\lambda e^{-\lambda t}\), for \(\lambda, t>0\), which has mean \(\frac{1}{\lambda}\) and variance \(\frac{1}{\lambda^2}\). The exponential survivor function \(S(t)=Pr(T>t)=\int_t^\infty \lambda e^{-\lambda t} dt=\frac{-\lambda e^{-\lambda t}}{\lambda}|_t^\infty=e^{-\lambda t}\), and so the exponential hazard \(h(t)\) (to avoid confusing notation here) is given by \(h(t)=\frac{f(t)}{S(t)}=\lambda \frac{e^{-\lambda t}}{e^{-\lambda t}}=\lambda\). The parameter \(\lambda\) is often called a scale parameter.
So under the exponential model, the hazard of the event of interest does not change with time. Let’s generate some data and look at the exponential pdf.
Here’s the exponential hazard for the same values of \(\lambda\).
The Weibull distribution is a generalization of the exponential distribution that allows greater flexibility in the distribution of survival times via inclusion of the shape parameter \(\alpha\).
The Weibull can be written \(f(t)=\alpha \lambda t^{\alpha-1} e^{-\lambda t^\alpha}\), for \(\lambda, \alpha, t>0\), which reduces to the exponential distribution when \(\alpha=1\). Its survivor function is \(S(t)=e^{-\lambda t^\alpha}\) and hazard function is \(\alpha \lambda t^{\alpha-1}\).
For the Weibull distribution, the hazard is increasing for \(\alpha>1\), decreasing for \(0<\alpha<1\), and constant (exponential) for \(\alpha=1\). What type of hazard makes sense for studying lifetimes of leaders?
In some applications, monotone hazard functions may not be realistic. In reliability engineering, you often have a bathtub-shaped hazard, with a decreasing rate of early failures, a constant failure rate during the main lifetime of the product, and an increasing failure rate as the product begins to wear out.
Perhaps the most popular survival model in public health and medicine is the Cox (1972) proportional hazards model, which is a semi-parametric model that makes no assumption about the shape of the baseline hazard function.
We are often interested in assessing the relationship between covariates or predictors of interest and survival. For example, in the Popes paper, the authors are interested in how the year (cohort) of birth is related to post-election survival and how the age at election is related to post-election survival.
The most popular class of models used in survival analysis is the class of proportional hazards models.
Under a proportional hazards model, the hazard of death at time \(t\) for individual \(i\) is given by \(h_i(t)=h_0(t)e^{\beta_1 x_{i1}+\beta_2 x_{i2} + \cdots + \beta_px_{ip}}\). We call \(h_0(t)\) the baseline hazard. In a Weibull model, we assume that \(h_0(t)=\lambda \alpha t^{\alpha-1}\), and in an exponential model, we assume \(h_0(t)=\lambda\).
Consider a simple model with one binary covariate \(x\) that takes value 1 for one group and 0 for the other. Then the hazard at time \(t\) for those in group 1 is \(h_0(t)e^\beta\), and the hazard at time \(t\) for those in group 0 is \(h_0(t)\). Because the hazard in group 1 is proportional to the hazard in group 0, this type of model is called a proportional hazards model.
\(e^\beta\) is called the hazard ratio. When \(\beta>0\), the hazard ratio\(>1\), and we expect those in group 1 to fail more quickly. When \(\beta<0\), the hazard ratio\(<1\), and we expect those in group 1 to be protected against failure.
Another large class of survival models is that of accelerated failure time (AFT) models. The Weibull model can be interpreted both as a proportional hazards model and as an AFT model. Section 3 of the Stander et al. paper illustrates some of the nice features of interpretation of parameter estimates in the Weibull AFT model (e.g., you can interpret a function of the parameters as the percentage increase in survival time for a unit increase in a predictor).