Survival Analysis

Survival Data

In many studies, the outcome of interest is the time from an initial observation until the occurrence of some event of interest, e.g.

Time from transplant surgery until new organ failure
Time to death in a pancreatic cancer trial
Time to first sex
Time to menopause
Time to divorce
Time to receipt of bachelor’s degree

Typically, the event of interest is called a failure (even if it is a good thing). The time interval between a starting point and the failure is known as the survival time and is often represented by \(t\).

Survival Data

Certain aspects of survival data make data analysis particularly challenging.

Typically, not all the individuals are observed until their times of failure
- An organ transplant recipient may die in an automobile accident before the new organ fails
- A student may withdraw from the program to start a multi-billion dollar health company
- Not everyone gets divorced
- A pancreatic cancer patient may move to Aitutaki instead of undergoing further treatment
In this case, an observation is said to be censored at the last point of contact with the patient.

Survival

I hope you do visit Aitutaki some day! The Cook Islands are really nice.

Study Time and Patient Time

It is important to distinguish between study time and patient time.

A study may start enrolling patients in September and continue until all 500 patients have been enrolled
This is likely to take months or years
Time is typically converted to patient time (time between enrollment and failure or censoring) before analysis
In the world leaders data, patient time is the time from birth to death; study time can be represented by year of birth and year of death

Survival Function

The distribution of survival times is characterized by the survival function, represented by \(S(t)\). For a continuous random variable \(T\), \(S(t)=Pr(T>t),\) and \(S(t)\) represents the proportion of individuals who have not yet failed.

The graph of \(S(t)\) versus \(t\) is called a survival curve. The survival curve shows the proportion of survivors at any given time.

Note: sometimes the survival function is defined as \(S(t)=Pr(T \geq t)\).

Vaccination in Burkina Faso, 2004 BMJ

Survival of Children by Vaccination Status

Estimating Survival Curves

Consider a small study with 10 patients.

Patient	Event Time	Event Type
1	4.5	Death
2	7.5	Death
3	8.5	Censored
4	11.5	Death
5	13.5	Censored
6	15.5	Death
7	16.5	Death
8	17.5	Censored
9	19.5	Death
10	21.5	Censored

How do we estimate the survival curve for these data?

Kaplan-Meier Estimate

Perhaps the most popular estimate of a survival curve is the Kaplan-Meier or product-limit estimate. This method is actually fairly intuitive.

\(I_t\): # at risk of failure at time \(t\) (i.e., those who did not fail before \(t\) and those who were not censored before \(t\))
\(d_t\): # who fail at time \(t\)
\(q_t=\frac{d_t}{I_t}\): estimated conditional probability of failing at time \(t\)
\(S(t)\): cumulative probability of surviving beyond time \(t\), estimated as \(\hat{S}(t)=\prod_{t_i \leq t} \left(1-\frac{d_{t_i}}{I_{t_i}}\right)\)