This allows the plotting of adjusted curves for different groups, which are very similar to Kaplan-Meier curves, but instead show or predict the probability of survival in each group, while keeping the other covariates fixed at their mean values. Assumptions of the Cox PH model—other than assumptions that apply to all survival analyses, such as noninformative censoring described above—include a linear relationship between the covariates and the log-hazard, as well as the PH assumption.

While the PH assumption is central to the Cox model, its actual importance is debated. While some authors stress the importance, 25 , 26 others de-emphasize it and take the view that the HR can still be viewed as an average effect during the observation period when the assumption is violated. Several methods to check this and other assumptions have been suggested, and we refer to previous literature on the topic for a detailed overview.

The Cox PH model assumes the covariates to be time-independent—in other words, the values of the variable of each patient eg, gender and age at time of diagnosis do not change over time.

1. Big Picture. Making God the Main Focus of Your Life!
2. Time-To-Event Data Analysis | Columbia University Mailman School of Public Health?
3. Forensic Science, Medicine and Pathology Vol 6, issue 4, December 2010!

Extensions of the Cox model are available that allow for covariates that vary over time eg, blood pressure recordings at follow-up time points. Nonparametric and semiparametric methods are commonly used to analyze survival data in anesthesia, critical care, perioperative, and pain research. In their retrospective cohort study, Huang et al 29 sought to identify predictors of long-term survival in patients after lung cancer surgery.

The authors initially used multiple log-rank tests to identify covariates that are potentially related to survival. With this model, the authors identified 6 factors associated with either longer or shorter overall survival. For example, limited resection was associated with a higher hazard rate and hence shorter survival HR, 1.

Kaplan-Meier curves were presented for each of the 4 possible combinations Figure 2 , and a log-rank test was used for an unadjusted comparison of the survival curves. A multivariable model adjusting for confounders suggested that administration of both flurbiprofen axetil and dexamethasone was associated with prolonged overall survival when compared to no use of both, with an adjusted HR of 0.

In their randomized controlled trial, Wilson et al 31 studied whether dural puncture epidural DPE —a technique where the dura is punctured but medication is not administered in the subarachnoid space—expedites analgesia in laboring patients compared to the conventional lumbar epidural LE technique.

The authors applied the Kaplan-Meier method to estimate median time to achieve adequate analgesia in each treatment group. A Cox PH regression model with treatment group as a sole independent variable was used to estimate the treatment effect. The estimated HR was 1. Here, in this randomized controlled trial, the purpose of the Cox PH model was to obtain an estimate of the treatment effect. Using data on patients who participated in 2 trials across 4 clinical sites for a follow-up analysis, Podolyak et al 30 studied effects of supplemental perioperative oxygen on long-term mortality in patients undergoing colorectal surgery.

The authors present survival curves using Kaplan-Meier estimates and use a Cox PH model, stratified by study and site to allow for separate baseline hazards for each study and site. This approach was presumably chosen as it allows for the estimation of an overall HR estimate and significance test across all study sites.

Parametric models assume a specific distribution of the survival times. Advantages of a parametric model include a higher efficiency ie, greater power , 14 which can be particularly useful with smaller sample sizes. Furthermore, a variety of parametric techniques can model survival times when the PH assumption is not met.

However, it can be quite challenging to identify the most appropriate data distribution, and parametric models have the drawback of providing misleading inferences if the distributional assumptions are not met. In contrast, the semiparametric Cox model is a safe and proven method without the need to specify a specific data distribution, 36 which is why this model is most common in analyzing survival data. For a more detailed discussion on parametric models, we refer to previously published literature on the topic.

## Time-To-Event Data Analysis

The previously described techniques are useful for studying time until occurrence of a specific event that occurs only once, terminates the observation of a patient, and occurs independently between the patients. Recurrent event models are capable of modeling the sequential occurrence of events over time. Competing risk models can accommodate multiple competing types of failure events, each of which terminate the observation of an individual. Or, commonly, researchers are interested in an event such as cancer recurrence, but death that occurs before the event of interest is a competing risk.

In this setting, the researcher can either model the time to the earliest of death or cancer recurrence or use special methods to model both events. Frailty models account for nonindependence of observations in clustered data for correlated failure times , by incorporating random effects. These models are analogous to mixed effect models for uncensored longitudinal and correlated data, as described in a recent tutorial in this series. The power of a method to analyze survival time data depends on the number of events rather the total sample size.

First, the number of events needed to detect a minimum clinically important effect size, like a prespecified HR, with a preselected power and alpha level is computed. Depending on the planned data analysis method, different approaches for estimating the number of events have been proposed, including the Schoenfeld method for log-rank tests or PH models. Second, to calculate the total sample size, the proportion of patients who are expected to experience the event needs to be estimated.

Survival data are unique in that the research questions essentially involve a combination of whether the event has occurred in the observation period and when it has occurred.

## Survival Analysis with R

Censoring, or the incomplete observation of failure times, is common in these data, such that specific statistical methods are required for an appropriate analysis. The Kaplan-Meier method estimates the unadjusted probability of surviving beyond a certain time point, and a Kaplan-Meier curve is a useful graphical tool to display the estimated survival function. The log-rank test is commonly used to compare survival curves between different groups, but can only be used for a crude, unadjusted comparison. The Cox PH model is the most commonly used technique to assess the effect of factors, such as treatments, that simultaneously allows one to control for the effects of other covariates.

The exponentiated regression coefficients can be interpreted in terms of an HR. This semiparametric technique makes no assumptions about the distribution of the survival times. If the distribution can be appropriately identified and modeled, parametric techniques can alternatively be used. For special circumstances in which the standard techniques cannot be validly used, a variety of methods including recurrent events models, competing risks models, and frailty models are available.

Contribution: This author helped write and revise the manuscript. Name: Thomas R. You may be trying to access this site from a secured browser on the server. Please enable scripts and reload this page. Wolters Kluwer Health may email you for journal alerts and information, but is committed to maintaining your privacy and will not share your personal information without your express consent.

Figure 1. Figure 2. Vetter TR, Schober P. Regression: the apple does not fall far from the tree. Anesth Analg. Cited Here Unadjusted bivariate two-group comparisons: when simpler is better.

• You are here.
• Keywords and phrases.?
• Handbook of Interpersonal Psychoanalysis.
• Interval-Censored Time-to-Event Data: Methods and Applications!
• Passion for Tea: Its History, Its Future, Its Health Benefits;
• Faber & Kells Heating & Air-Conditioning of Buildings (10th Edition)!
• The direction and magnitude of the bias from Models 1—3 depended on the scenario Table 3. When the effect of the exposure on death was much stronger in disease-free than in diseased subjects and much stronger than on disease Scenario 3 , exposed subjects tended to die before developing the disease. In Models 1—3, the age at disease onset for these subjects was right-censored at their last visit, and they therefore did not contribute to any risk set between that visit and death.

By contrast, when the effect of the exposure on death was much stronger in diseased subjects Scenario 4 , exposed diseased subjects tended to die shortly after their disease onset, possibly before being diagnosed. For all participants without follow-up, we had the date of death or the alive status at some point in time during the year follow-up, giving subjects for Model 4. Among the subjects, Among the subjects who were never diagnosed with dementia, The overall percentages of events are therefore close to those simulated in the scenarios with high event rates Table 2.

The time elapsed between the last visit and death was longer than 10 years for subjects, which may be long enough to develop dementia. Because age-specific incidence of dementia and mortality rate depend on sex, 13 , 15 separate analyses were conducted in men and women. Selected subjects characteristics are described in Table 4.

As expected, the risk of dementia was lower in subjects with the primary school certificate in both men and women Table 5. Cognitive reserve for highly educated people indeed leads to higher severity of the underlying brain lesions at the time of dementia onset, and thus to a shorter survival time thereafter. Estimated effects of education level, high blood pressure and smoking status at baseline on the risks of dementia and death, based on different survival and illness-death models, PAQUID cohort, France, — All models used age as the time-scale and included the three indicators of education level, high blood pressure and ever smoked, as well as birth year as a continuous covariate.

The illness-death model for interval-censored data, as compared with standard survival models, resulted in better estimates of the effects on disease of exposures that were associated with death, especially when the mortality rates were high. The superiority of the illness-death model was due to its ability to account for the probability of developing the disease between the last visit and death.

The direction and magnitude of the bias from the standard models depended on the exposure effects on disease and death in both diseased and disease-free subjects. As in most simulation studies, we investigated only a limited range of scenarios and imposed some restrictive assumptions. In particular, we imposed constant effects of only one time-independent covariate.

All analytical models which assumed proportional transition intensities were thus correctly specified. Yet, it would be of interest to investigate the impact of model misspecification. We also assumed that death was the only source of dependent censoring.

To account for this, the illness-death model for interval-censored data can be extended as described in Barrett et al. Further simulations would be necessary to investigate the advantage of this model as compared with our simpler but more flexible semi-parametric illness-death model for interval-censored data. Similarly, we assumed in both simulations and application that visit times were independent of the illness-death process.

This independence assumption is reasonable in the PAQUID study where the visits are scheduled in advance independently of any outcome. Finally, we used a Markov illness-death model and thus assumed that mortality rates in diseased subjects depended only on age and exposures, and not on disease duration. It would be of interest to investigate the impact of the violation of such assumption in further simulation studies.

## Survival Analysis and Interpretation of Time-to-Event Data : Anesthesia & Analgesia

When the disease status may be misclassified, hidden Markov models could be estimated using the msm R package 27 which also handles interval censoring and allows estimation of covariate effects on all transition intensities. However, constant or piecewise constant transition intensities have to be assumed as opposed to the SmoothHazard R package that allows smooth non-parametric estimation of transition intensities, as well as parametric estimation assuming Weibull distributions.

Multiple imputation for the age at dementia has been proposed as an alternative to penalized likelihood. Fully non-parametric estimation has also been proposed, 16 but to our knowledge it does not allow the incorporation of covariates in the model. If the time-to-onset of disease is interval censored and the exposure is not associated with death, one can use cause-specific hazards models using midpoint imputation for the time-to-onset of disease for diagnosed subjects, and censoring at the last disease-free visit before death or before end of follow-up for undiagnosed subjects.

If the time-to-onset of disease is interval censored and the exposure may be associated with death, then the illness-death model for interval-censored data should be considered in order to account for the possibility of developing the disease between last visit and death. The SmoothHazard R package allows estimation of the illness-death model for interval-censored data, providing estimates of exposure effects on disease and death in both disease and disease-free subjects, as well as parametric or smooth non-parametric estimates of transition intensities.

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account.

### Shop with confidence

Sign In. Advanced Search. Article Navigation. Close mobile search navigation Article Navigation. Volume Article Contents. Regression models. Simulation study. Application to dementia. Interval-censored time-to-event and competing risk with death: is the illness-death model more accurate than the Cox model?

E-mail: karen. Oxford Academic. Google Scholar. Catherine Helmer. Pierre Joly. Cite Citation. Permissions Icon Permissions. For instance, in medical studies, data become censored when the trial observation period is shorter than the time to event. Other reasons for censoring include loss to follow-up and death due to an unrelated cause.

If censored observations are not present in a sample, the Kaplan-Meier estimator is equivalent to obtaining an empirical survival distribution. Suppose a homogeneous population has the survival function, S t , which represents the probability that an individual will be alive at time, t, and the event of interest is development of an oral lesion. Moreover, consider a sample of n individuals from this population and that survival times are subject to right-censoring i.

The Kaplan-Meier estimator would be defined as [ 2 ]. The Kaplan—Meier estimator is a decreasing step function which changing only at time of an event. A consequence of this is that the mean lifetime can not be estimated. A solution for this problem is to assume that the survival function is zero after the largest time, although this obviously results in a biased estimate. Alternatively, a better solution is to consider the median survival time [ 27 ]. The median survival is the smallest time at which the survival probability drops to 0.

If the survival curve does not drop to 0. The mean survival time is estimated as the area under the survival curve in the interval 0 to t max [ 31 ]. In many practical situations, lifetime data may be interval-censored. In these situations, the time until the event of interest is not observed exactly. In such cases, the only information available for each individual is that their event time falls within an interval, and the exact time is unknown. The most basic approach for analyzing interval-censored survival data is use of a nonparametric estimation of survival function. The latter approach does not require any modeling assumptions, and thus, the estimated curves can be easily interpreted in a similar manner to Kaplan-Meier curves for right-censoring.

This is usually the first analysis that is performed for survival time with interval censoring, and it can be the basis for further parametric or semiparametric analyses.

Here, we present an analog Product-Limit estimator of the survival function for interval-censored data. This estimator was suggested by [ 15 ]. However, it has no closed form and it is based on an iterative procedure. Step 4: Compute the updated Product-Limit estimator 1 by using quantities found in Steps 2 and 3. Currently, there are some statistical software programs that provide tools for analyzing interval-censored failure time data. For each random sample generated, survival function was estimated according to the three different scenarios for calculating lifetimes IC, UL, and MP.

The survival function estimates were then compared with the true survival function. The goal of this simulation study was to quantify the error in the traditionally applied approaches i. Accordingly, smaller values of MAE correspond to a better estimate of survival function. Overall, the IC approach produced the lowest mean MAE value for the larger sample sizes, as expected. It was also observed that increases in sample size were associated with decreases in the mean of MAE, regardless of the approach used. However, as the censoring percentage and sample size increased, the IC approach exhibited the lowest MAE mean value, regardless of hospital visit interval.

When the analysis is restricted to the UL and MP approaches, MP approach presented lower mean values of MAE, in general, and it is seen as the less worse approach between those who ignore the existence of interval censorship nature in data. Simulation study. In general, IC approach presented better performance for all scenarios.

It is also worth mentioning that the range of each hospital visit interval was found to contribute to the magnitude of these differences, with greater differences observed as the hospital visit ranges increased. To show the applicability of interval-censoring mechanism in real data sets, we consider two studies that were previously conducted at the A.

Camargo Cancer. The data sets are characterized by different sample sizes and distinct survival curves. In addition, we quantified the estimates difference from UL and MP approaches when IC approach is considered as reference. A prospective study of oral lesion development in children younger than 18 years after liver transplantation was performed at the A. Researchers believe that oral lesions are a side effect of the immunosuppressive medicines that are administered following liver transplantation. Oral exams and oral care were performed by stomatology specialists during follow-up appointments.

Patients were initially observed every 1—2 months. As their recovery progressed, the interval between visits lengthened. Time until lesion diagnosis was defined as the period between the date of transplantation and the time to first observation of an oral lesion. The mean interval between the last two follow-up exams was approximately 2 months, while the maximum observation time was days. Next, we consider an explanatory variable for this data set to illustrate the methods, and to assess whether it is important to explain the time until oral lesion development.

These survival functions were compared with the log-rank statistic test for censoring-interval data [ 37 ], and a significant difference between the survival curves was observed. The corresponding estimated survival curves show that the younger patients had a better survival rate than the older patients, regardless of the analysis approach. In addition, use of the UL method for the older patient group tended to overestimate the survival curve, whereas application of the MP approach to the younger patient group showed a trend towards under or overestimating the survival curve.

Meanwhile, the UL approach did trend towards overestimating the survival curve. Corresponding differences are observed in the MAE as well, and this results in greater difference from IC estimates in the UL approach, as expected. The data set was obtained at the A. Camargo Cancer Center and it included female patients affected by recurrence of ovarian cancer. In particular, patients who were diagnosed with high-grade serous carcinoma between and were included in our data set, and follow-up exams were conducted until The event of interest is recurrence of ovarian cancer.

The exact time of the start of cancer recurrence was not observed, although it was known that it occurred between the diagnosis examination and the preceding examination. Time to recurrence was defined as the period between the date of surgery to remove the primary cancer and the diagnosis of recurrence. The mean interval between the two last exams was 6 months and the maximum observation time was approximately 8 years.

While these survival functions are in close proximity at some of the time points, in general, UL approach lead to an higher survival rate rather than IC and MP approaches. For example, the estimated probability of recurrence for ovarian cancer patients beyond 24 months is 0. In this scenario, if the interest is the point estimate, it is evident that UL approach leads to higher survival rate compared to the use of interval-censored approach. Estimated survival functions obtained with the Kaplan-Meier and Turnbull methods for the ovarian cancer recurrence data set. Interval-censored data are often presented in medical applications.

However, many researchers do not take into account this mechanism when analyzing data. This may be because traditional methodologies are easier to apply and are well-known. As observed in the examples presented in this paper, when the usual methods for survival analysis are applied inappropriately, authors should be cautious regarding their conclusions.

Besides, it is worth mentioning that by assuming that an event of interest occurs at the end of each interval or at the midpoint might lead to an overestimate of survival rates, especially when there is a large interval between the diagnosis examination and the preceding exam.

We hope the analyses we have presented will help researchers better understand the implications of applying traditional survival analysis methods versus adequate methods when analyzing interval-censored data. Ten-year follow-up of ovarian cancer patients after second-look laparotomy with negative findings. Obstet Gynecol. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration.

Cancer Chemother Rep. Cox DR. Regression models and life-tables. J R Stat Soc B. Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer. J Clin Oncol. Gefitinib versus cisplatin plus docetaxel in patients with non-small-cell lung cancer harbouring mutations of the epidermal growth factor receptor WJTOG : an open label, randomised phase 3 trial. Lancet Oncol.