# Required packages
library(tidyverse)
library(survey)
library(survival)
library(gtsummary)
library(survminer)
Adjusted survival curves
This page is under development. Changes should be expected.
Background
Unadjusted Kaplan-Meier curves from observational data might be biased because of confounding (Cole and Hernán (2004)). This vignette introduces the concept of inverse probability (IP) weighted survival curves.
Example
We consider a study population of 500 patients who received a medication. 60% of the patients were women. In men, a specific enzyme was measured with a probability of 30%, whereas in women have a 5 times higher chance of having the enzyme. Those with the enzyme were more likely to receive the medication (with a probability of 75%), compared to those without the enzyme (probability of 50%).
- Research question: What is the effect of the medication on death?
- Study design: Cohort study
- Outcome of interest: Time to death or end of follow-up
- Predictor of interest: Medication
- Confounders: Sex, enzyme
The used variables from the data are:
Variable | Definition | Coding |
---|---|---|
female | Sex | 1=Women, 0=Men |
enzyme | Measured enzyme | 1=Present, 0=Not present |
medi | Medication | 1=Received, 0=Not received |
death | Death | 1=Death, 0=Alive |
fup | Follow-up time | Non-negative number |
Knowledge from the crystal ball
Because we simulated the data, we know that medication has no effect on mortality.
Analysis strategy
IP weighting constructs weights which are equal to the probability of an individual’s treatment (here: receiving the medication) given observed covariates (here: sex and enzyme) and creates pseudopopulations in which observed covariates are independent of treatment (i.e. no confounding) (Hernán and Robins (2022)).
Descriptive table
The table below shows a descriptive summary of the study population, by medication.
%>%
data tbl_summary(by = medi)
Characteristic | 0, N = 1721 | 1, N = 3281 |
---|---|---|
fup | 8 (3, 17) | 6 (3, 14) |
death | 65 (38%) | 156 (48%) |
enzyme | 64 (37%) | 199 (61%) |
female | 102 (59%) | 210 (64%) |
1 Median (IQR); n (%) |
Unadjusted Kaplan-Meier curve and Cox-modelling
A naive (unadjusted) survival analysis of the data reveals the following Kaplan-Meier plot. We conclude that the medication has an effect on survival.
<- survfit(Surv(fup, death) ~ medi, data = data)
mod ggsurvplot(mod, data = data, palette = c("#CC0000", "black"), censor = FALSE)
<- coxph(Surv(fup, death) ~ medi, data = data)
mod_cox_unadjusted mod_cox_unadjusted
Call:
coxph(formula = Surv(fup, death) ~ medi, data = data)
coef exp(coef) se(coef) z p
medi 0.3345 1.3973 0.1479 2.262 0.0237
Likelihood ratio test=5.34 on 1 df, p=0.0209
n= 500, number of events= 221
An unadjusted Cox proportional hazard model shows that patients with medication have 1.4 higher hazard of death compared to those without medication.
<- coxph(Surv(fup, death) ~ medi + enzyme + female, data = data)
mod_cox_adjusted mod_cox_adjusted
Call:
coxph(formula = Surv(fup, death) ~ medi + enzyme + female, data = data)
coef exp(coef) se(coef) z p
medi 0.01920 1.01938 0.14993 0.128 0.898
enzyme 2.12737 8.39280 0.20308 10.475 <2e-16
female -0.03787 0.96284 0.16500 -0.230 0.818
Likelihood ratio test=179.2 on 3 df, p=< 2.2e-16
n= 500, number of events= 221
What happens if we adjust for enzym and sex? Then the effect of the medication on death vanishes (hazard ratio=1.02).
IPW modelling
An IPW modelling approach constructs treatment weights (here medication) given known covariates (here sex and enzyme) using a logistic regression model.
# IPW denominator
<- glm(medi ~ female + enzyme, data = data, family = binomial())
mod
$ipw <- NA
data# Probabilty of treatment
$ipw <- predict(mod, data = data, type = "response")
data# Probabilty of non-treatment
$ipw[data$medi==0] <- 1 - predict(mod, data = data, type = "response")[data$medi == 0] data
We construct stabilized weights, since they can provide narrower confidence intervals (Hernán and Robins (2022)).
# Stabilized weights
<- glm(medi ~ 1, data = data, family = binomial())
mod0 $ipw0 <- predict(mod0, data = data, type = "response")
data$ipw0[data$medi == 0] <- 1 - predict(mod0, data = data, type = "response")[data$medi == 0]
data$ipw <- data$ipw0 / data$ipw data
An IPW adjusted Kaplan-Meier curve reveals that medication has no effect on survival:
# Set survey design
<- svydesign(id = ~1, weights = ~ipw, data = data)
svy_design
# IPW adjusted Kaplan-Meier
<- svykm(Surv(fup, death) ~ medi, design = svy_design)
km_fit
<- data.frame(time = km_fit$`1`$time, surv = km_fit$`1`$surv, strata = "medi=1")
km_df <- bind_rows(km_df, data.frame(time=km_fit$`0`$time, surv=km_fit$`0`$surv, strata = "medi=0"))
km_df ggsurvplot_df(km_df, palette = c("#CC0000", "black"), censor=FALSE)
<- svycoxph(Surv(fup, death) ~ medi, design = svy_design)
mod_cox_ipw_adjusted summary(mod_cox_ipw_adjusted)
Independent Sampling design (with replacement)
svydesign(id = ~1, weights = ~ipw, data = data)
Call:
svycoxph(formula = Surv(fup, death) ~ medi, design = svy_design)
n= 500, number of events= 221
coef exp(coef) se(coef) robust se z Pr(>|z|)
medi -0.04057 0.96024 0.14165 0.14255 -0.285 0.776
exp(coef) exp(-coef) lower .95 upper .95
medi 0.9602 1.041 0.7262 1.27
Concordance= 0.494 (se = 0.019 )
Likelihood ratio test= NA on 1 df, p=NA
Wald test = 0.08 on 1 df, p=0.8
Score (logrank) test = NA on 1 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
This is confirmed by an IPW adjusted Cox regression model (hazard ratio=0.96).
Conclusion
Unadjusted Kaplan-Meier curves from observational data might be biased because of confounding. IPW adjusted survival curves account for confounding by constructing weights which are proportional to the probability of treatment given known covariates. An advantage of IPW adjusted Kaplan-Meier curves is that they provide marginal survival estimates, in contrast to stratified plots (Cole and Hernán (2004)).
Data simulation
set.seed(1)
<- 500
n
# 60% percent women
<- rbinom(n, 1, 0.6)
female # Men has the enzyme with a prob of 0.3; women have
# a 5 times higher chance of having the enzym
<- ifelse(runif(n) < plogis(qlogis(0.3) + log(5) * female), 1, 0)
enzyme # Those with enzyme receive medication with prob 0.75
<- ifelse(runif(n) < plogis(log(3) * enzyme), 1, 0)
medi # Hazard of dying: HR=10 of those with enzyme; No effect for medication
<- 0.01 * exp(log(10) * enzyme)
haz # Time to death
<- -log(runif(n)) / haz
time_to_death # Censoring time: Mean duration 20 days
<- rweibull(n, 1, 20)
censored # Follow-up time
<- pmin(time_to_death, censored)
fup # Death
<- ifelse(time_to_death <= censored, 1, 0)
death # Data
<- data.frame(fup, death, medi, enzyme, female) data