get data
library(survival)
library(car)
g <- read.csv("GMPH_1K_final_2.csv", header = TRUE, sep = ",")
g %>% dim()
## [1] 1000 31
# g %>% head()
g$gender <- as.factor(g$gender)
g$copd <- as.factor(g$copd)
g$ethnicgroup <- as.factor(g$ethnicgroup)
levels(g$ethnicgroup) <- c(levels(g$ethnicgroup), "8")
g$ethnicgroup[is.na(g$ethnicgroup)] <- "8"
table(g$ethnicgroup, exclude = NULL)
##
## 1 2 3 9 8
## 889 17 34 17 43
# set quintile 0 to NAs, coxph will ignore these patients
g <- g %>% mutate(quintile = recode(quintile, "c('0')=NA"))
table(g$quintile, exclude = NULL)
##
## 1 2 3 4 5 <NA>
## 138 205 211 220 216 10
g$quintile <- as.factor(g$quintile)
what does NON-PH mean?
- example: gender
- if relation btw male vs female risk of death changes over time,
- it could mean that males have higher risk of death early on,
- then relation changes, and females have higher risk of death later on
- there is interaction btw gender and time
- { can you have interaction btw variable and OUTCOME??? }
- all you need to do is to add an interaction term btw gender and time {…}
- { need to find a dataset with actual interaction,
- and plot out KM curves with and without interaction and see for myself… }
- testing if interaction term is statistically significant is another way of testing PH assumption
- if interaction term not stat sig, then PH assumption is valid
- should get p-value and plots
- for mathematically reasons, cannot just include follow-up time itself,
- need to transform it with tt() time-transform function
# model with just gender
cox <- coxph(Surv(fu_time, death) ~ gender, data = g)
summary(cox)
## Call:
## coxph(formula = Surv(fu_time, death) ~ gender, data = g)
##
## n= 1000, number of events= 492
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gender2 0.02571 1.02604 0.09082 0.283 0.777
##
## exp(coef) exp(-coef) lower .95 upper .95
## gender2 1.026 0.9746 0.8587 1.226
##
## Concordance= 0.497 (se = 0.013 )
## Likelihood ratio test= 0.08 on 1 df, p=0.8
## Wald test = 0.08 on 1 df, p=0.8
## Score (logrank) test = 0.08 on 1 df, p=0.8
# model with gender and gender * fu_time
cox_int <- coxph(Surv(fu_time, death) ~ gender + tt(gender), data = g)
summary(cox_int)
## Call:
## coxph(formula = Surv(fu_time, death) ~ gender + tt(gender), data = g)
##
## n= 1000, number of events= 492
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gender2 0.6405 1.8974 0.9800 0.654 0.513
## tt(gender) -0.2003 0.8185 0.3182 -0.629 0.529
##
## exp(coef) exp(-coef) lower .95 upper .95
## gender2 1.8974 0.527 0.2779 12.953
## tt(gender) 0.8185 1.222 0.4387 1.527
##
## Concordance= 0.497 (se = 0.013 )
## Likelihood ratio test= 0.49 on 2 df, p=0.8
## Wald test = 0.48 on 2 df, p=0.8
## Score (logrank) test = 0.48 on 2 df, p=0.8
- p-value of 0.529 for tt(gender) is not statistically significant, good news,
- PH assumption is valid, no interaction between variable and time
- i.e. time does NOT affect gender’s impact on risk of death
scenario 2
- if p-value is low, but hazards looks proportional for most the the follow-up period
- can divide survival analysis into 2 time periods
- fit one model without interaction for when PH is valid
- fit another model with interaction for when PH is invalid
scenario 3
- stratify by variable
- e.g. separate models for male and females
- disadvantage = can no longer assess effect of gender on mortality
EOF