get data

library(survival)
library(car)
g <- read.csv("GMPH_1K_final_2.csv", header = TRUE, sep = ",")
g %>% dim()
## [1] 1000   31
# g %>% head()
g$gender <- as.factor(g$gender)
g$copd <- as.factor(g$copd)
g$ethnicgroup <- as.factor(g$ethnicgroup)
levels(g$ethnicgroup) <- c(levels(g$ethnicgroup), "8")
g$ethnicgroup[is.na(g$ethnicgroup)] <- "8"
table(g$ethnicgroup, exclude = NULL)
## 
##   1   2   3   9   8 
## 889  17  34  17  43
# set quintile 0 to NAs, coxph will ignore these patients
g <- g %>% mutate(quintile = recode(quintile, "c('0')=NA"))
table(g$quintile, exclude = NULL)
## 
##    1    2    3    4    5 <NA> 
##  138  205  211  220  216   10
g$quintile <- as.factor(g$quintile)

what does NON-PH mean?

  • example: gender
    • if relation btw male vs female risk of death changes over time,
      • it could mean that males have higher risk of death early on,
      • then relation changes, and females have higher risk of death later on
    • there is interaction btw gender and time
      • { can you have interaction btw variable and OUTCOME??? }
    • all you need to do is to add an interaction term btw gender and time {…}
      • { need to find a dataset with actual interaction,
      • and plot out KM curves with and without interaction and see for myself… }
  • testing if interaction term is statistically significant is another way of testing PH assumption
    • if interaction term not stat sig, then PH assumption is valid
    • should get p-value and plots
  • for mathematically reasons, cannot just include follow-up time itself,
    • need to transform it with tt() time-transform function
# model with just gender
cox <- coxph(Surv(fu_time, death) ~ gender, data = g)
summary(cox)
## Call:
## coxph(formula = Surv(fu_time, death) ~ gender, data = g)
## 
##   n= 1000, number of events= 492 
## 
##            coef exp(coef) se(coef)     z Pr(>|z|)
## gender2 0.02571   1.02604  0.09082 0.283    0.777
## 
##         exp(coef) exp(-coef) lower .95 upper .95
## gender2     1.026     0.9746    0.8587     1.226
## 
## Concordance= 0.497  (se = 0.013 )
## Likelihood ratio test= 0.08  on 1 df,   p=0.8
## Wald test            = 0.08  on 1 df,   p=0.8
## Score (logrank) test = 0.08  on 1 df,   p=0.8
# model with gender and gender * fu_time
cox_int <- coxph(Surv(fu_time, death) ~ gender + tt(gender), data = g)
summary(cox_int)
## Call:
## coxph(formula = Surv(fu_time, death) ~ gender + tt(gender), data = g)
## 
##   n= 1000, number of events= 492 
## 
##               coef exp(coef) se(coef)      z Pr(>|z|)
## gender2     0.6405    1.8974   0.9800  0.654    0.513
## tt(gender) -0.2003    0.8185   0.3182 -0.629    0.529
## 
##            exp(coef) exp(-coef) lower .95 upper .95
## gender2       1.8974      0.527    0.2779    12.953
## tt(gender)    0.8185      1.222    0.4387     1.527
## 
## Concordance= 0.497  (se = 0.013 )
## Likelihood ratio test= 0.49  on 2 df,   p=0.8
## Wald test            = 0.48  on 2 df,   p=0.8
## Score (logrank) test = 0.48  on 2 df,   p=0.8
  • p-value of 0.529 for tt(gender) is not statistically significant, good news,
  • PH assumption is valid, no interaction between variable and time
  • i.e. time does NOT affect gender’s impact on risk of death

scenario 2

  • if p-value is low, but hazards looks proportional for most the the follow-up period
  • can divide survival analysis into 2 time periods
    • fit one model without interaction for when PH is valid
    • fit another model with interaction for when PH is invalid

scenario 3

  • stratify by variable
  • e.g. separate models for male and females
  • disadvantage = can no longer assess effect of gender on mortality

EOF