week2_2runCoxModel

library(survival)
g <- read.csv("GMPH_1K_final_2.csv", header = TRUE, sep = ",")
g %>% dim()
g$fu_time %>% typeof()
g$death %>% typeof()

## [1] 1000   31
## [1] "integer"
## [1] "integer"

run simple Cox model

cox <- coxph(Surv(fu_time, death) ~ age, data = g)
summary(cox)

## Call:
## coxph(formula = Surv(fu_time, death) ~ age, data = g)
## 
##   n= 1000, number of events= 492 
## 
##         coef exp(coef) se(coef)     z Pr(>|z|)    
## age 0.056005  1.057602 0.005193 10.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## age     1.058     0.9455     1.047     1.068
## 
## Concordance= 0.651  (se = 0.013 )
## Likelihood ratio test= 138  on 1 df,   p=<2e-16
## Wald test            = 116.3  on 1 df,   p=<2e-16
## Score (logrank) test = 115.7  on 1 df,   p=<2e-16

coef = log hazard ratio
exp(coef) = hazard ratio
- HR of 1.06 means for every 1 year increase in age, 6% increase in risk of death
- older people have higher risk of death vs younger people (as expected)
- fitted age as continuous variable using linear model, so 6% HR is constant
se is small (only 1 predictor used, lots of patients)
p-value is tiny (highly statistically significant predictor of risk of death)
concordance
- similar to c-statistic in logistic regression
- fraction of pairs of patients in a sample where patient with higher survival time was predicted to have higher risk of death (i.e. the model was correct)
- values > 0.5 is better than chance
R^2 (didn’t output this for me…)
- proportion of variation in the outcome explained by the model
with only 1 predictor (age), neither concordance nor R^2 are very high
Likelihood ratio test/ Wald test/ Score test
- null hypothesis testing
- at small p-value, null is rejected
- i.e. model with age in it, is significantly better than null model

practice: ethnic groups

library(survminer)

g$ethnicgroup <- factor(g$ethnicgroup)

cox_ethnicity <- coxph(Surv(fu_time, death) ~ ethnicgroup, data =g)
summary(cox_ethnicity)

## Call:
## coxph(formula = Surv(fu_time, death) ~ ethnicgroup, data = g)
## 
##   n= 957, number of events= 471 
##    (43 observations deleted due to missingness)
## 
##                  coef exp(coef) se(coef)      z Pr(>|z|)   
## ethnicgroup2 -0.06428   0.93774  0.32000 -0.201  0.84078   
## ethnicgroup3 -1.19586   0.30244  0.41108 -2.909  0.00362 **
## ethnicgroup9  0.07394   1.07674  0.35706  0.207  0.83596   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## ethnicgroup2    0.9377     1.0664    0.5008    1.7558
## ethnicgroup3    0.3024     3.3064    0.1351    0.6769
## ethnicgroup9    1.0767     0.9287    0.5348    2.1679
## 
## Concordance= 0.516  (se = 0.006 )
## Likelihood ratio test= 12.99  on 3 df,   p=0.005
## Wald test            = 8.55  on 3 df,   p=0.04
## Score (logrank) test = 9.61  on 3 df,   p=0.02

large standard errors, large CIs…
hazard ratio, need to know reference ethnicity, which is ethnicgroup1 (white)
- group2 (black) not significantly different hazard
- group3 (indian) significantly different hazard
- group9 not significant

43 had missing ethnicities

make an unknown ethnicity group and rerun

g$ethnicgroup %>% levels()

## [1] "1" "2" "3" "9"

levels(g$ethnicgroup) <- c(levels(g$ethnicgroup), "8")
g$ethnicgroup[is.na(g$ethnicgroup)] <- "8"

cox_ethnicity_noNAs <- coxph(Surv(fu_time, death) ~ ethnicgroup, data = g)
summary(cox_ethnicity_noNAs)

## Call:
## coxph(formula = Surv(fu_time, death) ~ ethnicgroup, data = g)
## 
##   n= 1000, number of events= 492 
## 
##                  coef exp(coef) se(coef)      z Pr(>|z|)   
## ethnicgroup2 -0.06573   0.93638  0.31999 -0.205  0.83725   
## ethnicgroup3 -1.19368   0.30310  0.41107 -2.904  0.00369 **
## ethnicgroup9  0.08160   1.08502  0.35706  0.229  0.81923   
## ethnicgroup8 -0.02353   0.97675  0.22363 -0.105  0.91621   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## ethnicgroup2    0.9364     1.0679    0.5001    1.7532
## ethnicgroup3    0.3031     3.2992    0.1354    0.6784
## ethnicgroup9    1.0850     0.9216    0.5389    2.1846
## ethnicgroup8    0.9767     1.0238    0.6301    1.5140
## 
## Concordance= 0.518  (se = 0.008 )
## Likelihood ratio test= 12.95  on 4 df,   p=0.01
## Wald test            = 8.53  on 4 df,   p=0.07
## Score (logrank) test = 9.58  on 4 df,   p=0.05

found that including NA ethnicities as a group did not change results by much
but if including multiple predictors, NAs will add up, will affect results,
better to try recoding NAs or impute, and compare results before and after

week2_2runCoxModel_inR

Hippo

2020_05_25_13

run simple Cox model

practice: ethnic groups

43 had missing ethnicities

EOF