drop variables with p-value > 0.05
# get names of only significant variables
sigVars <- summary(cox_paper)$coefficients %>%
as.data.frame() %>%
rownames_to_column("vars") %>%
filter(`Pr(>|z|)` < 0.05) %>%
select(vars) %>%
unlist() %>% unname()
sigVars
## [1] "age" "gender2" "pneumonia1"
## [4] "metastatic_cancer1" "cog_imp1" "los"
## [7] "prior_dnas"
new cox model with significant variables only
- somehow course had age, gender, valvular, pneumonia, mets and cog_imp as significant
- clearly wrong since output from first cox model is the same…
g_paper %>% str()
## 'data.frame': 1000 obs. of 19 variables:
## $ age : int 90 74 83 79 94 89 63 86 72 82 ...
## $ gender : Factor w/ 2 levels "1","2": 2 1 2 1 2 1 1 2 2 2 ...
## $ ethnicgroup : Factor w/ 5 levels "1","2","3","9",..: 5 1 1 1 1 5 1 1 1 1 ...
## $ ihd : Factor w/ 2 levels "0","1": 1 2 1 2 1 1 1 1 1 1 ...
## $ valvular_disease : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 1 1 1 1 ...
## $ pvd : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
## $ stroke : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
## $ copd : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
## $ pneumonia : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 1 1 ...
## $ hypertension : Factor w/ 2 levels "0","1": 1 2 2 2 2 1 2 2 1 1 ...
## $ renal_disease : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
## $ cancer : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ metastatic_cancer: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ mental_health : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ cog_imp : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ los : int 2 10 3 1 17 47 3 12 2 2 ...
## $ prior_dnas : int 0 1 0 2 0 0 0 1 0 1 ...
## $ fu_time : int 416 648 466 441 371 47 656 12 530 551 ...
## $ death : int 0 0 0 0 0 0 0 0 0 0 ...
cox_sigVars <- coxph(Surv(fu_time, death) ~ age + gender + valvular_disease + pneumonia + metastatic_cancer + cog_imp, data = g_paper)
HR_sigVars <- summary(cox_sigVars)$coefficients %>%
as.data.frame() %>%
rownames_to_column("vars") %>%
select(vars, "HR_sigVars" = `exp(coef)`)
HR_sigVars
## vars HR_sigVars
## 1 age 1.059630
## 2 gender2 0.755440
## 3 valvular_disease1 1.271553
## 4 pneumonia1 1.565284
## 5 metastatic_cancer1 12.208870
## 6 cog_imp1 1.430736
compare HR of original cox model and model with only significant vars
inner_join(HR_allVars, HR_sigVars, by = "vars")
## vars HR_allVars HR_sigVars
## 1 age 1.0600833 1.059630
## 2 gender2 0.8057446 0.755440
## 3 pneumonia1 1.3528890 1.565284
## 4 metastatic_cancer1 8.9778235 12.208870
## 5 cog_imp1 1.3873881 1.430736
- only metastatic_cancer HR had dramatic change (8.98 to 12.20)
- should we add back variables to try to get metastatic_cancer HR back to 8.98?
- depends on how results is going to be used,
- if you only care about finding significant predictors, then HR doesn’t really matter
- { then course notes goes on to look at cog_imp while talking about metastatic_cancer… }
- { skipped assumptions testing, no new info }