week1_3plotKM_survdiff

what is heart failure

causes include: coronary heart disease, high blood pressure, arrythmia, alcoholism
symptoms: shortness of breath, tiredness, swollen ankles
high mortality and morbidity
high rate of hospital admission
readmission high, within 1-4 days of discharge
hospital admin database gives us data to work with

library(survival)

run survfit() and KMplot

# read in data
g <- read.csv("GMPH_1K_final_2.csv", header = TRUE, sep = ",")
g %>% dim()

# set variable types
g[,"fu_time"] %>% typeof()
fu_time <- g[,"fu_time"]; fu_time %>% head(20)
g[,"death"] %>% typeof()
death <- g[,"death"]; death %>% head(20)

## [1] 1000   31
## [1] "integer"
##  [1]  416  648  466  441  371   47  656   12  530  551 1059   90  538  289  334
## [16]    5   79  363  195   27
## [1] "integer"
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0

# survfit
km_fit <- survfit(Surv(fu_time, death) ~ 1) 
# ~ 1 plots KM estimated probability of survival over time for ALL patients, no comparisons made
plot(km_fit)

interpret the function

survfit()
- fits simple survival model,
- no predictor given, model has only the intercept (i.e. ~ 1)
Surv()
- 2 arguments: follow-up time of each patient + if they died
- death = 1, people who died by the end of their follow-up period
- death = 0, people still alive by end of their follow-up period, censored
survfit() produces KM estimates of “the probabilty of survival over time”,
- used by plot to produce KM curve

table of survival estimates

summary(km_fit, times = c(1:7, 30, 60, 90*(1:10)))

## Call: survfit(formula = Surv(fu_time, death) ~ 1)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    992      12    0.988 0.00346        0.981        0.995
##     2    973       7    0.981 0.00435        0.972        0.989
##     3    963       5    0.976 0.00489        0.966        0.985
##     4    954       6    0.970 0.00546        0.959        0.980
##     5    945       5    0.964 0.00590        0.953        0.976
##     6    938       1    0.963 0.00598        0.952        0.975
##     7    933       1    0.962 0.00606        0.951        0.974
##    30    865      39    0.921 0.00865        0.905        0.939
##    60    809      28    0.891 0.01010        0.871        0.911
##    90    770      24    0.864 0.01117        0.843        0.887
##   180    698      43    0.815 0.01282        0.790        0.841
##   270    653      24    0.787 0.01363        0.760        0.814
##   360    619      21    0.761 0.01428        0.733        0.789
##   450    525      44    0.705 0.01554        0.675        0.736
##   540    429      47    0.639 0.01681        0.607        0.673
##   630    362      32    0.589 0.01765        0.556        0.625
##   720    266      43    0.514 0.01876        0.479        0.552
##   810    190      31    0.448 0.01979        0.411        0.488
##   900    126      26    0.378 0.02098        0.339        0.421

t=1, survival probability is high (99%)
t=900, survival probability is low (38%)
{ longer you stay in hospital the lower your survival prob?
or is this just natural decrease in surv prob over time??? }
see week1_4 for calculating survfit() output by hand

run survfit() with groups

g[,"gender"] %>% typeof()

## [1] "integer"

gender <- as.factor(g[,"gender"])
km_gender_fit <- survfit(Surv(fu_time, death) ~ gender)
plot(km_gender_fit)

compare survival using logRankTest

survdiff(Surv(fu_time, death) ~ gender, rho = 0)

## Call:
## survdiff(formula = Surv(fu_time, death) ~ gender, rho = 0)
## 
##            N Observed Expected (O-E)^2/E (O-E)^2/V
## gender=1 548      268      271    0.0365     0.082
## gender=2 452      224      221    0.0448     0.082
## 
##  Chisq= 0.1  on 1 degrees of freedom, p= 0.8

rho = 0 is default
yields log-rank (aka Mantel-Haenszel) test
log-rank test does NOT have proportional hazard assumption
{ is log-rank test non-parametric? }
Observed = weighted observed number of events in each group
Expected = weighted expected number of events in each group

# explore log-rank test
# ?survdiff
# tests if there's a difference btw 2 or more survival curves,
# using rho family of tests

g %>% filter(gender == 1) %>% summarise(., mean(fu_time)) # 460.8942

##   mean(fu_time)
## 1      460.8942

g %>% filter(gender == 2) %>% summarise(., mean(fu_time)) # 472.0841

##   mean(fu_time)
## 1      472.0841

log-rank test details

https://en.wikipedia.org/wiki/Logrank_test
compares survival distributions of 2 samples
nonparametric
good when data is right skewed and censored
logrank test statistic
- compares estimates of hazard function of 2 groups at EACH time point
- constructed by computing O and E number of events in one group at each time point,
  - then sum to obtain overall summary across all-time points where there was an event
- more details needed…
null hypothesis
- 2 groups have identical hazard functions (survival curves)
- H0: h_1(t) = h_2(t)
assumptions
- censoring is unrelated to hazard
- survival probability same regardless of when subject was recruited (early vs later in study)

exercise: compare survival of age groups

age_65+ vs otherwise

age_65plus <- ifelse(g[,"age"] >= 65, 1, 0)
table(age_65plus)

## age_65plus
##   0   1 
## 115 885

# table(g[, "age"], age_65plus)

km_age65plus_fit <- survfit(Surv(fu_time, death) ~ age_65plus)
plot(km_age65plus_fit)

survdiff(Surv(fu_time, death) ~ age_65plus, rho=0)

## Call:
## survdiff(formula = Surv(fu_time, death) ~ age_65plus, rho = 0)
## 
##                N Observed Expected (O-E)^2/E (O-E)^2/V
## age_65plus=0 115       18       67     35.85      41.7
## age_65plus=1 885      474      425      5.65      41.7
## 
##  Chisq= 41.7  on 1 degrees of freedom, p= 1e-10

week1_3plotKM_survdiff_logRankTest

Hippo

2020_05_25_12

what is heart failure

run survfit() and KMplot

interpret the function

table of survival estimates

run survfit() with groups

compare survival using logRankTest

log-rank test details

exercise: compare survival of age groups

EOF