Running MSGene

Sarah Urbut

2023-11-15

Loading and installing MSGene

devtools::install_github("surbut/MSGene")
#> Skipping install of 'MSGene' from a github remote, the SHA1 (8f7fb80e) has not changed since last install.
#>   Use `force = TRUE` to force installation
require(MSGene)
#> Loading required package: MSGene
require(data.table)
#> Loading required package: data.table
require(tidyverse)
#> Loading required package: tidyverse
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.3     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
#> ✔ purrr     1.0.1     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::between()     masks data.table::between()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ dplyr::first()       masks data.table::first()
#> ✖ lubridate::hour()    masks data.table::hour()
#> ✖ lubridate::isoweek() masks data.table::isoweek()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ dplyr::last()        masks data.table::last()
#> ✖ lubridate::mday()    masks data.table::mday()
#> ✖ lubridate::minute()  masks data.table::minute()
#> ✖ lubridate::month()   masks data.table::month()
#> ✖ lubridate::quarter() masks data.table::quarter()
#> ✖ lubridate::second()  masks data.table::second()
#> ✖ purrr::transpose()   masks data.table::transpose()
#> ✖ lubridate::wday()    masks data.table::wday()
#> ✖ lubridate::week()    masks data.table::week()
#> ✖ lubridate::yday()    masks data.table::yday()
#> ✖ lubridate::year()    masks data.table::year()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Vignette Info

First, we load our sample data frame. You will need to request a data frame with the following columns. We cannot share our UKB data but you can assembley your own variables using the ukbpheno packages

#>   identifier   cad.prs f.31.0.0 Cad_0_Any Dm_0_Any Ht_0_Any Death_censor_Any
#> 1          1 -0.232384        1         1        1        1                1
#> 2          2  0.419198        1         1        1        1                1
#> 3          3  0.640905        0         1        1        1                1
#> 4          4 -0.559320        1         1        1        1                2
#> 5          5 -0.373879        0         1        1        2                1
#> 6          6  1.131000        0         1        1        1                1
#>   HyperLip_0_Any Cad_0_censor_age Dm_0_censor_age Ht_0_censor_age
#> 1              2         81.84580        81.98345        81.63868
#> 2              1         56.33555        56.07028        56.20252
#> 3              1         81.41553        81.49883        83.22884
#> 4              1         76.85842        76.35119        76.03445
#> 5              1         65.50503        65.67987        60.79963
#> 6              2         60.73703        62.16309        61.55569
#>   Death_Censor_Age HyperLip_0_censor_age statin antihtn    statin_age htn_age
#> 1         82.54210              67.59576      1       0 69.35797 days NA days
#> 2         59.49485              56.12827      0       0       NA days NA days
#> 3         82.80712              81.81767      0       0       NA days NA days
#> 4         75.51662              77.21757      0       0       NA days NA days
#> 5         65.29198              67.33691      0       0       NA days NA days
#> 6         59.14340              61.65276      0       0       NA days NA days
#>   smoke
#> 1     0
#> 2     1
#> 3     0
#> 4     0
#> 5     0
#> 6     0

Set a training set

train=df2[1:(nrow(df2)*0.80),]

Parameter choice

ages=c(40:80)
nstates=c("Health", "Ht","HyperLip","Dm","Cad","death","Ht&HyperLip","HyperLip&Dm","Ht&Dm","Ht&HyperLip&Dm")

Fit our model

Now we’re ready to fit our model! Sit tight, this step takes between 60-90 seconds … but it’s worht it.


modelfit = fitfunc2(
  df_frame=data.table::data.table(train),
  ages = ages,
  nstates = nstates,
  mode = "binomial",
  covariates = "cad.prs+f.31.0.0")

Look at all that we’ve produced!

attributes(modelfit)
#> $names
#> [1] "events"       "rates"        "AR"           "model_list"   "yearsinstate"

You can see in this array we have the number of events, the absolute rate per year per transition, the number at risk per year and per transition, and the list of models over years and states.

Extract coefficients for health–>CAD transition

To compute our risk calculations, we need to extract the unsmoothed coefficients, we will feed through a smoother, and then use these coefficients to compute our product.

a = coefplotsmooth2(
    ages = ages,
    start = nstates[1],
    stop = "Cad",
    modelfit = modelfit,
    window_width = 20,
    span = 0.75,
    degree = 2
  )
#> Loading required package: reshape2
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths
#> The following objects are masked from 'package:data.table':
#> 
#>     dcast, melt
healthy_coefs = a$custom_smooth

Check dimensions

## create sample matrix to test
intercept=1
sex=c(0,1)
cad.prs=c(1)
atrisk=expand.grid(intercept,cad.prs,sex) 


dim(atrisk)
#> [1] 2 3
dim(healthy_coefs)
#> [1] 41  3

Compute matrix interval risk for covariate profile two:

mso = compute_prediction_product_matrix(
coefmat = healthy_coefs,
atrisk = atrisk,
agepredinterval = c(40:80)
)
mso$PredictedIntervalrisk[2]
#> [1] 0.2523112

Now we compute the complement of the product of the annual surivivals.

mso$PredictedIntervalrisk
#> [1] 0.1066703 0.2523112

Now we compute the complement of the product of the annual surivivals augmented by treatment.

mso$Hazard_treated
#> [1] 0.08625041 0.20737248

You can also make cool plots examining the proportional occupancy of each state: