Installation

The risks package can be installed from CRAN:

install.packages("risks")

Development versions can be installed from GitHub:

remotes::install_github("stopsack/risks")

An example cohort study

We use a cohort of women with breast cancer, as used by Spiegelman and Hertzmark (Am J Epidemiol 2005) and Greenland (Am J Epidemiol 2004). The the categorical exposure is stage, the binary outcome is death, and the binary confounder is receptor.

library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries

data(breastcancer)  # Load sample data

breastcancer %>%  # Display the sample data
  group_by(receptor, stage) %>% 
  summarize(
    deaths = sum(death), 
    total = n(), 
    risk = deaths / total
  )
#> # A tibble: 6 × 5
#> # Groups:   receptor [2]
#>   receptor stage     deaths total   risk
#>   <chr>    <fct>      <dbl> <int>  <dbl>
#> 1 High     Stage I        5    55 0.0909
#> 2 High     Stage II      17    74 0.230 
#> 3 High     Stage III      9    15 0.6   
#> 4 Low      Stage I        2    12 0.167 
#> 5 Low      Stage II       9    22 0.409 
#> 6 Low      Stage III     12    14 0.857

Adjusted risk ratios

The risk of death is higher among women with higher-stage and hormone receptor-low cancers, which also tend to be of higher stage. Using risks models to obtain (possibly multivariable-adjusted) risk ratios or risk differences is similar to the standard code for logistic models in R. As customary in R, log(RR) is returned; see below for how to obtain exponentiated values. No options for model family or link need to be supplied:

fit_rr <- riskratio(
  formula = death ~ stage + receptor, 
  data = breastcancer
)
summary(fit_rr)
#> 
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I     0.0000     0.0000     NaN      NaN    
#> stageStage II    0.8989     0.3875   2.320   0.0203 *  
#> stageStage III   1.8087     0.3783   4.781 1.75e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                    2.5 %   97.5 %
#> stageStage I   0.0000000 0.000000
#> stageStage II  0.1395299 1.658324
#> stageStage III 1.0671711 2.550242

Adjusted risk differences

To obtain risk differences, use riskdiff, which has the same syntax:

fit_rd <- riskdiff(
  formula = death ~ stage + receptor, 
  data = breastcancer
)
summary(fit_rd)
#> 
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I    0.00000    0.00000     NaN      NaN    
#> stageStage II   0.16303    0.05964   2.734  0.00626 ** 
#> stageStage III  0.57097    0.09962   5.732 9.95e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                     2.5 %    97.5 %
#> stageStage I   0.00000000 0.0000000
#> stageStage II  0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158

For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.

The model summary in riskratio() and riskdiff() includes to two additions compared to a regular glm() model:

In the first line of summary(), the type of risk regression model is displayed (in the example, “Risk ratio model, fitted as binomial model...”).
At the end of the output, confidence intervals for the model coefficients are printed.

Accessing model coefficients

The risks package provides an interface to broom::tidy(), which returns a data frame of all coefficients (risk differences in this example), their standard errors, z statistics, and confidence intervals.

tidy(fit_rd)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I      0        0         NaN    NaN         0          0     marg…
#> 2 stageStage II     0.163    0.0596      2.73   6.26e-3   0.0461     0.280 marg…
#> 3 stageStage III    0.571    0.0996      5.73   9.95e-9   0.376      0.766 marg…

In accordance with glm() standards, coefficients for relative risks are shown on the logarithmic scale. Exponentiated coefficients for risk ratios are easily obtained via tidy(..., exponentiate = TRUE):

tidy(
  fit_rr, 
  exponentiate = TRUE
)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I       1        0        NaN    NaN           1         1    marg…
#> 2 stageStage II      2.46     0.387      2.32   2.03e-2     1.15      5.25 marg…
#> 3 stageStage III     6.10     0.378      4.78   1.75e-6     2.91     12.8  marg…

For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.

Introduction to the risks package: Get Started

Installation

An example cohort study

Adjusted risk ratios

Adjusted risk differences

Accessing model coefficients

More post-estimation commands