The risks package can be installed from CRAN:
Development versions can be installed from GitHub:
We use a cohort of women with breast cancer, as used by Spiegelman
and Hertzmark (Am J
Epidemiol 2005) and Greenland (Am J Epidemiol
2004). The the categorical exposure is stage, the
binary outcome is death, and the binary confounder is
receptor.
library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries
data(breastcancer)  # Load sample data
breastcancer %>%  # Display the sample data
  group_by(receptor, stage) %>% 
  summarize(
    deaths = sum(death), 
    total = n(), 
    risk = deaths / total
  )
#> # A tibble: 6 × 5
#> # Groups:   receptor [2]
#>   receptor stage     deaths total   risk
#>   <chr>    <fct>      <dbl> <int>  <dbl>
#> 1 High     Stage I        5    55 0.0909
#> 2 High     Stage II      17    74 0.230 
#> 3 High     Stage III      9    15 0.6   
#> 4 Low      Stage I        2    12 0.167 
#> 5 Low      Stage II       9    22 0.409 
#> 6 Low      Stage III     12    14 0.857The risk of death is higher among women with higher-stage and hormone
receptor-low cancers, which also tend to be of higher stage. Using
risks models to obtain (possibly multivariable-adjusted)
risk ratios or risk differences is similar to the standard code for
logistic models in R. As customary in R, log(RR) is returned; see below
for how to obtain exponentiated values. No options for model
family or link need to be supplied:
fit_rr <- riskratio(
  formula = death ~ stage + receptor, 
  data = breastcancer
)
summary(fit_rr)
#> 
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I     0.0000     0.0000     NaN      NaN    
#> stageStage II    0.8989     0.3875   2.320   0.0203 *  
#> stageStage III   1.8087     0.3783   4.781 1.75e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                    2.5 %   97.5 %
#> stageStage I   0.0000000 0.000000
#> stageStage II  0.1395299 1.658324
#> stageStage III 1.0671711 2.550242To obtain risk differences, use riskdiff, which has the
same syntax:
fit_rd <- riskdiff(
  formula = death ~ stage + receptor, 
  data = breastcancer
)
summary(fit_rd)
#> 
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I    0.00000    0.00000     NaN      NaN    
#> stageStage II   0.16303    0.05964   2.734  0.00626 ** 
#> stageStage III  0.57097    0.09962   5.732 9.95e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                     2.5 %    97.5 %
#> stageStage I   0.00000000 0.0000000
#> stageStage II  0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.
The model summary in riskratio() and
riskdiff() includes to two additions compared to a regular
glm() model:
summary(), the type of risk
regression model is displayed (in the example,
“Risk ratio model, fitted as binomial model...”).The risks package provides an interface to
broom::tidy(), which returns a data frame of all
coefficients (risk differences in this example), their standard errors,
z statistics, and confidence intervals.
tidy(fit_rd)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I      0        0         NaN    NaN         0          0     marg…
#> 2 stageStage II     0.163    0.0596      2.73   6.26e-3   0.0461     0.280 marg…
#> 3 stageStage III    0.571    0.0996      5.73   9.95e-9   0.376      0.766 marg…
In accordance with glm() standards, coefficients for
relative risks are shown on the logarithmic scale. Exponentiated
coefficients for risk ratios are easily obtained via
tidy(..., exponentiate = TRUE):
tidy(
  fit_rr, 
  exponentiate = TRUE
)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I       1        0        NaN    NaN           1         1    marg…
#> 2 stageStage II      2.46     0.387      2.32   2.03e-2     1.15      5.25 marg…
#> 3 stageStage III     6.10     0.378      4.78   1.75e-6     2.91     12.8  marg…For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.
Typical R functions that build on regression models can further
process fitted risks models. In addition to
tidy(), examples are:
coef(fit) or coefficients(fit) return
model coefficients (i.e., RDs or log(RR)) as a numeric
vectorconfint(fit, level = 0.9) returns 90%
confidence intervals.predict(fit, type = "response") or
fitted.values(fit) return predicted probabilities of the
binary outcome.residuals(fit) returns model residuals.