---
title: "summaryTable"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{summaryTable}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE,
  eval = TRUE,
  warning=FALSE,
  fig.height = 6,
  fig.width = 9,
  fig.align='center'
)
```
```{r color-function, echo = FALSE}
colorize <- function(text, color) {
  if (knitr::is_latex_output()) {
    sprintf("\\textcolor{%s}{%s}", color, text)
  } else if (knitr::is_html_output()) {
    sprintf("%s", color, text)
  } else text
}
```
```{r setup, echo = FALSE, message = FALSE, warning = FALSE}
library(dplyr)
library(tidyverse)
library(gtsummary)
library(summarySCI)
library(flextable)
```
The function `summaryTable()` produces a table with descriptive
statistics for continuous, categorical and dichotomous variables. It is
based on the function `gtsummary::tbl_summary()`, 
with several enhancements and simplifications, such as
-   Simplified syntax for easier and more intuitive use.
-   Display of missing values for categorical variables: Option to show (or not) the percentage
of missing values next to the count. 
- Columns with the number of non-missing observations can be added for each group
## Setup and data
To demonstrate the various functionalities of the function we will use
the dataset `survival::colon`.
```{r, message = FALSE}
library(survival)
data(cancer, package="survival")
colon1 <-  colon %>%
  group_by(id) %>%
  slice(1) %>% # Select the first row within each id group
  ungroup()
  
```
```{r, echo = FALSE}
n_patients <- nrow(colon)
```
The dataset `colon` contains data of `r n_patients` patients from one of
the first successful trials of adjuvant chemotherapy for colon cancer.
For simplicity, we focus here on recurrence only, two treatment groups,
and four variable: 
- the treatment group (`rx`), 
- the sex (`Male`), 
- the age (`age`) and 
- the extent of local spread (`extent`).
We also add a few
missing values for the variable `extent`.
```{r}
set.seed(123)
colon2 <- colon1 %>%
  select(rx, sex, age, extent) %>%
  filter(rx != "Lev") %>%
  mutate(rx = if_else(rx == "Obs", "Control", rx),
         extent = if_else(row_number() %in% sample(row_number(), size = round(0.1 * n())), NA, extent)) %>% 
  rename(Male = sex) %>% 
  mutate(extent = as.factor(extent))
```
```{r}
head(colon2)
```
## Simple table
By default, the function produces a table with all variables present in
the dataset.
```{r}
summaryTable(data = colon2)
```
If only specific variables are to be included, they need to be entered 
in the argument `vars`. The argument `group` allows the summary 
statistics to be stratified by this variable. 
```{r}
summaryTable(data = colon2, 
             vars = c("Male", "age", "extent"), 
             group = "rx")
```
### Displayed name of variables
The displayed name of each variable is
-   the label if it exists in the dataset, or
-   the variable name if no label is present in the dataset (which is
    the case in our example).
In order to customize the displayed name, the argument `labels` can be
used. Please note that the labels need to be entered as a list, as shown below:
```{r}
summaryTable(data = colon2, 
             group = "rx",
             labels = list(age = "Age", extent = "Extent"))
```
## Adding number of observations 
The number of observations **which are not missing values** 
are by default added in a new column. This can be disabled 
by setting the argument `add_n` to `FALSE`.
```{r}
summaryTable(data = colon2, 
             group = "rx",
            labels = list(rx = "Arm", age = "Age", extent = "Extent"), 
             add_n = FALSE)
```
## Overall column 
An "overall" column can be added by setting the argument `overall` to
`TRUE`.
```{r}
summaryTable(data = colon2, 
             group = "rx",
             overall = TRUE, 
             labels = list(age = "Age", extent = "Extent"))
```
## Variable types
The function `gtsummary::tbl_summary` considers
numeric variables with fewer than 10 unique values as categorical by default.
This is not the case in the function `summaryTable`. 
Per default, all numeric variables are considered as continuous, unless 
they only have two unique values: 0 and 1. In that case, they are considered as
dichotomous. This can be changed by setting the argument `continuous_as` to `categorical`.
For dichotomous variables, all levels are displayed by default. 
To show only one row, use the argument
`dichotomous_as = dichotomous`. 
The reference level is specified using the argument
`value = list(variable ~ "level to show")`. 
```{r}
summaryTable(data = colon2,
             group = "rx",
             vars = "Male",
            labels = list(age = "Age"), 
            dichotomous_as = "dichotomous", 
            value = list(Male ~ "1"),
            missing = FALSE)
```
By default, the function plots the median and range for continuous
variables. A number of other options are available, using the argument
`stat_cont`.
### Statistic type
The statistics to be displayed can be chosen using the argument `stat_cont` 
(options: `median_IQR`, `median_range` (default), `"mean_sd"`, `"mean_se"`
and `"geomMean_sd"`) and `stat_cat` (options: `"n_percent"` (default) `"n"`
and `"n_N"`). 
```{r}
summaryTable(data = colon2, group = "rx", 
             stat_cont = "median_IQR", 
             stat_cat = "n_N",
              labels = list(age = "Age", sex = "Sex", extent = "Extent"))
```
## Tests
By default, no p-value and confidence (CI) are displayed. p-values can
be added
by setting `test` to `TRUE` and CI by setting `ci` to `TRUE`.
The default test type for continuous variable is `wilcox.test`, 
and `fisher.test` for categorical variables.
This can
be changed in `test_cont` and `test_cat`, respectively.
The default CI type for continuous variables is `wilcox.test` and `wilson`
for categorical variables. 
This can be changed in `ci_cont` and `ci_cat`, respectively. 
```{r}
summaryTable(data = colon2, 
             group = "rx", 
             vars = c("age", "extent"), 
             stat_cont = "mean_sd", 
             test = TRUE,
             ci = TRUE,
             labels = list(age = "Age", extent = "Extent")
             )
```
## Missing values
Per default, missing values are shown as a separate category. This can 
be disabled by setting `missing` to `FALSE`.
For `missing = TRUE`, the percentage are automatically added next to the 
missing number.  This can be disabled by setting the argument `missing_percentage`
to `FALSE`.
```{r}
summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = FALSE,
             labels = list(extent = "Extent")
             )
summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = TRUE,
             labels = list(extent = "Extent")
             )
```
The tables with and without missing values can also be put next to each
other
by setting `missing` to `"both"`.
```{r}
summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             missing_percent = "both", 
             test = TRUE,
              labels = list(extent = "Extent")
             )
```
## Further customization
Digits can be customized with the arguments `digits_cont` and
`digits_cat`. The argument `as_flex_table` (default to `TRUE`)
converts the gtsummary object to a flextable object, which is better
for Word output. 
# Next steps
The argument `type`
will be introduced in a future release to enable more 
fine-grained customization of the variables types.