| Title: | An Elegant Approach to Summarizing Clinical Data | 
| Version: | 0.1.0 | 
| Description: | Streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types. For method details see Motulsky H J (2016) https://www.graphpad.com/guides/prism/10/statistics/index.htm and d'Agostino R B (1971) <doi:10.1093/biomet/58.2.341>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| Imports: | car, cli, dplyr, fBasics, glue, qqplotr, rlang, stats, stringr, tibble, tidyplots, tidyr | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 4.1.0) | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-10 07:33:20 UTC; Lixiang | 
| Author: | Xiang Li [aut, cre] | 
| Maintainer: | Xiang Li <htqqdd@126.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-15 07:00:02 UTC | 
Add statistical test results to summary data
Description
Calculates and appends p-values with optional statistical details to a summary table based on variable types and group comparisons. Handles both continuous and categorical variables with appropriate statistical tests.
Usage
add_p(
  summary,
  digit = 3,
  asterisk = FALSE,
  add_method = FALSE,
  add_statistic_name = FALSE,
  add_statistic_value = FALSE
)
Arguments
summary | 
 A data frame that has been processed by   | 
digit | 
 A numeric determine decimal. Accepts: 
  | 
asterisk | 
 Logical indicating whether to show asterisk significance markers.  | 
add_method | 
 Control parameter for display of statistical methods. Accepts: 
  | 
add_statistic_name | 
 Logical indicating whether to include test statistic names.  | 
add_statistic_value | 
 Logical indicating whether to include test statistic values.  | 
Value
A data frame merged with statistical test results, containing: - Variable names - Summary - Formatted p-values - Optional method names/codes - Optional statistic names/values
Examples
# `summary` is a data frame processed by `add_var()` and `add_summary()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
summary <- add_summary(data)
# Add statistical test results
result <- add_p(summary)
Add summary statistics to a add_var object
Description
This function generates summary statistics for variables from a data frame that has been processed by add_var(), with options to format outputs.
Usage
add_summary(
  data,
  add_overall = TRUE,
  continuous_format = NULL,
  norm_continuous_format = "{mean} ± {SD}",
  unnorm_continuous_format = "{median} ({Q1}, {Q3})",
  categorical_format = "{n} ({pct})",
  binary_show = "last",
  digit = 2
)
Arguments
data | 
 A data frame that has been processed by   | 
add_overall | 
 Logical indicating whether to include an "Overall" summary column.   | 
continuous_format | 
 Format string to override both normal/abnormal continuous formats. Accepted placeholders are   | 
norm_continuous_format | 
 Format string for normally distributed continuous variables. Default is   | 
unnorm_continuous_format | 
 Format string for non-normal continuous variables. Default is   | 
categorical_format | 
 Format string for categorical variables. Default is   | 
binary_show | 
 Display option for binary variables: 
  | 
digit | 
 digit A numeric determine decimal.  | 
Value
A data frame containing summary statistics with the following columns:
-  
variable: Variable name -  
Overall (n=X): Summary statistics for all data, ifadd_overall=TRUE Group-specific columns named
[group] (n=X)with summary statistics
Examples
# `data` is a data frame processed by `add_var()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
# Add summary statistics
result <- add_summary(data, add_overall = TRUE)
result <- add_summary(data, continuous_format = "{mean}, ({SD})")
Prepare variables for add_summary
Description
This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.
Usage
add_var(data, var = NULL, group = "group", norm = "auto", center = "median")
Arguments
data | 
 A data frame containing the variables to analyze, with variables at columns and observations at rows.  | 
var | 
 A character vector of variable names to include. If   | 
group | 
 A character string specifying the grouping variable in   | 
norm | 
 Control parameter for normality tests. Accepts: 
  | 
center | 
 A character string specifying the   | 
Value
A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:
-  
var: List of categorized variables:-  
valid: All valid variable names after checks -  
continuous: Sublist of continuous variables (further divided by normality/equal variance) -  
categorical: Sublist of categorical variables (further divided by ordered/expected frequency) 
 -  
 -  
group: Grouping variable name -  
overall_n: Total number of observations -  
group_n: Observation counts per group -  
group_nlevels: Number of groups -  
group_levels: Group level names -  
norm: Normality check method used 
Examples
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
Test for Equality of Variances
Description
Performs Levene's test to assess equality of variances between groups.
Usage
equal_test(data, var, group, center = "median")
Arguments
data | 
 A data frame containing the variables to be tested.  | 
var | 
 A character string specifying the numeric variable in   | 
group | 
 A character string specifying the grouping variable in   | 
center | 
 A character string specifying the   | 
Value
Logical value:
-  
TRUE: Variances are equal, p-value more than 0.05 -  
FALSE: Variances are unequal or an error occurred during testing 
Methodology for Equality of Variances
Levene's test is the default method adopted in SPSS, the original Levene's test select center = mean, but here select center = median for a more robust test
Examples
equal_test(iris, "Sepal.Length", "Species")
Format p-values with significance markers
Description
Formats p-values as strings with specified precision and optional significance asterisks.
Usage
format_p(p, digit = 3, asterisk = FALSE)
Arguments
p | 
 A numeric p-value between 0 and 1.  | 
digit | 
 A numeric determine decimal. Accepts: 
  | 
asterisk | 
 Logical indicating whether to return significance asterisks.  | 
Value
Character of formatted p-value or asterisks.
Examples
format_p(0.00009, 4)
format_p(0.03, 3)
format_p(0.02, asterisk = TRUE)
Perform normality test on a variable
Description
Conducts normality tests for a specified variable, optionally by group. Supports automatic testing and interactive visualization.
Usage
normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")
Arguments
data | 
 A data frame containing the variables to be tested.  | 
var | 
 A character string specifying the numeric variable in   | 
group | 
 A character string specifying the grouping variable in   | 
norm | 
 Control parameter for test behavior. Accepts: 
  | 
Value
A logical value:
-  
TRUE: data are normally distributed -  
FALSE: data are not normally distributed 
Methodology for p-values
Automatically selects test based on sample size per group:
n < 3: Too small, assuming non-normal
(3, 50] Shapiro-Wilk test
(50, 1000]: D'Agostino Chi2 test, instead of Kolmogorov-Smirnov test
n > 1000: Show p-values, plots QQ plots and prompts for decision
Examples
normal_test(iris, "Sepal.Length", "Species", norm = "auto")
normal_test(iris, "Sepal.Length", "Species", norm = TRUE)
Check Sample Size Adequacy for Chi-Squared Test
Description
This function determines if a contingency table meets the expected frequency assumptions for a valid chi-squared test. It categorizes the data into "not_small", "small", or "very_small" based on sample size and expected frequencies.
Usage
small_test(data, var, group)
Arguments
data | 
 A data frame containing the variables to be tested.  | 
var | 
 A character string specifying the factor variable in   | 
group | 
 A character string specifying the grouping variable in   | 
Value
A character string with one of three values:
-  
"not_small": Sample size more than or euqal to 40 and all expected frequencies more than or euqal to 5 -  
"small": Sample size more than or euqal to 40, all expected frequencies more than or euqal to 1 and at least one <5, only for 2*2 contingency tables -  
"very_small": Other conditions, including sample size <40 or any expected frequency <1 
Examples
df <- data.frame(
  category = factor(c("A", "B", "A", "B")),
  group    = factor(c("X", "X", "Y", "Y"))
)
small_test(data = df, var = "category", group = "group")