Type: Package
Title: Convenient Functions for Exploratory Data Analysis
Version: 0.0.5
Description: A collection of convenient functions to facilitate common tasks in exploratory data analysis. Some common tasks include generating summary tables of variables, displaying tables as a 'flextable' or a 'kable' and showing distributions of variables using 'ggplot2'. Labels stating the source file with run time can be easily generated for annotation in tables and plots.
License: MIT + file LICENSE
Encoding: UTF-8
URL: https://soutomas.github.io/edar/, https://github.com/soutomas/edar/
BugReports: https://github.com/soutomas/edar/issues
RoxygenNote: 7.3.3
Imports: dplyr, flextable, grDevices, janitor, kableExtra, knitr, magrittr, patchwork, rlang, rstudioapi, scales, tidyr
Suggests: ggplot2, gt
Depends: R (≥ 4.2.0)
NeedsCompilation: no
Packaged: 2025-10-24 16:23:16 UTC; tomas
Author: Tomas Sou ORCID iD [aut, cre]
Maintainer: Tomas Sou <tomas.sou@carexer.com>
Repository: CRAN
Date/Publication: 2025-10-29 19:50:16 UTC

edar: Convenient Functions for Exploratory Data Analysis

Description

A collection of convenient functions to facilitate common tasks in exploratory data analysis. Some common tasks include generating summary tables of variables, displaying tables as a 'flextable' or a 'kable' and showing distributions of variables using 'ggplot2'. Labels stating the source file with run time can be easily generated for annotation in tables and plots.

Author(s)

Maintainer: Tomas Sou tomas.sou@carexer.com (ORCID)

See Also

Useful links:


Copy files and rename with date

Description

Copy files to destination and add date to file name with a tag as desired.

Usage

fc(fpath, des = "", tag = "-")

Arguments

fpath

⁠<chr>⁠ A vector of file paths of the source files to copy.

des

⁠<chr>⁠ Destination folder.

tag

⁠<chr>⁠ Tag to the filename.

Value

A logical vector indicating if the operation succeeded for each of the files.

Examples

## Not run: 
# Copy a file to home directory
tmp <- tempdir()
fc(c("f1.R","f2.R"),des=tmp)

## End(Not run)

flextable wrapper

Description

Sugar function for default flextable output.

Usage

ft(d, fnote = NULL, ttl = NULL, sig = 8, dig = 2, src = 0, omit = "")

Arguments

d

⁠<dfr>⁠ A data frame.

fnote

⁠<chr>⁠ Footnote.

ttl

⁠<chr>⁠ Title.

sig

⁠<int>⁠ Number of significant digits to compute.

dig

⁠<int>⁠ Number of decimal places to display.

src

⁠<int>⁠ Either 1 or 2 to add source label over 1 or 2 lines.

omit

⁠<chr>⁠ Text to omit from the source label.

Value

A flextable object.

Examples

mtcars |> head() |> ft()
mtcars |> head() |> ft(src=1)
mtcars |> head() |> ft("Footnote")
mtcars |> head() |> ft("Footnote",src=1)
mtcars |> head() |> ft(sig=2,dig=1)

flextable default

Description

Sugar function to set flextable defaults. The arguments are passed to flextable::set_flextable_defaults().

Usage

ft_def(font = "Calibri Light", fsize = 10, pad = 3)

Arguments

font

⁠<chr>⁠ Font family - for font.family.

fsize

⁠<int>⁠ Font size (in point) - for font.size.

pad

⁠<int>⁠ Padding space around text - for padding.

Value

A list containing previous default values.

See Also

flextable::set_flextable_defaults().

Examples

## Not run: 
ft_def()

## End(Not run)

Box plot wrapper for discrete covariates

Description

Sugar function to generate box plots for a chosen variable by discrete covariates. Orientation will follow the axis of the discrete variables. Numeric variables will be dropped, except the chosen variable to plot.

Usage

ggcov_box(d, var, cats, ...)

Arguments

d

⁠<dfr>⁠ A data frame.

var

⁠<var>⁠ A variable to plot as unquoted name.

cats

⁠<var>⁠ Optional. A vector of selected discrete variables as unquoted names.

...

List of arguments to pass to ggplot2::geom_boxplot.

Value

A ggplot object of a box plot.

Examples

iris |> ggcov_box(Sepal.Length)
sleep |> ggcov_box(extra,group)
sleep |> ggcov_box(extra,"group") # character for `cats` will not break
d <- mtcars |> dplyr::mutate(cyl=factor(cyl),gear=factor(gear),vs=factor(vs))
d |> ggcov_box(mpg)
d |> ggcov_box(mpg,c("cyl","vs"))

Histogram wrapper for continuous covariates

Description

Sugar function to generate histograms for numeric variables in a dataset. Non-numeric variables will be dropped.

Usage

ggcov_hist(d, cols, bins = 30, ...)

Arguments

d

⁠<dfr>⁠ A data frame.

cols

⁠<var>⁠ Optional. A vector of selected columns as unquoted names.

bins

⁠<int>⁠ Number of bins.

...

Other arguments to pass to ggplot2::geom_histogram.

Value

A ggplot object with histograms of numeric variables.

Examples

iris |> ggcov_hist()
iris |> ggcov_hist(c(Sepal.Width,Sepal.Length))

Violin plot wrapper for discrete covariates

Description

Sugar function to generate violin plots for a chosen variable by discrete covariates. Orientation will follow the axis of the discrete variables. Numeric variables will be dropped, except the chosen variable to plot.

Usage

ggcov_violin(d, var, cats, ...)

Arguments

d

⁠<dfr>⁠ A data frame.

var

⁠<var>⁠ A variable to plot as unquoted name.

cats

⁠<var>⁠ Optional. A vector of selected discrete variables as unquoted names.

...

List of arguments to pass to ggplot2::geom_violin.

Value

A ggplot object with violin plots.

Examples

iris |> ggcov_violin(Sepal.Length)
sleep |> ggcov_violin(extra,group)
sleep |> ggcov_box(extra,"group") # character for `cats` will not break
d <- mtcars |> dplyr::mutate(cyl=factor(cyl),gear=factor(gear),vs=factor(vs))
d |> ggcov_violin(mpg)
d |> ggcov_violin(mpg,c("cyl","vs"))

Add source label to a ggplot object

Description

Generate and add a source label with file path and run time to a ggplot object.

Usage

ggsrc(plt, span = 2, size = 8, col = "grey55", lab = NULL, omit = "")

Arguments

plt

A ggplot object.

span

⁠<num>⁠ Number of lines: either 1 or 2.

size

⁠<num>⁠ Text size.

col

⁠<chr>⁠ Colour of the text.

lab

⁠<chr>⁠ Custom label to use instead of the default.

omit

⁠<chr>⁠ Text to omit from the label.

Value

A ggplot object with the added label.

Examples

# A source label can be easily added to a ggplot object.
library(ggplot2)
p = ggplot(mtcars, aes(mpg, wt)) + geom_point()
p |> ggsrc()
p |> ggsrc(lab="My label")

Generate hex colour codes for plotting

Description

Create a vector of hex colour codes for the desired number of colours. Colours are generated by splitting hue in the range ⁠[0,360]⁠ in grDevices::hcl.

Usage

hcln(n, show = FALSE)

Arguments

n

⁠<int>⁠ Number of colours to output.

show

⁠<lgl>⁠ TRUE to show the output colours.

Value

Hex colour codes that can be used for plotting.

Examples

hcln(6,FALSE)
hcln(4,TRUE)

kable wrapper

Description

Sugar function for default kable output.

Usage

kb(d, fnote = NULL, cap = NULL, sig = 8, dig = 2, src = 0, omit = "")

Arguments

d

⁠<dfr>⁠ A data frame.

fnote

⁠<chr>⁠ Footnote.

cap

⁠<chr>⁠ Caption.

sig

⁠<int>⁠ Number of significant digits to compute.

dig

⁠<int>⁠ Number of decimal places to display.

src

⁠<int>⁠ Either 1 or 2 to add source label over 1 or 2 lines.

omit

⁠<chr>⁠ Text to omit from the source label.

Value

A kable object

Examples

mtcars |> head() |> kb()
mtcars |> head() |> kb(src=1)
mtcars |> head() |> kb("Footnote")
mtcars |> head() |> kb("Footnote",src=1)
mtcars |> head() |> kb(sig=2,dig=1)

Generate source label

Description

Generate a source label with file path and run time. In interactive sessions, the function uses rstudioapi to get the file path. It is designed to work in a script file in RStudio when running interactively. It will return empty if run in the console directly.

Usage

label_src(span = 2, omit = "", tz = TRUE)

Arguments

span

⁠<int>⁠ Number of lines: either 1 or 2.

omit

⁠<chr>⁠ Text to omit from the label.

tz

⁠<lgl>⁠ FALSE to exclude time stamp.

Value

A label showing the source file path with a time stamp.

Examples

label_src(1)
label_src(tz=FALSE)

Generate time stamp label

Description

Generate a time stamp label of the current time.

Usage

label_tz(omit = "")

Arguments

omit

⁠<chr>⁠ Text to omit from the label.

Value

A label with time stamp.

Examples

label_tz()

Summarise continuous variables by group

Description

Summarise all continuous variables by group. Non-numeric variables will be dropped.

Usage

summ_by(d, cols, ..., pct = c(0.25, 0.75), xname = "")

Arguments

d

⁠<dfr>⁠ A data frame.

cols

⁠<var>⁠ Optional. Select a vector of variables as unquoted names.

...

⁠<var>⁠ Optional. Columns to group by as unquoted names.

pct

⁠<num>⁠ A vector of two indicating the percentiles to compute.

xname

⁠<chr>⁠ Characters to omit in output column names.

Value

A data frame of summarised variables.

Examples

iris |> summ_by()
iris |> summ_by(pct=c(0.1,0.9))
d <- mtcars |> dplyr::mutate(vs=factor(vs), am=factor(am))
d |> summ_by(mpg)
d |> summ_by(mpg,vs)
d |> summ_by(mpg,vs,am)
d |> summ_by(c(mpg,disp))
d |> summ_by(c(mpg,disp),vs)
d |> summ_by(c(mpg,disp),vs,xname="mpg_")
# grouping without column selection is possible but rarely useful in large dataset
d |> summ_by(,vs)

Summarise categorical variables

Description

Summarise categorical variables. Numeric variables will be dropped.

Usage

summ_cat(d, pos)

Arguments

d

A data frame.

pos

⁠<chr/int>⁠ (name or position) Optional. Choose a variable to return.

Value

A list containing summaries for each categorical variables.

Examples

iris |> summ_cat()
sleep |> summ_cat()
sleep |> summ_cat("group")
sleep |> summ_cat(1)