| Title: | Generalized Framework for Cross-Validation | 
| Version: | 1.0.7 | 
| Maintainer: | Jeremy Coyle <jeremyrcoyle@gmail.com> | 
| Description: | A general framework for the application of cross-validation schemes to particular functions. By allowing arbitrary lists of results, origami accommodates a range of cross-validation applications. This implementation was first described by Coyle and Hejazi (2018) <doi:10.21105/joss.00512>. | 
| Depends: | R (≥ 3.0.0), | 
| License: | GPL-3 | 
| URL: | https://tlverse.org/origami/ | 
| BugReports: | https://github.com/tlverse/origami/issues | 
| Encoding: | UTF-8 | 
| Imports: | abind, methods, data.table, assertthat, future, future.apply, listenv | 
| Suggests: | testthat, class, rmarkdown, knitr, stringr, glmnet, forecast, randomForest | 
| VignetteBuilder: | knitr | 
| RoxygenNote: | 7.2.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2022-10-19 22:38:23 UTC; jrcoyle | 
| Author: | Jeremy Coyle | 
| Repository: | CRAN | 
| Date/Publication: | 2022-10-19 23:22:36 UTC | 
Check ID and Time Compatibility
Description
Check ID and Time Compatibility
Usage
check_id_and_time(id, time)
Arguments
| id | An optional vector of unique identifiers corresponding to the time vector. These can be used to subset the time vector. | 
| time | An optional vector of integers of time points observed for each subject in the sample. | 
Combine Results from Different Folds
Description
Applies combiners: functions that collapse across a list of
similarly structured results, to a list of such lists.
Usage
combine_results(results, combiners = NULL, smart_combiners = TRUE)
Arguments
| results | A  | 
| combiners | A  | 
| smart_combiners | A  | 
Details
In theory you should never call this function directly, because it is called automatically by cross_validate. The defaults, combiners guessed based on data type, should work in most cases.
Value
A list of combined results.
See Also
Combiners
Description
Combiners are functions that collapse across a list of similarly structured results. These are standard idioms for combining lists of certain data types.
Usage
combiner_rbind(x)
combiner_c(x)
combiner_factor(x)
combiner_array(x)
Arguments
| x | A  | 
Value
A combined results object.
Main Cross-Validation Function
Description
Applies cv_fun to the folds using future_lapply and combines
the results across folds using combine_results.
Usage
cross_validate(
  cv_fun,
  folds,
  ...,
  use_future = TRUE,
  .combine = TRUE,
  .combine_control = list(),
  .old_results = NULL
)
Arguments
| cv_fun | A function that takes a 'fold' as it's first argument and
returns a list of results from that fold. NOTE: the use of an argument
named 'X' is specifically disallowed in any input function for compliance
with the functions  | 
| folds | A list of folds to loop over generated using
 | 
| ... | Other arguments passed to  | 
| use_future | A  | 
| .combine | A  | 
| .combine_control | A  | 
| .old_results | A  | 
Value
A list of results, combined across folds.
Examples
###############################################################################
# This example explains how to use the cross_validate function naively.
###############################################################################
data(mtcars)
# resubstitution MSE
r <- lm(mpg ~ ., data = mtcars)
mean(resid(r)^2)
# function to calculate cross-validated squared error
cv_lm <- function(fold, data, reg_form) {
  # get name and index of outcome variable from regression formula
  out_var <- as.character(unlist(stringr::str_split(reg_form, " "))[1])
  out_var_ind <- as.numeric(which(colnames(data) == out_var))
  # split up data into training and validation sets
  train_data <- training(data)
  valid_data <- validation(data)
  # fit linear model on training set and predict on validation set
  mod <- lm(as.formula(reg_form), data = train_data)
  preds <- predict(mod, newdata = valid_data)
  # capture results to be returned as output
  out <- list(
    coef = data.frame(t(coef(mod))),
    SE = ((preds - valid_data[, out_var_ind])^2)
  )
  return(out)
}
# replicate the resubstitution estimate
resub <- make_folds(mtcars, fold_fun = folds_resubstitution)[[1]]
resub_results <- cv_lm(fold = resub, data = mtcars, reg_form = "mpg ~ .")
mean(resub_results$SE)
# cross-validated estimate
folds <- make_folds(mtcars)
cv_results <- cross_validate(
  cv_fun = cv_lm, folds = folds, data = mtcars,
  reg_form = "mpg ~ ."
)
mean(cv_results$SE)
###############################################################################
# This example explains how to use the cross_validate function with
# parallelization using the framework of the future package.
###############################################################################
suppressMessages(library(data.table))
library(future)
data(mtcars)
set.seed(1)
# make a lot of folds
folds <- make_folds(mtcars, fold_fun = folds_bootstrap, V = 1000)
# function to calculate cross-validated squared error for linear regression
cv_lm <- function(fold, data, reg_form) {
  # get name and index of outcome variable from regression formula
  out_var <- as.character(unlist(str_split(reg_form, " "))[1])
  out_var_ind <- as.numeric(which(colnames(data) == out_var))
  # split up data into training and validation sets
  train_data <- training(data)
  valid_data <- validation(data)
  # fit linear model on training set and predict on validation set
  mod <- lm(as.formula(reg_form), data = train_data)
  preds <- predict(mod, newdata = valid_data)
  # capture results to be returned as output
  out <- list(
    coef = data.frame(t(coef(mod))),
    SE = ((preds - valid_data[, out_var_ind])^2)
  )
  return(out)
}
plan(sequential)
time_seq <- system.time({
  results_seq <- cross_validate(
    cv_fun = cv_lm, folds = folds, data = mtcars,
    reg_form = "mpg ~ ."
  )
})
plan(multicore)
time_mc <- system.time({
  results_mc <- cross_validate(
    cv_fun = cv_lm, folds = folds, data = mtcars,
    reg_form = "mpg ~ ."
  )
})
if (availableCores() > 1) {
  time_mc["elapsed"] < 1.2 * time_seq["elapsed"]
}
Build a Fold Object from a Fold Vector
Description
For V-fold type cross-validation. This takes a fold vector (validation set IDs) and builds a fold object for fold V.
Usage
fold_from_foldvec(v, folds)
Arguments
| v | An identifier of the fold in which observations fall for cross-validation. | 
| folds | A vector of the fold status for each observation for cross-validation. | 
See Also
Other fold generation functions: 
fold_funs,
folds2foldvec(),
make_folds(),
make_repeated_folds()
Cross-Validation Schemes
Description
These functions represent different cross-validation schemes that can be
used with origami. They should be used as options for the
fold_fun argument to make_folds, which will call the
requested function specify n, based on its arguments, and pass any
remaining arguments (e.g. V or pvalidation) on.
Usage
folds_vfold(n, V = 10L)
folds_resubstitution(n)
folds_loo(n)
folds_montecarlo(n, V = 1000L, pvalidation = 0.2)
folds_bootstrap(n, V = 1000L)
folds_rolling_origin(n, first_window, validation_size, gap = 0L, batch = 1L)
folds_rolling_window(n, window_size, validation_size, gap = 0L, batch = 1L)
folds_rolling_origin_pooled(
  n,
  t,
  id = NULL,
  time = NULL,
  first_window,
  validation_size,
  gap = 0L,
  batch = 1L
)
folds_rolling_window_pooled(
  n,
  t,
  id = NULL,
  time = NULL,
  window_size,
  validation_size,
  gap = 0L,
  batch = 1L
)
folds_vfold_rolling_origin_pooled(
  n,
  t,
  id = NULL,
  time = NULL,
  V = 10L,
  first_window,
  validation_size,
  gap = 0L,
  batch = 1L
)
folds_vfold_rolling_window_pooled(
  n,
  t,
  id = NULL,
  time = NULL,
  V = 10L,
  window_size,
  validation_size,
  gap = 0L,
  batch = 1L
)
Arguments
| n | An integer indicating the number of observations. | 
| V | An integer indicating the number of folds. | 
| pvalidation | A  | 
| first_window | An integer indicating the number of observations in the first training sample. | 
| validation_size | An integer indicating the number of points in the validation samples; should be equal to the largest forecast horizon. | 
| gap | An integer indicating the number of points not included in the training or validation samples. The default is zero. | 
| batch | An integer indicating increases in the number of time points added to the training set in each iteration of cross-validation. Applicable for larger time-series. The default is one. | 
| window_size | An integer indicating the number of observations in each training sample. | 
| t | An integer indicating the total amount of time to consider per time-series sample. | 
| id | An optional vector of unique identifiers corresponding to the time vector. These can be used to subset the time vector. | 
| time | An optional vector of integers of time points observed for each subject in the sample. | 
Value
A list of Folds.
See Also
Other fold generation functions: 
fold_from_foldvec(),
folds2foldvec(),
make_folds(),
make_repeated_folds()
Fold Helpers
Description
Accessors and indexers for the different parts of a fold.
Usage
training(x = NULL, fold = NULL)
validation(x = NULL, fold = NULL)
fold_index(x = NULL, fold = NULL)
Arguments
| x | an object to be indexed by a training set, validation set, or fold index. If missing, the index itself will be returned. | 
| fold | Fold; the fold used to do the indexing. If missing,  | 
Value
The elements of x corresponding to the indexes, or the
indexes themselves if x is missing.
See Also
Build a Fold Vector from a Fold Object
Description
For V-fold type cross-validation. This takes a fold object and returns a fold
vector (containing the validation set IDs) for use with other tools like
cv.glmnet.
Usage
folds2foldvec(folds)
Arguments
| folds | A  | 
See Also
Other fold generation functions: 
fold_from_foldvec(),
fold_funs,
make_folds(),
make_repeated_folds()
Flexible Guessing and Mapping for Combining Data Types
Description
Maps data types into standard combiners that should be sensible.
Usage
guess_combiner(result)
Arguments
| result | A single result; flexibly accepts several object classes. | 
Value
A function to combine a list of such results.
Convert ID Folds to Observation Folds
Description
This function convertsf olds that subset ids to folds that subset observations
Usage
id_folds_to_folds(idfolds, cluster_ids)
Arguments
| idfolds | folds that subset ids | 
| cluster_ids | a vector of cluster ids indicating which observations are in which clusters | 
Fold
Description
Functions to make a fold. Current representation is a simple list.
Usage
make_fold(v, training_set, validation_set)
Arguments
| v | An integer index of folds in the larger scheme. | 
| training_set | An integer vector of indexes corresponding to the training set. | 
| validation_set | An integer vector of indexes corresponding to the validation set. | 
Value
A list containing these elements.
See Also
Make List of Folds for cross-validation
Description
Generates a list of folds for a variety of cross-validation schemes.
Usage
make_folds(
  n = NULL,
  fold_fun = folds_vfold,
  cluster_ids = NULL,
  strata_ids = NULL,
  ...
)
Arguments
| n | - either an integer indicating the number of observations to
cross-validate over, or an object from which to guess the number of
observations; can also be computed from  | 
| fold_fun | - A function indicating the cross-validation scheme to use.
See  | 
| cluster_ids | - a vector of cluster ids. Clusters are treated as a unit – that is, all observations within a cluster are placed in either the training or validation set. | 
| strata_ids | - a vector of strata ids. Strata are balanced: insofar as possible the distribution in the sample should be the same as the distribution in the training and validation sets. | 
| ... | other arguments to be passed to  | 
Value
A list of folds objects. Each fold consists of a list with a
training index vector, a validation index vector, and a
fold_index (its order in the list of folds).
See Also
Other fold generation functions: 
fold_from_foldvec(),
fold_funs,
folds2foldvec(),
make_repeated_folds()
Repeated Cross-Validation
Description
Implementation of repeated window cross-validation: generates fold objects
for repeated cross-validation by making repeated calls to
make_folds and concatenating the results.
Usage
make_repeated_folds(repeats, ...)
Arguments
| repeats | An integer indicating the number of repeats. | 
| ... | Arguments passed to  | 
See Also
Other fold generation functions: 
fold_from_foldvec(),
fold_funs,
folds2foldvec(),
make_folds()
Wrap a Function in a Try Statement
Description
Function factory that generates versions of functions wrapped in try.
Usage
wrap_in_try(fun, ...)
Arguments
| fun | A  | 
| ... | Additional arguments passed to the previous argument  |