This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via
library("mlr3spatiotempcv"), the spatiotemporal resampling
methods and example tasks provided by {mlr3spatiotempcv} are available
to the user alongside the default {mlr3} resampling methods and
tasks.
To make use of spatial resampling methods, a {mlr3} task that is
aware of its spatial characteristic needs to be created. Two
Task child classes exist in {mlr3spatiotempcv} for this
purpose:
TaskClassifSTTaskRegrSTTo create one of these, you have multiple options:
Task directly via
$new() - this only works for data.table backends (!)as_task_* converters (e.g. if your data is
stored in an sf object)We recommend the latter, as the as_task_* converters aim
to make task construction easier, e.g., by creating the
DataBackend (which is required to create a Task in {mlr3})
automatically and setting the crs and
coordinate_names fields. Let’s assume your (point) data is
stored in with an sf object, which is a common scenario for
spatial analysis in R.
# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)
# create `TaskClassifST` from `sf` object
task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")You can also use a plain data.frame. In this case,
crs and coordinate_names need to be passed
along explicitly as they cannot be inferred directly from the
sf object:
task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides",
  positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)The *ST task family prints a subset of the coordinates
by default:
print(task)
#> 
#> ── <TaskClassifST> (751x11) ────────────────────────────────────────────────────
#> • Target: slides
#> • Properties: twoclass
#> • Features (10):
#>   • dbl (10): carea, cslope, dem, distdeforest, distroad, distslidespast,
#>   hcurv, log.carea, slope, vcurv
#> * Coordinates:
#>             x       y
#>         <num>   <num>
#>   1: 712882.5 9560002
#>   2: 715232.5 9559582
#>   3: 715392.5 9560172
#>   4: 715042.5 9559312
#>   5: 715382.5 9560142
#>  ---                 
#> 747: 714472.5 9558482
#> 748: 713142.5 9560992
#> 749: 713322.5 9560562
#> 750: 715392.5 9557932
#> 751: 713802.5 9560862All *ST tasks can be treated as their super class
equivalents TaskClassif or TaskRegr in
subsequent {mlr3} modeling steps.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
TaskClassifST
TaskRegrST
mlr_reflections$task_types
#> Key: <type>
#>            type          package             task        learner
#>          <char>           <char>           <char>         <char>
#> 1:      classif             mlr3      TaskClassif LearnerClassif
#> 2:   classif_st mlr3spatiotempcv    TaskClassifST LearnerClassif
#> 3:         regr             mlr3         TaskRegr    LearnerRegr
#> 4:      regr_st mlr3spatiotempcv       TaskRegrST    LearnerRegr
#> 5: unsupervised             mlr3 TaskUnsupervised        Learner
#>           prediction       prediction_data        measure
#>               <char>                <char>         <char>
#> 1: PredictionClassif PredictionDataClassif MeasureClassif
#> 2: PredictionClassif PredictionDataClassif MeasureClassif
#> 3:    PredictionRegr    PredictionDataRegr    MeasureRegr
#> 4:    PredictionRegr    PredictionDataRegr    MeasureRegr
#> 5:              <NA>                  <NA>           <NA>coordinate
space
time
mlr_reflections$task_col_roles
#> $regr
#> [1] "feature"         "target"          "name"            "order"          
#> [5] "stratum"         "group"           "offset"          "weights_learner"
#> [9] "weights_measure"
#> 
#> $classif
#> [1] "feature"         "target"          "name"            "order"          
#> [5] "stratum"         "group"           "offset"          "weights_learner"
#> [9] "weights_measure"
#> 
#> $unsupervised
#> [1] "feature" "name"    "order"  
#> 
#> $classif_st
#>  [1] "feature"         "target"          "name"            "order"          
#>  [5] "stratum"         "group"           "offset"          "weights_learner"
#>  [9] "weights_measure" "coordinate"      "space"           "time"           
#> 
#> $regr_st
#>  [1] "feature"         "target"          "name"            "order"          
#>  [5] "stratum"         "group"           "offset"          "weights_learner"
#>  [9] "weights_measure" "coordinate"      "space"           "time"mlr_resampling_spcv_block
mlr_resampling_spcv_buffer
mlr_resampling_spcv_coords
mlr_resampling_spcv_knndm
mlr_resampling_spcv_disc
mlr_resampling_spcv_tiles
mlr_resampling_spcv_env
mlr_resampling_sptcv_cstf
and their respective repeated versions. See
as.data.table(mlr_resamplings) for the full dictionary.
tsk("ecuador") (spatial, classif)
tsk("cookfarm_mlr3") (spatiotemp, regr)
The following table lists all spatiotemporal methods implemented in
{mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific
references. All methods besides "spcv_buffer" also have a
corresponding “repeated” method.
| Category | (Package) Method Name | Reference | mlr3 Notation | 
|---|---|---|---|
| Buffering, spatial | (blockCV) Spatial Buffering | Valavi et al. (2018) | mlr_resamplings_spcv_buffer | 
| Buffering, spatial | (sperrorest) Spatial Disc | Brenning (2012) | mlr_resamplings_spcv_disc | 
| Blocking, spatial | (blockCV) Spatial Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_block | 
| Blocking, spatial | (sperrorest) Spatial Tiles | Valavi et al. (2018) | mlr_resamplings_spcv_tiles | 
| Clustering, spatial | (sperrorest) Spatial CV | Brenning (2012) | mlr_resamplings_spcv_coords | 
| Clustering, spatial | (CAST) KNNDM | Linnenbrink et al. (2023) | mlr_resamplings_spcv_knndm | 
| Clustering, feature-space | (blockCV) Environmental Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_env | 
| Grouping, predefined inds | (mlr3) Predefined partitions | mlr_resamplings_custom_cv | |
| Grouping, spatiotemporal | (mlr3) via col_roles"group" | mlr_resamplings_cv,Task$set_col_roles(<variable>, "group") | |
| Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | Meyer et al. (2018) | mlr_resamplings_sptcv_cstf,Task$set_col_roles(<variable>, "space|time") |