---
title: "Step and Cadence Analysis"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Step and Cadence Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

GGIR does not have a build-in step detection algorithm, but does facilitate the [embedding of an external function](https://wadpac.github.io/GGIR/articles/ExternalFunction.html) for event detection such as steps. Such algorithms typically count steps per epoch, where an epoch is usually a 5 second or larger time interval. Steps and cadence per epoch alone is not of direct value for research. Therefore, GGIR facilitates the extraction of summary statistics, which will be discussed below.


## Verisense algorithm

The only external algorithm for step detection currently available for GGIR is the Verisense algorithm, which was design for data collected on the wrist. The algorithm has been described and evaluated in studies by [Maylor 2022](https://doi.org/10.3390/s22249984) and [Rowlands 2022](https://doi.org/10.1080/02640414.2022.2147134).

The original [Verisense algortihm code](https://github.com/ShimmerEngineering/Verisense-Toolbox/tree/master/Verisense_step_algorithm) is not actively maintained at the time of writing this documentation. An [improved copy of the Verisense algortihms code](https://github.com/wadpac/GGIR/blob/master/user-scripts/verisense_count_steps.R) with minor bug fixes is part of the GGIR GitHub repository.


### How to use the Verisense algorithm?

To use the algorithm, copy-paste the following code to the top of your R script and update the file path in the first line to the Verisense function file on your computer. A discussion of this code will follow further down.

```
source("C:/path_to_file/verisense_count_steps.R")
myfun = list(FUN = verisense_count_steps,
             parameters = c(4, 4, 20, -1.0, 4, 4, 0.01, 1.25),
             expected_sample_rate = 15,
             expected_unit = "g",
             colnames = "step_count",
             outputres = 1,
             minlength = 1,
             outputtype = "numeric",
             aggfunction = sum, # aggregate step by taking sum
             timestamp = FALSE,
             reporttype = "event",
             ilevels = c(0, 100, 250), # acceleration levels to be used
             clevels = c(0, 80, 100, 120), # cadence levels to be used
             qlevels = c(0.5, 0.9), # quantiles to be used
             ebout.dur = c(1, 5, 10), # event bout duration to extract
             ebout.th.cad = 30, # event bout threshold for cadence
             ebout.th.acc = 50, # event bout threshold for acceleration
             ebout.criter = 0.8, # event bout criteria (same as boutcriter)
             ebout.condition = "AND") # event bout logic (see below)
   
```

Note the that `parameters = c(4, 4, 20, -1.0, 4, 4, 0.01, 1.25)` is based on Rowlands et al. "Stepping up with GGIR" from 2022.

Next, use GGIR as you normally do, but include `myfun = myfun` as input argument. If we would rely on all GGIR default parameter values and segments days in 6 equal parts then this would look as follows:

```
GGIR(myfun = myfun,
     outputdir = "C:/myresults",
     datadir = "C:/mydata",
     qwindow = c(0, 4, 8, 12, 16, 20, 24))
```


## Time resolution for deriving statistics

### Per day or groups of days

All step and cadence statistics are derived as summary per recording day and per recording as a whole, and stored in **results/part2_dayeventsummary.csv** and **results/part2_eventsummary.csv**, respectively. The column name extensions **AD_**, **WD_**, **WE_**, **WWD_**, and **WWE_** inside the file **part2_eventsummary.csv** have the same meaning as elsewhere in the [GGIR part2 output](https://wadpac.github.io/GGIR/articles/chapter7_DescribingDataWithoutKnowingSleep.html?q=WWD#related-output), e.g. AD means aggregation across all recording days.

### Per day segment

When you use GGIR's [day segment analysis functionality](https://wadpac.github.io/GGIR/articles/TutorialDaySegmentAnalyses.html) the statistics are also derived per time segment within a day. The time segment for which each statistic is derived is clarified in the ending of a column name as **\_startHour-endHourhr**. For example, **\_0-14hr** indicates that it was extracted from data between midnight and 14:00.


## Step count and cadence summary statistics

Total step count is stored with **tot_step_count** in the column name. Cadence is derived in the unit 'steps per minute', even if the epoch size for step detection is not 1 minute. Cadence is abbreviated as **cad** in the column name. Mean cadence is indicated as **mn_cad**.

Note that [Chapter 6](https://wadpac.github.io/GGIR/articles/chapter6_DataImputation.html) discusses how acceleration metrics are imputed based on a mean of valid time points on other days of the recording. For steps and cadence this would be problematic as the average between walking and non-walking is not informative. Therefore, imputation of steps is done by the median instead and by that represents typical stepping behaviour at that time of the day.


### Stratified by acceleration and/or cadence level

#### Per acceleration level

As discussed in [Chapter 7](https://wadpac.github.io/GGIR/articles/chapter7_DescribingDataWithoutKnowingSleep.html) accelerometer data can be described in terms of acceleration levels. As such, we can also describe step count and cadence per acceleration level.

To control this aspect, include in your `myfun` object the item `ilevels` as a numeric vector with the acceleration values to define the acceleration levels, e.g. `ilevels= c(0, 50)`.

Output:

- Mean cadence for the acceleration range 0-50 m_g_ when working with acceleration metric `ENMO` will be stored with column name **mn_cad_acc0-50mg_ENMO**. Note: The step bout detection functionality automatically uses the acceleration metrics specified by the user. So, if you want to try it out with both ENMO, ENMOa, and MAD metric then that should work.

- Total step count for the acceleration range 0-50m_g_ when working with acceleration metric `ENMO` will be stored with column name **tot_step_count_acc0-50mg_ENMO**. Here **step_count** is the value as provided via `myfun` parameter `colnames`.


#### Per cadence level with absolute thresholds

Similarly, we can also describe the data based on cadence level and use absolute cadence thresholds. 

To control this aspect, include in your `myfun` object item `clevels` as a numeric vector with the cadence values that define the cadence levels, e.g. `clevels = c(0, 30, 120)`.

Output:

- Total step count for the cadence range 0-30spm will be stored with column name **tot_step_count_cad0-30spm**.
- Time spent in the cadence range 0-30spm will be stored with column name **dur_cad0-30spm**.
- Mean acceleration in the cadence range 0-30spm will be stored with column name **mn_ENMO_cad0-30spm**

#### Per cadence level with relative thresholds

Instead of setting absolute thresholds to define cadence levels we can use percentiles.

To control this aspect, include in your `myfun` object the item `qlevels` as a numeric vector with fractions of 1 to define the quantiles, e.g. `qlevels = c(0.5, 0.9)`.

Output:

- Cadence value for the 50th percentile of the cadence distribution, e.g. `cad_p50` for `qlevels = 0.5` and `cad_p90` for `qlevels = 0.9`.


#### Per most and least active consecutive cadence time window

The same statistics can also be derived from the consecutive X hours with the least and most cadence per (segment of a) day.

To control this aspect, include in your `myfun` object the item `winhr`, e.g. `winhr = c(5, 10)` which works the same as [`winhr` in GGIR itself](https://wadpac.github.io/GGIR/articles/GGIRParameters.html#winhr).

Output:

- The X hour window with the least amount of cadence will be stored with column name **LX_cad_**, e.g. `L10_cad_`.
- The X hour window with the most amount of cadence will be stored with column name **MX_cad_**, e.g. `M5_cad_`.

This is then followed by:

- **mean_Y** where Y reflects the metrics being average, which can be acceleration metrics, step count per epoch or the mean cadence. For example, "L10_cad_mean_ENMO_mg_0-24hr", "L10_cad_mean_step_count_mg_0-24hr", "L10_cad_meancad_0-24hr".
- **L10hr_** indicates the timing of the 10 hours with the least cadence.

### Walking bouts

Similar to the detection of physical activity bouts as discussed in [Chapter 11](https://wadpac.github.io/GGIR/articles/chapter11_DescribingDataCutPoints.html) we can detect bouts of stepping behaviour. In GGIR we define these based on duration, minimum cadence AND/OR minimum acceleration and the fraction of the bout for which these criteria need to be met.

The step bout detection functionality automatically uses all acceleration metrics specified by the user. So, if you want to try it out with both ENMO, ENMOa, and MAD metric then that should work.

To control the criteria for bout detection, include in your `myfun` object the items:

- `ebout.dur` a numeric vector of length 1 or larger with the minimum bout duration(s) of interest, e.g. `ebout.dur = c(1, 5, 10)`.
- `ebout.th.cad` a single number being the minimum cadence value in steps per minute, e.g. `ebout.th.cad = 30`.
- `ebout.th.acc` a single number being the minimum acceleration value in steps per minute, e.g. `ebout.th.acc = 50`.
- `ebout.criter` a single number as a fraction of 1 being the faction of a bout for which the inclusion criteria need to be met identical to `boutcriter` in the context of physical activity bouts, e.g. `ebout.criter = 0.8`.
- `ebout.condition` whether cadence and acceleration condition are both required (`ebout.condition = "AND"`) or either of the conditions is required (`ebout.condition = "OR"`). If both cadence and acc need to meet a thresholds then fill in `"AND"`. If it also acceptable if only one of the thresholds is met then fill in `"OR"`. If you do not want cadence or acceleration to be used in the equation then simply set `ebout.th.cad = 0` or `ebout.th.acc = 0`.

Output:

Columns names related to detected bouts start with the word **Bout_**, e.g. "Bout_meandur_E5S_B10M80%_cadT30_AND_accT50_ENMO", "Bout_number_E5S_B5M80%_cadT30_AND_accT50_ENMO" and "Bout_totdur_E5S_B5M80%_cadT30_AND_accT50_ENMO".

- Average bout duration is indicated with **Bout_meandur_** in the column name.
- The total duration is indicated with **Bout_totdur_** in the column name.
- The number of bouts is indicated with **Bout_number_** in the column name.
- Underlying epoch size is indicated with **E5S** in the column name where **5S** refers to a 5sec epochs.
- Minimum bout duration is indicated with **B10M** in the column name where **10M** would refer to a 10 minutes.
- The minimum fraction for which the bout conditions need to be met is indicated as a percentage  in the column name, e.g. **80%**.
- The minimum cadence and acceleration values are presented in the column name as **cadT30** and **accT50_ENMO** when using a 30 and 50 thresholds, respectively. 
- The setting of parameter **ebout.condition** is directly copied into the variable name as **AND** or **OR**.