| Title: | Random Forest with Multivariate Longitudinal Predictors | 
| Version: | 1.2.0 | 
| Description: | Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi:10.1177/09622802231206477>. | 
| Imports: | DescTools, cli, cmprsk, doParallel, doRNG, foreach, ggplot2, lcmm, methods, pbapply, pec, prodlim, stringr, survival, zoo | 
| Depends: | R (≥ 4.4.0) | 
| License: | LGPL (≥ 3) | 
| LazyData: | true | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| URL: | https://github.com/anthonydevaux/DynForest | 
| BugReports: | https://github.com/anthonydevaux/DynForest/issues | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2024-10-23 09:22:36 UTC; antho | 
| Author: | Anthony Devaux | 
| Maintainer: | Anthony Devaux <anthony.devauxbarault@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-10-23 10:50:02 UTC | 
Compute the grouped importance of variables (gVIMP) statistic
Description
Compute the grouped importance of variables (gVIMP) statistic
Usage
compute_gvimp(
  dynforest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  group = NULL,
  ncores = NULL,
  seed = 1234
)
Arguments
| dynforest_obj | dynforest_obj  | 
| IBS.min | (Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0. | 
| IBS.max | (Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found. | 
| group | A list of groups with the name of the predictors assigned in each group | 
| ncores | Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1. | 
| seed | Seed to replicate results | 
Value
compute_gvimp() function returns a list with the following elements:
| Inputs | A list of 3 elements: Longitudinal,NumericandFactor. Each element contains the names of the predictors | 
| group | A list of each group defined in groupargument | 
| gVIMP | A numeric vector containing the gVIMP for each group defined in groupargument | 
| tree_oob_err | A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic | 
| IBS.range | A vector containing the IBS min and max | 
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2, seed = 1234)
Compute the Out-Of-Bag error (OOB error)
Description
Compute the Out-Of-Bag error (OOB error)
Usage
compute_ooberror(dynforest_obj, IBS.min = 0, IBS.max = NULL, ncores = NULL)
Arguments
| dynforest_obj | dynforest_obj  | 
| IBS.min | (Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0. | 
| IBS.max | (Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found. | 
| ncores | Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1. | 
Value
compute_ooberror() function return a list with the following elements:
| data | A list containing the data used to grow the trees | 
| rf | A table with each tree in column. Provide multiple characteristics about the tree building | 
| type | Outcome type | 
| times | A numeric vector containing the time-to-event for all subjects | 
| cause | Indicating the cause of interest | 
| causes | A numeric vector containing the causes indicator | 
| Inputs | A list of 3 elements: Longitudinal,NumericandFactor. Each element contains the names of the predictors | 
| Longitudinal.model | A list of longitudinal markers containing the formula used for modeling in the random forest | 
| param | A list containing the hyperparameters | 
| oob.err | A numeric vector containing the OOB error for each subject | 
| oob.pred | Outcome prediction for all subjects | 
| IBS.range | A vector containing the IBS min and max | 
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Compute OOB error
res_dyn_OOB <- compute_ooberror(dynforest_obj = res_dyn, ncores = 2)
Extract characteristics from the trees building process
Description
Extract characteristics from the trees building process
Usage
compute_vardepth(dynforest_obj)
Arguments
| dynforest_obj | dynforest_obj  | 
Value
compute_vardepth function return a list with the following elements:
| min_depth | A table providing for each feature in row: the average depth and the rank | 
| var_node_depth | A table providing for each tree in column the minimal depth for each feature in row. NA indicates that the feature was not used for the corresponding tree | 
| var_count | A table providing for each tree in column the number of times where the feature is used (in row). 0 value indicates that the feature was not used for the corresponding tree | 
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Run compute_vardepth function
res_varDepth <- compute_vardepth(res_dyn)
Compute the importance of variables (VIMP) statistic
Description
Compute the importance of variables (VIMP) statistic
Usage
compute_vimp(
  dynforest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  ncores = NULL,
  seed = 1234
)
Arguments
| dynforest_obj | dynforest_obj  | 
| IBS.min | (Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0. | 
| IBS.max | (Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found. | 
| ncores | Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1. | 
| seed | Seed to replicate results | 
Value
compute_vimp() function returns a list with the following elements:
| Inputs | A list of 3 elements: Longitudinal,NumericandFactor. Each element contains the names of the predictors | 
| Importance | A list of 3 elements: Longitudinal,NumericandFactor. Each element contains a numeric vector of VIMP statistic predictor inInputsvalue | 
| tree_oob_err | A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic | 
| IBS.range | A vector containing the IBS min and max | 
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2, seed = 1234)
data_simu1 dataset
Description
Simulated dataset 1 with continuous outcome
Format
Longitudinal dataset with 1200 rows and 13 columns for 200 subjects
- id
- Subject identifier 
- time
- Time measurement 
- cont_covar1
- Continuous time-fixed predictor 1 
- cont_covar2
- Continuous time-fixed predictor 2 
- bin_covar1
- Binary time-fixed predictor 1 
- bin_covar2
- Binary time-fixed predictor 2 
- marker1
- Continuous time-dependent predictor 1 
- marker2
- Continuous time-dependent predictor 2 
- marker3
- Continuous time-dependent predictor 3 
- marker4
- Continuous time-dependent predictor 4 
- marker5
- Continuous time-dependent predictor 5 
- marker6
- Continuous time-dependent predictor 6 
- Y_res
- Continuous outcome 
Examples
data(data_simu1)
data_simu2 dataset
Description
Simulated dataset 2 with continuous outcome
Format
Longitudinal dataset with 1200 rows and 13 columns for 200 subjects
- id
- Subject identifier 
- time
- Time measurement 
- cont_covar1
- Continuous time-fixed predictor 1 
- cont_covar2
- Continuous time-fixed predictor 2 
- bin_covar1
- Binary time-fixed predictor 1 
- bin_covar2
- Binary time-fixed predictor 2 
- marker1
- Continuous time-dependent predictor 1 
- marker2
- Continuous time-dependent predictor 2 
- marker3
- Continuous time-dependent predictor 3 
- marker4
- Continuous time-dependent predictor 4 
- marker5
- Continuous time-dependent predictor 5 
- marker6
- Continuous time-dependent predictor 6 
- Y_res
- Continuous outcome 
Examples
data(data_simu2)
Random forest with multivariate longitudinal endogenous covariates
Description
Build a random forest using multivariate longitudinal endogenous covariates
Usage
dynforest(
  timeData = NULL,
  fixedData = NULL,
  idVar = NULL,
  timeVar = NULL,
  timeVarModel = NULL,
  Y = NULL,
  ntree = 200,
  mtry = NULL,
  nodesize = 1,
  minsplit = 2,
  cause = 1,
  nsplit_option = "quantile",
  ncores = NULL,
  seed = 1234,
  verbose = TRUE
)
Arguments
| timeData | A data.frame containing the id and time measurements variables and the time-dependent predictors. | 
| fixedData | A data.frame containing the id variable and the time-fixed predictors. Categorical variables should be characterized as factor. | 
| idVar | A character indicating the name of variable to identify the subjects | 
| timeVar | A character indicating the name of time variable | 
| timeVarModel | A list for each time-dependent predictors containing a list of formula for fixed and random part from the mixed model | 
| Y | A list of output which should contain:  | 
| ntree | Number of trees to grow. Default value set to 200. | 
| mtry | Number of candidate variables randomly drawn at each node of the trees. This parameter should be tuned by minimizing the OOB error. Default is defined as the square root of the number of predictors. | 
| nodesize | Minimal number of subjects required in both child nodes to split. Cannot be smaller than 1. | 
| minsplit | (Only with survival outcome) Minimal number of events required to split the node. Cannot be smaller than 2. | 
| cause | (Only with competing events) Number indicates the event of interest. | 
| nsplit_option | A character indicates how the values are chosen to build the two groups for the splitting rule (only for continuous predictors). Values are chosen using deciles ( | 
| ncores | Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1. | 
| seed | Seed to replicate results | 
| verbose | A logical controlling the function progress. Default is  | 
Details
The function currently supports survival (competing or single event), continuous or categorical outcome.
FUTUR IMPLEMENTATIONS:
- Continuous longitudinal outcome 
- Functional data analysis 
Value
dynforest function returns a list with the following elements:
| data | A list containing the data used to grow the trees | 
| rf | A table with each tree in column. Provide multiple characteristics about the tree building | 
| type | Outcome type | 
| times | A numeric vector containing the time-to-event for all subjects | 
| cause | Indicating the cause of interest | 
| causes | A numeric vector containing the causes indicator | 
| Inputs | A list of 3 elements: Longitudinal,NumericandFactor. Each element contains the names of the predictors | 
| Longitudinal.model | A list of longitudinal markers containing the formula used for modeling in the random forest | 
| param | A list containing the hyperparameters | 
| comput.time | Computation time | 
Author(s)
Anthony Devaux (anthony.devauxbarault@gmail.com)
References
- Devaux A., Helmer C., Genuer R., Proust-Lima C. (2023). Random survival forests with multivariate longitudinal endogenous covariates. SMMR doi:10.1177/09622802231206477 
- Devaux A., Proust-Lima C., Genuer R. (2023). Random Forests for time-fixed and time-dependent predictors: The DynForest R package. arXiv doi:10.48550/arXiv.2302.02670 
See Also
summary.dynforest() compute_ooberror() compute_vimp() compute_gvimp() predict.dynforest() plot.dynforest()
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
Extract some information about the split for a tree by user
Description
Extract some information about the split for a tree by user
Usage
get_tree(dynforest_obj, tree)
Arguments
| dynforest_obj | dynforest_obj  | 
| tree | Integer indicating the tree identifier | 
Value
A table sorted by the node/leaf identifier with each row representing a node/leaf. Each column provides information about the splits:
| type | The nature of the predictor ( Longitudinalfor longitudinal predictor,Numericfor continuous predictor orFactorfor categorical predictor) if the node was split,Leafotherwise | 
| var_split | The predictor used for the split defined by its order in timeDataandfixedData | 
| feature | The feature used for the split defined by its position in random statistic | 
| threshold | The threshold used for the split (only with LongitudinalandNumeric). No information is returned forFactor | 
| N | The number of subjects in the node/leaf | 
| Nevent | The number of events of interest in the node/leaf (only with survival outcome) | 
| depth | the depth level of the node/leaf | 
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Extract split information from tree 4
res_tree4 <- get_tree(dynforest_obj = res_dyn, tree = 4)
Extract nodes identifiers for a given tree
Description
Extract nodes identifiers for a given tree
Usage
get_treenodes(dynforest_obj, tree = NULL)
Arguments
| dynforest_obj | dynforest_obj  | 
| tree | Integer indicating the tree identifier | 
Value
Extract nodes identifiers for a given tree
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Extract nodes identifiers for a given tree
get_treenodes(dynforest_obj = res_dyn, tree = 1)
pbc2 dataset
Description
pbc2 data from Mayo clinic
Format
Longitudinal dataset with 1945 rows and 19 columns for 312 patients
- id
- Patient identifier 
- time
- Time measurement 
- ascites
- Presence of ascites (Yes/No) 
- hepatomegaly
- Presence of hepatomegaly (Yes/No) 
- spiders
- Blood vessel malformations in the skin (Yes/No) 
- edema
- Edema levels (No edema/edema no diuretics/edema despite diuretics) 
- serBilir
- Level of serum bilirubin 
- serChol
- Level of serum cholesterol 
- albumin
- Level of albumin 
- alkaline
- Level of alkaline phosphatase 
- SGOT
- Level of aspartate aminotransferase 
- platelets
- Platelet count 
- prothrombin
- Prothrombin time 
- histologic
- Histologic stage of disease 
- drug
- Drug treatment (D-penicillmain/Placebo) 
- age
- Age at enrollment 
- sex
- Sex of patient 
- years
- Time-to-event in years 
- event
- Event indicator: 0 (alive), 1 (transplanted) and 2 (dead) 
Source
pbc2 joineRML
Examples
data(pbc2)
Plot function in dynforest
Description
This function displays a plot of CIF for a given node and tree (for class dynforest), the most predictive variables with the minimal depth (for class dynforestvardepth), the variable importance (for class dynforestvimp) or the grouped variable importance (for class dynforestgvimp).
Usage
## S3 method for class 'dynforest'
plot(x, tree = NULL, nodes = NULL, id = NULL, max_tree = NULL, ...)
## S3 method for class 'dynforestvardepth'
plot(x, plot_level = c("predictor", "feature"), ...)
## S3 method for class 'dynforestvimp'
plot(x, PCT = FALSE, ordering = TRUE, ...)
## S3 method for class 'dynforestgvimp'
plot(x, PCT = FALSE, ...)
## S3 method for class 'dynforestpred'
plot(x, id = NULL, ...)
Arguments
| x | Object inheriting from classes  | 
| tree | For  | 
| nodes | For  | 
| id | For  | 
| max_tree | For  | 
| ... | Optional parameters to be passed to the low level function | 
| plot_level | For  | 
| PCT | For  | 
| ordering | For  | 
Value
plot() function displays: 
| With dynforestvardepth | the minimal depth for each predictor/feature | 
| With dynforestvimp | the VIMP for each predictor | 
| With dynforestgvimp | the grouped-VIMP for each given group | 
See Also
dynforest() compute_ooberror() compute_vimp() compute_gvimp() compute_vardepth()
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Plot estimated CIF at nodes 17 and 32
plot(x = res_dyn, tree = 1, nodes = c(17,32))
# Run var_depth function
res_varDepth <- compute_vardepth(res_dyn)
# Plot minimal depth
plot(x = res_varDepth, plot_level = "feature")
# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2)
# Plot VIMP
plot(x = res_dyn_VIMP, PCT = TRUE)
# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2)
# Plot gVIMP
plot(x = res_dyn_gVIMP, PCT = TRUE)
# Sample 5 subjects to predict the event
set.seed(123)
id_pred <- sample(id, 5)
# Create predictors objects
pbc2_pred <- pbc2[which(pbc2$id%in%id_pred),]
timeData_pred <- pbc2_pred[,c("id", "time", "serBilir", "SGOT", "albumin", "alkaline")]
fixedData_pred <- unique(pbc2_pred[,c("id","age","drug","sex")])
# Predict the CIF function for the new subjects with landmark time at 4 years
pred_dyn <- predict(object = res_dyn,
                    timeData = timeData_pred, fixedData = fixedData_pred,
                    idVar = "id", timeVar = "time",
                    t0 = 4)
# Plot predicted CIF for subjects 26 and 110
plot(x = pred_dyn, id = c(26, 110))
Prediction using dynamic random forests
Description
Prediction using dynamic random forests
Usage
## S3 method for class 'dynforest'
predict(
  object,
  timeData = NULL,
  fixedData = NULL,
  idVar,
  timeVar,
  t0 = NULL,
  ...
)
Arguments
| object | 
 | 
| timeData | A data.frame containing the id and time measurements variables and the time-dependent predictors. | 
| fixedData | A data.frame containing the id variable and the time-fixed predictors. Non-continuous variables should be characterized as factor. | 
| idVar | A character indicating the name of variable to identify the subjects | 
| timeVar | A character indicating the name of time variable | 
| t0 | Landmark time | 
| ... | Optional parameters to be passed to the low level function | 
Value
Return the outcome of interest for the new subjects: matrix of probability of event of interest in survival mode, average value in regression mode and most likely value in classification mode
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Sample 5 subjects to predict the event
set.seed(123)
id_pred <- sample(id, 5)
# Create predictors objects
pbc2_pred <- pbc2[which(pbc2$id%in%id_pred),]
timeData_pred <- pbc2_pred[,c("id", "time", "serBilir", "SGOT", "albumin", "alkaline")]
fixedData_pred <- unique(pbc2_pred[,c("id","age","drug","sex")])
# Predict the CIF function for the new subjects with landmark time at 4 years
pred_dyn <- predict(object = res_dyn,
                    timeData = timeData_pred, fixedData = fixedData_pred,
                    idVar = "id", timeVar = "time",
                    t0 = 4)
Print function
Description
This function displays a brief summary regarding the trees (for class dynforest), a data frame with variable importance (for class dynforestvimp) or the grouped variable importance (for class dynforestgvimp).
Usage
## S3 method for class 'dynforest'
print(x, ...)
## S3 method for class 'dynforestvimp'
print(x, ...)
## S3 method for class 'dynforestgvimp'
print(x, ...)
## S3 method for class 'dynforestvardepth'
print(x, ...)
## S3 method for class 'dynforestoob'
print(x, ...)
## S3 method for class 'dynforestpred'
print(x, ...)
Arguments
| x | Object inheriting from classes  | 
| ... | Optional parameters to be passed to the low level function | 
See Also
dynforest() compute_ooberror() compute_vimp() compute_gvimp() compute_vardepth() predict.dynforest()
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Print function
print(res_dyn)
# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2, seed = 1234)
# Print function
print(res_dyn_VIMP)
# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2, seed = 1234)
# Print function
print(res_dyn_gVIMP)
# Run var_depth function
res_varDepth <- compute_vardepth(res_dyn)
# Print function
print(res_varDepth)
Display the summary of dynforest
Description
Display the summary of dynforest
Usage
## S3 method for class 'dynforest'
summary(object, ...)
## S3 method for class 'dynforestoob'
summary(object, ...)
Arguments
| object | 
 | 
| ... | Optional parameters to be passed to the low level function | 
Value
Return some information about the random forest
See Also
Examples
data(pbc2)
# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)
# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)
pbc2_train <- pbc2[id_row,]
timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]
# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))
# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])
# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))
# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)
# Compute OOB error
res_dyn_OOB <- compute_ooberror(dynforest_obj = res_dyn, ncores = 2)
# dynforest summary
summary(object = res_dyn_OOB)