| Title: | Logit Leaf Model Classifier for Binary Classification | 
| Version: | 1.1.0 | 
| Date: | 2020-05-05 | 
| Author: | Arno De Caigny [aut, cre], Kristof Coussement [aut], Koen W. De Bock [aut] | 
| Maintainer: | Arno De Caigny <a.de-caigny@ieseg.fr> | 
| Description: | Fits the Logit Leaf Model, makes predictions and visualizes the output. (De Caigny et al., (2018) <doi:10.1016/j.ejor.2018.02.009>). | 
| Depends: | R (≥ 4.0.0) | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.0 | 
| Suggests: | mlbench | 
| Imports: | partykit, stats, stringr, RWeka, survey, reghelper, scales | 
| NeedsCompilation: | no | 
| Packaged: | 2020-05-08 06:13:51 UTC; zosia | 
| Repository: | CRAN | 
| Date/Publication: | 2020-05-08 06:30:03 UTC | 
Create Logit Leaf Model
Description
This function creates the logit leaf model. It takes a dataframe with numeric values as input and a corresponding vector with dependent values. Decision tree parameters threshold for pruning and number of observations per leaf can be set.
Usage
llm(X, Y, threshold_pruning = 0.25, nbr_obs_leaf = 100)
Arguments
| X | Dataframe containing numerical independent variables. | 
| Y | Numerical vector of dependent variable. Currently only binary classification is supported. | 
| threshold_pruning | Set confidence threshold for pruning. Default 0.25. | 
| nbr_obs_leaf | The minimum number of observations in a leaf node. Default 100. | 
Value
An object of class logitleafmodel, which is a list with the following components:
| Segment Rules | The decision rules that define segments. Use  | 
| Coefficients | The segment specific logistic regression coefficients. Use  | 
| Full decision tree for segmentation | The raw decision tree. Use  | 
| Observations per segment | The raw decision tree. Use  | 
| Incidence of dependent per segment | The raw decision tree. Use  | 
Author(s)
Arno De Caigny, a.de-caigny@ieseg.fr, Kristof Coussement, k.coussement@ieseg.fr and Koen W. De Bock, kdebock@audencia.com
References
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
See Also
predict.llm, table.llm.html, llm.cv
Examples
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
Runs v-fold cross validation with LLM
Description
In v-fold cross validation, the data are divided into v subsets of approximately equal size. Subsequently, one of the v data parts is excluded while the remaider of the data is used to create a logitleafmodel object. Predictions are generated for the excluded data part. The process is repeated v times.
Usage
llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)
Arguments
| X | Dataframe containing numerical independent variables. | 
| Y | Numerical vector of dependent variable. Currently only binary classification is supported. | 
| cv | An integer specifying the number of folds in the cross-validation. | 
| threshold_pruning | Set confidence threshold for pruning. Default 0.25. | 
| nbr_obs_leaf | The minimum number of observations in a leaf node. Default 100. | 
Value
An object of class llm.cv, which is a list with the following components:
| foldpred | a data frame with, per fold, predicted class membership probabilities for the left-out observations | 
| pred | a data frame with predicted class membership probabilities. | 
| foldclass | a data frame with, per fold, predicted classes for the left-out observations. | 
| class | a data frame with the predicted classes. | 
| conf | the confusion matrix which compares the real versus the predicted class memberships based on the class object. | 
Author(s)
Arno De Caigny, a.de-caigny@ieseg.fr, Kristof Coussement, k.coussement@ieseg.fr and Koen W. De Bock, kdebock@audencia.com
References
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
See Also
predict.llm, table.llm.html, llm
Examples
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Create the LLM with 5-cv
Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
Create Logit Leaf Model Prediction
Description
This function creates a prediction for an object of class logitleafmodel. It assumes a dataframe with numeric
values as input and an object of class logitleafmodel, which is the result of the llm function.
Currently only binary classification is supported.
Usage
## S3 method for class 'llm'
predict(object, X, ...)
Arguments
| object | An object of class logitleafmodel, as that created by the function llm. | 
| X | Dataframe containing numerical independent variables. | 
| ... | further arguments passed to or from other methods. | 
Value
Returns a dataframe containing a probablity for every instance based on the LLM model. Optional rownumbers can be added.
Author(s)
Arno De Caigny, a.de-caigny@ieseg.fr, Kristof Coussement, k.coussement@ieseg.fr and Koen W. De Bock, kdebock@audencia.com
References
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
See Also
Examples
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Use the model on the test dataset to make a prediction
PimaPrediction <- predict.llm(object = Pima.llm, X = Pimatest[,-c(9)])
## Optionally add the dependent to calculate performance statistics such as AUC
# PimaPrediction <- cbind(PimaPrediction, "diabetes" = Pimatest[,"diabetes"])
Create the HTML code for Logit Leaf Model visualization
Description
This function generates HTML code for a visualization of the logit leaf model based on the variable importance per variable category.
Usage
table.cat.llm.html(
  object,
  category_var_df,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2,
  methodvarimp = "Coef"
)
Arguments
| object | An object of class logitleafmodel, as that created by the function llm. | 
| category_var_df | dataframe containing a column called "iv" with the independent variables and a column called "cat" with the variable category names that is associated with every iv | 
| headertext | Allows to provide the table with a header. | 
| footertext | Allows to provide the table with a custom footer. | 
| roundingnumbers | An integer stating the number of decimals in the visualization. | 
| methodvarimp | Allows to determine the method to calculate the variable importance. There are 4 options: 1/ Variable coefficent (method = 'Coef) 2/ Standardized beta ('Beta') 3/ Wald statistic ('Wald') 4/ Likelihood Rate Test ('LRT') | 
Value
Generates HTML code for a visualization.
Author(s)
Arno De Caigny, a.de-caigny@ieseg.fr, Kristof Coussement, k.coussement@ieseg.fr and Koen W. De Bock, kdebock@audencia.com
References
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
See Also
Examples
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <- PimaIndiansDiabetes[idtrain,]
Pimatest <- PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Define the variable categories (note: the categories are only created for demonstration)
var_cat_df <- as.data.frame(cbind(names(PimaTrain[,-c(9)]),
c("cat_a","cat_a","cat_a","cat_a","cat_b","cat_b","cat_b","cat_b")), stringsAsFactors = FALSE)
names(var_cat_df) <- c("iv", "cat")
## Save the output of the model to a html file
Pima.Viz <- table.cat.llm.html(object = Pima.llm,category_var_df= var_cat_df,
 headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
Create the HTML code for Logit Leaf Model visualization
Description
This function generates HTML code for a visualization of the logit leaf model.
Usage
table.llm.html(
  object,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2
)
Arguments
| object | An object of class logitleafmodel, as that created by the function llm. | 
| headertext | Allows to provide the table with a header. | 
| footertext | Allows to provide the table with a custom footer. | 
| roundingnumbers | An integer stating the number of decimals in the visualization. | 
Value
Generates HTML code for a visualization.
Author(s)
Arno De Caigny, a.de-caigny@ieseg.fr, Kristof Coussement, k.coussement@ieseg.fr and Koen W. De Bock, kdebock@audencia.com
References
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
See Also
Examples
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Save the output of the model to a html file
Pima.Viz <- table.llm.html(object = Pima.llm, headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")