| Type: | Package | 
| Title: | Optimal Trees Ensembles for Regression, Classification and Class Membership Probability Estimation | 
| Version: | 1.0.1 | 
| Date: | 2020-04-18 | 
| Author: | Zardad Khan, Asma Gul, Aris Perperoglou, Osama Mahmoud, Werner Adler, Miftahuddin and Berthold Lausen | 
| Maintainer: | Zardad Khan <zardadkhan@awkum.edu.pk> | 
| Description: | Functions for creating ensembles of optimal trees for regression, classification (Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). (2019) <doi:10.1007/s11634-019-00364-9>) and class membership probability estimation (Khan, Z, Gul, A, Mahmoud, O, Miftahuddin, M, Perperoglou, A, Adler, W & Lausen, B (2016) <doi:10.1007/978-3-319-25226-1_34>) are given. A few trees are selected from an initial set of trees grown by random forest for the ensemble on the basis of their individual and collective performance. Three different methods of tree selection for the case of classification are given. The prediction functions return estimates of the test responses and their class membership probabilities. Unexplained variations, error rates, confusion matrix, Brier scores, etc. are also returned for the test data. | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| Imports: | randomForest,stats | 
| LazyData: | true | 
| RoxygenNote: | 7.1.0 | 
| NeedsCompilation: | no | 
| Packaged: | 2020-04-20 09:32:39 UTC; ZKHAN | 
| Repository: | CRAN | 
| Date/Publication: | 2020-04-20 10:50:07 UTC | 
Optimal Trees Ensembles for Regression, Classification and Class Membership Probability Estimation
Description
Functions for creating ensembles of optimal trees for regression, classification and class membership probability estimation are given. A few trees are selected from an initial set of trees grown by random forest for the ensemble on the basis of their individual and collective performance. The prediction functions return estimates of the test responses/class labels and their class membership probabilities. Unexplained variations, error rates, confusion matrix, Brier scores, etc. for the test data are also returned. Three different methods for tree selection are given for the case of classification.
Details
| Package: | OTE | 
| Type: | Package | 
| Version: | 1.0.1 | 
| Date: | 2020-04-18 | 
| License: | GPL-3 | 
Author(s)
Zardad Khan, Asma Gul, Aris Perperoglou, Osama Mahmoud, Werner Adler, Miftahuddin and Berthold Lausen Maintainer: Zardad Khan <zardadkhan@awkum.edu.pk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Exploring Relationships in Body Dimensions
Description
The Body data set consists of 507 observations on 24 predictor variables including age, weight, hight and 21 body dimensions. All the 507 observations are on individuals, 247 men and 260 women, in the age of twenties and thirties with a small number of old people. The class variable is gender having two categories male and female.
Usage
data(Body)Format
A data frame with 507 observations recorded on the following 25 variables.
- Biacrom
- The diameter of Biacrom taken in centimeter. 
- Biiliac
- "Pelvic breadth" measured in centimeter. 
- Bitro
- Bitrochanteric whole diameter measured in centimeter. 
- ChestDp
- The depth of Chest of a person in centimeter between sternum and spine at nipple level. 
- ChestD
- The diameter of Chest of a person in centimeter at nipple level. 
- ElbowD
- The sum of diameters of two Elbows in centimeter. 
- WristD
- Sum of two Wrists diameters in centimeter. 
- KneeD
- The sum of the diameters of two Knees in centimeter. 
- AnkleD
- The sum of the diameters of two Ankles in centimeter. 
- ShoulderG
- The wideness of shoulder in centimeter. 
- ChestG
- The circumference of chest centimeter taken at nipple line for males and just above breast tissue for females. 
- WaistG
- The circumference of Waist in centimeter taken as the average of contracted and relaxed positions at the narrowest part. 
- AbdG
- Girth of Abdomin in centimeter at umbilicus and iliac crest, where iliac crest is taken as a landmark. 
- HipG
- Girth of Hip in centimeter at level of bitrochanteric diameter. 
- ThighG
- Average of left and right Thigh girths in centimeter below gluteal fold. 
- BicepG
- Average of left and right Bicep girths in centimeter. 
- ForearmG
- Average of left and right Forearm girths, extended, palm up. 
- KneeG
- Average of left and right Knees girths over patella, slightly flexed position. 
- CalfG
- Average of right and left Calf maximum girths. 
- AnkleG
- Average of right and left Ankle minimum girths. 
- WristG
- Average of left and right minimum circumferences of Wrists. 
- Age
- Age in years 
- Weight
- Weight in kilogram 
- Height
- Height in centimeter 
- Gender
- Binary response with two categories; 1 - male, 0 - female 
Source
Heinz, G., Peterson, L.J., Johnson, R.W. and Kerk, C.J. (2003), “Exploring Relationships in Body Dimensions”, Journal of Statistics Education , 11.
References
Hurley, C. (2012), “ gclus: Clustering Graphics”, R package version 1.3.1, https://CRAN.R-project.org/package=gclus.
Examples
data(Body)
str(Body)
Radial Velocity of Galaxy NGC7531
Description
This data set is a record of radial velocity of a spiral galaxy that is measured at 323 points in its covered area of the sky. The positions of the measurements, that are in the range of seven slot crossing at the origin, are denoted by 4 variables.
Usage
data(Galaxy)Format
A data frame with 324 observations recorded on the following 5 variables.
- east.west
- It is the east-west coordinate where east is taken as negative, west is taken as positive and origin, (0,0), is close to the center of galaxy. 
- north.south
- It is the north-south coordinate where south is taken as negative, north is taken as positive and origin, (0,0), is near the center of galaxy. 
- angle
- It is the degrees of anti rotation (clockwise) from the slot horizon where the observation lies. 
- radial.position
- It is the signed distance from the center, (0,0), which is signed as negative if the east-west coordinate is negative. 
- velocity
- This is the response variable denoting the radial velocity(km/sec) of the galaxy. 
Source
Buta, R. (1987), “The Structure and Dynamics of Ringed Galaxies, III: Surface Photometry and Kinematics of the Ringed Nonbarred Spiral NGC7531” The Astrophysical J. Supplement Ser. 64. 1–37.
Examples
data(Galaxy)
str(Galaxy)
Train the ensemble of optimal trees for classification.
Description
This function selects optimal trees for classification from a total of t.initial trees grown by random forest. Number of trees in the initial set, t.initial, is specified by the user. If not specified then the default t.initial = 1000 is used.
Usage
OTClass(XTraining, YTraining, method=c("oob+independent","oob","sub-sampling"),
p = 0.1,t.initial = NULL,nf = NULL, ns = NULL, info = TRUE)
Arguments
| XTraining | An  | 
| YTraining | A vector of length  | 
| method | Method used in the selection of optimal trees.  | 
| p | Percent of the best  | 
| t.initial | Size of the initial set of classification trees. | 
| nf | Number of features to be sampled for spliting the nodes of the trees. If equal to  | 
| ns | Node size: Minimal number of samples in the nodes. If equal to  | 
| info | If  | 
Details
Large values are recommended for t.initial for better performance as possible under the available computational resources.
Value
A trained object consisting of the selected trees.
Note
Prior action needs to be taken in the case of missing values as the fuction can not handle them at the current version.
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Predict.OTClass, OTReg, OTProb
Examples
#load the data
  data(Body)
  data <- Body
#Divide the data into training and test parts
  set.seed(9123)
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:24]
  Y <- data[,25]
#Train OTClass on the training data
  Opt.Trees <- OTClass(XTraining=X[training,],YTraining = Y[training],
  t.initial=200,method="oob+independent")
#Predict on test data
  Prediction <- Predict.OTClass(Opt.Trees, X[testing,],YTesting=Y[testing])
#Objects returned
  names(Prediction)
  Prediction$Confusion.Matrix
  Prediction$Predicted.Class.Labels
Train the ensemble of optimal trees for class membership probability estimation.
Description
This function selects optimal trees for class membership probability estimation from a total of t.initial trees grown by random forest. Number of trees in the initial set, t.initial, is specified by the user. If not specified then the default t.initial = 1000 is used. 
Usage
OTProb(XTraining, YTraining, p = 0.2, t.initial = NULL,
      nf = NULL, ns = NULL, info = TRUE)
Arguments
| XTraining | An  | 
| YTraining | A vector of length  | 
| p | Percent of the best  | 
| t.initial | Size of the initial set of probability estimation trees. | 
| nf | Number of features to be sampled for spliting the nodes of the trees. If equal to  | 
| ns | Node size: Minimal number of samples in the nodes. If equal to  | 
| info | If  | 
Details
Large values are recommended for t.initial for better performance as possible under the available computational resources. 
Value
A trained object consisting of the selected trees.
Note
Prior action needs to be taken in case of missing values as the fuction can not handle them at the current version.
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Predict.OTProb, OTReg, OTClass
Examples
#load the data
  data(Body)
  data <- Body
  
#Divide the data into training and test parts
  set.seed(9123) 
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:24]
  Y <- data[,25]
  
#Train OTClass on the training data
  Opt.Trees <- OTProb(XTraining=X[training,],YTraining = Y[training],t.initial=200)
  
#Predict on test data
  Prediction <- Predict.OTProb(Opt.Trees, X[testing,],YTesting=Y[testing])
  
#Objects returned
  names(Prediction)
  Prediction$Brier.Score
  Prediction$Estimated.Probabilities
  Train the ensemble of optimal trees for regression.
Description
This function selects optimal trees for regression from a total of t.initial trees grown by random forest. Number of trees in the initial set, t.initial, is specified by the user. If not specified then the default t.initial = 1000 is used.
Usage
OTReg(XTraining, YTraining, p = 0.2, t.initial = NULL,
      nf = NULL, ns = NULL, info = TRUE)
Arguments
| XTraining | An  | 
| YTraining | A vector of length  | 
| p | Percent of the best  | 
| t.initial | Size of the initial set of regression trees. | 
| nf | Number of features to be sampled for spliting the nodes of the trees. If equal to  | 
| ns | Node size: Minimal number of samples in the nodes. If equal to  | 
| info | If  | 
Details
Large values are recommended for t.initial for better performance as possible under the available computational resources. 
Value
A trained object consisting of the selected trees for regression.
Note
Prior action needs to be taken in case of missing values as the fuction can not handle them at the current version.
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Predict.OTReg, OTProb, OTClass
Examples
# Load the data
  data(Galaxy)
  data <- Galaxy
  
#Divide the data into training and test parts
  set.seed(9123) 
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:4]
  Y <- data[,5]
  
#Train OTReg on the training data
  Opt.Trees <- OTReg(XTraining=X[training,],YTraining = Y[training],t.initial=200)
  
#Predict on test data
  Prediction <- Predict.OTReg(Opt.Trees, X[testing,],YTesting=Y[testing])
  
#Objects returned
  names(Prediction)
  Prediction$Unexp.Variations
  Prediction$Pr.Values
  Prediction$Trees.Used
Prediction function for the object returned by OTClass
Description
This function provides prediction for test data on the trained OTClass object for classification.
Usage
Predict.OTClass(Opt.Trees, XTesting, YTesting)
Arguments
| Opt.Trees | An object of class  | 
| XTesting | An  | 
| YTesting | Optional. A vector of length  | 
Value
A list with values
| Error.Rate | Error rate of the clssifier for the observations in XTesting. | 
| Confusion.Matrix | Confusion matrix based on the estimated class labels and the true class labels. | 
| Estimated.Class | A vector of length  | 
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Examples
#load the data
  data(Body)
  data <- Body
#Divide the data into training and test parts
  set.seed(9123)
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:24]
  Y <- data[,25]
#Train OTClass on the training data
  Opt.Trees <- OTClass(XTraining=X[training,],YTraining = Y[training],
  t.initial=200, method="oob+independent")
#Predict on test data
  Prediction <- Predict.OTClass(Opt.Trees, X[testing,],YTesting=Y[testing])
#Objects returned
  names(Prediction)
  Prediction$Confusion.Matrix
  Prediction$Predicted.Class.Labels
Prediction function for the object returned by OTProb
Description
This function provides prediction for test data on the trained OTProb object for class membership probability estimation. 
Usage
Predict.OTProb(Opt.Trees, XTesting, YTesting)
Arguments
| Opt.Trees | An object of class  | 
| XTesting | An  | 
| YTesting | Optional. A vector of length  | 
Value
A list with values
| Brier.Score | Brier Score based on the estimated probabilities and true class label in YTesting. | 
| Estimated.Probabilities | A vector of length  | 
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Examples
#load the data
  data(Body)
  data <- Body
  
#Divide the data into training and test parts
  set.seed(9123) 
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:24]
  Y <- data[,25]
  
#Train OTClass on the training data
  Opt.Trees <- OTProb(XTraining=X[training,],YTraining = Y[training],t.initial=200)
  
#Predict on test data
  Prediction <- Predict.OTProb(Opt.Trees, X[testing,],YTesting=Y[testing])
  
#Objects returned
  names(Prediction)
  Prediction$Brier.Score
  Prediction$Estimated.Probabilities
  
Prediction function for the object returned by OTReg
Description
This function provides prediction for test data on the trained OTReg object for the continuous response variable.
Usage
Predict.OTReg(Opt.Trees, XTesting, YTesting)
Arguments
| Opt.Trees | An object of class  | 
| XTesting | An  | 
| YTesting | Optional. A vector of length  | 
Value
A list with values
| Unexp.Variations | Unexplained variations based on estimated response and given response. | 
| Pr.Values | A vector of length  | 
Author(s)
Zardad Khan <zkhan@essex.ac.uk>
References
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2019). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 1-20.
Liaw, A. and Wiener, M. (2002) “Classification and regression by random forest” R news. 2(3). 18–22.
See Also
Examples
# Load the data
  data(Galaxy)
  data <- Galaxy
  
#Divide the data into training and test parts
  set.seed(9123) 
  n <- nrow(data)
  training <- sample(1:n,round(2*n/3))
  testing <- (1:n)[-training]
  X <- data[,1:4]
  Y <- data[,5]
  
#Train oTReg on the training data
  Opt.Trees <- OTReg(XTraining=X[training,],YTraining = Y[training],t.initial=200)
  
#Predict on test data
  Prediction <- Predict.OTReg(Opt.Trees, X[testing,],YTesting=Y[testing])
  
#Objects returned
  names(Prediction)
  Prediction$Unexp.Variations
  Prediction$Pr.Values
  Prediction$Trees.Used