| Type: | Package | 
| Title: | Collapses Levels, Computes Information Value and WoE | 
| Version: | 0.3.0 | 
| Author: | Krishanu Mukherjee | 
| Maintainer: | Krishanu Mukherjee <toton1181@gmail.com> | 
| Description: | Contains functions to help in selecting and exploring features ( or variables ) in binary classification problems. Provides functions to compute and display information value and weight of evidence (WoE) of the variables , and to convert numeric variables to categorical variables by binning. Functions are also provided to determine which levels ( or categories ) of a categorical variable can be collapsed (or combined ) based on their response rates. The functions provided only work for binary classification problems. | 
| License: | GPL-2 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.0 | 
| Imports: | dplyr,lazyeval, ggplot2 | 
| Depends: | magrittr | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2020-06-04 13:09:20 UTC; User | 
| Repository: | CRAN | 
| Date/Publication: | 2020-06-04 13:20:02 UTC | 
German Credit data set
Description
This data set classifies customers as "Good" or "Bad" as per their credit risks.This data set was contributed by Professor Dr. Hans Hofmann,and can be downloaded from the UCI Machine Learning Repository.
Usage
data("German_Credit")Format
A data frame with 1000 observations on the following 21 variables.
- Account_Balance
- a factor with levels - A11- A12- A13- A14
- Duration
- a numeric vector 
- Credit_History
- a factor with levels - A30- A31- A32- A33- A34
- Purpose
- a factor with levels - A40- A41- A410- A42- A43- A44- A45- A46- A48- A49
- Credit_Amount
- a numeric vector 
- Saving_Accounts_Bonds
- a factor with levels - A61- A62- A63- A64- A65
- Current_Employment_Length
- a factor with levels - A71- A72- A73- A74- A75
- Installment_Rate
- a numeric vector 
- MaritalStatusnGender
- a factor with levels - A91- A92- A93- A94
- Guarantors
- a factor with levels - A101- A102- A103
- ‘Duration in Current Address’
- a numeric vector 
- Valuable_Asset
- a factor with levels - A121- A122- A123- A124
- Age
- a numeric vector 
- Other_Credit
- a factor with levels - A141- A142- A143
- Housing
- a factor with levels - A151- A152- A153
- Existing_Credits
- a numeric vector 
- Job
- a factor with levels - A171- A172- A173- A174
- Dependents
- a numeric vector 
- Telephone
- a factor with levels - A191- A192
- ForeignWorker
- a factor with levels - A201- A202
- Good_Bad
- a numeric vector 
Source
https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
Examples
data(German_Credit)
str(German_Credit)
IVCalc
Description
This function displays the Information Values by the levels of an attribute This information is displayed for all attributes in the data set
Usage
IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)
Arguments
| dset | The data frame containing the data set | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| bins | A number denoting the number of bins.Default value is 10 | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
Value
A list containing the tables of Information Values by levels for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
l<-list()
# Call the function as follows
l<-IVCalc(German_Credit,resp="Good_Bad",bins=10)
# Information Value for  the attribute Account_Balance in the German_Credit data
l$Account_Balance
IVCalc2
Description
This function displays the Information Values of all the attributes in the data set
Usage
IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)
Arguments
| dset | The data frame containing the data set | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| bins | A number denoting the number of bins.Default value is 10 | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
Value
A data frame containing the Information Values for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
d<-data.frame()
# Call the function as follows
d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10)
# Information Value for all the attributes in the German_Credit data
d
displayIV
Description
This function displays the Information Values of the levels of an attribute.
Usage
displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
Arguments
| dset | The data frame containing the data set | 
| col | A character respresenting the name of the attribute . The attribute can either be numeric or categorical | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
| bins | A number denoting the number of bins.Default value is 10 | 
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayIV(German_Credit,col="Credit_History",resp="Good_Bad")
displayResponseRatebyLevels
Description
This function displays the response percents of the levels of an attribute.
Usage
displayResponseRatebyLevels(
  dset,
  col = "job",
  resp = "Good_Bad",
  bins = 10,
  adjFactor = 0.5
)
Arguments
| dset | The data frame containing the data set | 
| col | A character respresenting the name of the attribute . The attribute can either be numeric or categorical | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| bins | A number denoting the number of bins.Default value is 10 | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")
displayWOE
Description
This function displays the Weight of Evidence of the levels of an attribute.
Usage
displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
Arguments
| dset | The data frame containing the data set | 
| col | A character respresenting the name of the attribute . The attribute can either be numeric or categorical | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
| bins | A number denoting the number of bins.Default value is 10 | 
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")
levelsCollapser
Description
This function displays the response rates by the levels of an attribute Levels with similar response rates may be combined
Usage
levelsCollapser(dset, resp = "y", bins = 10)
Arguments
| dset | The data frame containing the data set | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| bins | A number denoting the number of bins.Default value is 10 | 
Value
A list containing the tables of response rate by levels for every attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
# Create an empty list
l<-list()
# Call the function as follows
l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10)
# response rate by levels of the Account_Balance in the German_Credit data
l$Account_Balance
# Collapse levels with similar response percentages.
numericToCategorical
Description
This function categorizes a numerical variable by binning
Usage
numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)
Arguments
| dset | The data frame containing the data set | 
| col | A character respresenting the name of the numeric attribute which we want to categorize | 
| resp | A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values | 
| bins | A number denoting the number of bins.Default value is 10 | 
| adjFactor | A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute | 
Value
A list containing the categorized attribute,a table of Information Values for the levels of the categorized attribute,the Information Value for the entire attribute,a table showing the response rates of the levels of the categorized attribute
Examples
# Load the German_Credit data set supplied with this package
data("German_Credit")
# Create an empty list
l<-list()
# Call the function as follows.
#This will categorize the numeric variable Duration in the German_Credit dataset.
l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad")
# To view the categorized variable
 l$categoricalVariable
 # To view the IV table of the levels of the categorized variable
 l$IVTable
 # To view the total IV value of the  categorized variable
 l$IV
 # To view the response rates of the levels of the categorized variable
 l$collapseLevels