| Title: | Generate Suggestions for Validation Rules | 
| Version: | 0.3.2 | 
| Description: | Generate suggestions for validation rules from a reference data set, which can be used as a starting point for domain specific rules to be checked with package 'validate'. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.2.3 | 
| Imports: | validate, whisker, rpart | 
| URL: | https://github.com/data-cleaning/validatesuggest | 
| BugReports: | https://github.com/data-cleaning/validatesuggest/issues | 
| Depends: | R (≥ 2.10) | 
| Suggests: | knitr, rmarkdown, tinytest | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2023-10-06 10:24:39 UTC; edwin | 
| Author: | Edwin de Jonge | 
| Maintainer: | Edwin de Jonge <edwindjonge@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-10-06 16:40:02 UTC | 
validatesuggest: Generate Suggestions for Validation Rules
Description
Generate suggestions for validation rules from a reference data set, which can be used as a starting point for domain specific rules to be checked with package 'validate'.
validatesuggest
The goal of validatesuggest is to generate suggestions for validation rules from a supplied dataset. These can be used as a starting point for a rule set and are to be adjusted by domain experts.
Author(s)
Maintainer: Edwin de Jonge edwindjonge@gmail.com (ORCID)
Authors:
- Olav ten Bosch 
See Also
Useful links:
- Report bugs at https://github.com/data-cleaning/validatesuggest/issues 
Car owners data set (fictitious).
Description
A constructed data set useful for detecting conditinal dependencies.
Usage
car_owner
Format
A data frame with 200 rows and 4 variables. Each row is a person with:
- age
- age of person 
- driver_license
- has a driver license, only persons older then 17 can have a license in this data set 
- income
- monthly income 
- owns_car
- only persons with a drivers license , and a monthly income > 1500 can own a car 
- car_color
- NA when there is no car 
Examples
data("car_owner")
rules <- suggest_cond_rule(car_owner)
rules$rules
Suggest rules
Description
Suggests rules using the various suggestion checks.
Use the more specific suggest functions for more control.
Usage
suggest_rules(
  d,
  vars = names(d),
  domain_check = TRUE,
  range_check = TRUE,
  pos_check = TRUE,
  type_check = TRUE,
  na_check = TRUE,
  unique_check = TRUE,
  ratio_check = TRUE,
  conditional_rule = TRUE
)
suggest_all(
  d,
  vars = names(d),
  domain_check = TRUE,
  range_check = TRUE,
  pos_check = TRUE,
  type_check = TRUE,
  na_check = TRUE,
  unique_check = TRUE,
  ratio_check = TRUE,
  conditional_rule = TRUE
)
write_all_suggestions(
  d,
  vars = names(d),
  file = stdout(),
  domain_check = TRUE,
  range_check = TRUE,
  type_check = TRUE,
  pos_check = TRUE,
  na_check = TRUE,
  unique_check = TRUE,
  ratio_check = TRUE,
  conditional_rule = TRUE
)
Arguments
| d | 
 | 
| vars | 
 | 
| domain_check | if  | 
| range_check | if  | 
| pos_check | if  | 
| type_check | if  | 
| na_check | if  | 
| unique_check | if  | 
| ratio_check | if  | 
| conditional_rule | if  | 
| file | file to which the checks will be written to. | 
Value
returns validate::validator() object with the suggested rules.
write_all_suggestions write the rules to file and returns invisibly a named list of ranges for each variable.
task2 dataset
Description
Fictuous test data set from European (ESSnet) project on validation 2017.
Usage
task2
Format
- ID
- ID 
- Age
- Age of person 
- Married
- Marital status 
- Employed
- Employed or not 
- Working_hours
- Working hours 
References
European (ESSnet) project on validation 2017
Suggest a conditional rule
Description
Suggest a conditional rule based on a association rule. This functions derives conditional rules based on the non-existance of combinations of categories in pairs of variables. For each numerical variable a logical variable is derived that tests for positivity. It generates IF THEN rules based on two variables.
Usage
write_cond_rule(d, vars = names(d), file = stdout())
suggest_cond_rule(d, vars = names(d))
Arguments
| d | 
 | 
| vars | 
 | 
| file | file to which the checks will be written to. | 
Value
suggest_cond_rule returns validate::validator() object with the suggested rules.
write_cond_rule returns invisibly a named list of ranges for each variable.
Examples
data(retailers, package="validate")
# will generate check for all columns in retailers that are
# complete.
suggest_na_check(retailers)
data("car_owner")
rules <- suggest_cond_rule(car_owner)
rules$rules
Suggest a range check
Description
Suggest a range check
Usage
write_domain_check(d, vars = names(d), only_positive = TRUE, file = stdout())
suggest_domain_check(d, vars = names(d), only_positive = TRUE)
Arguments
| d | 
 | 
| vars | 
 | 
| only_positive | if  | 
| file | file to which the checks will be written to. | 
Value
suggest_domain_check returns validate::validator() object with the suggested rules.
write_domain_check returns invisibly a named list of checks for each variable.
Examples
data(SBS2000, package="validate")
suggest_range_check(SBS2000)
# checks the ranges of each variable
suggest_range_check(SBS2000[-1], min=TRUE, max=TRUE)
# checks the ranges of each variable
suggest_range_check(SBS2000, vars=c("turnover", "other.rev"), min=FALSE, max=TRUE)
Suggest a check for completeness.
Description
Suggest a check for completeness.
Usage
write_na_check(d, vars = names(d), file = stdout())
suggest_na_check(d, vars = names(d))
Arguments
| d | 
 | 
| vars | 
 | 
| file | file to which the checks will be written to. | 
Value
suggest_na_check returns validate::validator() object with the suggested rules.
write_na_check write the rules to file and returns invisibly a named list of ranges for each variable.
Examples
data(retailers, package="validate")
# will generate check for all columns in retailers that are
# complete.
suggest_na_check(retailers)
Suggest a range check
Description
Suggest a range check
Usage
write_pos_check(d, vars = names(d), only_positive = TRUE, file = stdout())
suggest_pos_check(d, vars = names(d), only_positive = TRUE)
Arguments
| d | 
 | 
| vars | 
 | 
| only_positive | if  | 
| file | file to which the checks will be written to. | 
Value
suggest_pos_check returns validate::validator() object with the suggested rules.
write_pos_check write the rules to file and returns invisibly a named list of checks for each variable.
Examples
data(SBS2000, package="validate")
suggest_range_check(SBS2000)
# checks the ranges of each variable
suggest_range_check(SBS2000[-1], min=TRUE, max=TRUE)
# checks the ranges of each variable
suggest_range_check(SBS2000, vars=c("turnover", "other.rev"), min=FALSE, max=TRUE)
Suggest a range check
Description
Suggest a range check
Usage
write_range_check(d, vars = names(d), min = TRUE, max = FALSE, file = stdout())
suggest_range_check(d, vars = names(d), min = TRUE, max = FALSE)
Arguments
| d | 
 | 
| vars | 
 | 
| min | 
 | 
| max | 
 | 
| file | file to which the checks will be written to. | 
Value
suggest_range_check returns validate::validator() object with the suggested rules.
write_range_check write the rules to file and returns invisibly a named list of ranges for each variable.
Examples
data(SBS2000, package="validate")
suggest_range_check(SBS2000)
# checks the ranges of each variable
suggest_range_check(SBS2000[-1], min=TRUE, max=TRUE)
# checks the ranges of each variable
suggest_range_check(SBS2000, vars=c("turnover", "other.rev"), min=FALSE, max=TRUE)
Suggest ratio checks
Description
Suggest ratio checks
Usage
write_ratio_check(
  d,
  vars = names(d),
  file = stdout(),
  lin_cor = 0.95,
  digits = 2
)
suggest_ratio_check(d, vars = names(d), lin_cor = 0.95, digits = 2)
Arguments
| d | 
 | 
| vars | 
 | 
| file | file to which the checks will be written to. | 
| lin_cor | threshold for abs correlation to be included (details) | 
| digits | number of digits for rounding | 
Value
suggest_ratio_check returns validate::validator() object with the suggested rules.
write_ratio_check write the rules to file and returns invisibly a named list of check for each variable.
Examples
data(SBS2000, package="validate")
# generates upper and lower checks for the
# ratio of two variables if their correlation is
# bigger then `lin_cor`
suggest_ratio_check(SBS2000, lin_cor=0.98)
suggest type check
Description
suggest type check
Usage
write_type_check(d, vars = names(d), file = stdout())
suggest_type_check(d, vars = names(d))
Arguments
| d | 
 | 
| vars | 
 | 
| file | file to which the checks will be written to. | 
Value
suggest_type_check returns validate::validator() object with the suggested rules.
write_type_check write the rules to file and returns invisibly a named list of types for each variable.
Suggest range checks
Description
Suggest range checks
Usage
write_unique_check(d, vars = names(d), file = stdout(), fraction = 0.95)
suggest_unique_check(d, vars = names(d), fraction = 0.95)
Arguments
| d | 
 | 
| vars | 
 | 
| file | file to which the checks will be written to. | 
| fraction | if values in a column >  | 
Value
suggest_unique_check returns validate::validator() object with the suggested rules.
write_unique_check write the rules to file and returns invisibly a named list of checks for each variable.