| Type: | Package | 
| Title: | Sample Design, Drawing & Data Analysis Using Data Frames | 
| Version: | 0.2.4 | 
| Author: | Michael Baldassaro | 
| Maintainer: | Michael Baldassaro <mbaldassaro@gmail.com> | 
| Description: | Determine sample sizes, draw samples, and conduct data analysis using data frames. It specifically enables you to determine simple random sample sizes, stratified sample sizes, and complex stratified sample sizes using a secondary variable such as population; draw simple random samples and stratified random samples from sampling data frames; determine which observations are missing from a random sample, missing by strata, duplicated within a dataset; and perform data analysis, including proportions, margins of error and upper and lower bounds for simple, stratified and cluster sample designs. | 
| License: | MIT + file LICENSE | 
| URL: | https://github.com/mbaldassaro/sampler | 
| BugReports: | https://github.com/mbaldassaro/sampler/issues | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | dplyr, tidyr, reshape, purrr | 
| RoxygenNote: | 6.0.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2019-09-15 15:28:31 UTC; mbaldassaro | 
| Repository: | CRAN | 
| Date/Publication: | 2019-09-15 15:40:02 UTC | 
Albania 2017 Election Results by Polling Station
Description
Data set containing 2017 Albania election results by polling station published by the Central Election Commission and opened by the Coalition of Domestic Observers & Democracy International.
Usage
albania
Format
A data frame with 5362 rows and 45 variables
- qarku
- district, 12 in total 
- Q_ID
- geocode for district 
- bashkia
- municipality, 61 in total 
- BAS_ID
- geocode for municipality 
- zaz
- election area zone, 90 in total 
- njesiaAdministrative
- village, 373 in total 
- COM_ID
- geocode for village 
- qvKod
- polling station identifier 
- zgjedhes
- number of total registered voters 
- meshkuj
- number of male registered voters 
- femra
- number of female registered voters 
- totalSeats
- number of seats contested by district 
- vendndodhja
- name of polling center containing polling stations 
- ambienti
- type of polling center, 5 in total 
- totalVoters
- number of total registered voters that cast ballots 
- femVoters
- number of female registered voters that cast ballots 
- maleVoters
- number of male registered voters that cast ballots 
- unusedBallots
- number of ballots not used 
- damagedBallots
- number of ballots damaged 
- ballotsCast
- number of total ballots cast 
- invalidVotes
- number of ballots cast that were invalidated 
- validVotes
- number of valid ballots cast 
- lsi
- number of ballots cast for LSI 
- ps
- number of ballots cast for PS 
- pkd
- number of ballots cast for PKD 
- sfida
- number of ballots cast for SFIDA 
- pr
- number of ballots cast for PR 
- pd
- number of ballots cast for PD 
- pbdksh
- number of ballots cast for PBDKSH 
- adk
- number of ballots cast for ADK 
- psd
- number of ballots cast for PSD 
- ad
- number of ballots cast for AD 
- frd
- number of ballots cast for FRD 
- pds
- number of ballots cast for PDS 
- pdiu
- number of ballots cast for PDIU 
- aak
- number of ballots cast for AAK 
- mega
- number of ballots cast for MEGA 
- pksh
- number of ballots cast for PKSH 
- apd
- number of ballots cast for APD 
- libra
- number of ballots cast for LIBRA 
- psSeats
- number of seats won by PS 
- pdSeats
- number of seats won by PD 
- lsiSeats
- number of seats won by LSI 
- pdiuSeats
- number of seats won by PDIU 
- psdSeats
- number of seats won by PSD 
Source
https://albaniaelectiondata.herokuapp.com/
Calculate proportion and margin of error (unequal-sized cluster sample)
Description
Calculate proportion and margin of error (unequal-sized cluster sample)
Usage
cpro(df, numerator, denominator, ci = 95, na = "", N = 0)
Arguments
| df | object containing data frame on which to perform analysis | 
| numerator | variable in data frame for which you want to calculate proportion and margin of error | 
| denominator | variable in data frame containing population sizes of unequal clusters | 
| ci | (optional) confidence level for establishing a confidence interval using z-score (defaults to 95; restricted to 80, 85, 90, 95 or 99 as input) | 
| na | (optional) value that you want to filter and exclude (defaults to include everything) | 
| N | (optional) population universe (e.g. 10000, nrow(df)); if N value is passed as an argument, margin of error will be calculated using fpc | 
Value
Returns table of responses (n), proportions, margins of error, lower and upper bounds by factor for a given variable in a stratified sample
References
[1] Survey Sampling, L. Kish, 1965, Equation 6.3.4 [2] Sampling Techniques, W.G. Cochran, 1977, Equation 3.34
Examples
alresults <- ssamp(albania, 890, qarku)
cpro(df=alresults, numerator=totalVoters, denominator=zgjedhes, ci=95)
cpro(df=alresults, numerator=pd, denominator=validVotes, ci=95, N=5361)
Removes duplicate observations within collected data
Description
Removes duplicate observations within collected data
Usage
dedupe(df, col_name)
Arguments
| df | object containing data frame of collected data | 
| col_name | variable within data frame by which to filter for duplicate values | 
Value
Returns table of all data based on unique values within collected data
Examples
aldupe <- rsamp(df=albania, n=390, rep=TRUE)
dedupe(df=aldupe, col_name=qvKod)
Identifies duplicate values within collected data
Description
Identifies duplicate values within collected data
Usage
dupe(df, col_name)
Arguments
| df | object containing data frame of collected data | 
| col_name | variable within data frame by which to filter for duplicate values | 
Value
Returns table of duplicate values within collected data
Examples
aldupe <- rsamp(df=albania, n=390, rep=TRUE)
dupe(df=aldupe, col_name=qvKod)
Albania 2017 CDO Election Observation Data Findings on Opening Process
Description
Data set containing 2017 Albania election observation findings on polling station opening process by the Coalition of Domestic Observers (CDO) CDO conducted a statistically-based observation (SBO) exercise, deploying observers to a random sample of polling stations for the 25 June 2017 Albanian elections. This is a subset of observation data collected by CDO observers that includes data that was used to perform statistical analysis.
Usage
opening
Format
A data frame with 524 rows and 19 variables
- qarku
- district, 12 in total 
- psID
- polling station identifier 
- votersList
- number of registered voters at the polling station 
- ballotPapers
- number of ballot papers at the polling station 
- pubPriv
- type of polling station, public or private 
- openTime
- time when polling station opening, in 30 minute ranges 
- numKommish
- number of commissioners present at polling station 
- secrecyOpen
- yes-no if polling station enabled voters to cast ballots in secrecy, po or jo 
- movementOpen
- yes-no if polling station provided sufficient space to vote, po or jo 
- removeMatInside
- yes-no if campaign materials were removed from inside polling station, po or jo 
- removeMatOutside
- yes-no if campaign materials were removed from outside polling station, po or jo 
- pvComplete
- yes-no if commissioners completed the opening record checklist sheet, po or jo 
- boxChecked
- yes-no if commissioners checked to ensure the ballot box was empty before opening, po or jo 
- boxSealed
- yes-no if commissioners sealed the ballot box to prevent ballot tampering, po or jo 
- recordBox
- yes-no if commissioners recorded the seal number on the ballot box, po or jo 
- centerMat
- yes-no if there were all election materials were available at the polling station, po or jo 
- blindTools
- yes-no if the polling station was equipped for blind voters, po or jo 
- disabledTools
- yes-no-partially if the polling station was equipped for disabled voters, po or jo or pjeserisht 
- overallOpen
- very good-good-problematic-very problematic an overall assessment of the opening process, shummir,mir,meprob,shumprob 
Source
https://ona.io/cdo/35080/216662
Determines sample size by strata using sub-units
Description
Determines sample size by strata using sub-units
Usage
psampcalc(df, n, strata, unit, over = 0)
Arguments
| df | object containing full sampling data frame (e.g. data) | 
| n | sample size (integer) or object containing sample size | 
| strata | variable in sampling data frame by which to stratify (e.g. region) | 
| unit | variable in sampling data frame containing sub-units (e.g. population) | 
| over | (optional) desired oversampling proportion (defaults to 0; takes value between 0 and 1 as input) | 
Value
Returns sample size per strata based on sub-units (rounded up to nearest integer)
References
[1] Sampling Design & Analysis, S. Lohr, 1999, 4.4
Identifies missing points between sample and collected data
Description
Identifies missing points between sample and collected data
Usage
rmissing(sampdf, colldf, col_name)
Arguments
| sampdf | object containing data frame of sample points | 
| colldf | object containing data frame of collected data | 
| col_name | common variable (i.e. key) in data frames by which to check for missing points | 
Value
Returns table of sample points missing from collected data
References
Simplified wrapper around dplyr::anti_join()
Examples
alsample <- rsamp(df=albania, 544)
alreceived <- rsamp(df=alsample, 390)
rmissing(sampdf=alsample, colldf=alreceived, col_name=qvKod)
Calculate proportion and margin of error (simple random sample)
Description
Calculate proportion and margin of error (simple random sample)
Usage
rpro(df, col_name, ci = 95, na = "", N = 0)
Arguments
| df | object containing data frame on which to perform analysis (e.g. data) | 
| col_name | variable in data frame for which you want to calculate proportion and margin of error | 
| ci | (optional) confidence level for establishing a confidence interval using z-score (defaults to 95; restricted to 80, 85, 90, 95 or 99 as input) | 
| na | (optional) value that you want to filter and exclude (defaults to include everything) | 
| N | (optional) population universe (e.g. 10000, nrow(df)); if N value is passed as an argument, margin of error will be calculated using fpc | 
Value
Returns table of responses (n), proportions, margins of error, lower and upper bounds by factor for a given variable
References
[1] Sampling Design & Analysis, S. Lohr, 1999, Equation 2.15
Examples
rpro(df=opening, col_name=openTime, ci=95, na="n/a", N=5361)
Draws simple random sample without replacement
Description
Draws simple random sample without replacement
Usage
rsamp(df, n, over = 0, rep = FALSE)
Arguments
| df | object containing full sampling data frame (e.g. data) | 
| n | sample size (integer) or object containing sample size | 
| over | (optional) desired oversampling proportion (defaults to 0; takes value between 0 and 1 as input) | 
| rep | (optional) | 
Value
Returns simple random sample without replacement
References
Simplified wrapper around dplyr::sample_n()
Examples
rsamp(albania, n=360, over=0.1, rep=FALSE)
size <- rsampcalc(nrow(albania), 3, 95, 0.5)
randomsample <- rsamp(albania, size)
Determines random sample size
Description
Determines random sample size
Usage
rsampcalc(N, e, ci = 95, p = 0.5, over = 0)
Arguments
| N | population universe (e.g. 10000, nrow(df)) | 
| e | tolerable margin of error (integer or float, e.g. 5, 2.5) | 
| ci | (optional) confidence level for establishing a confidence interval using z-score (defaults to 95; restricted to 80, 85, 90, 95 or 99 as input) | 
| p | (optional) anticipated response distribution (defaults to 0.5; takes value between 0 and 1 as input) | 
| over | (optional) desired oversampling proportion (defaults to 0; takes value between 0 and 1 as input) | 
Value
Returns appropriate sample size (rounded up to nearest integer)
References
[1] Sampling Design & Analysis, S. Lohr, 1999, equation 2.17
Examples
rsampcalc(N=5361, e=3, ci=95, p=0.5, over=0.1)
rsampcalc(nrow(data), 3)
Identifies number of missing points by strata between sample and collected data
Description
Identifies number of missing points by strata between sample and collected data
Usage
smissing(sampdf, colldf, strata, col_name)
Arguments
| sampdf | object containing data frame of sample points | 
| colldf | object containing data frame of collected data | 
| strata | variable in both data frames by which to stratify | 
| col_name | common variable (i.e. key) in data frames by which to check for missing points | 
Value
Returns table of number of sample points by strata missing from collected data
References
Simplified wrapper around dplyr::anti_join()
Examples
alsample <- rsamp(df=albania, 544)
alreceived <- rsamp(df=alsample, 390)
smissing(sampdf=alsample, colldf=alreceived, strata=qarku, col_name=qvKod)
Calculate proportion and margin of error (stratified sample)
Description
Calculate proportion and margin of error (stratified sample)
Usage
spro(fulldf, sampdf, strata, col_name, ci = 95, na = "")
Arguments
| fulldf | object containing original data frame used to draw sample | 
| sampdf | object containing data frame on which to perform analysis | 
| strata | variable in both data frames by which to stratify | 
| col_name | variable in data frame for which you want to calculate proportion and margin of error | 
| ci | (optional) confidence level for establishing a confidence interval using z-score (defaults to 95; restricted to 80, 85, 90, 95 or 99 as input) | 
| na | (optional) value that you want to filter and exclude (defaults to include everything) | 
Value
Returns table of responses (n), proportions, margins of error, lower and upper bounds by factor for a given variable in a stratified sample
References
[1] Sampling Design & Analysis, S. Lohr, 1999, 4.6 & 4.7
Examples
spro(fulldf=albania, sampdf=opening, strata=qarku, col_name=openTime, ci=95, na="n/a")
Draws stratifed sample without replacement using proportional allocation
Description
Draws stratifed sample without replacement using proportional allocation
Usage
ssamp(df, n, strata, over = 1)
Arguments
| df | object containing full sampling data frame (e.g. data) | 
| n | sample size (integer) or object containing sample size | 
| strata | variable in sampling data frame by which to stratify (e.g. region) | 
| over | (optional) desired oversampling proportion (defaults to 0; takes value between 0 and 1 as input) | 
Value
Returns stratified sample without replacement
Examples
ssamp(df=albania, n=360, strata=qarku, over=0.1)
size <- rsampcalc(nrow(albania), 3, 95, 0.5)
stratifiedsample <- ssamp(albania, size, qarku)
Determines sample size by strata using proportional allocation
Description
Determines sample size by strata using proportional allocation
Usage
ssampcalc(df, n, strata, over = 0)
Arguments
| df | object containing sampling data frame (e.g. data) | 
| n | sample size (integer) or object containing sample size | 
| strata | variable in sampling data frame by which to stratify (e.g. region) | 
| over | (optional) desired oversampling proportion (defaults to 0; takes value between 0 and 1 as input) | 
Value
Returns proportional sample size per strata (rounded up to nearest integer)
References
[1] Sampling Design & Analysis, S. Lohr, 1999, 4.4
Examples
ssampcalc(df=albania, n=544, strata=qarku, over=0.05)
size <- rsampcalc(nrow(albania), 3, 95, 0.5)
ssampcalc(albania, size, qarku)