| Title: | Use Raw Vectors to Minimize Memory Consumption of Factors | 
| Version: | 0.1.0 | 
| Description: | Uses raw vectors to minimize memory consumption of categorical variables with fewer than 256 unique values. Useful for analysis of large datasets involving variables such as age, years, states, countries, or education levels. | 
| License: | GPL-2 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.0 | 
| Imports: | utils | 
| Suggests: | data.table, tinytest | 
| NeedsCompilation: | yes | 
| Packaged: | 2023-11-17 05:59:45 UTC; hughp | 
| Author: | Hugh Parsonage [aut, cre] | 
| Maintainer: | Hugh Parsonage <hugh.parsonage@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-11-17 08:50:06 UTC | 
Aggregating helpers
Description
Aggregating helpers
Usage
count_by256(DT, by = NULL, count_col = "N")
Arguments
| DT | A  | 
| by | (string) A column of  | 
| count_col | (string) The name of the column in the result containing the counts. | 
Value
For:
- count_by256
- A tally of - by.
Factors of fewer than 256 elements
Description
Whereas base R's factors are based on 32-bit integer vectors,
factor256 uses 8-bit raw vectors to minimize its memory footprint.
Usage
factor256(x, levels = NULL)
recompose256(f)
relevel256(x, levels)
## S3 method for class 'factor256'
levels(x)
is.factor256(x)
isntSorted256(x, strictly = FALSE)
as_factor(x)
factor256_in(x, tbl)
factor256_notin(x, tbl)
factor256_ein(x, tbl)
factor256_enotin(x, tbl)
tabulate256(f)
rank256(x)
order256(x)
unique256(x)
tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)
Arguments
| x | An atomic vector with fewer than 256 unique elements. | 
| levels | An optional character vector of or representing the unique values of  | 
| f | A raw vector of class  | 
| strictly | If  | 
| tbl | The table of values to lookup in  | 
| nmax,dotInterval | ( | 
Value
factor256 is a class based on raw vectors.
Values in x absent from levels are mapped to 00.
In the following list, o is the result.
- factor256
- A raw vector of class - factor256.
- recompose256
- is the inverse operation. 
- factor256_e?(not)?in
- A logical vector the same length of - f,- o[i] = TRUEif- f[i]is among the values of- tblwhen converted to- factor256.- _notinis the negation. The- factor256_evariants will error if none of the values of- tblare present in- f.
- tabulate256
- Takes a raw vector and counts the number of times each element occurs within it. It is always length-256; if an element is absent it will have value zero in the output. 
- tabulate256_levels
- Similar to - tabulate256but with optional arguments- nmax,- dotInterval.
- as_factor
- Converts from - factor256to- factor.
- order256
- Same as - orderbut supports raw vectors.- order256(x)
- rank256
- Same as - rankwith- ties.method = "first"but supports raw vectors.
- unique256
- Unique elements of. 
Examples
f10 <- factor256(1:10)
fletters <- factor256(rep(letters, 1:26))
head(factor256_in(fletters, "g"))
head(tabulate256(fletters))
head(recompose256(fletters))
gletters <- factor256(rep(letters, 1:26), levels = letters[1:25])
tail(tabulate256(gletters))
tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)
Interlace raw vectors
Description
Some processes do not accept raw vectors so it can be necessary to convert our vectors to integers.
Usage
interlace256(w, x, y = NULL, z = NULL)
deinterlace256(u)
interlace256_columns(DT, new_colnames = 1L)
deinterlace256_columns(DT, new_colnames = 1L)
Arguments
| w,x,y,z | Raw vectors. A vector may be  | 
| u | An integer vector. | 
| DT | A  | 
| new_colnames | A mechanism for producing the new columns. Currently only
 | 
Value
interlace256 Return an integer vector, compressing raw vectors.
deinterlace256 is the inverse operation, returning a list of four raw vectors.
setkey for raw columns
Description
setkey for raw columns
Usage
setkeyv256(DT, cols)
Arguments
| DT | A  | 
| cols | Column names as in  | 
Value
Same as data.table::setkeyv except that raw cols will be
converted to factors (as data.table does not allow raw keys).