antiMode for the least common
elementCminmax() function with OpenMP on Clang19. Updated array
reduction logic for RAW inputs to improve compatibility and
reliability.is_seq (internal function in finp): fixed
UB when length(table) == 0.STRING_PTR changed to STRING_PTR_RO as
required by new CRAN policiesis_constant does not inherit data.table
multithreadingand3s(rr == 0L) works for raw rrabs_diff contains a which.max option = 3.f() now
f(void))logical3. When passes expression with non-numeric
components, no longer skips as if emptyabs_diff for non-allocating versions of
abs(x - y).character2integer for a faster version of
as.integer(gsub("[^0-9]", "", x))Comma, relatedly,
prettyNum(x, big.mark = ",")coalesce0 as a convenience function, equivalent to
coalesce(x, 0) for correct type of 0.diam and thinner for direct versions of
diff(minmax(x)).every_int32 Returns a vector of every
integerModeC most common element of integer vectors.unique_fmatch and uniqueN_fmatch for
distinct elements.and3s and friends)
is now done using a different logic, and performs internal logical
operations on raw (char) vectors.allNA equivalent to all(is.na(x))NA results.minmax accepts raw input, treating as unsigned
charactersLOGICAL C API has been absorbed.Functions are now in C to improve install time and size.
Implies for logical impliesdivisible2 test evenness of numbersfmatchp, finp experimental parallel
hashing functionsis_sorted and isntSorted for assertions
about sorted atomic vectorsminmax multithreaded function of
c(min(x), max(x))which_first,
introduced in version 0.5.0, caused by an overeliance on compiler
optimization. (#20)pminV no longer accept non-numeric inputdo_ functions have been removed entirelypmax0(x, in_place = TRUE) now returns early, rather
than checking the vector twice.sum_isna now reflects sum(is.na(x)) when x
contains NaN.sum_isna diverts ALTREP vectors to anyNA
for performance and to avoid problems when passed to C++.which_last for the first index from the last
index.divisible and divisible16 for returning
divisibilitycount_logical fast tabulation of logical vectorsand3s, or3s, parallelized and
separated versions of &sum_and3s and sum_or3s, the sums of the
above logical vectors.whichs for an alternative implementation of
which which separates the inputwhich_firstNA and which_lastNA for
first/last position of missing valueswhich_first accepts argument use.which.max
for better performance on known short inputsis_constant now accepts nThread for
multithreaded checking of constant vectors and is much faster in general
even in single-thread mode.sum_isna now accepts nThread for
multithreaded accumulation of missing value countsare_even can be slightly faster on integers if ignoring
NA, handles large doubles (like 1e10), and
accepts nThread.is_safe2int(x) now tolerates NaN input.
Thanks to CRAN clang-UBSAN.which_first(x == y) now works properly when
length(y) == length(x).xor2 a faster version of xor.set.seed(1)
library(hutils)
library(hutilscpp)
bench__mark <- function(...) {
  dplyr::select(bench::mark(..., min_iterations = 12),
                expression, median, `itr/sec`, mem_alloc, n_gc)
}
x <- y <- logical(1e9)
bench__mark(xor(x, y), xor2(x, y))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 x 5
#>   expression   median `itr/sec` mem_alloc  n_gc
#>   <chr>      <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 xor(x, y)    7.956s     0.126  14.901GB    16
#> 2 xor2(x, y)   1.652s     0.530   3.725GB     3
x <- !y
bench__mark(xor(x, y), xor2(x, y))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 x 5
#>   expression   median `itr/sec` mem_alloc  n_gc
#>   <chr>      <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 xor(x, y)    8.227s     0.121  14.901GB    13
#> 2 xor2(x, y)   1.983s     0.460   3.725GB     3
x <- samp(c(TRUE, FALSE), 1e9)
y <- samp(c(TRUE, FALSE), 1e9)
bench__mark(xor(x, y), xor2(x, y))
#> # A tibble: 2 x 5
#>   expression   median `itr/sec` mem_alloc  n_gc
#>   <chr>      <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 xor(x, y)   20.276s    0.0493  14.901GB    11
#> 2 xor2(x, y)   1.971s    0.506    3.725GB     3
x <- samp(c(TRUE, FALSE, NA), 1e9)
y <- samp(c(TRUE, FALSE), 1e9)
benc__mark(xor(x, y), xor2(x, y))
#> # A tibble: 2 x 5
#>   expression   median `itr/sec` mem_alloc  n_gc
#>   <chr>      <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 xor(x, y)   25.063s    0.0399  14.901GB     2
#> 2 xor2(x, y)   4.524s    0.221    3.725GB     3Created on 2019-08-25 by the reprex package (v0.3.0)
NEWS.md file to track changes to the
package.which_first(x == y) now supports logical x
without returning arcane error messages.is_constant, for testing atomic vectors and
isntConstant for the first different valueis_sorted and isntSorted (currently
private), similarly.and3, or3 for ternary and/or enabling
vectorized short-circuitingsum_isna for counting NA values.pminC now handles integer inputs without coercing to
double.pmaxC(x, a) accepts integer a when
x is type double.pmax0 and pmin0 perform much better,
especially when x is known and marked as sorted, but also
due to a better algorithm using absolute value.set.seed(1)
attach(asNamespace("hutilscpp"))
#> The following object is masked from package:base:
#> 
#>     isFALSE
bench__mark <- function(...) {
  dplyr::select(bench::mark(..., min_iterations = 12),
                expression, median, `itr/sec`, mem_alloc, n_gc)
}
x <- rep_len(rlnorm(1e6, 7, 2), 1e9)
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_abs_dbl(x))
#> # A tibble: 2 x 5
#>   expression              median `itr/sec`     mem_alloc  n_gc
#>   <chr>                 <bch:tm>     <dbl>     <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0)  2428.139ms     0.405 3618205.211KB     4
#> 2 do_pmax0_abs_dbl(x)  777.362ms     1.28        6.539KB     0
x <- x - 1
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_abs_dbl(x))
#> # A tibble: 2 x 5
#>   expression            median `itr/sec` mem_alloc  n_gc
#>   <chr>               <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0)    2.394s     0.410   3.451GB     4
#> 2 do_pmax0_abs_dbl(x)   2.590s     0.386   3.451GB     4
x <- sort(x)
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_radix_sorted_dbl(x))
#> # A tibble: 2 x 5
#>   expression                     median `itr/sec` mem_alloc  n_gc
#>   <chr>                        <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0)             3.593s     0.313   6.901GB     5
#> 2 do_pmax0_radix_sorted_dbl(x)   2.306s     0.437   3.451GB     4
x <- rep_len(as.integer(rlnorm(1e6, 7, 2)), 1e9)
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_abs_int(x))
#> # A tibble: 2 x 5
#>   expression              median `itr/sec`     mem_alloc  n_gc
#>   <chr>                 <bch:tm>     <dbl>     <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L) 2041.515ms     0.490 3906256.727KB     3
#> 2 do_pmax0_abs_int(x)  405.266ms     2.45        6.539KB     0
x <- x - 1L
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_abs_int(x))
#> # A tibble: 2 x 5
#>   expression            median `itr/sec` mem_alloc  n_gc
#>   <chr>               <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L)   1.449s     0.686   3.725GB     2
#> 2 do_pmax0_abs_int(x)   1.766s     0.577   3.725GB     1
x <- sort(x)
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_radix_sorted_int(x))
#> # A tibble: 2 x 5
#>   expression                     median `itr/sec` mem_alloc  n_gc
#>   <chr>                        <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L)            1.751s     0.568   7.451GB     2
#> 2 do_pmax0_radix_sorted_int(x)   1.404s     0.827   3.725GB     1Created on 2019-08-10 by the reprex package (v0.3.0)