Open
Description
Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:
library(collapse)
#> Warning: package 'collapse' was built under R version 4.3.3
#> collapse 2.0.17, see ?`collapse-package` or ?`collapse-documentation`
x = round(rnorm(100))
unique(x) # R
#> [1] 1 0 2 -1 -2
funique(x) # My hash function in C
#> [1] 1 0 0 2 -1 -2
funique(x, sort = TRUE) # Rcpp::sugar::sort_unique()
#> [1] -2 -1 0 0 1 2
# More explicit proof
collapse:::sortuniqueCpp(x)
#> [1] -2 -1 0 0 1 2
# The solution
y = x + 0L
funique(y)
#> [1] 1 0 2 -1 -2
collapse:::sortuniqueCpp(y)
#> [1] -2 -1 0 1 2
Created on 2024-10-31 with reprex v2.0.2
In words: R functions like round()
create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.
Metadata
Metadata
Assignees
Labels
No labels