Skip to content

Rcpp sugar indexHash (unique, sort_unique, etc.) distinguish signed and unsigned zeros #1340

Open
@SebKrantz

Description

@SebKrantz

Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:

library(collapse)
#> Warning: package 'collapse' was built under R version 4.3.3
#> collapse 2.0.17, see ?`collapse-package` or ?`collapse-documentation`

x = round(rnorm(100))
unique(x)               # R
#> [1]  1  0  2 -1 -2
funique(x)              # My hash function in C
#> [1]  1  0  0  2 -1 -2
funique(x, sort = TRUE) # Rcpp::sugar::sort_unique()
#> [1] -2 -1  0  0  1  2
# More explicit proof
collapse:::sortuniqueCpp(x)
#> [1] -2 -1  0  0  1  2

# The solution
y = x + 0L

funique(y)              
#> [1]  1  0  2 -1 -2
collapse:::sortuniqueCpp(y)
#> [1] -2 -1  0  1  2

Created on 2024-10-31 with reprex v2.0.2

In words: R functions like round() create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions