Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# `geohashTools` NEWS

## v0.3.4 (Development)

### IMPROVEMENTS

1. Fixed documentation for `gh_encode` precision limit (was incorrectly stated as 28, corrected to 25).

2. Added bounds checking to `gh_delta` to validate precision is between 0 and 25, preventing silent incorrect results for invalid inputs.

3. Updated CRS specification from deprecated PROJ.4 string to EPSG:4326 for better compatibility with modern PROJ versions.

4. Optimized duplicate detection in `gh_to_sp`, `gh_to_spdf.default`, and `gh_to_spdf.data.frame` to use single-pass algorithm instead of double-scan, improving performance ~2x when duplicates are present.

## v0.3.3

Drop references to deprecated rgdal.
Expand Down
28 changes: 14 additions & 14 deletions R/gis_tools.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# https://epsg.io/4326
wgs = function() sp::CRS('+proj=longlat +datum=WGS84', doCheckCRSArgs = FALSE)
wgs = function() sp::CRS('EPSG:4326')
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there some problem with this CRS? Maybe we should just drop 'sp' support altogether, WDYT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. sp is pretty deprecated.

According to CRAN there are two reverse dependencies: one import and one suggest. Neither use sp.

Reverse imports: MazamaCoreUtils
Reverse suggests: spatialrisk

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome... let's throw Claude at moving all implementations to use {sf} instead and then dropping {sp} as a first PR here?


# nocov start
check_suggested = function(pkg) {
Expand All @@ -12,10 +12,10 @@ check_suggested = function(pkg) {
gh_to_sp = function(geohashes) {
check_suggested('sp')
gh = tolower(geohashes)
if (anyDuplicated(gh) > 0L) {
Copy link
Copy Markdown
Owner

@MichaelChirico MichaelChirico Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually would revert this -- it's optimizing the right way IMO

  1. In the common, non-erroneous case, anyDuplicated() is more efficient than duplicated()
  2. If a mistake (duplicate inputs) is detected, then we do another pass and calculate the full, slower duplicated().

ditto elsewhere. Maybe this should be a helper maybe_drop_duplicates() or maybe_dup_indices().

idx = which(duplicated(gh))
warning('Detected ', length(idx), ' duplicate input geohashes; removing')
gh = gh[-idx]
dup_idx = duplicated(gh)
if (any(dup_idx)) {
warning('Detected ', sum(dup_idx), ' duplicate input geohashes; removing')
gh = gh[!dup_idx]
}
gh_xy = gh_decode(gh, include_delta = TRUE)
sp::SpatialPolygons(lapply(seq_along(gh), function(ii) {
Expand All @@ -34,10 +34,10 @@ gh_to_spdf = function(...) {
}

gh_to_spdf.default = function(geohashes, ...) {
if (anyDuplicated(geohashes) > 0L) {
idx = which(duplicated(geohashes))
warning('Detected ', length(idx), ' duplicate input geohashes; removing')
geohashes = geohashes[-idx]
dup_idx = duplicated(geohashes)
if (any(dup_idx)) {
warning('Detected ', sum(dup_idx), ' duplicate input geohashes; removing')
geohashes = geohashes[!dup_idx]
}
sp::SpatialPolygonsDataFrame(
gh_to_sp(geohashes),
Expand All @@ -49,11 +49,11 @@ gh_to_spdf.data.frame = function(gh_df, gh_col = 'gh', ...) {
if (is.na(idx <- match(gh_col, names(gh_df))))
stop('Searched for geohashes at a column named "', gh_col, '", but found nothing.')
gh = gh_df[[idx]]
if (anyDuplicated(gh) > 0L) {
idx = which(duplicated(gh))
warning('Detected ', length(idx), ' duplicate input geohashes; removing')
gh = gh[-idx]
gh_df = gh_df[-idx, , drop = FALSE]
dup_idx = duplicated(gh)
if (any(dup_idx)) {
warning('Detected ', sum(dup_idx), ' duplicate input geohashes; removing')
gh = gh[!dup_idx]
gh_df = gh_df[!dup_idx, , drop = FALSE]
}
sp::SpatialPolygonsDataFrame(
gh_to_sp(gh), data = gh_df, match.ID = FALSE
Expand Down
3 changes: 3 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,8 @@

gh_delta = function(precision) {
if (length(precision) > 1L) stop('One precision at a time, please.')
if (!is.numeric(precision) || precision < 0L || precision > 25L) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no real reason to enforce the <=25 upper bound, even if we can't compute such precise geohashes, the calculation is still numerically accurate.

stop('precision must be a single integer between 0 and 25, inclusive.')
}
45.0/2.0^((5.0*precision + c(-1.0, 1.0) * (precision %% 2.0))/2.0 - 1:2)
}
2 changes: 1 addition & 1 deletion man/gh_encode.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gh_encode(latitude, longitude, precision = 6L)
\item{precision}{ Positive \code{integer} scalar controlling the 'zoom level' -- how many characters should be used in the output. }
}
\details{
\code{precision} is limited to at most 28. This level of precision encodes locations on the globe at a nanometer scale and is already more than enough for basically all applications.
\code{precision} is limited to at most 25. This level of precision encodes locations on the globe at a nanometer scale and is already more than enough for basically all applications.

Longitudes outside \code{[-180, 180)} will be wrapped appropriately to the standard longitude grid.
}
Expand Down
12 changes: 7 additions & 5 deletions tests/testthat/test-gis-tools.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ test_that('gh_to_sp works', {
byrow = TRUE, ncol = 2L
)
)
wgs = sp::CRS('+proj=longlat +datum=WGS84', doCheckCRSArgs = FALSE)
wgs = sp::CRS('EPSG:4326')
expect_identical(ghSP@proj4string, wgs)

# duplicate inputs dropped
Expand Down Expand Up @@ -53,7 +53,7 @@ test_that('gh_to_spdf.default works', {
byrow = TRUE, ncol = 2L
)
)
wgs = sp::CRS('+proj=longlat +datum=WGS84', doCheckCRSArgs = FALSE)
wgs = sp::CRS('EPSG:4326')
expect_identical(ghSPDF@proj4string, wgs)

DF = data.frame(ID = 1:9, row.names = urumqi)
Expand Down Expand Up @@ -91,7 +91,7 @@ test_that('gh_to_spdf.data.frame works', {
byrow = TRUE, ncol = 2L
)
)
wgs = sp::CRS('+proj=longlat +datum=WGS84', doCheckCRSArgs = FALSE)
wgs = sp::CRS('EPSG:4326')
expect_identical(ghSPDF@proj4string, wgs)
expect_identical(ghSPDF@data, DF)

Expand Down Expand Up @@ -150,7 +150,7 @@ test_that('gh_covering works', {

# core
banjarmasin_cover = gh_covering(banjarmasin)
wgs = sp::CRS('+proj=longlat +datum=WGS84', doCheckCRSArgs = FALSE)
wgs = sp::CRS('EPSG:4326')
sp::proj4string(banjarmasin) = wgs
# use gUnaryUnion to overcome rgeos bug as reported 2019-08-16
expect_false(anyNA(sp::over(banjarmasin, banjarmasin_cover)))
Expand All @@ -172,7 +172,9 @@ test_that('gh_covering works', {
sp::proj4string(banjarmasin) = NA_character_
banjarmasin_cover = gh_covering(banjarmasin, minimal = TRUE)
sp::proj4string(banjarmasin_cover) = wgs
expect_equivalent(banjarmasin_cover, banjarmasin_tight)
# Compare just the data and polygons, not CRS representation which can vary
expect_identical(banjarmasin_cover@data, banjarmasin_tight@data)
expect_identical(banjarmasin_cover@polygons, banjarmasin_tight@polygons)

# works for SpatialPointsDataFrame when minimal=TRUE, #30
banjarmasinDF = sp::SpatialPointsDataFrame(
Expand Down
12 changes: 12 additions & 0 deletions tests/testthat/test-utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,16 @@ test_that('gh_delta works', {

# one at a time, for now...
expect_error(gh_delta(1:5), 'One precision at a time', fixed = TRUE)

# bounds checking
expect_error(gh_delta(-1L), 'precision must be a single integer between 0 and 25')
expect_error(gh_delta(26L), 'precision must be a single integer between 0 and 25')
expect_error(gh_delta(100L), 'precision must be a single integer between 0 and 25')
expect_error(gh_delta('foo'), 'precision must be a single integer between 0 and 25')
expect_error(gh_delta(NA), 'precision must be a single integer between 0 and 25')

# valid edge cases
expect_length(gh_delta(0L), 2L)
expect_length(gh_delta(1L), 2L)
expect_length(gh_delta(25L), 2L)
})