Skip to content

Gh fill#41

Open
gvanweerden wants to merge 8 commits into
MichaelChirico:masterfrom
gvanweerden:gh_fill
Open

Gh fill#41
gvanweerden wants to merge 8 commits into
MichaelChirico:masterfrom
gvanweerden:gh_fill

Conversation

@gvanweerden
Copy link
Copy Markdown

Made the following changes:

  • Got rid of whitespace in function
  • Removed roxygen comments and replaced with .Rd file
  • Simplified code using any()
  • Added test in testthat
  • Added comparison .Rmd for testing base vs data.table cartesian join

"9ws", "9wt", "9wu", "9wv", "9ww", "9wx",
"9wy", "9wz"))
# Test vector of inputs
expect_equal(gh_fill(c("9w", "9x"), 3L),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might expect the output here to be

list(
  c("9w0", ..., "9wz"),
  c("9x0", ..., "9xz")
)

That would make it much more natural to map inputs to outputs:

fill( x[j] ) == out[[jj]] instead of currently fill( x[j] ) == out[1:32 + (32 * (jj - 1L))]

WDYT?

We might also offer a simplify= (naming?) argument to toggle unlist() the version I propose.

Comment thread R/gh_fill.R
@@ -0,0 +1,14 @@
gh_fill <- function(geohashes, precision) {
if (length(unique(nchar(geohashes))) > 1) {
stop("Input Geohashes must all have the same precision level.")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no use case for taking e.g. c("0", "11") as input and wanting all of the level-4 precisions as output?

Comment thread R/gh_fill.R
stop("Invalid Geohash; Valid characters: [0123456789bcdefghjkmnpqrstuvwxyz](any case)")
}
new_levels <- precision - nchar(geohashes[1])
base32 <-
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cache this to .geohash at build time since it's likely to come up in various places:

.global = new.env(parent = emptyenv())

.geohash$base32 <- ...

Comment thread R/gh_fill.R
if (length(unique(nchar(geohashes))) > 1) {
stop("Input Geohashes must all have the same precision level.")
}
if (any(grepl("['ailoAILO]", geohashes))) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allows non-ASCII characters on input, for example.

I would do if (!all(grepl(<base32>, geohashes))) instead, and use ignore.case=TRUE.

PS why '?

Comment thread expandgrid_vs_CJ.Rmd

```{r benchmark test}
microbenchmark(gh_fill_dt(test_vector, 6L))
microbenchmark(gh_fill_base(test_vector, 6L))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine the speedup is large but not massive (30%):

Unit: milliseconds
                          expr      min        lq      mean    median       uq
   gh_fill_dt(test_vector, 6L) 497.8866  723.1318  797.3614  793.1192  871.561
 gh_fill_base(test_vector, 6L) 814.1263 1031.9389 1225.3436 1160.6757 1457.280
      max neval
 1176.892   100
 1788.559   100

Anyway, data.table is not complex as far as dependencies go... I lean towards allowing it.

@MichaelChirico
Copy link
Copy Markdown
Owner

WDYT about designing this as gh_zoom() instead? And also allowing precision < nchar(geohashes) (easier to implement obviously)?

gh_fill() naming doesn't feel perfect to me, since nothing's being "filled", exactly. "zooming" feels closer to what we're doing.

Comment thread R/gh_fill.R
@@ -0,0 +1,14 @@
gh_fill <- function(geohashes, precision) {
if (length(unique(nchar(geohashes))) > 1) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See overall comment... if we only want to allow "zooming in", we should add a check that precision >= nchar(geohashes).

Comment thread R/gh_fill.R
@@ -0,0 +1,14 @@
gh_fill <- function(geohashes, precision) {
Copy link
Copy Markdown
Owner

@MichaelChirico MichaelChirico Jul 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also consider some edge cases:

geohashes = character()
geohashes = NA_character_
geohashes = c("0", NA_character_)
geohashes = "0"; precision = 1L

What should be the behavior here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants