Skip to content

Phase 2: Optimize gh_covering Performance (2-3× Speedup)#44

Open
dshkol wants to merge 2 commits into
masterfrom
perf/gh-covering-direct-grid
Open

Phase 2: Optimize gh_covering Performance (2-3× Speedup)#44
dshkol wants to merge 2 commits into
masterfrom
perf/gh-covering-direct-grid

Conversation

@dshkol
Copy link
Copy Markdown
Collaborator

@dshkol dshkol commented Nov 11, 2025

Summary

This PR optimizes gh_covering by eliminating a redundant encode-decode cycle, achieving 2-3× speedup across typical use cases.

Problem

The previous implementation had an inefficient data flow:

Grid → gh_encode → gh_to_sf → gh_to_spdf → gh_to_sp → gh_decode → Build polygons

The encode-decode round-trip was wasteful since we only needed the decoded coordinates to build polygons.

Solution

Streamlined the flow to:

Grid → gh_encode → gh_decode → Build polygons (direct)

Now gh_covering decodes geohashes once and builds SpatialPolygons directly, skipping the intermediate conversions through gh_to_sfgh_to_spdfgh_to_sp.

Performance Results

Benchmark: 100 random points, 0.5° spread

Precision Old (ms) New (ms) Speedup
4 10.9 5.0 2.17×
5 15.9 6.6 2.41×
6 180.5 57.8 3.13×
7 5528.6 1802.5 3.07×

Median speedup: ~2.7×

Higher precisions show larger absolute time savings (e.g., precision 7 saves ~3.7 seconds).

Testing

✅ All 48 existing tests pass
✅ No breaking changes to API or behavior
✅ Same output as before, just faster
✅ Benchmark scripts included in benchmarks/

Changes

  • R/gis_tools.R: Refactored gh_covering to build polygons directly from decoded coordinates
  • NEWS.md: Documented performance improvement
  • benchmarks/: Added microbenchmark scripts for reproducibility

Verification

# Run benchmarks yourself
source("benchmarks/gh_covering_bench_simple.R")

# Run tests
devtools::test()

Next Steps

This is part of a series of optimization PRs:

  • ✅ Phase 1: Quick fixes and code quality (#43)
  • ✅ Phase 2: gh_covering optimization (this PR)
  • 🔜 Phase 3: Duplicate detection optimization
  • 🔜 Phase 4: CI/CD modernization (GitHub Actions)

🤖 Generated with Claude Code

Eliminated redundant encode-decode cycle in gh_covering by decoding
geohashes once and building spatial polygons directly.

Before: Grid → encode → gh_to_sf → gh_to_spdf → gh_to_sp → decode → polygons
After: Grid → encode → decode → polygons (direct)

Performance improvements (100 points, 0.5° spread):
- Precision 4: 2.17x faster (10.9ms → 5.0ms)
- Precision 5: 2.41x faster (15.9ms → 6.6ms)
- Precision 6: 3.13x faster (180.5ms → 57.8ms)
- Precision 7: 3.07x faster (5528.6ms → 1802.5ms)

Median speedup: ~2.7x across typical use cases

All existing tests pass - no breaking changes to API or behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread NEWS.md

### PERFORMANCE

1. Optimized `gh_covering` by eliminating redundant encode-decode cycle. The function now decodes geohashes once and builds spatial polygons directly, rather than going through `gh_to_sf` → `gh_to_spdf` → `gh_to_sp` → `gh_decode`. Benchmarks show 2-3× speedup across typical use cases, with larger improvements for higher precision values.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to avoid non-ASCII in NEWS?

Suggested change
1. Optimized `gh_covering` by eliminating redundant encode-decode cycle. The function now decodes geohashes once and builds spatial polygons directly, rather than going through `gh_to_sf` `gh_to_spdf` `gh_to_sp` `gh_decode`. Benchmarks show 2- speedup across typical use cases, with larger improvements for higher precision values.
1. Optimized `gh_covering` by eliminating redundant encode-decode cycle. The function now decodes geohashes once and builds spatial polygons directly, rather than going through `gh_to_sf` -> `gh_to_spdf` -> `gh_to_sp` -> `gh_decode`. Benchmarks show 2-3x speedup across typical use cases, with larger improvements for higher precision values.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, superfluous.

@MichaelChirico
Copy link
Copy Markdown
Owner

A jj chain should be the easiest way to prevent headache from merge conflicts among this sequence of PRs as review suggestions are incorporated / each branch drifts from the state as filed:

https://github.com/jj-vcs/jj

It seems Claude can basically get the gist of how to use it too even though it's relatively new:

https://claude.ai/share/e53b881a-f456-44dc-ae0c-2d96936cf366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants