Skip to content

feat: convert directly between C and Go arrays#85

Merged
jogly merged 4 commits intouber:masterfrom
justinhwang:justinhwang/unsafe
Jun 6, 2025
Merged

feat: convert directly between C and Go arrays#85
jogly merged 4 commits intouber:masterfrom
justinhwang:justinhwang/unsafe

Conversation

@justinhwang
Copy link
Copy Markdown
Collaborator

@justinhwang justinhwang commented Jun 6, 2025

Save a few allocations by converting directly between C and Go arrays.

goos: darwin
goarch: arm64
pkg: github.com/uber/h3-go/v4
cpu: Apple M3 Max
BenchmarkToString-16          	62702066	        17.13 ns/op	      16 B/op	       1 allocs/op
BenchmarkFromString-16        	42018031	        27.86 ns/op	       0 B/op	       0 allocs/op
BenchmarkCellToLatLng-16      	 6184140	       192.3 ns/op	      16 B/op	       1 allocs/op
BenchmarkLatLngToCell-16      	 3529909	       340.2 ns/op	      24 B/op	       2 allocs/op
BenchmarkCellToBoundary-16    	 1628898	       737.0 ns/op	     336 B/op	       2 allocs/op
BenchmarkGridDisk-16          	  331419	      3601 ns/op	    5376 B/op	       2 allocs/op
BenchmarkGridRing-16          	 1606585	       746.3 ns/op	     960 B/op	       2 allocs/op
PASS
goos: darwin
goarch: arm64
pkg: github.com/uber/h3-go/v4
cpu: Apple M3 Max
BenchmarkToString-16          	70314662	        16.96 ns/op	      16 B/op	       1 allocs/op
BenchmarkFromString-16        	40283664	        29.25 ns/op	       0 B/op	       0 allocs/op
BenchmarkCellToLatLng-16      	 6232748	       191.7 ns/op	      16 B/op	       1 allocs/op
BenchmarkLatLngToCell-16      	 3487287	       339.9 ns/op	      24 B/op	       2 allocs/op
BenchmarkCellToBoundary-16    	 1642731	       729.9 ns/op	     336 B/op	       2 allocs/op
BenchmarkGridDisk-16          	  342846	      3496 ns/op	    2688 B/op	       1 allocs/op
BenchmarkGridRing-16          	 1686889	       712.3 ns/op	     480 B/op	       1 allocs/op
PASS
name               old alloc/op   new alloc/op   delta
GridDisk-16          5.38kB ± 0%    2.69kB ± 0%   ~     (p=1.000 n=1+1)
GridRing-16            960B ± 0%      480B ± 0%   ~     (p=1.000 n=1+1)

name               old allocs/op  new allocs/op  delta
GridDisk-16            2.00 ± 0%      1.00 ± 0%   ~     (p=1.000 n=1+1)
GridRing-16            2.00 ± 0%      1.00 ± 0%   ~     (p=1.000 n=1+1)

The delta here though is about even though.

@justinhwang
Copy link
Copy Markdown
Collaborator Author

@jogly how do you feel about this one? The optimization wasn't as significant as I thought it would be

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 6, 2025

Pull Request Test Coverage Report for Build 15497052950

Details

  • 6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.01%) to 98.963%

Totals Coverage Status
Change from base Build 15496336957: -0.01%
Covered Lines: 668
Relevant Lines: 675

💛 - Coveralls

@zachcoleman
Copy link
Copy Markdown
Collaborator

These benchmarks are kinda meh anyways as they call the function on the same input over and over enabling a lot of cache behavior. Would be curious the results on randomized lat/lngs.

@dfellis
Copy link
Copy Markdown
Collaborator

dfellis commented Jun 6, 2025

Or testing polygonToCells with a large geofence and a fine resolution? The largest benchmark in here (afaict, since I'm not super familiar with the Go bindings) only allocates for 331 cells, or about 2.58KB, for the gridDisk benchmark.

The benchmark does show that this change cuts the temporary memory allocation exactly in half to only very slightly above the memory needed to store the cells (down from 5.38KB to 2.69KB), so that does still feel like a solid win on memory overhead, which could be proven with a monstrous polygon that spits out at least tens of thousands of cells? At a large enough size the extra memcpy time might be measurable, as well, to demonstrate a perf boost.

But basically I wouldn't reject this PR because a server working with a lot of data simultaneously could be under a lot of memory pressure that it doesn't have to. I think it's great. :)

@justinhwang
Copy link
Copy Markdown
Collaborator Author

Or testing polygonToCells with a large geofence and a fine resolution? The largest benchmark in here (afaict, since I'm not super familiar with the Go bindings) only allocates for 331 cells, or about 2.58KB, for the gridDisk benchmark.

Yeah there's a preexisting benchmark wherein we test with 15 resolution which was timing out - let me retry that with a lower resolution and then see what the benchmark differences are.

func BenchmarkPolyfill(b *testing.B) {
	for n := 0; n < b.N; n++ {
		cells, _ = PolygonToCells(validGeoPolygonHoles, 15)
	}
}

@justinhwang
Copy link
Copy Markdown
Collaborator Author

Yeah there's a preexisting benchmark wherein we test with 15 resolution which was timing out - let me retry that with a lower resolution and then see what the benchmark differences are.

Not sure I was thinking of that properly, a finer resolution would mean more Cells which would mean more likely to time out? Let me try as-is with longer timeout and less retries

@jogly
Copy link
Copy Markdown
Collaborator

jogly commented Jun 6, 2025

I read this PR after I made this comment https://github.com/uber/h3-go/pull/84/files?w=1#r2132501753 haha. Because you're calling cellsFromC per origin cell there, that function would get the most benefit from this optimization for a large number of origins. And of course @dfellis is correct as well for large polygons, this has big savings on memory pressure, which is what was on my mind when I wrote that comment.

@justinhwang
Copy link
Copy Markdown
Collaborator Author

@jogly I couldn't get the BenchmarkPolyfill to run locally even with b.N set to 1 --

go test -bench=. -benchmem -benchtime=1x

have you had any success running that benchmark?

@jogly
Copy link
Copy Markdown
Collaborator

jogly commented Jun 6, 2025

@jogly I couldn't get the BenchmarkPolyfill to run locally even with b.N set to 1 --

go test -bench=. -benchmem -benchtime=1x

have you had any success running that benchmark?

Ran a couple tests and 15 is definitely too fine of a resolution, it requires 9GB of contiguous RAM without this change. 13 is a good number for the polygon being tested, and definitely shows the memory required is halved. I think the polygon being passed to this benchmark changed for a test a while back, and we never ran the benchmarks after. I also added a benchmark for your last PR and this changes halves that memory requirement as well, as expected. Great optimization!

jogly
jogly previously approved these changes Jun 6, 2025
@jogly jogly changed the title feat: Convert directly between C and Go arrays feat: convert directly between C and Go arrays Jun 6, 2025
@jogly jogly merged commit 41962bc into uber:master Jun 6, 2025
5 checks passed
@justinhwang justinhwang deleted the justinhwang/unsafe branch June 6, 2025 18:20
@dfellis
Copy link
Copy Markdown
Collaborator

dfellis commented Jun 6, 2025

Screenshot From 2025-06-06 13-20-27
obama-awarding-himself-jpeg-505931254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants