Skip to content

Conversation

@patrickbr
Copy link
Member

@patrickbr patrickbr commented Dec 22, 2025

This so fare includes several successful experiments I did over the last week, with the overall goal of speeding up the distance calculation between very large multi-geometries (MULTI* and COLLECTION).

1.) Store the current upper distance bound across multi geometry parts, which causes many candidates being already discarded via a simple bounding box distance calculation

2.) Improve the initial upper bound by storing, for each multigeometry, the rightmost point during parsing

3.) Make the padding for the meter distance distortion on the web mercator projected plane more tight by only considering the local distortion (e.g. if we know that both geometries are on the same "band" around the globe, it is necessary to consider the global distortion factor initially for computing the euclidean distance upper bound, but for the further sweep padding it is then enough to only consider the relative distortion between the geometries (which is nearly 1 if latitudes are close)

4.) Update libspatialjoin to a new experimental version which computes tighter initial upper distance bounds on start (by doing a simple probing of 4 points) and which also updates the sweeping padding dynamically during the sweep.

5.) On the side, fix a very subtle bug in which the distortion for the candidate sweep padding was computed based on the original geometry, but it must be computed based on the padded geometry. This caused some lost candidates on very large distances.

…distance calculation between multigeometries: store the current upper distance bound across multi geometry parts, which causes many candidates being already discarded via a simple bounding box distance calculation, improve the initial upper bound by storing, for each multigeometry, the rightmost point during parsing, make the padding for the meter distance distortion on the web mercator projected plane more tight by only considering the local distortion (e.g. if we know that both geometries are on the same "band" around the globe, it is enough to consider the global distortion factor initially for computing the euclidean distance upper bound, but for the padding it is then enough to only consider the relative distortion between the geometries (which is then nearly 1), fix a very subtle bug in which the distortion for the candidate sweep padding was computed based on the *original* geometry, but it must be computed based on the *padded* geometry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants