Skip to content

Commit bfc5c28

Browse files
Krishna Paifacebook-github-bot
authored andcommitted
feat: Add support for SpatialJoinFuzzer (facebookincubator#15455)
Summary: This change adds support for SpatialJoinFuzzer. Spatial Join is sufficiently different from default Join fuzzing that it seemed better to keep it separate rather than shoe horn into our default join fuzzer. SpatialJoin has more limited capabilities than regular joins. It does not support: * Multiple partition strategies * Grouped execution * Spilling * Flipped join order (build/probe swap) At the moment the fuzzer only generates simple ``Values -> SpatialJoin`` plans. Limitations =========== Apart from this the current iteration of the Spatial fuzzer has following limitations that I will try and address in later iterations. 1. Limited support for radius-based spatial joins (used only in ST_Distance predicate case). 2. No support for complex filter expressions beyond spatial predicates 3. No support for TableScan 4. No support for output column reordering/selection (uses all columns) Another thing to note is that currently it doesnt support either Presto or DuckDB as a source of truth. Support for adding Presto as a reference query runner will come in future iterations. Reviewed By: jagill Differential Revision: D86709638
1 parent 0fefb6c commit bfc5c28

File tree

6 files changed

+818
-0
lines changed

6 files changed

+818
-0
lines changed

velox/docs/develop/testing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@ Testing Tools
1111
testing/join-fuzzer
1212
testing/memory-arbitration-fuzzer
1313
testing/row-number-fuzzer
14+
testing/spatial-join-fuzzer
1415
testing/writer-fuzzer
1516
testing/spark-query-runner.rst
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
====================
2+
Spatial Join Fuzzer
3+
====================
4+
5+
Overview
6+
========
7+
8+
The Spatial Join Fuzzer tests the correctness of the SpatialJoin operator by generating random geometry data and spatial join plans. It verifies that SpatialJoin produces the same results as NestedLoopJoin for equivalent queries.
9+
10+
11+
Supported Features
12+
==================
13+
14+
Join Types
15+
----------
16+
17+
The fuzzer tests the two join types supported by SpatialJoin (as defined in ``SpatialJoinNode::isSupported()``):
18+
19+
* **INNER** - Only matching rows from both sides
20+
* **LEFT** - All rows from left side, matched rows from right side
21+
22+
Spatial Predicates
23+
------------------
24+
25+
The fuzzer tests these spatial predicates:
26+
27+
* ``ST_Intersects(geometry1, geometry2)`` - Tests if geometries intersect
28+
* ``ST_Contains(geometry1, geometry2)`` - Tests if one geometry contains another
29+
* ``ST_Within(geometry1, geometry2)`` - Tests if one geometry is within another
30+
* ``ST_Distance(geometry1, geometry2) < threshold`` - Tests distance with threshold
31+
32+
Geometry Types
33+
--------------
34+
35+
The fuzzer generates Well-Known Text (WKT) strings for three geometry types:
36+
37+
* **POINT** - Single coordinate point (e.g., ``POINT (10.5 20.3)``)
38+
* **POLYGON** - Closed shape with vertices
39+
* **LINESTRING** - Line segment between two points
40+
41+
Distribution Patterns
42+
---------------------
43+
44+
Geometries are generated using three distribution patterns:
45+
46+
* **Uniform** - Geometries uniformly distributed in space (0-1000 range)
47+
* **Clustered** - Geometries grouped in 5 specific regions to test overlap scenarios
48+
* **Sparse** - Geometries widely spread (0-2000 range) with low overlap probability
49+
50+
Implementation Details
51+
======================
52+
53+
54+
Geometry Generation
55+
-------------------
56+
57+
Geometries are generated using ``AbstractInputGenerator`` subclasses:
58+
59+
* ``PointInputGenerator`` - Generates POINT WKT strings
60+
* ``PolygonInputGenerator`` - Generates POLYGON WKT strings
61+
* ``LineStringInputGenerator`` - Generates LINESTRING WKT strings
62+
63+
Each generator implements the ``generate(vector_size_t index)`` method to produce geometry strings based on the distribution pattern.
64+
65+
**Uniform Distribution**::
66+
67+
x = random(0, 1000)
68+
y = random(0, 1000)
69+
POINT (x y)
70+
71+
**Clustered Distribution**::
72+
73+
cluster = row % 5 // 5 clusters
74+
centerX = cluster * 200 + 100
75+
centerY = cluster * 200 + 100
76+
x = centerX + random(-50, 50)
77+
y = centerY + random(-50, 50)
78+
POINT (x y)
79+
80+
**Sparse Distribution**::
81+
82+
x = random(0, 2000) // Larger Range
83+
y = random(0, 2000)
84+
POINT (x y)
85+
86+
Data Matching Strategy
87+
----------------------
88+
89+
To ensure some matches occur during joins:
90+
91+
* Build side copies ~30% of geometries from probe side
92+
* 10% chance of empty build side to test edge cases
93+
94+
Verification
95+
------------
96+
97+
The fuzzer compares results from two equivalent plans:
98+
99+
1. **SpatialJoin plan** - Using the specialized SpatialJoin operator
100+
2. **NestedLoopJoin plan** - Using NestedLoopJoin with the same spatial predicate as a filter
101+
102+
Results must match exactly, validating that SpatialJoin implements spatial predicates correctly.
103+
104+
Key Differences from JoinFuzzer
105+
================================
106+
107+
Join Conditions
108+
---------------
109+
110+
Unlike regular joins with simple equality predicates::
111+
112+
// Regular join
113+
probe.id = build.id
114+
115+
// Spatial join
116+
ST_Intersects(probe_geom, build_geom)
117+
118+
Spatial joins use **function call expressions** as join conditions rather than simple column references.

velox/exec/fuzzer/CMakeLists.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,19 @@ target_link_libraries(
146146
velox_expression_test_utility
147147
)
148148

149+
# Spatial Join Fuzzer.
150+
add_executable(velox_spatial_join_fuzzer SpatialJoinFuzzerRunner.cpp SpatialJoinFuzzer.cpp)
151+
152+
target_link_libraries(
153+
velox_spatial_join_fuzzer
154+
velox_type
155+
velox_vector_fuzzer
156+
velox_fuzzer_util
157+
velox_exec_test_lib
158+
velox_expression_test_utility
159+
velox_vector_test_lib
160+
)
161+
149162
add_library(velox_writer_fuzzer WriterFuzzer.cpp)
150163

151164
target_link_libraries(

0 commit comments

Comments
 (0)