Skip to content

Commit 1132bab

Browse files
committed
RFC for geospatial indexing
1 parent eec38de commit 1132bab

File tree

1 file changed

+116
-0
lines changed

1 file changed

+116
-0
lines changed

rfc/geospatial.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
RFC: 87
3+
Status: Proposed
4+
---
5+
6+
# Title
7+
8+
Geospatial Indexing
9+
10+
## Abstract
11+
12+
This proposal introduces a new field type, `GEO`, into the Valkey Search module. This enhancement enables efficient indexing and querying of geospatial data, supporting operations such as finding keys within a specified radius or bounding box. The geospatial indexing leverages Boost.Geometry's R-tree implementation to ensure high performance and scalability.
13+
14+
## Motivation
15+
16+
Efficient geospatial querying is essential for applications that handle location-based data, such as mapping services, ride-sharing platforms, and geographic information systems (GIS). Integrating geospatial indexing into Valkey Search will allow developers to perform complex location-based queries directly within Valkey, reducing the need for external processing and thereby improving performance and simplifying application architecture.
17+
18+
## Terminology
19+
20+
In the context of this RFC:
21+
22+
- **Key**: A Valkey key, which could be the name of the key or the contents of a HASH or JSON key.
23+
- **Field**: A component of an index. Each field has a type and a path.
24+
- **Index**: A collection of fields and field-indexes. The object created by the `FT.CREATE` command.
25+
- **Field-Index**: A data structure associated with a field that accelerates the operation of search operators for this field type.
26+
- **Geospatial Data**: Data that represents the geographic location and shape of objects.
27+
- **R-tree**: A tree data structure used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates.
28+
29+
## Design
30+
31+
### Field Type
32+
33+
A new field type, `GEO`, is introduced to represent geospatial data. This field type indexes geographic coordinates (latitude and longitude) and supports efficient spatial queries.
34+
35+
### Indexing
36+
37+
The `GEO` field utilizes Boost.Geometry's R-tree implementation for spatial indexing. The R-tree is a self-balanced data structure that organizes spatial data in a way that minimizes the number of nodes traversed during searches, optimizing query performance.
38+
39+
### Packing Algorithms
40+
41+
Boost.Geometry implements several packing algorithms for its R-tree implementation, each with various advantages and disadvantages. To leverage this flexibility, an optional parameter `OPTIMIZED_FOR` is proposed to allow users to choose the optimal algorithm for their use case:
42+
43+
- `SEARCH`: Utilizes the **R\*-tree algorithm**, optimized for minimizing overlap between nodes and enhancing search performance. It typically offers significant speedup in query operations at the cost of slightly slower insertions and deletions.
44+
- `MUTATION`: Employs the **Quadratic split algorithm**, optimized for rapid insertions, deletions, and updates. It significantly speeds up data mutation operations but may result in slightly slower search performance due to increased node overlap.
45+
46+
The default value for `OPTIMIZED_FOR` is `SEARCH`.
47+
48+
By selecting different algorithms, this option provides flexibility to optimize performance based on specific application needs.
49+
50+
### Querying
51+
52+
The `GEO` field supports the following query operations:
53+
54+
- **Radius Search**: Finds keys within a specified radius from a given point.
55+
- **Bounding Box Search**: Finds keys within a specified rectangular area.
56+
- **Nearest Neighbor Search**: Finds the nearest keys to a given point.
57+
58+
These operations allow efficient retrieval of geospatial data based on proximity and spatial relationships.
59+
60+
## Query Language Extensions
61+
62+
To support geospatial queries, the query language is extended with new predicates:
63+
64+
- `GEO_RADIUS(lat, lon, radius)`: Matches keys within the given radius (in meters) from the specified latitude and longitude.
65+
- `GEO_BOUNDING_BOX(min_lat, min_lon, max_lat, max_lon)`: Matches keys within the specified bounding box.
66+
67+
These predicates can be combined with existing query constructs to perform complex geospatial queries.
68+
69+
## Commands
70+
71+
### FT.CREATE
72+
73+
The `FT.CREATE` command is extended to support the `GEO` field type and the optional optimization parameter:
74+
75+
```
76+
FT.CREATE <index_name> ON <data_type> PREFIX <prefix_count> <prefix> SCHEMA <field_name> GEO [OPTIMIZED_FOR SEARCH|MUTATION]
77+
```
78+
79+
Example:
80+
81+
```
82+
FT.CREATE places_idx ON HASH PREFIX 1 \"place:\" SCHEMA location GEO OPTIMIZED_FOR SEARCH
83+
```
84+
85+
### FT.SEARCH
86+
87+
The `FT.SEARCH` command is extended to support geospatial queries:
88+
89+
```
90+
FT.SEARCH <index_name> <query> [GEO_RADIUS <lat> <lon> <radius>] [GEO_BOUNDING_BOX <min_lat> <min_lon> <max_lat> <max_lon>]
91+
```
92+
93+
Example:
94+
95+
```
96+
FT.SEARCH places_idx "*" GEO_RADIUS 40.7128 -74.0060 5000
97+
```
98+
99+
## Implementation Details
100+
101+
- **Boost.Geometry Integration**: Utilizes the R-tree implementation from Boost.Geometry for efficient spatial indexing.
102+
- **Packing Algorithms**: Provides flexibility to select packing algorithms optimized either for search performance or data modifications.
103+
- **Data Storage**: Geospatial data is stored in fields designated as `GEO` type, with the indexing mechanism parsing these fields to extract coordinates.
104+
- **Performance Considerations**: Ensures efficient insertions, deletions, and queries, even with large datasets.
105+
106+
## Backward Compatibility
107+
108+
The introduction of the `GEO` field type and associated commands is backward compatible. Existing functionality remains unchanged, and the new features are additive.
109+
110+
## Open Questions
111+
112+
- **Coordinate Reference Systems**: Should multiple coordinate reference systems be supported, or should the system standardize on WGS 84?
113+
- **3D Geospatial Data**: Should three-dimensional data (latitude, longitude, altitude) be supported in the future?
114+
- **Advanced Spatial Queries**: Should additional spatial predicates (e.g., intersects, contains) be supported beyond radius and bounding box searches?
115+
116+
Feedback and discussions are welcome to refine this proposal.

0 commit comments

Comments
 (0)