|
| 1 | +--- |
| 2 | +RFC: 87 |
| 3 | +Status: Proposed |
| 4 | +--- |
| 5 | + |
| 6 | +# Title |
| 7 | + |
| 8 | +Geospatial Indexing |
| 9 | + |
| 10 | +## Abstract |
| 11 | + |
| 12 | +This proposal introduces a new field type, `GEO`, into the Valkey Search module. This enhancement enables efficient indexing and querying of geospatial data, supporting operations such as finding keys within a specified radius or bounding box. The geospatial indexing leverages Boost.Geometry's R-tree implementation to ensure high performance and scalability. |
| 13 | + |
| 14 | +## Motivation |
| 15 | + |
| 16 | +Efficient geospatial querying is essential for applications that handle location-based data, such as mapping services, ride-sharing platforms, and geographic information systems (GIS). Integrating geospatial indexing into Valkey Search will allow developers to perform complex location-based queries directly within Valkey, reducing the need for external processing and thereby improving performance and simplifying application architecture. |
| 17 | + |
| 18 | +## Terminology |
| 19 | + |
| 20 | +In the context of this RFC: |
| 21 | + |
| 22 | +- **Key**: A Valkey key, which could be the name of the key or the contents of a HASH or JSON key. |
| 23 | +- **Field**: A component of an index. Each field has a type and a path. |
| 24 | +- **Index**: A collection of fields and field-indexes. The object created by the `FT.CREATE` command. |
| 25 | +- **Field-Index**: A data structure associated with a field that accelerates the operation of search operators for this field type. |
| 26 | +- **Geospatial Data**: Data that represents the geographic location and shape of objects. |
| 27 | +- **R-tree**: A tree data structure used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates. |
| 28 | + |
| 29 | +## Design |
| 30 | + |
| 31 | +### Field Type |
| 32 | + |
| 33 | +A new field type, `GEO`, is introduced to represent geospatial data. This field type indexes geographic coordinates (latitude and longitude) and supports efficient spatial queries. |
| 34 | + |
| 35 | +### Indexing |
| 36 | + |
| 37 | +The `GEO` field utilizes Boost.Geometry's R-tree implementation for spatial indexing. The R-tree is a self-balanced data structure that organizes spatial data in a way that minimizes the number of nodes traversed during searches, optimizing query performance. |
| 38 | + |
| 39 | +### Packing Algorithms |
| 40 | + |
| 41 | +Boost.Geometry implements several packing algorithms for its R-tree implementation, each with various advantages and disadvantages. To leverage this flexibility, an optional parameter `OPTIMIZED_FOR` is proposed to allow users to choose the optimal algorithm for their use case: |
| 42 | + |
| 43 | +- `SEARCH`: Utilizes the **R\*-tree algorithm**, optimized for minimizing overlap between nodes and enhancing search performance. It typically offers significant speedup in query operations at the cost of slightly slower insertions and deletions. |
| 44 | +- `MUTATION`: Employs the **Quadratic split algorithm**, optimized for rapid insertions, deletions, and updates. It significantly speeds up data mutation operations but may result in slightly slower search performance due to increased node overlap. |
| 45 | + |
| 46 | +The default value for `OPTIMIZED_FOR` is `SEARCH`. |
| 47 | + |
| 48 | +By selecting different algorithms, this option provides flexibility to optimize performance based on specific application needs. |
| 49 | + |
| 50 | +### Querying |
| 51 | + |
| 52 | +The `GEO` field supports the following query operations: |
| 53 | + |
| 54 | +- **Radius Search**: Finds keys within a specified radius from a given point. |
| 55 | +- **Bounding Box Search**: Finds keys within a specified rectangular area. |
| 56 | +- **Nearest Neighbor Search**: Finds the nearest keys to a given point. |
| 57 | + |
| 58 | +These operations allow efficient retrieval of geospatial data based on proximity and spatial relationships. |
| 59 | + |
| 60 | +## Query Language Extensions |
| 61 | + |
| 62 | +To support geospatial queries, the query language is extended with new predicates: |
| 63 | + |
| 64 | +- `GEO_RADIUS(lat, lon, radius)`: Matches keys within the given radius (in meters) from the specified latitude and longitude. |
| 65 | +- `GEO_BOUNDING_BOX(min_lat, min_lon, max_lat, max_lon)`: Matches keys within the specified bounding box. |
| 66 | + |
| 67 | +These predicates can be combined with existing query constructs to perform complex geospatial queries. |
| 68 | + |
| 69 | +## Commands |
| 70 | + |
| 71 | +### FT.CREATE |
| 72 | + |
| 73 | +The `FT.CREATE` command is extended to support the `GEO` field type and the optional optimization parameter: |
| 74 | + |
| 75 | +``` |
| 76 | +FT.CREATE <index_name> ON <data_type> PREFIX <prefix_count> <prefix> SCHEMA <field_name> GEO [OPTIMIZED_FOR SEARCH|MUTATION] |
| 77 | +``` |
| 78 | + |
| 79 | +Example: |
| 80 | + |
| 81 | +``` |
| 82 | +FT.CREATE places_idx ON HASH PREFIX 1 \"place:\" SCHEMA location GEO OPTIMIZED_FOR SEARCH |
| 83 | +``` |
| 84 | + |
| 85 | +### FT.SEARCH |
| 86 | + |
| 87 | +The `FT.SEARCH` command is extended to support geospatial queries: |
| 88 | + |
| 89 | +``` |
| 90 | +FT.SEARCH <index_name> <query> [GEO_RADIUS <lat> <lon> <radius>] [GEO_BOUNDING_BOX <min_lat> <min_lon> <max_lat> <max_lon>] |
| 91 | +``` |
| 92 | + |
| 93 | +Example: |
| 94 | + |
| 95 | +``` |
| 96 | +FT.SEARCH places_idx "*" GEO_RADIUS 40.7128 -74.0060 5000 |
| 97 | +``` |
| 98 | + |
| 99 | +## Implementation Details |
| 100 | + |
| 101 | +- **Boost.Geometry Integration**: Utilizes the R-tree implementation from Boost.Geometry for efficient spatial indexing. |
| 102 | +- **Packing Algorithms**: Provides flexibility to select packing algorithms optimized either for search performance or data modifications. |
| 103 | +- **Data Storage**: Geospatial data is stored in fields designated as `GEO` type, with the indexing mechanism parsing these fields to extract coordinates. |
| 104 | +- **Performance Considerations**: Ensures efficient insertions, deletions, and queries, even with large datasets. |
| 105 | + |
| 106 | +## Backward Compatibility |
| 107 | + |
| 108 | +The introduction of the `GEO` field type and associated commands is backward compatible. Existing functionality remains unchanged, and the new features are additive. |
| 109 | + |
| 110 | +## Open Questions |
| 111 | + |
| 112 | +- **Coordinate Reference Systems**: Should multiple coordinate reference systems be supported, or should the system standardize on WGS 84? |
| 113 | +- **3D Geospatial Data**: Should three-dimensional data (latitude, longitude, altitude) be supported in the future? |
| 114 | +- **Advanced Spatial Queries**: Should additional spatial predicates (e.g., intersects, contains) be supported beyond radius and bounding box searches? |
| 115 | + |
| 116 | +Feedback and discussions are welcome to refine this proposal. |
0 commit comments