Skip to content

Commit bc1989f

Browse files
authored
publish a new blog: RTREE (#503)
1 parent 551f9b9 commit bc1989f

File tree

1 file changed

+213
-0
lines changed

1 file changed

+213
-0
lines changed
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
id: unlock-geo-vector-search-with-geometry-fields-and-rtree-index-in-milvus.md
3+
title: >
4+
Spatial Meets Semantic: Unlock Geo-Vector Search with Geometry Fields and RTREE Index in Milvus
5+
author: Cai Zhang
6+
date: 2025-12-08
7+
cover: assets.zilliz.com/rtree_cover_53c424f967.png
8+
tag: Engineering
9+
recommend: false
10+
publishToMedium: true
11+
tags: Milvus, vector database
12+
meta_keywords: Milvus 2.6, Geometry field, RTREE index, Geo-Vector Search
13+
meta_title: >
14+
Milvus Geometry Field and RTREE Index for Geo-Vector Search
15+
desc: Learn how Milvus 2.6 unifies vector search with geospatial indexing using Geometry fields and the RTREE index, enabling accurate, location-aware AI retrieval.
16+
origin: https://milvus.io/blog/unlock-geo-vector-search-with-geometry-fields-and-rtree-index-in-milvus.md
17+
---
18+
19+
As modern systems grow more intelligent, geolocation data has become essential to applications such as AI-driven recommendations, smart dispatching, and autonomous driving.
20+
21+
For example, when you order food on platforms like DoorDash or Uber Eats, the system considers much more than the distance between you and the restaurant. It also weighs restaurant ratings, courier locations, traffic conditions, and even your personal preference embeddings. In autonomous driving, vehicles must perform path planning, obstacle detection, and scene-level semantic understanding, often within just a few milliseconds.
22+
23+
All of this depends on the ability to efficiently index and retrieve geospatial data.
24+
25+
Traditionally, geographic data and vector data lived in two separate systems:
26+
27+
- Geospatial systems store coordinates and spatial relationships (latitude, longitude, polygon regions, etc.).
28+
29+
- Vector databases handle semantic embeddings and similarity search generated by AI models.
30+
31+
This separation complicates architecture, slows queries, and makes it difficult for applications to perform spatial and semantic reasoning at the same time.
32+
33+
[Milvus 2.6](https://milvus.io/docs/release_notes.md#v264) addresses this problem by introducing the [Geometry Field](https://milvus.io/docs/geometry-field.md), which allows vector similarity search to be combined directly with spatial constraints. This enables use cases such as:
34+
35+
- Location-Base Service (LBS): “find similar POIs within this city block”
36+
37+
- Multi‑modal search: “retrieve similar photos within 1km of this point”
38+
39+
- Maps & logistics: “assets inside a region” or “routes intersecting a path”
40+
41+
Paired with the new [RTREE index](https://milvus.io/docs/rtree.md)—a tree-based structure optimized for spatial filtering—Milvus now supports efficient geospatial operators like `st_contains`, `st_within`, and `st_dwithin` alongside high-dimensional vector search. Together, they make spatially aware intelligent retrieval not just possible, but practical.
42+
43+
In this post, we’ll walk through how the Geometry Field and RTREE index work, and how they combine with vector similarity search to enable real-world, spatial-semantic applications.
44+
45+
46+
## What Is a Geometry Field?
47+
48+
A **Geometry field** is a schema-defined data type (`DataType.GEOMETRY`) in Milvus used to store geometric data. Unlike systems that handle only raw coordinates, Milvus supports a range of spatial structures—including **Point**, **LineString**, and **Polygon**.
49+
50+
This makes it possible to represent real-world concepts such as restaurant locations (Point), delivery zones (Polygon), or autonomous-vehicle trajectories (LineString), all within the same database that stores semantic vectors. In other words, Milvus becomes a unified system for both _where_ something is and _what it means_.
51+
52+
Geometry values are stored using the [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) format, a human-readable standard for inserting and querying geometric data. This simplifies data ingestion and querying because WKT strings can be inserted directly into a Milvus record. For example:
53+
54+
```
55+
data = [
56+
{
57+
"id": 1,
58+
"geo": "POINT(116.4074 39.9042)",
59+
"vector": vector,
60+
}
61+
]
62+
```
63+
64+
## What Is the RTREE Index and How Does It Work?
65+
66+
Once Milvus introduces the Geometry data type, it also needs an efficient way to filter spatial objects. Milvus handles this using a two-stage spatial filtering pipeline:
67+
68+
- **Coarse filtering:** Quickly narrows down candidates using spatial indexes such as RTREE.
69+
70+
- **Fine filtering:** Applies exact geometry checks on the candidates that remain, ensuring correctness at boundaries.
71+
72+
At the core of this process is **RTREE (Rectangle Tree)**, a spatial indexing structure designed for multidimensional geometric data. RTREE accelerates spatial queries by organizing geometric objects hierarchically.
73+
74+
**Phase 1: Build the index**
75+
76+
**1. Create leaf nodes:** For each geometry object, calculate its **Minimum Bounding Rectangle (MBR)**—the smallest rectangle that fully contains the object—and store it as a leaf node.
77+
78+
**2. Group into larger boxs:** Cluster nearby leaf nodes and wrap each group inside a new MBR, producing internal nodes.
79+
80+
**3. Add the root node:** Create a root node whose MBR covers all internal groups, forming a height-balanced tree structure.
81+
82+
![](https://assets.zilliz.com/RTREE_Index_11b5d09e07.png)
83+
84+
**Phase 2: Accelerate queries**
85+
86+
**1. Form the query MBR:** Calculate the MBR for the geometry used in your query.
87+
88+
**2. Prune branches:** Starting from the root, compare the query MBR with each internal node. Skip any branch whose MBR does not intersect with the query MBR.
89+
90+
**3. Collect candidates:** Descend into intersecting branches and gather the candidate leaf nodes.
91+
92+
**4. Perform exact matching:** For each candidate, run the spatial predicate to get precise results.
93+
94+
95+
### Why RTREE Is Fast
96+
97+
RTREE delivers strong performance in spatial filtering because of several key design features:
98+
99+
- **Every node stores an MBR:** Each node approximates the area of all geometries in its subtree. This makes it easy to decide whether a branch should be explored during a query.
100+
101+
- **Fast pruning:** Only subtrees whose MBR intersects the query region are explored. Irrelevant areas are ignored entirely.
102+
103+
- **Scales with data size:** RTREE supports spatial searches in **O(log N)** time, enabling fast queries even as the dataset expands.
104+
105+
- **Boost.Geometry implementation:** Milvus builds its RTREE index using [Boost.Geometry](https://www.boost.org/library/latest/geometry/), a widely used C++ library that provides optimized geometry algorithms and a thread-safe RTREE implementation suitable for concurrent workloads.
106+
107+
108+
### Supported geometry operators
109+
110+
Milvus provides a set of spatial operators that allow you to filter and retrieve entities based on geometric relationships. These operators are essential for workloads that need to understand how objects relate to one another in space.
111+
112+
The following table lists the [geometry operators](https://milvus.io/docs/geometry-operators.md) currently available in Milvus.
113+
114+
115+
116+
| **Operator** | **Description** |
117+
| :----------------------: | :--------------------------------------------------------------------------------------------------------------: |
118+
| **st_intersects(A, B)** | Returns TRUE if geometries A and B share at least one common point. |
119+
| **st_contains(A, B)** | Returns TRUE if geometry A completely contains geometry B (excluding the boundary). |
120+
| **st_within(A, B)** | Returns TRUE if geometry A is completely contained within geometry B. This is the inverse of st_contains(A, B). |
121+
| **st_covers(A, B)** | Returns TRUE if geometry A covers geometry B (including the boundary). |
122+
| **st_touches(A, B)** | Returns TRUE if geometries A and B touch at their boundaries but do not intersect internally. |
123+
| **st_equals(A, B)** | Returns TRUE if geometries A and B are spatially identical. |
124+
| **st_overlaps(A, B)** | Returns TRUE if geometries A and B partially overlap and neither fully contains the other. |
125+
| **st_dwithin(A, B, d)** | Returns TRUE if the distance between A and B is less than _d_. |
126+
127+
128+
### How to Combine Geolocation Index and Vector Index
129+
130+
With Geometry support and the RTREE index, Milvus can combine geospatial filtering with vector similarity search in a single workflow. The process works in two steps:
131+
132+
**1. Filter by location using RTREE:** Milvus first uses the RTREE index to narrow the search to entities within the specified geographic range (e.g., “within 2 km”).
133+
134+
**2. Rank by semantics using vector search:** From the remaining candidates, the vector index selects the Top-N most similar results based on embedding similarity.
135+
136+
![](https://assets.zilliz.com/Geometry_R_Tree_f1d88fc252.png)
137+
138+
## Real-World Applications of Geo-Vector Retrieval
139+
140+
### 1. Delivery Services: Smarter, Location-Aware Recommendations
141+
142+
Platforms such as DoorDash or Uber Eats handle hundreds of millions of requests each day. The moment a user opens the app, the system must determine—based on the user’s location, time of day, taste preferences, estimated delivery times, real-time traffic, and courier availability—which restaurants or couriers are the best match _right now_.
143+
144+
Traditionally, this requires querying a geospatial database and a separate recommendation engine, followed by multiple rounds of filtering and re-ranking. With the Geolocation Index, Milvus greatly simplifies this workflow:
145+
146+
- **Unified storage** — Restaurant coordinates, courier locations, and user preference embeddings all live in one system.
147+
148+
- **Joint retrieval** — First apply a spatial filter (e.g., _restaurants within 3 km_), then use vector search to rank by similarity, taste preference, or quality.
149+
150+
- **Dynamic decision-making** — Combine real-time courier distribution and traffic signals to quickly assign the nearest, most suitable courier.
151+
152+
This unified approach allows the platform to perform spatial and semantic reasoning in a single query. For example, when a user searches “curry rice,” Milvus retrieves restaurants that are semantically relevant _and_ prioritizes those that are nearby, deliver quickly, and match the user’s historical taste profile.
153+
154+
155+
### 2. Autonomous Driving: More Intelligent Decisions
156+
157+
In autonomous driving, geospatial indexing is fundamental to perception, localization, and decision-making. Vehicles must continuously align themselves to high-definition maps, detect obstacles, and plan safe trajectories—all within just a few milliseconds.
158+
159+
With Milvus, the Geometry type and RTREE index can store and query rich spatial structures such as:
160+
161+
- **Road boundaries** (LineString)
162+
163+
- **Traffic regulation zones** (Polygon)
164+
165+
- **Detected obstacles** (Point)
166+
167+
These structures can be indexed efficiently, allowing geospatial data to take part directly in the AI decision loop. For example, an autonomous vehicle can quickly determine whether its current coordinates fall within a specific lane or intersect with a restricted area, simply through an RTREE spatial predicate.
168+
169+
When combined with vector embeddings generated by the perception system—such as scene embeddings that capture the current driving environment—Milvus can support more advanced queries, like retrieving historical driving scenarios similar to the current one within a 50-meter radius. This helps models interpret the environment faster and make better decisions.
170+
171+
172+
## Conclusion
173+
174+
Geolocation is more than latitude and longitude—it is a valuable source of semantic information that tells us where things happen, how they relate to their surroundings, and what context they belong to.
175+
176+
In Zilliz’s next-generation database, vector data and geospatial information are gradually coming together as a unified foundation. This enables:
177+
178+
- Joint retrieval across vectors, geospatial data, and time
179+
180+
- Spatially aware recommendation systems
181+
182+
- Multimodal, location-based search (LBS)
183+
184+
In the future, AI will not only understand _what_ content means, but also where it applies and when it matters most.
185+
186+
For more information about the Geometry Field and the RTREE index, check the documentation below:
187+
188+
- [Geometry Field | Milvus Documentation](https://milvus.io/docs/geometry-field.md)
189+
190+
- [RTREE | Milvus Documentation](https://milvus.io/docs/rtree.md)
191+
192+
Have questions or want a deep dive on any feature of the latest Milvus? Join our[ Discord channel](https://discord.com/invite/8uyFbECzPX) or file issues on[ GitHub](https://github.com/milvus-io/milvus). You can also book a 20-minute one-on-one session to get insights, guidance, and answers to your questions through[ Milvus Office Hours](https://milvus.io/blog/join-milvus-office-hours-to-get-support-from-vectordb-experts.md).
193+
194+
195+
## Learn More about Milvus 2.6 Features
196+
197+
- [Introducing Milvus 2.6: Affordable Vector Search at Billion Scale](https://milvus.io/blog/introduce-milvus-2-6-built-for-scale-designed-to-reduce-costs.md)
198+
199+
- [Introducing the Embedding Function: How Milvus 2.6 Streamlines Vectorization and Semantic Search](https://milvus.io/blog/data-in-and-data-out-in-milvus-2-6.md)
200+
201+
- [JSON Shredding in Milvus: 88.9x Faster JSON Filtering with Flexibility](https://milvus.io/blog/json-shredding-in-milvus-faster-json-filtering-with-flexibility.md)
202+
203+
- [Unlocking True Entity-Level Retrieval: New Array-of-Structs and MAX_SIM Capabilities in Milvus](https://milvus.io/blog/unlocking-true-entity-level-retrieval-new-array-of-structs-and-max-sim-capabilities-in-milvus.md)
204+
205+
- [MinHash LSH in Milvus: The Secret Weapon for Fighting Duplicates in LLM Training Data ](https://milvus.io/blog/minhash-lsh-in-milvus-the-secret-weapon-for-fighting-duplicates-in-llm-training-data.md)
206+
207+
- [Bring Vector Compression to the Extreme: How Milvus Serves 3× More Queries with RaBitQ](https://milvus.io/blog/bring-vector-compression-to-the-extreme-how-milvus-serves-3%C3%97-more-queries-with-rabitq.md)
208+
209+
- [Benchmarks Lie — Vector DBs Deserve a Real Test ](https://milvus.io/blog/benchmarks-lie-vector-dbs-deserve-a-real-test.md)
210+
211+
- [We Replaced Kafka/Pulsar with a Woodpecker for Milvus ](https://milvus.io/blog/we-replaced-kafka-pulsar-with-a-woodpecker-for-milvus.md)
212+
213+
- [Vector Search in the Real World: How to Filter Efficiently Without Killing Recall](https://milvus.io/blog/how-to-filter-efficiently-without-killing-recall.md)

0 commit comments

Comments
 (0)