Skip to content

Commit 56ae185

Browse files
committed
Covering index RFC mango index selection notes
1 parent 240b912 commit 56ae185

File tree

1 file changed

+66
-10
lines changed

1 file changed

+66
-10
lines changed

src/docs/rfcs/018-mango-covering-json-index.md

+66-10
Original file line numberDiff line numberDiff line change
@@ -87,20 +87,75 @@ This would take place within `mango_view_cursor.erl`. The key functions
8787
involved are the shard-level `view_cb/2`, the streaming result handler at the
8888
coordinator end (`handle_message/2`) and the `execute/3` function.
8989

90+
## Mango JSON index selection
91+
92+
A Mango JSON index is implemented as a view with a complex key. The first field
93+
in the index is the first entry in the complex key, the second field is the
94+
second key and so on. Even indexes with one field use a complex key with length
95+
`1`.
96+
97+
When choosing a JSON index to use for a query, there are a couple of things that
98+
are important to covering indexes.
99+
100+
Firstly, note there are certain predicate operators that can be used with an
101+
index, currently: `$lt`, $lte`, `$eq`, $gte` and `$gt`. These can easily be
102+
converted to key operations within a key ordered index. For an index to be
103+
chosen for a query, the first key within the indexes complex key MUST be used
104+
with a predicate operator that can be converted into an operation on the index.
105+
106+
Secondly, a quirk of Mango indexes is that for a document to be included in an
107+
index it must contain all of the index's indexed fields. Documents without all
108+
the fields will not be included. This means that when we are choosing an index
109+
for a query, we must further choose an index where the predicates within the
110+
`selector` imply `$exists=true` for all fields in the index's key. Without that,
111+
we will have incomplete results.
112+
113+
Why is this? Let's look at an index with these fields:
114+
115+
```json
116+
["age", "name"]
117+
```
118+
119+
Now we index two documents. The first document is included in the index while the second is not (because it doesn't include `name`):
120+
121+
122+
```json
123+
{"_id": "foo", "age": 39, "name": "mike"}
124+
125+
{"_id": "bar", "age": 39, "pet": "cat"}
126+
```
127+
128+
The `selector` `{"age": {"$gt": 30}}` should return both documents. However, if
129+
we use the index above, we'd miss out `bar` because it's not in the index.
130+
Therefore we can't use the index.
131+
132+
On the other hand, the `selector` `{"age": {"$gt": 30}, "name":
133+
{"$exists"=true}}` requires that the `name` field exist so the index can be used
134+
because the query predicates can only match documents containing both `age` and
135+
`name`, just like the index. In both cases, note the predicate `"age": {"$gt":
136+
30}` implies `"age": {"$exists"=true}`.
137+
90138
## Phase 1: handle keys only covering indexes
91139

92140
Within `execute/3` we will need to decide whether the view should be requested
93141
to include documents. If the index is covering, this will not be required and
94142
so the `include_docs` argument to the view fabric call will be `false`. We'll
95143
need to add a helper method to return whether the index is covering.
96144

97-
When selecting an index, we'll need to be careful of some subtleties. We will
98-
need to ensure that only fields in the `selector` and not `fields` are used when
99-
choosing an index. This is because we require all keys in the index to be fields
100-
within the selector -- with predicates implying `$exists=true` -- due to the
101-
fact that only documents that include _all_ fields in the index are added to the
102-
index. Therefore, if the selector doesn't imply all fields in the index's keys
103-
exist, then using that index risks returning an incomplete result set.
145+
When selecting an index, we'll need to ensure that only fields in the `selector`
146+
and not `fields` are used when choosing an index. This is because we need all
147+
fields in the `selector` to be present per [Mango JSON index
148+
selection](#mango-json-index-selection). This is because `fields` is only used
149+
after we generate the result set, and none of the field names in `fields` need
150+
to exist in result documents.
151+
152+
As an example, an index `["age", "name"]` would still require the `selector` to
153+
imply `$exists=true` for both `age` and `name` even if the `fields` were just
154+
`["age"]` in order that correct results be returned.
155+
156+
Of note, this means that if an index is unusable pre-covering-index support, it
157+
will continue to be unusable after this implementation: whether an index covers
158+
a query is only used to prefer one already usable index over another.
104159

105160
Within `view_cb/2`, we'll need to know whether an index is covering. Without
106161
that, `view_cb/2` will interpret the lack of included documents as an indicator
@@ -160,9 +215,10 @@ We'll then need to update the Mango cursor methods mentioned above to take
160215
account of the values within the covering index code.
161216

162217
One thing to be careful about is again index selection. We will still need all
163-
index keys to be present in the selector as above so need differentiate between
164-
the fields in index's keys and values when selecting an index to ensure we
165-
retain the correct behaviour.
218+
index keys to be present in the `selector` as above so need differentiate
219+
between the fields in index's keys and values when selecting an index to ensure
220+
we retain the correct behaviour per [Mango JSON index
221+
selection](#mango-json-index-selection).
166222

167223
## Mixed versions during cluster upgrades
168224

0 commit comments

Comments
 (0)