-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mango): keys-only covering indexes #4512
Conversation
e3d8c48
to
3e55cc4
Compare
@pgj those performance numbers look very promising! I think this looks mostly good from an implementation perspective, though I'll leave others to comment on the Erlang style. One thing that's missing though is handling
|
@willholley you are right, the statistics counters are not yet updated. The reason for that is twofold: @mikerhodes has started to work on this earlier independently and I was not sure of his status until recently, then I wanted to have this piece out first and work on the counters in a separate pull request. They have their own trouble with mixed-version clusters and I did not want to complicate this change further with those parts. There are further pull requests planned along the lines of the corresponding RFC anyway. |
@@ -12,6 +12,8 @@ | |||
|
|||
-define(MANGO_ERROR(R), throw({mango_error, ?MODULE, R})). | |||
|
|||
-type maybe(A) :: A | undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit as A can be undefined also. I'd would expect Dialyzer should emit some kind of warning here but perhaps our Dialyzer options are not configured properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see warnings from Dialyzer with regard to this definition. Though I have run it only with the standard options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you run Dialyzer? Every time I run it either shows no output or crashes in a warning in config or a different application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initialized the ~/.dialyzer_plt
file by running the standard command:
$ dialyzer --build_plt --apps erts kernel stdlib mnesia
Creating PLT /Users/gaborpali/.dialyzer_plt ...
Unknown functions:
compile:file/2 (c.erl:385:10)
compile:forms/2 (escript.erl:204:10)
compile:noenv_forms/2 (erl_abstract_code.erl:10:9)
compile:noenv_forms/2 (qlc_pt.erl:444:14)
compile:output_generated/1 (c.erl:444:10)
crypto:crypto_one_time/5 (beam_lib.erl:987:11)
crypto:start/0 (beam_lib.erl:1023:10)
crypto:strong_rand_bytes/1 (net_kernel.erl:1918:37)
Unknown types:
compile:option/0 (c.erl:92:19)
compile:option/0 (erl_expand_records.erl:41:26)
compile:option/0 (erl_lint.erl:50:47)
compile:option/0 (qlc.erl:541:32)
compile:option/0 (qlc_pt.erl:73:32)
done in 0m21.58s
done (passed successfully)
Then I was able to run Dialyzer only on the Mango portion of the code base:
$ dialyzer src/mango/ebin
Checking whether the PLT /Users/gaborpali/.dialyzer_plt is up-to-date... yes
Proceeding with analysis...
mango_epi.erl:15:2: Callback info about the couch_epi_plugin behaviour is not available
Unknown functions:
chttpd:chunked_response_buffer_size/0 (src/mango_httpd.erl:295:21)
chttpd:close_delayed_json_object/4 (src/mango_httpd.erl:276:19)
chttpd:end_delayed_json_response/1 (src/mango_httpd.erl:288:5)
chttpd:error_info/1 (src/mango_crud.erl:143:33)
chttpd:json_body_obj/1 (src/mango_httpd.erl:127:50)
chttpd:qs/1 (src/mango_httpd.erl:59:9)
chttpd:qs_value/3 (src/mango_httpd.erl:241:16)
chttpd:send_delayed_chunk/2 (src/mango_httpd.erl:287:19)
chttpd:send_error/2 (src/mango_httpd.erl:41:21)
chttpd:send_error/4 (src/mango_httpd.erl:43:21)
chttpd:send_json/2 (src/mango_httpd.erl:185:13)
chttpd:send_method_not_allowed/2 (src/mango_httpd.erl:192:5)
chttpd:start_delayed_json_response/4 (src/mango_httpd.erl:272:5)
chttpd:validate_ctype/2 (src/mango_httpd.erl:126:5)
config:get/3 (src/mango_idx_text.erl:399:5)
config:get_integer/3 (src/mango_cursor.erl:215:17)
config:get_integer/3 (src/mango_cursor_text.erl:310:5)
config:get_integer/3 (src/mango_opts.erl:360:5)
couch_db:get_user_ctx/1 (src/mango_crud.erl:114:25)
couch_db:get_user_ctx/1 (src/mango_idx_text.erl:384:30)
couch_db:name/1 (src/mango_cursor_text.erl:52:28)
couch_db:name/1 (src/mango_idx.erl:277:5)
couch_db:name/1 (src/mango_idx_text.erl:383:14)
couch_db:set_user_ctx/2 (src/mango_httpd.erl:220:19)
couch_doc:from_json_obj/1 (src/mango_crud.erl:63:27)
couch_doc:from_json_obj/1 (src/mango_doc.erl:46:11)
couch_doc:rev_to_str/1 (src/mango_crud.erl:137:15)
couch_doc:to_json_obj/2 (src/mango_cursor_view.erl:627:27)
couch_ejson_compare:less/2 (src/mango_json.erl:47:5)
couch_epi:register_service/2 (src/mango_sup.erl:23:33)
couch_log:error/2 (src/mango_cursor_view.erl:457:13)
couch_log:error/2 (src/mango_idx_text.erl:219:13)
couch_log:error/2 (src/mango_idx_view.erl:245:13)
couch_log:error/2 (src/mango_native_proc.erl:71:17)
couch_log:error/2 (src/mango_util.erl:141:13)
couch_log:warning/2 (src/mango_idx_text.erl:390:13)
couch_mrview_util:get_extra/3 (src/mango_cursor_view.erl:136:29)
couch_mrview_util:set_extra/3 (src/mango_cursor_view.erl:571:15)
couch_partition:validate_partition/1 (src/mango_opts.erl:311:5)
couch_stats:increment_counter/1 (src/mango_cursor.erl:164:5)
couch_stats:increment_counter/1 (src/mango_cursor_text.erl:183:5)
couch_stats:increment_counter/1 (src/mango_cursor_view.erl:381:13)
couch_stats:increment_counter/1 (src/mango_execution_stats.erl:57:5)
couch_stats:increment_counter/1 (src/mango_selector.erl:53:5)
couch_stats:update_histogram/2 (src/mango_execution_stats.erl:76:5)
couch_util:decodeBase64Url/1 (src/mango_json_bookmark.erl:55:35)
couch_util:encodeBase64Url/1 (src/mango_json_bookmark.erl:47:5)
couch_util:get_value/2 (src/mango_crud.erl:127:10)
couch_util:get_value/2 (src/mango_cursor_special.erl:40:16)
couch_util:get_value/2 (src/mango_cursor_text.erl:247:10)
couch_util:get_value/2 (src/mango_cursor_view.erl:95:16)
couch_util:get_value/2 (src/mango_execution_stats.erl:78:10)
couch_util:get_value/2 (src/mango_httpd.erl:255:18)
couch_util:get_value/2 (src/mango_idx.erl:353:17)
couch_util:get_value/2 (src/mango_idx_text.erl:385:14)
couch_util:get_value/2 (src/mango_native_proc.erl:115:16)
couch_util:get_value/3 (src/mango_cursor_special.erl:39:14)
couch_util:get_value/3 (src/mango_cursor_text.erl:57:14)
couch_util:get_value/3 (src/mango_cursor_view.erl:94:14)
couch_util:get_value/3 (src/mango_httpd.erl:72:14)
couch_util:get_value/3 (src/mango_idx.erl:404:10)
couch_util:get_value/3 (src/mango_idx_text.erl:113:23)
couch_util:get_value/3 (src/mango_native_proc.erl:154:13)
couch_util:json_decode/1 (src/mango_idx.erl:137:13)
couch_util:json_encode/1 (src/mango_error.erl:55:45)
couch_util:json_encode/1 (src/mango_httpd.erl:280:19)
couch_util:json_encode/1 (src/mango_idx.erl:137:26)
couch_util:set_value/3 (src/mango_cursor_view.erl:593:21)
couch_uuids:new/0 (src/mango_crud.erl:67:46)
couch_uuids:new/0 (src/mango_doc.erl:49:26)
crypto:hash/2 (src/mango_idx.erl:327:11)
ddoc_cache:open/2 (src/mango_idx.erl:54:21)
dreyfus:available/0 (src/mango_idx.erl:178:14)
dreyfus:available/0 (src/mango_native_proc.erl:312:14)
dreyfus_bookmark:pack/1 (src/mango_cursor_text.erl:127:22)
dreyfus_bookmark:unpack/2 (src/mango_cursor_text.erl:280:21)
dreyfus_bookmark:update/3 (src/mango_cursor_text.erl:264:13)
dreyfus_fabric:get_json_docs/2 (src/mango_cursor_text.erl:320:20)
dreyfus_fabric_search:go/4 (src/mango_cursor_text.erl:159:10)
fabric:all_docs/5 (src/mango_cursor_view.erl:218:25)
fabric:query_view/7 (src/mango_cursor_view.erl:224:25)
fabric:update_docs/3 (src/mango_crud.erl:37:10)
fabric_util:is_partitioned/1 (src/mango_idx.erl:77:11)
mochiglobal:get/1 (src/mango_util.erl:343:10)
mochiglobal:put/2 (src/mango_util.erl:346:18)
rexi:ping/0 (src/mango_cursor_view.erl:429:13)
rexi:reply/1 (src/mango_cursor_view.erl:392:5)
rexi:stream2/1 (src/mango_cursor_view.erl:387:10)
rexi:stream_last/1 (src/mango_cursor_view.erl:389:10)
Unknown types:
couch_att:att/0 (/Users/gaborpali/projects/github/couchdb/src/couch/include/couch_db.hrl:119:19)
done in 0m0.46s
done (warnings were emitted)
Of course, it complains about the unknown functions. And it could even crash if I tried to add further ebin
s where there is something wrong.
However, this version works reasonably well (from the root of the CouchDB git clone):
$ dialyzer `find src -name ebin -depth 2`
[..]
mango_crud.erl:40:9: The pattern
{'accepted', Results0} can never match the type
{'aborted', _}
mango_cursor_view.erl:628:48: The call mango_cursor_view:match_and_extract_doc
(Doc :: {[any(), ...]},
Selector :: any(),
any()) breaks the contract
(Doc, Selector, Fields) -> Result
when
Doc :: doc(),
Selector :: selector(),
Fields :: maybe(fields()),
Result :: {'match', term()} | {'no_match', 'undefined'}
[..]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Thanks for taking the time to describe it. It would be great if one day could run these as part of CI and use type specs more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that it is not how make dialyze
would do that. Ideally, make dialyze
should be fixed to work like this. It uses rebar
to create the PLT files and something in the sources makes it fail. I think that might be Elixir. Anyhow, I am a strong endorser of types and one day I will fix this somehow :-)
0e8262d
to
25e40bc
Compare
As a performance improvement, shorten the gap between Mango queries and the underlying map-reduce views: try to serve requests without pulling documents from the primary data set, i.e. run the query with `include_docs` set to `false` when there is a chance that it can be "covered" by the chosen index. The rows in the results are then built from the information stored there. Extend the response on the `_explain` endpoint to show information in the `covered` Boolean attribute about the query would be covered by the index or not. Remarks: - This should be a transparent optimization, without any semantical effect on the queries. - Because the main purpose of indexes is to store keys and the document identifiers, the change will only work in cases when the selected fields overlap with those. The chance of being covered could be increased by adding more non-key fields to the index, but that is not in scope here.
beef3b4
to
5dade2c
Compare
This is required to make index selection work better with covering indexes. The `$exists` operator prescribes the presence of the given field so that if an index has the field, it shall be considered because it implies true. Without this change, it will not happen, but covering indexes can work if the index is manually picked.
Ideally, the effect of this function should be applied at a single spot of the code. When building the base options, covering index information should be left blank to make it consistent with the rest of the parameters.
Covering indexes shall provide all the fields that the selector may contain, otherwise the derived documents would get dropped on the "match and extract" phase even if they were matching. Extend the integration tests to check this case as well.
5dade2c
to
cb9a3bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 well done, @pgj
Sorry it took a while to review it.
Thanks @nickva for taking the time and energy to get familiar with Mango and providing these useful comments! |
This pull request aims to implement the first phase of the covering indexes work ("keys-only covering indexes") as it was described in the corresponding RFC document by @mikerhodes, see #4410 and #4413 for the details.
Ideally, end users should not notice any change in the behavior but speed-ups for certain kind of queries. Performance results obtained by k6 on my MacBook Pro 2021 M1 Max, with 10,000 documents, compared against
c5a7809f
with 3 nodes:min=0.59ms max=498.29ms p(95)=313.02ms p(98)=357.44ms p(99)=390.75ms
min=0.53ms max=127.26ms p(95)=33.15ms p(98)=37.53ms p(99)=39.99ms
min=19.68ms max=893.79ms p(95)=674.84ms p(98)=739.12ms p(99)=774.02ms
min=1.95ms max=163.22ms p(95)=50.13ms p(98)=57.11ms p(99)=63.38ms
min=0.70ms max=3786.70ms p(95)=2733.86ms p(98)=3265.91ms p(99)=3452.13ms
min=0.72ms max=3590.26ms p(95)=2631.55ms p(98)=2777.04ms p(99)=2879.50ms
Please find the details of the performance evaluation in
pgj/couchdb-k6/mango/couchdb_4482
.Unit test coverage increased significantly, see before:
and then after:
Tasks:
src/docs
folder