quickwit: add tag_fields on CounterID, drop positions on raw text#877
Open
alexey-milovidov wants to merge 1 commit intoadd-quickwit-entryfrom
Open
quickwit: add tag_fields on CounterID, drop positions on raw text#877alexey-milovidov wants to merge 1 commit intoadd-quickwit-entryfrom
alexey-milovidov wants to merge 1 commit intoadd-quickwit-entryfrom
Conversation
tag_fields: [CounterID] writes per-split CounterID values into the metastore so the searcher can prune whole splits before opening them for queries 37-43, which all filter CounterID = 62 — the closest analogue to Elasticsearch's index.sort early-termination here. record: basic on every tokenizer: raw text field skips storing freqs and positions in the postings; phrase queries can never run against single-term raw fields, so the data was dead weight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two index-level changes to
quickwit/index_config.yaml, keeping the rest of the benchmark setup identical.tag_fields: [CounterID]— Q37-Q43 all filterCounterID = 62. Tagging it writes the per-split CounterID values into the metastore so the searcher can prune whole splits before opening them. This is the closest analogue we get to Elasticsearch'sindex.sortearly-termination on the same column. Quickwit/Tantivy has no real multi-column doc-sort to match the full ESsort.field: [CounterID, EventDate, UserID, EventTime, WatchID], so this picks up just the CounterID dimension.record: basicon everytokenizer: rawtext field (28 fields). Tantivy defaults text postings toWithFreqsAndPositions, but raw-tokenized fields only ever hold one term per document — phrase queries can't run against them, so freqs and positions are dead weight in the index.Validated against the running v0.9.0-nightly server (the same image
benchmark.shuses): thetag_fieldsandrecord: basicsettings round-trip cleanly through the index-create API.Test plan
bash benchmark.shend-to-end on a fresh machinetag_fieldsbenefitrecord: basic🤖 Generated with Claude Code