@@ -3,3 +3,25 @@ summary: Add binary doc value compression with variable doc count blocks
33area : Mapping
44type : feature
55issues : []
6+ highlight :
7+ title : Add binary doc value compression with variable doc count blocks
8+ body : " Add compression for binary doc values using Zstd and blocks with a\n variable\
9+ \ number of values.\n\n Block-wise LZ4 was previously added to Lucene in\n [LUCENE-9211](https://issues.apache.org/jira/browse/LUCENE-9211).\
10+ \ This\n was subsequently removed in\n [LUCENE-9378](https://issues.apache.org/jira/browse/LUCENE-9378)\
11+ \ due to\n query performance issues. \n\n We investigated adding to adding the original\
12+ \ Lucene implementation to\n ES in https://github.com/elastic/elasticsearch/pull/112416\
13+ \ and\n https://github.com/elastic/elasticsearch/pull/105301. This approach\n stores\
14+ \ a constant number of values per block (specifically 32 values).\n This is nice\
15+ \ because it makes it very easy to map a given value index\n (eg docId for dense\
16+ \ values) to the block containing it with `blockId =\n docId / 32`. Unfortunately,\
17+ \ if values are very large we cannot reduce\n the number of values per block and\
18+ \ (de)compressing a block could cause\n an OOM. Also, since this is a concern,\
19+ \ we have to keep the number of\n values lower than ideal.\n\n This PR instead stores\
20+ \ a variable number of documents per block. It\n stores a minimum of 1 document\
21+ \ per block and stops adding values when\n the size of a block exceeds a threshold.\
22+ \ Like the previous version is\n stores an array of address for the start of each\
23+ \ block. Additionally, it\n stores are parallel array with the value index at the\
24+ \ start of each\n block. When looking up a given value index, if it is not in the\
25+ \ current\n block, we binary search the array of value index starts to find the\n \
26+ blockId containing the value. Then look up the address of the block."
27+ notable : true
0 commit comments