Skip to content

Commit 80c14a3

Browse files
committed
Update docs/changelog/137139.yaml
1 parent 1209e78 commit 80c14a3

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/changelog/137139.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,25 @@ summary: Add binary doc value compression with variable doc count blocks
33
area: Mapping
44
type: feature
55
issues: []
6+
highlight:
7+
title: Add binary doc value compression with variable doc count blocks
8+
body: "Add compression for binary doc values using Zstd and blocks with a\nvariable\
9+
\ number of values.\n\nBlock-wise LZ4 was previously added to Lucene in\n[LUCENE-9211](https://issues.apache.org/jira/browse/LUCENE-9211).\
10+
\ This\nwas subsequently removed in\n[LUCENE-9378](https://issues.apache.org/jira/browse/LUCENE-9378)\
11+
\ due to\nquery performance issues. \n\nWe investigated adding to adding the original\
12+
\ Lucene implementation to\nES in https://github.com/elastic/elasticsearch/pull/112416\
13+
\ and\nhttps://github.com/elastic/elasticsearch/pull/105301. This approach\nstores\
14+
\ a constant number of values per block (specifically 32 values).\nThis is nice\
15+
\ because it makes it very easy to map a given value index\n(eg docId for dense\
16+
\ values) to the block containing it with `blockId =\ndocId / 32`. Unfortunately,\
17+
\ if values are very large we cannot reduce\nthe number of values per block and\
18+
\ (de)compressing a block could cause\nan OOM. Also, since this is a concern,\
19+
\ we have to keep the number of\nvalues lower than ideal.\n\nThis PR instead stores\
20+
\ a variable number of documents per block. It\nstores a minimum of 1 document\
21+
\ per block and stops adding values when\nthe size of a block exceeds a threshold.\
22+
\ Like the previous version is\nstores an array of address for the start of each\
23+
\ block. Additionally, it\nstores are parallel array with the value index at the\
24+
\ start of each\nblock. When looking up a given value index, if it is not in the\
25+
\ current\nblock, we binary search the array of value index starts to find the\n\
26+
blockId containing the value. Then look up the address of the block."
27+
notable: true

0 commit comments

Comments
 (0)