Update docs/changelog/137139.yaml

parkertimmins · parkertimmins · commit 80c14a3e533e · 2025-11-03T13:30:30.000-06:00
diff --git a/docs/changelog/137139.yaml b/docs/changelog/137139.yaml
@@ -3,3 +3,25 @@ summary: Add binary doc value compression with variable doc count blocks
 area: Mapping
 type: feature
 issues: []
+highlight:
+  title: Add binary doc value compression with variable doc count blocks
+  body: "Add compression for binary doc values using Zstd and blocks with a\nvariable\
+    \ number of values.\n\nBlock-wise LZ4 was previously added to Lucene in\n[LUCENE-9211](https://issues.apache.org/jira/browse/LUCENE-9211).\
+    \ This\nwas subsequently removed in\n[LUCENE-9378](https://issues.apache.org/jira/browse/LUCENE-9378)\
+    \ due to\nquery performance issues. \n\nWe investigated adding to adding the original\
+    \ Lucene implementation to\nES in https://github.com/elastic/elasticsearch/pull/112416\
+    \ and\nhttps://github.com/elastic/elasticsearch/pull/105301. This approach\nstores\
+    \ a constant number of values per block (specifically 32 values).\nThis is nice\
+    \ because it makes it very easy to map a given value index\n(eg docId for dense\
+    \ values) to the block containing it with `blockId =\ndocId / 32`. Unfortunately,\
+    \ if values are very large we cannot reduce\nthe number of values per block and\
+    \ (de)compressing a block could cause\nan OOM. Also, since this is a concern,\
+    \ we have to keep the number of\nvalues lower than ideal.\n\nThis PR instead stores\
+    \ a variable number of documents per block. It\nstores a minimum of 1 document\
+    \ per block and stops adding values when\nthe size of a block exceeds a threshold.\
+    \ Like the previous version is\nstores an array of address for the start of each\
+    \ block. Additionally, it\nstores are parallel array with the value index at the\
+    \ start of each\nblock. When looking up a given value index, if it is not in the\
+    \ current\nblock, we binary search the array of value index starts to find the\n\
+    blockId containing the value. Then look up the address of the block."
+  notable: true