[Enhancement] reading predicate column by late materialization and sort predicate column according to predicate selectivity #64600

before-Sunrise · 2025-10-27T06:45:36Z

Why I'm doing:

optimize our internal table scan'late materialization's implementation, so selected rows in the same page can be handled in one function call
support reading predicate column by late materialization, if predicate columns' number is huge, and the predicates in front can filter lots of data, this can reduce io and memory copy
since we don't need reading predicate columns all at once, we can reorder the predicate columns so the predicate with lower selectivity can execute first. This is done by sample data and check the selectivity of every predicate
if the first predicate column is string, push down the string predicate into page level, so we don't need to read big string into column then filter it if not satisfied predicate. This is one implementation of zero-copy, since our current zero-copy doesn't support string type.
optimize predicate evaluation speed for: string_col != "", only check the offset column.

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Note

Implements late materialization for predicate columns with efficient page-level filtering and zero-copy string handling.

New utility: column/append_with_mask.{h,cpp} to append rows by selection mask; replaces custom logic in SortedStreamingAggregator and simplifies selector handling
BinaryColumn refactor: zero-copy via ContainerResource, new get_immutable_bytes()/_data_base(), and lazy materialization; switch call sites across hashing, CSV/Parquet writers, string funcs, join, sort, split, etc.
Predicate pushdown & late materialization: add next_batch_with_filter, read_by_rowids, reserve_col, and support_push_down_predicate() to page decoders (binary_plain/dict/bitshuffle), ParsedPage, and ScalarColumnIterator; new compound_and_predicates_evaluate to combine predicates; optimize binary_col != "" by checking offsets
Scan plumbing: propagate enable_predicate_col_late_materialize through OLAP/Lake readers and RowsetOptions; add PredicateTree::has_or_predicate() and sampling config tigger_sample_selectivity
APIs: Chunk::append_column overload by ColumnId; numerous sites updated to use get_immutable_bytes() and reserve properly
Minor cleanups/perf tweaks in string operations, logging, and case conversions

^{Written by Cursor Bugbot for commit 024808e. This will update automatically on new commits. Configure here.}

alvin-celerdata · 2025-10-27T16:41:40Z

@cursor review

be/src/storage/rowset/segment_iterator.cpp

mergify · 2025-10-29T09:16:45Z

🧪 CI Insights

Here's what we observed from your CI run for d8089ab.

🟢 All jobs passed!

But CI Insights is watching 👀

alvin-celerdata · 2025-11-06T21:49:34Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no bugs!

alvin-celerdata · 2025-11-07T17:36:12Z

@cursor review

be/src/storage/rowset/segment_iterator.cpp

be/src/storage/rowset/bitshuffle_page.h

alvin-celerdata · 2025-11-10T18:45:57Z

@cursor review

alvin-celerdata · 2025-11-11T17:00:14Z

@cursor review

be/src/storage/rowset/segment_iterator.cpp

alvin-celerdata · 2025-11-17T17:19:40Z

@cursor review

be/src/storage/rowset/segment_iterator.cpp

alvin-celerdata · 2025-11-18T04:21:59Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no bugs!

alvin-celerdata · 2025-11-18T17:37:26Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no bugs!

alvin-celerdata · 2025-11-19T17:22:34Z

@cursor review

kangkaisen · 2025-12-17T06:05:49Z

be/src/storage/rowset/segment_iterator.cpp

 class SegmentIterator final : public ChunkIterator {
 public:
-    SegmentIterator(std::shared_ptr<Segment> segment, Schema _schema, SegmentReadOptions options);
+    SegmentIterator(std::shared_ptr<Segment> segment, Schema _schema, const SegmentReadOptions& options);


The segment iterator now has dual code paths for normal vs late materialization, making it harder to maintain and debug.

Consider refactoring into strategy pattern:

class ScanStrategy { public: virtual Status read(Chunk* chunk, vector<rowid_t>* rowids, size_t n) = 0; virtual Status seek(ordinal_t pos) = 0; }; class NormalScanStrategy : public ScanStrategy { /* ... */ }; class LateMaterializationStrategy : public ScanStrategy { /* ... */ };

kangkaisen · 2025-12-17T06:10:23Z

be/src/common/config.h

 CONF_mInt64(max_lookup_batch_request, "8");
+// if the first predicate column's selectivity bigger than this, trigger sample,
+// which means if the selectivity is very good already, we don't need to sample
+CONF_mDouble(tigger_sample_selectivity, "0.2");


typo. trigger_sample_selectivity

this config name is bad. please give a clear and meaning name.

kangkaisen · 2025-12-17T06:30:29Z

be/src/storage/rowset/segment_iterator.cpp

+            // column-expr-predicate doesn't support [begin, end] interface
+            ASSIGN_OR_RETURN(next_start, _filter_by_compound_and_predicates(
+                                                 chunk, rowid, chunk_start, next_start, first_column_id,
+                                                 non_expr_column_predicate_map.at(first_column_id), current_columns));


we have checked non_expr_column_predicate_map.contains(first_column_id). so we could use [] replace at

kangkaisen · 2025-12-17T06:46:15Z

be/src/storage/rowset/parsed_page.cpp

        return count;
    }

+    Status read_by_rowds(Column* column, const rowid_t* rowids, size_t* count) override {


read_by_rowids

github-actions · 2025-12-25T13:54:23Z

[BE Incremental Coverage Report]

❌ fail : 296 / 446 (66.37%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	src/storage/lake/tablet_reader.cpp	0	1	00.00%	[346]
🔵	src/exprs/agg/group_concat.h	0	2	00.00%	[216, 266]
🔵	src/storage/push_utils.cpp	0	1	00.00%	[152]
🔵	src/storage/rowset/dictcode_column_iterator.h	0	2	00.00%	[50, 51]
🔵	src/exec/sorted_streaming_aggregator.cpp	0	6	00.00%	[310, 324, 326, 329, 330, 442]
🔵	src/exec/tablet_scanner.cpp	0	1	00.00%	[150]
🔵	src/column/binary_column.h	0	22	00.00%	[70, 71, 72, 75, 76, 77, 78, 87, 88, 89, 118, 120, 121, 122, 125, 153, 154, 155, 166, 169, 270, 304]
🔵	src/storage/rowset/page_decoder.h	0	6	00.00%	[97, 98, 101, 102, 118, 119]
🔵	src/storage/rowset/binary_dict_page.cpp	1	4	25.00%	[272, 278, 279]
🔵	src/storage/rowset/binary_plain_page.h	2	8	25.00%	[221, 239, 240, 241, 242, 243]
🔵	src/storage/rowset/scalar_column_iterator.cpp	7	18	38.89%	[331, 339, 350, 351, 352, 355, 356, 357, 365, 366, 367]
🔵	src/util/slice.h	4	10	40.00%	[242, 243, 254, 255, 258, 259]
🔵	src/storage/rowset/binary_dict_page.h	2	3	66.67%	[136]
🔵	src/storage/rowset/segment_iterator.cpp	151	208	72.60%	[432, 433, 434, 435, 436, 437, 531, 536, 537, 612, 615, 619, 633, 652, 653, 654, 655, 657, 658, 659, 661, 662, 663, 665, 666, 670, 874, 1312, 1317, 1318, 1320, 1384, 1385, 1391, 1400, 1414, 1415, 1416, 1417, 1418, 1449, 1450, 1451, 1452, 1453, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1465, 1466, 1467, 1469]
🔵	src/storage/rowset/parsed_page.cpp	53	69	76.81%	[264, 265, 266, 268, 269, 270, 271, 273, 274, 277, 278, 279, 281, 361, 374, 404]
🔵	src/column/binary_column.cpp	43	51	84.31%	[99, 402, 408, 546, 565, 768, 769, 770]
🔵	src/exprs/string_functions.cpp	6	7	85.71%	[1605]
🔵	src/storage/rowset/binary_plain_page.cpp	2	2	100.00%	[]
🔵	src/exprs/split.cpp	3	3	100.00%	[]
🔵	src/formats/parquet/level_builder.cpp	1	1	100.00%	[]
🔵	src/storage/lake/rowset.cpp	1	1	100.00%	[]
🔵	src/serde/column_array_serde.cpp	1	1	100.00%	[]
🔵	src/formats/csv/varbinary_converter.cpp	1	1	100.00%	[]
🔵	src/exprs/like_predicate.cpp	1	1	100.00%	[]
🔵	src/storage/rowset/scalar_column_iterator.h	2	2	100.00%	[]
🔵	src/storage/rowset/rowid_column_iterator.h	11	11	100.00%	[]
🔵	src/storage/tablet_reader.cpp	1	1	100.00%	[]
🔵	src/storage/rowset/rowset.cpp	1	1	100.00%	[]
🔵	src/formats/csv/string_converter.cpp	2	2	100.00%	[]

alvin-celerdata · 2025-12-25T18:09:09Z

@cursor review

cursor · 2025-12-25T18:18:37Z

be/src/storage/rowset/scalar_column_iterator.cpp

+    }
+
+    // No decoder available yet, conservatively return false
+    return false;


Plain-encoded columns missing predicate push-down check

The support_push_down_predicate function's comment says "First check dict decoder, then check current page decoder" but the implementation only checks _dict_decoder and returns false when it's null. For plain-encoded VARCHAR columns where _dict_decoder is nullptr, this incorrectly disables predicate push-down optimization even though BinaryPlainPageDecoder::support_push_down_predicate() returns true for VARCHAR and next_batch_with_filter is fully implemented. The function should also check the page decoder when the dict decoder is not available.

github-actions · 2025-12-26T03:36:14Z

[FE Incremental Coverage Report]

✅ pass : 6 / 7 (85.71%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/qe/SessionVariable.java	4	5	80.00%	[5705]
🔵	com/starrocks/sql/optimizer/rule/tree/AddIndexOnlyPredicateRule.java	1	1	100.00%	[]
🔵	com/starrocks/statistic/StatisticUtils.java	1	1	100.00%	[]

alvin-celerdata · 2025-12-26T04:16:17Z

@cursor review

cursor · 2025-12-26T04:23:02Z

be/src/storage/rowset/bitshuffle_page.h

+    }
+    size_t total = *count;
+    size_t read_count = 0;
+    [[maybe_unused]] CppType data[total];


Variable Length Array risks stack overflow

The code uses a Variable Length Array (VLA) CppType data[total] where total can be as large as the chunk size (4096 by default). For types like int128_t (16 bytes), this allocates 64KB on the stack, risking stack overflow. VLAs are also not standard C++ (they're a GCC extension). The [[maybe_unused]] attribute is incorrect since the data array is actively used in the loop and passed to append_numbers. This should use heap allocation (e.g., std::vector<CppType> or std::unique_ptr<CppType[]>).

sonarqubecloud · 2025-12-26T06:45:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Signed-off-by: before-Sunrise <[email protected]>

alvin-celerdata · 2025-12-26T16:25:20Z

@cursor review

cursor · 2025-12-26T16:34:10Z

be/src/storage/rowset/bitshuffle_page.h

+    }
+    size_t total = *count;
+    size_t read_count = 0;
+    [[maybe_unused]] CppType data[total];


Stack-allocated VLA may overflow for large row counts

The new read_by_rowids function uses a Variable Length Array (VLA) CppType data[total] which allocates memory on the stack based on the runtime count parameter. VLAs are a non-standard C++ extension and can cause stack overflow when total is large. For a page with 16K rows of int32 type, this allocates 64KB on the stack, and larger data types or deeper call stacks could trigger a crash. Using a std::vector or heap allocation would be safer.

github-actions · 2025-12-29T03:13:15Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

before-Sunrise requested review from a team as code owners October 27, 2025 06:45

wanpengfei-git added the PROTO-REVIEW label Oct 27, 2025

mergify bot assigned before-Sunrise Oct 27, 2025

wanpengfei-git requested a review from a team October 27, 2025 06:46

github-actions bot added the 4.0 label Oct 27, 2025

before-Sunrise requested a review from a team as a code owner October 27, 2025 07:21

before-Sunrise force-pushed the scan branch from 5056013 to 72b792d Compare October 27, 2025 12:14

cursor bot reviewed Oct 27, 2025

View reviewed changes

be/src/storage/rowset/segment_iterator.cpp Show resolved Hide resolved

be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved

be/src/storage/rowset/segment_iterator.cpp Show resolved Hide resolved

cursor bot reviewed Nov 6, 2025

View reviewed changes

cursor bot reviewed Nov 7, 2025

View reviewed changes

be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved

be/src/storage/rowset/bitshuffle_page.h Outdated Show resolved Hide resolved

before-Sunrise force-pushed the scan branch from 6881ae2 to d180fc0 Compare November 10, 2025 03:16

cursor bot reviewed Nov 11, 2025

View reviewed changes

be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved

cursor bot reviewed Nov 17, 2025

View reviewed changes

be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved

be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved

cursor bot reviewed Nov 18, 2025

View reviewed changes

before-Sunrise force-pushed the scan branch from 6343b79 to 99d1232 Compare November 19, 2025 09:48

kangkaisen reviewed Dec 17, 2025

View reviewed changes

before-Sunrise dismissed satanson’s stale review via a20c74e December 25, 2025 07:29

before-Sunrise force-pushed the scan branch from a20c74e to 2b69355 Compare December 25, 2025 11:22

before-Sunrise force-pushed the scan branch from dc793bb to cf588a3 Compare December 25, 2025 14:03

cursor bot reviewed Dec 25, 2025

View reviewed changes

cursor bot reviewed Dec 26, 2025

View reviewed changes

before-Sunrise added 12 commits December 26, 2025 16:57

optimize scan

0e939a4

Signed-off-by: before-Sunrise <[email protected]>

fix conflict

621c7bb

Signed-off-by: before-Sunrise <[email protected]>

fix has_null

a5cfe22

Signed-off-by: before-Sunrise <[email protected]>

fix format

df611d2

Signed-off-by: before-Sunrise <[email protected]>

fix compile

07f9a54

Signed-off-by: before-Sunrise <[email protected]>

fix counter when sample

bef83e0

Signed-off-by: before-Sunrise <[email protected]>

fix

a4aae5e

Signed-off-by: before-Sunrise <[email protected]>

refactor

380bd44

Signed-off-by: before-Sunrise <[email protected]>

add test

324a706

Signed-off-by: before-Sunrise <[email protected]>

fix compile

af9667b

Signed-off-by: before-Sunrise <[email protected]>

fix compile

e19cb33

Signed-off-by: before-Sunrise <[email protected]>

add test

024808e

Signed-off-by: before-Sunrise <[email protected]>

before-Sunrise force-pushed the scan branch from af25ef1 to 024808e Compare December 26, 2025 09:02

cursor bot reviewed Dec 26, 2025

View reviewed changes

[Enhancement] reading predicate column by late materialization and sort predicate column according to predicate selectivity #64600

Are you sure you want to change the base?

[Enhancement] reading predicate column by late materialization and sort predicate column according to predicate selectivity #64600

Conversation

before-Sunrise commented Oct 27, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

alvin-celerdata commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

alvin-celerdata commented Nov 6, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ Bugbot reviewed your changes and found no bugs!

Uh oh!

alvin-celerdata commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

alvin-celerdata commented Nov 10, 2025

Uh oh!

alvin-celerdata commented Nov 11, 2025

Uh oh!

Uh oh!

alvin-celerdata commented Nov 17, 2025

Uh oh!

Uh oh!

Uh oh!

alvin-celerdata commented Nov 18, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ Bugbot reviewed your changes and found no bugs!

Uh oh!

alvin-celerdata commented Nov 18, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ Bugbot reviewed your changes and found no bugs!

Uh oh!

alvin-celerdata commented Nov 19, 2025

Uh oh!

kangkaisen Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

kangkaisen Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

kangkaisen Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

kangkaisen Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 25, 2025

[BE Incremental Coverage Report]

file detail

Uh oh!

alvin-celerdata commented Dec 25, 2025

Uh oh!

cursor bot Dec 25, 2025

Choose a reason for hiding this comment

Plain-encoded columns missing predicate push-down check

Uh oh!

github-actions bot commented Dec 26, 2025

[FE Incremental Coverage Report]

file detail

before-Sunrise commented Oct 27, 2025 •

edited by cursor bot

Loading

mergify bot commented Oct 29, 2025 •

edited

Loading