Skip to content

feat: Dynamic streaming hash table min reduction based on CPU cache#61326

Open
HappenLee wants to merge 2 commits intoapache:masterfrom
HappenLee:coverage
Open

feat: Dynamic streaming hash table min reduction based on CPU cache#61326
HappenLee wants to merge 2 commits intoapache:masterfrom
HappenLee:coverage

Conversation

@HappenLee
Copy link
Contributor

  • Add get_cache_size() and get_cache_line_size() methods to CpuInfo
  • Add StreamingHtMinReductionEntry struct and get_streaming_ht_min_reduction() method
  • Replace static STREAMING_HT_MIN_REDUCTION config with dynamic calculation based on L2/L3 cache size
  • Update include paths in streaming_aggregation_operator.cpp and distinct_streaming_aggregation_operator.cpp
  • Change namespace from doris to doris::pipeline in streaming_aggregation_operator.cpp

This change enables better adaptation to different hardware environments by dynamically calculating hash table expansion thresholds based on actual CPU cache sizes.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

- Add get_cache_size() and get_cache_line_size() methods to CpuInfo
- Add StreamingHtMinReductionEntry struct and get_streaming_ht_min_reduction() method
- Replace static STREAMING_HT_MIN_REDUCTION config with dynamic calculation based on L2/L3 cache size
- Update include paths in streaming_aggregation_operator.cpp and distinct_streaming_aggregation_operator.cpp
- Change namespace from doris to doris::pipeline in streaming_aggregation_operator.cpp

This change enables better adaptation to different hardware environments by dynamically calculating hash table expansion thresholds based on actual CPU cache sizes.
Copilot AI review requested due to automatic review settings March 14, 2026 04:16
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@HappenLee
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make streaming aggregation hash table expansion thresholds adapt to the actual hardware by deriving the “min reduction” table from CPU L2/L3 cache sizes (instead of using a fixed constant table).

Changes:

  • Add cache size / cache line size accessors to CpuInfo.
  • Add a dynamic get_streaming_ht_min_reduction() table builder based on detected L2/L3 cache sizes.
  • Update streaming aggregation operators to use the dynamic reduction table (and adjust includes/namespace).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
be/src/util/cpu_info.h Adds cache query helpers and a dynamically-built streaming HT min-reduction table.
be/src/exec/operator/streaming_aggregation_operator.cpp Switches to CpuInfo-based reduction table; modifies includes and namespace.
be/src/exec/operator/distinct_streaming_aggregation_operator.cpp Switches to CpuInfo-based reduction table and updates includes accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +25 to +31
#include "common/cast_set.h"
#include "common/compiler_util.h" // IWYU pragma: keep
#include "exec/operator/operator.h"
#include "exec/operator/streaming_agg_min_reduction.h"
#include "exprs/aggregate/aggregate_function_simple_factory.h"
#include "exprs/vectorized_agg_fn.h"
#include "exprs/vslot_ref.h"
#include "pipeline/exec/operator.h"
#include "util/cpu_info.h"
#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
#include "vec/exprs/vectorized_agg_fn.h"
#include "vec/exprs/vslot_ref.h"
Comment on lines 38 to 45
namespace doris::pipeline {

using StreamingHtMinReductionEntry = doris::CpuInfo::StreamingHtMinReductionEntry;
static const std::vector<StreamingHtMinReductionEntry>& STREAMING_HT_MIN_REDUCTION =
doris::CpuInfo::get_streaming_ht_min_reduction();
static const size_t STREAMING_HT_MIN_REDUCTION_SIZE = STREAMING_HT_MIN_REDUCTION.size();

StreamingAggLocalState::StreamingAggLocalState(RuntimeState* state, OperatorXBase* parent)
Comment on lines 202 to 205
const auto* reduction = _is_single_backend
? SINGLE_BE_STREAMING_HT_MIN_REDUCTION
: STREAMING_HT_MIN_REDUCTION;

Comment on lines +99 to +106
const auto* reduction = _is_single_backend
? SINGLE_BE_STREAMING_HT_MIN_REDUCTION
: STREAMING_HT_MIN_REDUCTION;

// Find the appropriate reduction factor in our table for the current hash table sizes.
int cache_level = 0;
while (cache_level + 1 < STREAMING_HT_MIN_REDUCTION_SIZE &&
ht_mem >= reduction[cache_level + 1].min_ht_mem) {
Comment on lines +151 to +163
static long get_cache_size(CacheLevel level) {
long cache_sizes[NUM_CACHE_LEVELS];
long cache_line_sizes[NUM_CACHE_LEVELS];
_get_cache_info(cache_sizes, cache_line_sizes);
return cache_sizes[level];
}

static long get_cache_line_size(CacheLevel level) {
long cache_sizes[NUM_CACHE_LEVELS];
long cache_line_sizes[NUM_CACHE_LEVELS];
_get_cache_info(cache_sizes, cache_line_sizes);
return cache_line_sizes[level];
}
Comment on lines +171 to +196
static std::vector<StreamingHtMinReductionEntry> entries;
static bool initialized = false;

if (!initialized) {
long l2_cache_size = CpuInfo::get_cache_size(CpuInfo::L2_CACHE);
long l3_cache_size = CpuInfo::get_cache_size(CpuInfo::L3_CACHE);

entries.push_back({.min_ht_mem = 0, .streaming_ht_min_reduction = 0.0});

if (l2_cache_size > 256 * 1024) {
entries.push_back(
{.min_ht_mem = l2_cache_size / 4, .streaming_ht_min_reduction = 1.1});
} else {
entries.push_back({.min_ht_mem = 256 * 1024, .streaming_ht_min_reduction = 1.1});
}

if (l3_cache_size > 4 * 1024 * 1024) {
entries.push_back(
{.min_ht_mem = l3_cache_size / 2, .streaming_ht_min_reduction = 2.0});
} else {
entries.push_back(
{.min_ht_mem = 16 * 1024 * 1024, .streaming_ht_min_reduction = 2.0});
}

initialized = true;
}
Comment on lines +165 to +171
struct StreamingHtMinReductionEntry {
long min_ht_mem;
double streaming_ht_min_reduction;
};

static const std::vector<StreamingHtMinReductionEntry>& get_streaming_ht_min_reduction() {
static std::vector<StreamingHtMinReductionEntry> entries;
@HappenLee
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26872 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 31dbf14326a74419c1c157c6d83744889c73e047, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17608	4633	4330	4330
q2	q3	10639	838	528	528
q4	4676	363	255	255
q5	7551	1198	1022	1022
q6	174	175	146	146
q7	794	849	675	675
q8	9286	1481	1325	1325
q9	4890	4798	4768	4768
q10	6238	1906	1662	1662
q11	468	261	244	244
q12	762	574	467	467
q13	18070	2974	2204	2204
q14	238	234	218	218
q15	q16	754	737	671	671
q17	730	847	447	447
q18	5982	5418	5234	5234
q19	1105	1004	600	600
q20	546	518	374	374
q21	4416	1871	1401	1401
q22	552	345	301	301
Total cold run time: 95479 ms
Total hot run time: 26872 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4790	4535	4530	4530
q2	q3	3909	4334	3802	3802
q4	873	1201	765	765
q5	4113	4439	4387	4387
q6	190	178	144	144
q7	1775	1646	1530	1530
q8	2523	2721	2588	2588
q9	7805	7354	7402	7354
q10	3790	4065	3692	3692
q11	523	457	424	424
q12	563	595	456	456
q13	2717	3169	2411	2411
q14	378	316	280	280
q15	q16	737	765	710	710
q17	1214	1374	1387	1374
q18	7139	6787	6488	6488
q19	914	876	866	866
q20	2124	2228	2043	2043
q21	3939	3458	3275	3275
q22	479	452	380	380
Total cold run time: 50495 ms
Total hot run time: 47499 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 167377 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 31dbf14326a74419c1c157c6d83744889c73e047, data reload: false

query5	4322	632	516	516
query6	326	233	206	206
query7	4221	463	261	261
query8	339	235	234	234
query9	8668	2719	2699	2699
query10	518	366	327	327
query11	7064	5131	4905	4905
query12	184	123	120	120
query13	1253	435	357	357
query14	5652	3660	3499	3499
query14_1	2786	2767	2764	2764
query15	200	197	170	170
query16	962	474	449	449
query17	856	686	577	577
query18	2429	447	340	340
query19	206	207	202	202
query20	131	127	125	125
query21	202	131	109	109
query22	13268	13426	13158	13158
query23	15983	15473	15567	15473
query23_1	15914	15654	15694	15654
query24	7409	1712	1282	1282
query24_1	1268	1263	1293	1263
query25	582	518	463	463
query26	1821	310	155	155
query27	2815	484	306	306
query28	4518	1876	1867	1867
query29	846	578	488	488
query30	308	227	193	193
query31	1011	958	885	885
query32	85	76	71	71
query33	540	348	298	298
query34	905	889	519	519
query35	644	673	614	614
query36	1073	1153	984	984
query37	135	93	88	88
query38	2951	3008	2860	2860
query39	861	833	807	807
query39_1	795	800	794	794
query40	229	155	142	142
query41	70	66	63	63
query42	264	256	262	256
query43	239	245	221	221
query44	
query45	198	194	188	188
query46	885	982	602	602
query47	2658	2190	2024	2024
query48	324	339	235	235
query49	658	473	385	385
query50	691	281	217	217
query51	4145	4058	4048	4048
query52	262	268	258	258
query53	295	345	298	298
query54	308	277	284	277
query55	95	88	84	84
query56	323	328	325	325
query57	1920	1764	1480	1480
query58	293	282	284	282
query59	2797	2988	2744	2744
query60	356	352	380	352
query61	146	141	145	141
query62	639	592	529	529
query63	317	277	271	271
query64	5033	1242	965	965
query65	
query66	1463	455	357	357
query67	24286	24379	24248	24248
query68	
query69	414	317	286	286
query70	964	960	961	960
query71	342	307	307	307
query72	2728	2675	2422	2422
query73	540	538	314	314
query74	9650	9578	9411	9411
query75	2847	2738	2475	2475
query76	2342	1033	666	666
query77	363	369	302	302
query78	10887	11134	10473	10473
query79	1100	773	568	568
query80	698	619	548	548
query81	497	262	229	229
query82	1326	152	123	123
query83	399	258	241	241
query84	301	119	96	96
query85	839	530	436	436
query86	370	304	283	283
query87	3201	3170	3016	3016
query88	3520	2654	2656	2654
query89	420	378	345	345
query90	1942	181	175	175
query91	163	159	134	134
query92	81	73	68	68
query93	886	869	499	499
query94	459	315	301	301
query95	584	338	314	314
query96	650	514	227	227
query97	2464	2446	2384	2384
query98	235	221	226	221
query99	1031	985	929	929
Total cold run time: 249445 ms
Total hot run time: 167377 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.66% (19726/37461)
Line Coverage 36.24% (184275/508464)
Region Coverage 32.40% (142398/439556)
Branch Coverage 33.57% (62199/185263)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.48% (26209/36666)
Line Coverage 54.23% (274810/506742)
Region Coverage 51.35% (227800/443604)
Branch Coverage 52.85% (98174/185745)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants