Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](statistics)Improve analyze partition column and key column corner case. #48757

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Jibing-Li
Copy link
Contributor

What problem does this PR solve?

Improve 2 corner cases:

  1. When we pick the tablets for sample analyze, skip the very large tablets.
  2. When sample analyze for partition column that choose more than 1,000,000,000 rows, we switch to randomly choose some partitions to analyze. In this case, we don't want to choose the oldest partition with all the history data.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Jibing-Li Jibing-Li marked this pull request as ready for review March 6, 2025 09:02
@Jibing-Li
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32492 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f33e898f8658b2e6202e9570bf737ac3f987e472, data reload: false

------ Round 1 ----------------------------------
q1	17624	5198	5184	5184
q2	2053	317	179	179
q3	10592	1286	778	778
q4	10219	1041	543	543
q5	7532	2429	2310	2310
q6	190	175	135	135
q7	895	745	600	600
q8	9904	1281	1087	1087
q9	4888	4578	4697	4578
q10	6822	2296	1894	1894
q11	493	277	258	258
q12	350	351	224	224
q13	17801	3722	3079	3079
q14	227	220	206	206
q15	546	516	476	476
q16	628	614	596	596
q17	596	846	337	337
q18	6847	6328	6289	6289
q19	1817	944	547	547
q20	325	320	200	200
q21	2813	2246	2004	2004
q22	1053	995	988	988
Total cold run time: 104215 ms
Total hot run time: 32492 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5336	5124	5117	5117
q2	238	333	239	239
q3	2151	2646	2247	2247
q4	1470	1829	1387	1387
q5	4269	4112	4133	4112
q6	207	165	124	124
q7	1931	1952	1777	1777
q8	2600	2625	2540	2540
q9	7262	7334	7160	7160
q10	3030	3229	2798	2798
q11	600	508	515	508
q12	706	738	623	623
q13	3413	3966	3271	3271
q14	278	304	269	269
q15	536	488	491	488
q16	628	677	634	634
q17	1144	1653	1326	1326
q18	7684	7714	7536	7536
q19	848	846	921	846
q20	1967	2027	1887	1887
q21	5433	4928	4833	4833
q22	1115	1063	1071	1063
Total cold run time: 52846 ms
Total hot run time: 50785 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191217 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f33e898f8658b2e6202e9570bf737ac3f987e472, data reload: false

query1	1429	1052	993	993
query2	6221	1895	1855	1855
query3	11146	4491	4564	4491
query4	54186	24464	22998	22998
query5	5329	530	489	489
query6	375	203	198	198
query7	5167	499	288	288
query8	326	244	237	237
query9	6610	2573	2550	2550
query10	447	309	258	258
query11	15191	15080	14813	14813
query12	157	108	108	108
query13	1177	523	400	400
query14	10514	6465	6576	6465
query15	202	198	191	191
query16	7081	694	490	490
query17	1104	716	586	586
query18	1530	420	330	330
query19	206	202	173	173
query20	135	130	125	125
query21	215	168	115	115
query22	4449	4552	4257	4257
query23	34010	33477	33398	33398
query24	5964	2432	2411	2411
query25	443	462	411	411
query26	739	279	154	154
query27	2104	478	341	341
query28	2816	2442	2428	2428
query29	591	563	434	434
query30	277	232	193	193
query31	874	866	822	822
query32	73	60	65	60
query33	468	378	309	309
query34	770	954	511	511
query35	826	864	774	774
query36	925	1004	897	897
query37	123	105	74	74
query38	4354	4365	4175	4175
query39	1499	1445	1437	1437
query40	213	113	100	100
query41	63	50	50	50
query42	135	110	108	108
query43	524	514	459	459
query44	1321	801	791	791
query45	185	175	164	164
query46	872	1065	659	659
query47	1826	1843	1772	1772
query48	383	427	325	325
query49	729	528	456	456
query50	712	742	428	428
query51	4241	4259	4292	4259
query52	114	111	99	99
query53	241	278	196	196
query54	494	492	414	414
query55	87	85	82	82
query56	301	272	272	272
query57	1156	1190	1147	1147
query58	280	236	248	236
query59	2794	2825	2670	2670
query60	283	273	272	272
query61	136	120	116	116
query62	731	739	690	690
query63	234	186	192	186
query64	2268	1054	681	681
query65	4524	4449	4373	4373
query66	750	385	294	294
query67	15797	15431	15327	15327
query68	7385	880	503	503
query69	532	285	274	274
query70	1164	1069	1105	1069
query71	481	298	283	283
query72	5971	3584	3729	3584
query73	1202	748	337	337
query74	9032	9173	8665	8665
query75	3812	3165	2672	2672
query76	4167	1190	743	743
query77	578	371	283	283
query78	10139	10043	9315	9315
query79	2519	826	600	600
query80	740	511	454	454
query81	475	255	216	216
query82	480	127	95	95
query83	280	219	159	159
query84	287	96	73	73
query85	798	357	319	319
query86	399	262	264	262
query87	4391	4551	4398	4398
query88	3451	2220	2207	2207
query89	406	307	291	291
query90	1906	206	208	206
query91	149	136	109	109
query92	76	58	60	58
query93	1740	1039	573	573
query94	660	424	295	295
query95	342	266	257	257
query96	481	569	271	271
query97	3323	3402	3273	3273
query98	221	205	203	203
query99	1433	1390	1305	1305
Total cold run time: 300467 ms
Total hot run time: 191217 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f33e898f8658b2e6202e9570bf737ac3f987e472, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.24	0.07	0.06
query4	1.62	0.11	0.10
query5	0.55	0.56	0.55
query6	1.20	0.72	0.72
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.51	0.54
query10	0.57	0.58	0.57
query11	0.15	0.10	0.10
query12	0.14	0.11	0.12
query13	0.61	0.60	0.60
query14	2.80	2.69	2.80
query15	0.94	0.84	0.85
query16	0.39	0.37	0.37
query17	1.04	1.02	1.01
query18	0.21	0.20	0.20
query19	1.86	1.82	1.97
query20	0.01	0.01	0.01
query21	15.35	0.87	0.54
query22	0.77	1.08	0.61
query23	15.05	1.40	0.61
query24	6.99	1.72	1.42
query25	0.54	0.32	0.13
query26	0.53	0.16	0.15
query27	0.06	0.05	0.05
query28	10.16	0.84	0.43
query29	12.61	4.04	3.40
query30	0.25	0.10	0.07
query31	2.81	0.58	0.38
query32	3.23	0.55	0.46
query33	2.97	3.04	3.03
query34	15.76	5.14	4.54
query35	4.58	4.55	4.58
query36	0.68	0.49	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.02	0.03
query42	0.04	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.91 s
Total hot run time: 31.53 s

@morrySnow morrySnow added usercase Important user case type label p0_b dev/2.1.x dev/3.0.x labels Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/2.1.x dev/3.0.x p0_b usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants