Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](iceberg)Use the correct schema for query #48838

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

wuwenchi
Copy link
Contributor

@wuwenchi wuwenchi commented Mar 9, 2025

What problem does this PR solve?

Problem Summary:

When a snapshot is specified in the query, the corresponding schema should be used for parsing, otherwise the latest snapshot should be used for parsing.

  1. When using the HMS type, you also need to initialize the executor pool.
  2. When no snapshot is specified, the latest schema is used.
  3. When specifying a snapshot, you need to use the schema corresponding to the snapshot.
  4. When generating a scannode, save the schema information and no longer obtain it from the cache to prevent the cache from being refreshed.
  5. Add the latest schemaid in IcebergSnapshotCacheValue, because the same snapshot may be read with two different schemas.
  6. When refreshing the schema, you need to refresh all schemas of related tables.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuwenchi wuwenchi marked this pull request as draft March 9, 2025 03:43
@wuwenchi
Copy link
Contributor Author

wuwenchi commented Mar 9, 2025

run buildall

@wuwenchi
Copy link
Contributor Author

wuwenchi commented Mar 9, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32443 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e6c3e306ff73553cd070f918d395bf816bc4337c, data reload: false

------ Round 1 ----------------------------------
q1	17595	5200	5089	5089
q2	2041	290	173	173
q3	10446	1283	711	711
q4	10287	1070	545	545
q5	8684	2332	2355	2332
q6	215	168	136	136
q7	913	759	603	603
q8	9312	1283	1089	1089
q9	4988	4825	4747	4747
q10	6798	2321	1886	1886
q11	468	280	261	261
q12	350	348	214	214
q13	17764	3702	3075	3075
q14	227	221	204	204
q15	544	503	498	498
q16	645	632	585	585
q17	589	861	346	346
q18	7036	6608	6274	6274
q19	1272	946	560	560
q20	323	330	193	193
q21	2877	2146	1953	1953
q22	1046	1016	969	969
Total cold run time: 104420 ms
Total hot run time: 32443 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5156	5134	5149	5134
q2	237	329	238	238
q3	2168	2704	2296	2296
q4	1453	1821	1417	1417
q5	4277	4105	4186	4105
q6	213	160	125	125
q7	1973	1970	1774	1774
q8	2659	2612	2581	2581
q9	7157	7181	7170	7170
q10	3009	3185	2747	2747
q11	582	514	504	504
q12	690	809	592	592
q13	3471	3816	3304	3304
q14	273	294	291	291
q15	540	507	499	499
q16	630	695	642	642
q17	1132	1562	1335	1335
q18	7828	7766	7483	7483
q19	865	835	875	835
q20	1956	2016	1855	1855
q21	5590	5024	4705	4705
q22	1096	1004	972	972
Total cold run time: 52955 ms
Total hot run time: 50604 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186224 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e6c3e306ff73553cd070f918d395bf816bc4337c, data reload: false

query1	965	378	399	378
query2	6528	1918	1898	1898
query3	6796	226	216	216
query4	26428	23544	23415	23415
query5	4312	641	474	474
query6	296	191	186	186
query7	4595	512	294	294
query8	300	244	248	244
query9	8598	2578	2541	2541
query10	483	334	261	261
query11	15820	15092	14910	14910
query12	162	111	109	109
query13	1660	546	412	412
query14	9688	6432	6456	6432
query15	211	195	173	173
query16	7235	607	462	462
query17	1224	736	567	567
query18	1960	418	307	307
query19	198	205	165	165
query20	120	118	114	114
query21	212	122	106	106
query22	4293	4429	4096	4096
query23	34038	33246	33269	33246
query24	7792	2393	2474	2393
query25	577	445	390	390
query26	1229	271	161	161
query27	2206	489	331	331
query28	4011	2441	2377	2377
query29	758	560	427	427
query30	280	212	194	194
query31	962	844	786	786
query32	78	65	64	64
query33	555	357	302	302
query34	795	870	503	503
query35	805	819	748	748
query36	959	976	880	880
query37	120	100	77	77
query38	4183	4151	4114	4114
query39	1463	1410	1373	1373
query40	225	116	108	108
query41	69	53	52	52
query42	118	105	108	105
query43	486	512	485	485
query44	1337	817	794	794
query45	176	171	163	163
query46	852	1049	647	647
query47	1749	1818	1690	1690
query48	376	423	293	293
query49	788	494	416	416
query50	695	742	418	418
query51	4199	4197	4195	4195
query52	104	105	96	96
query53	231	263	190	190
query54	488	492	419	419
query55	85	80	83	80
query56	314	282	260	260
query57	1111	1099	1056	1056
query58	254	237	237	237
query59	2583	2704	2468	2468
query60	277	262	253	253
query61	119	116	138	116
query62	813	760	679	679
query63	247	194	197	194
query64	4417	998	671	671
query65	4411	4373	4342	4342
query66	1134	409	299	299
query67	15752	15472	15488	15472
query68	8041	880	521	521
query69	468	296	275	275
query70	1215	1149	1030	1030
query71	481	294	277	277
query72	5593	3576	3760	3576
query73	757	734	348	348
query74	9016	9205	9027	9027
query75	3785	3214	2765	2765
query76	3700	1178	760	760
query77	785	376	295	295
query78	9996	10035	9403	9403
query79	2806	827	588	588
query80	632	521	448	448
query81	492	261	220	220
query82	683	128	99	99
query83	183	181	164	164
query84	241	101	72	72
query85	781	356	305	305
query86	381	304	275	275
query87	4629	4624	4337	4337
query88	3892	2233	2257	2233
query89	389	319	291	291
query90	1860	230	223	223
query91	140	144	111	111
query92	76	62	58	58
query93	2021	1068	588	588
query94	660	425	308	308
query95	363	275	266	266
query96	483	605	278	278
query97	3290	3454	3302	3302
query98	239	205	202	202
query99	1375	1386	1255	1255
Total cold run time: 275758 ms
Total hot run time: 186224 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e6c3e306ff73553cd070f918d395bf816bc4337c, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.11
query5	0.58	0.55	0.56
query6	1.18	0.72	0.71
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.51	0.51
query10	0.56	0.62	0.58
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.63	0.61	0.60
query14	2.68	2.67	2.82
query15	0.93	0.86	0.86
query16	0.38	0.38	0.38
query17	1.03	1.00	1.04
query18	0.21	0.20	0.20
query19	1.97	1.81	1.97
query20	0.02	0.01	0.01
query21	15.36	0.94	0.54
query22	0.77	1.27	0.65
query23	14.86	1.37	0.67
query24	6.42	1.73	1.79
query25	0.51	0.32	0.07
query26	0.55	0.15	0.15
query27	0.05	0.05	0.05
query28	10.11	0.81	0.42
query29	12.55	3.98	3.30
query30	0.26	0.10	0.06
query31	2.81	0.58	0.38
query32	3.23	0.55	0.46
query33	2.96	2.99	3.09
query34	15.70	5.11	4.51
query35	4.50	4.52	4.51
query36	0.66	0.49	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.13	0.14
query41	0.07	0.03	0.03
query42	0.03	0.02	0.03
query43	0.03	0.03	0.03
Total cold run time: 104.85 s
Total hot run time: 31.66 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants