[SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files #50357

mihailom-db · 2025-03-23T15:29:16Z

What changes were proposed in this pull request?

Original PR (#50338) fixed some of the flakiness, but there were more tests that could potentially be flaky. This PR is fixing these issues.

Why are the changes needed?

We should not rely in golden files tests on buffer ordering. This could lead to flakiness in tests and we need to fix it, so that we do not waste resources.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Test only change.

Was this patch authored or co-authored using generative AI tooling?

No.

beliefer · 2025-03-24T02:35:08Z

sql/core/src/test/resources/sql-tests/inputs/listagg-collations.sql

@@ -1,13 +1,13 @@
 -- Test cases with utf8_binary
 SELECT listagg(c1) WITHIN GROUP (ORDER BY c1 COLLATE utf8_binary) FROM (VALUES ('a'), ('A'), ('b'), ('B')) AS t(c1);
-SELECT listagg(DISTINCT c1 COLLATE utf8_binary) FROM (VALUES ('a'), ('A'), ('b'), ('B')) AS t(c1);
+WITH t(c1) AS (SELECT listagg(DISTINCT col1 COLLATE utf8_binary) FROM (VALUES ('a'), ('A'), ('b'), ('B'))) SELECT len(c1), regexp_count(c1, 'a'), regexp_count(c1, 'b'), regexp_count(c1, 'A'), regexp_count(c1, 'B') FROM t;


Could you explain why you introduce this?

In tests in general, there is no guarantee that ordering of the column will be the same as the query insertion time, unless we include an order by clause. This is potentially making tests flaky, as golden file tests need to be regenerated, so this PR is only making sure we do not have to rerun CIs for tests potentially flaky tests. Also, it is better to prevent flakiness first, then to leave it for detection later.

I think we do not need this one if listagg is deterministic.

Spark aggregate is not deterministic by nature, as the shuffle reader fetches shuffle blocks in a random order, and they can arrive in a random order.

cloud-fan · 2025-03-24T11:27:41Z

thanks, merging to master/4.0!

…lden files ### What changes were proposed in this pull request? Original PR (#50338) fixed some of the flakiness, but there were more tests that could potentially be flaky. This PR is fixing these issues. ### Why are the changes needed? We should not rely in golden files tests on buffer ordering. This could lead to flakiness in tests and we need to fix it, so that we do not waste resources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Test only change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50357 from mihailom-db/listaggfollowup2. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit e4b6e9a) Signed-off-by: Wenchen Fan <[email protected]>

…lden files ### What changes were proposed in this pull request? Original PR (apache#50338) fixed some of the flakiness, but there were more tests that could potentially be flaky. This PR is fixing these issues. ### Why are the changes needed? We should not rely in golden files tests on buffer ordering. This could lead to flakiness in tests and we need to fix it, so that we do not waste resources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Test only change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50357 from mihailom-db/listaggfollowup2. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

LuciferYang · 2025-03-26T11:31:03Z

@mihailom-db The daily test for branch-4.0 still failed yesterday, during the testing, the current pull request should have already been merged

https://github.com/apache/spark/actions/runs/14059194162/job/39365632364

mihailom-db · 2025-03-26T11:51:08Z

Seems like maxRows field for CTEs was not backported to 4.0. So the port of generated golden file was wrong. Simple regeneration will work. Opening PR soon.

Fix more flakiness

dc3b203

github-actions bot added the SQL label Mar 23, 2025

mihailom-db changed the title ~~[SPARK-42746][SQL] Fixing potential flakiness in ListAgg golden files~~ [FOLLOW-UP][SPARK-42746][SQL] Fixing potential flakiness in ListAgg golden files Mar 23, 2025

Regen files

63cfa31

beliefer changed the title ~~[FOLLOW-UP][SPARK-42746][SQL] Fixing potential flakiness in ListAgg golden files~~ [SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files Mar 24, 2025

beliefer reviewed Mar 24, 2025

View reviewed changes

cloud-fan closed this in e4b6e9a Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files #50357

[SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files #50357

Uh oh!

mihailom-db commented Mar 23, 2025 •

edited

Loading

Uh oh!

beliefer Mar 24, 2025 •

edited

Loading

Uh oh!

mihailom-db Mar 24, 2025

Uh oh!

beliefer Mar 24, 2025

Uh oh!

cloud-fan Mar 24, 2025

Uh oh!

beliefer Mar 24, 2025

Uh oh!

cloud-fan commented Mar 24, 2025

Uh oh!

LuciferYang commented Mar 26, 2025 •

edited

Loading

Uh oh!

mihailom-db commented Mar 26, 2025

Uh oh!

Uh oh!

[SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files #50357

[SPARK-42746][SQL][FOLLOWUP] Fixing potential flakiness in ListAgg golden files #50357

Uh oh!

Conversation

mihailom-db commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

beliefer Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihailom-db Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Mar 24, 2025

Uh oh!

LuciferYang commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihailom-db commented Mar 26, 2025

Uh oh!

Uh oh!

mihailom-db commented Mar 23, 2025 •

edited

Loading

beliefer Mar 24, 2025 •

edited

Loading

LuciferYang commented Mar 26, 2025 •

edited

Loading