[MINOR] Add tests for auto keygen for immutable and mutable workflow #13889

linliu-code · 2025-09-13T13:52:49Z

Change Logs

At 20 iteration of operations have been added, either immutable or mutable operations.
Clean is verified.
Archive is verified.
Clustering is verified.

Impact

Verified if auto keygen works as expected when there are 10s of immutable or mutable operations with table services enabled.

Risk level (write none, low medium or high below)

None.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

nsivabalan · 2025-09-13T16:37:53Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestAutoKeyGenForSQL.scala

+    // Validate: table services are triggered.
+    assertFalse(metaClient.getActiveTimeline.getCleanerTimeline.getInstants.isEmpty)
+    assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
+    assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)


can we add validation for data.

nsivabalan · 2025-09-13T16:41:03Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestAutoKeyGenForSQL.scala

+
+    for (i <- 0 until 10) {
+      val ts: Long = 1695115999911L + i + 1
+      val rider: String = s"rider-${'A' + new Random().nextInt(8)}"


can we initialize Random once for the test and use it everywhere.

nsivabalan · 2025-09-13T16:42:05Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestAutoKeyGenForSQL.scala

+
+    // Validate: data integrity
+    val noRecords = spark.sql(s"SELECT * FROM $tableName").count()
+    assertEquals(16, noRecords)


can we declare a set for list of riders to update and delete. just to ensure the random does not produce the same rider records again.
so we know for sure, we are doing to update or delete two diff rider entries.

nsivabalan · 2025-09-13T16:43:03Z

...-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestAutoKeyGeneration.scala

+    assertTrue(metaClient.getTableConfig.getRecordKeyFields.isEmpty)
+    // Validate all records are unique.
+    val numRecords = spark.read.format("hudi").load(basePath).count()
+    assertEquals(45, numRecords)


can we assertain the data is intact.

nsivabalan · 2025-09-13T16:44:10Z

...-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestAutoKeyGeneration.scala

+    assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
+    assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())


we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

nsivabalan · 2025-09-13T16:44:24Z

...-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestAutoKeyGeneration.scala

+    assertFalse(metaClient.getArchivedTimeline.empty())
+    assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.empty())
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())


we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

nsivabalan · 2025-09-13T16:44:36Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestAutoKeyGenForSQL.scala

+    assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
+    assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)
+    if (tableType == HoodieTableType.MERGE_ON_READ) {
+      assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())


we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

nsivabalan · 2025-09-13T16:45:16Z

...atasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala

+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testStructuredStreamingWithAutoKeyGen(tableType: HoodieTableType): Unit = {
+    val (sourcePath, destPath) = initStreamingSourceAndDestPath("source", "dest")


can we try a mix of bulk insert and insert operation

nsivabalan · 2025-09-13T16:46:15Z

...utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java


+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  void testDeltaSyncWithAutoKeyGenAndImmutableOperations(HoodieTableType tableType) throws Exception {


can we try a mix of bulk insert, insert and upsert.
even though we might set upsert, for auto key gen, hudi should automatically switch it to insert.

lets validate that from commit metadata as well.

hudi-bot · 2025-09-13T17:05:26Z

CI report:

b81f6d4 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan · 2025-09-20T14:42:41Z

hey @linliu-code : this patch is still awaiting for you to address feedback.
@jonvex : once Lin addressed feedback, can you take this home.

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Sep 13, 2025

linliu-code force-pushed the add_function_tests_for_auto_keygen branch from 2624dd8 to 7d8b75a Compare September 13, 2025 15:55

nsivabalan reviewed Sep 13, 2025

View reviewed changes

Test auto key gen with table services

b81f6d4

linliu-code force-pushed the add_function_tests_for_auto_keygen branch from 7d8b75a to b81f6d4 Compare September 13, 2025 16:59

nsivabalan added the release-1.1.0 label Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MINOR] Add tests for auto keygen for immutable and mutable workflow #13889

[MINOR] Add tests for auto keygen for immutable and mutable workflow #13889

Uh oh!

linliu-code commented Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

nsivabalan Sep 13, 2025

Uh oh!

hudi-bot commented Sep 13, 2025

Uh oh!

nsivabalan commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MINOR] Add tests for auto keygen for immutable and mutable workflow #13889

Are you sure you want to change the base?

[MINOR] Add tests for auto keygen for immutable and mutable workflow #13889

Uh oh!

Conversation

linliu-code commented Sep 13, 2025

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Sep 13, 2025

CI report:

Uh oh!

nsivabalan commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants