Skip to content

Conversation

linliu-code
Copy link
Contributor

Change Logs

  1. At 20 iteration of operations have been added, either immutable or mutable operations.
  2. Clean is verified.
  3. Archive is verified.
  4. Clustering is verified.

Impact

Verified if auto keygen works as expected when there are 10s of immutable or mutable operations with table services enabled.

Risk level (write none, low medium or high below)

None.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Sep 13, 2025
@linliu-code linliu-code force-pushed the add_function_tests_for_auto_keygen branch from 2624dd8 to 7d8b75a Compare September 13, 2025 15:55
// Validate: table services are triggered.
assertFalse(metaClient.getActiveTimeline.getCleanerTimeline.getInstants.isEmpty)
assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add validation for data.


for (i <- 0 until 10) {
val ts: Long = 1695115999911L + i + 1
val rider: String = s"rider-${'A' + new Random().nextInt(8)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we initialize Random once for the test and use it everywhere.


// Validate: data integrity
val noRecords = spark.sql(s"SELECT * FROM $tableName").count()
assertEquals(16, noRecords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we declare a set for list of riders to update and delete. just to ensure the random does not produce the same rider records again.
so we know for sure, we are doing to update or delete two diff rider entries.

assertTrue(metaClient.getTableConfig.getRecordKeyFields.isEmpty)
// Validate all records are unique.
val numRecords = spark.read.format("hudi").load(basePath).count()
assertEquals(45, numRecords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assertain the data is intact.

assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)
if (tableType == HoodieTableType.MERGE_ON_READ) {
assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

assertFalse(metaClient.getArchivedTimeline.empty())
assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.empty())
if (tableType == HoodieTableType.MERGE_ON_READ) {
assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

assertFalse(metaClient.getArchivedTimeline.getInstants.isEmpty)
assertFalse(metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.isEmpty)
if (tableType == HoodieTableType.MERGE_ON_READ) {
assertFalse(metaClient.getActiveTimeline.getCommitsAndCompactionTimeline.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to ascertain exact number of compaction commits.
same for clean commits and replace commits above as well instead of just checking for ~isEmpty

@ParameterizedTest
@EnumSource(value = classOf[HoodieTableType])
def testStructuredStreamingWithAutoKeyGen(tableType: HoodieTableType): Unit = {
val (sourcePath, destPath) = initStreamingSourceAndDestPath("source", "dest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try a mix of bulk insert and insert operation


@ParameterizedTest
@EnumSource(HoodieTableType.class)
void testDeltaSyncWithAutoKeyGenAndImmutableOperations(HoodieTableType tableType) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try a mix of bulk insert, insert and upsert.
even though we might set upsert, for auto key gen, hudi should automatically switch it to insert.

lets validate that from commit metadata as well.

@linliu-code linliu-code force-pushed the add_function_tests_for_auto_keygen branch from 7d8b75a to b81f6d4 Compare September 13, 2025 16:59
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

hey @linliu-code : this patch is still awaiting for you to address feedback.
@jonvex : once Lin addressed feedback, can you take this home.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-1.1.0 size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants