Skip to content

Conversation

AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Mar 25, 2025

Description

Fix RFS Shutdown logic during exception cases

Set kafka tests as isolated

Remove deprecated usage of KafkaContainer in favor of ConfluentKafkaContainer

Issues Resolved

MIGRATIONS-2461
MIGRATIONS-2460

Testing

GHA

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andre Kurait <[email protected]>
@AndreKurait AndreKurait changed the title Flaky tests Fix RFS Shutdown logic during exception cases and set kafka tests as isolated Mar 25, 2025
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞 All my comments are optional, really would like to see this get our CI unblocked.

Comment on lines +450 to +451
if (successorWorkItemIds.size() == 1 && workItemId.equals(successorWorkItemIds.get(0))) {
log.atWarn().setMessage("No real progress was made for work item: {}. Will retry with larger timeout").addArgument(workItemId).log();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange case, it seems like the getSuccessorWorkItemIds should error out internally before returning up to this level. Can we rework this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSuccessorWorkItemIds does error out, but this should really be a warn instead of an error.

The case here is that the lease is just long enough to send one request to the target cluster successfully, getSuccessorWorkItemIds does throw.

With the new try catch, this would be caught, but this isn't an "Error" case, more of a Warn which is why we shouldn't rely on that exception in getSuccessorWorkItemIds

Comment on lines +463 to 468
} else {
log.atWarn().setMessage("No progress cursor to create successor work items from. This can happen when" +
"downloading and unpacking shard takes longer than the lease").log();
log.atWarn().setMessage("Skipping creation of successor work item to retry the existing one with more time")
.log();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we invert the flow of control and return if the precondition fails right away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have only one 'level' of if/elseif/else blocks for each function, makes it much cleaner to read.

@AndreKurait AndreKurait marked this pull request as ready for review March 25, 2025 18:55
@AndreKurait AndreKurait merged commit 0faa2da into opensearch-project:main Mar 25, 2025
54 checks passed
Copy link

codecov bot commented Mar 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (823efa2) to head (23954bc).
Report is 6 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff      @@
##   main   #1385   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants