Fix RFS Shutdown logic during exception cases and set kafka tests as isolated #1385

AndreKurait · 2025-03-25T18:15:43Z

Description

Fix RFS Shutdown logic during exception cases

Set kafka tests as isolated

Remove deprecated usage of KafkaContainer in favor of ConfluentKafkaContainer

Issues Resolved

MIGRATIONS-2461
MIGRATIONS-2460

Testing

GHA

Check List

New functionality includes testing
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andre Kurait <[email protected]>

peternied

🤞 All my comments are optional, really would like to see this get our CI unblocked.

peternied · 2025-03-25T18:42:47Z

DocumentsFromSnapshotMigration/src/main/java/org/opensearch/migrations/RfsMigrateDocuments.java

+                if (successorWorkItemIds.size() == 1 && workItemId.equals(successorWorkItemIds.get(0))) {
+                    log.atWarn().setMessage("No real progress was made for work item: {}. Will retry with larger timeout").addArgument(workItemId).log();


This is a strange case, it seems like the getSuccessorWorkItemIds should error out internally before returning up to this level. Can we rework this?

getSuccessorWorkItemIds does error out, but this should really be a warn instead of an error.

The case here is that the lease is just long enough to send one request to the target cluster successfully, getSuccessorWorkItemIds does throw.

With the new try catch, this would be caught, but this isn't an "Error" case, more of a Warn which is why we shouldn't rely on that exception in getSuccessorWorkItemIds

peternied · 2025-03-25T18:44:39Z

DocumentsFromSnapshotMigration/src/main/java/org/opensearch/migrations/RfsMigrateDocuments.java

+            } else {
+                log.atWarn().setMessage("No progress cursor to create successor work items from. This can happen when" +
+                        "downloading and unpacking shard takes longer than the lease").log();
+                log.atWarn().setMessage("Skipping creation of successor work item to retry the existing one with more time")
+                        .log();
            }


Nit: Can we invert the flow of control and return if the precondition fails right away?

Ideally we'd have only one 'level' of if/elseif/else blocks for each function, makes it much cleaner to read.

codecov · 2025-03-25T18:56:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (823efa2) to head (23954bc).
Report is 6 commits behind head on main.

Additional details and impacted files

@@     Coverage Diff      @@
##   main   #1385   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fix logic in RFS when run exits without real progress

f6b7298

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait had a problem deploying to migrations-cicd March 25, 2025 18:16 — with GitHub Actions Failure

Mark kafka tests as isolated

ac40c38

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait force-pushed the FlakyTests branch from ac5ffbc to ac40c38 Compare March 25, 2025 18:28

AndreKurait temporarily deployed to migrations-cicd March 25, 2025 18:28 — with GitHub Actions Inactive

AndreKurait changed the title ~~Flaky tests~~ Fix RFS Shutdown logic during exception cases and set kafka tests as isolated Mar 25, 2025

Extend timeout on KafkaTrafficCaptureSourceTest

23954bc

Signed-off-by: Andre Kurait <[email protected]>

AndreKurait temporarily deployed to migrations-cicd March 25, 2025 18:34 — with GitHub Actions Inactive

peternied approved these changes Mar 25, 2025

View reviewed changes

AndreKurait marked this pull request as ready for review March 25, 2025 18:55

AndreKurait requested review from chelma, gregschohn, jugal-chauhan, lewijacn, mikaylathompson and sumobrian as code owners March 25, 2025 18:55

AndreKurait merged commit 0faa2da into opensearch-project:main Mar 25, 2025
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix RFS Shutdown logic during exception cases and set kafka tests as isolated #1385

Fix RFS Shutdown logic during exception cases and set kafka tests as isolated #1385

Uh oh!

AndreKurait commented Mar 25, 2025 •

edited

Loading

Uh oh!

peternied left a comment

Uh oh!

peternied Mar 25, 2025

Uh oh!

AndreKurait Mar 25, 2025

Uh oh!

peternied Mar 25, 2025

Uh oh!

peternied Mar 25, 2025

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (successorWorkItemIds.size() == 1 && workItemId.equals(successorWorkItemIds.get(0))) {
		log.atWarn().setMessage("No real progress was made for work item: {}. Will retry with larger timeout").addArgument(workItemId).log();

Fix RFS Shutdown logic during exception cases and set kafka tests as isolated #1385

Fix RFS Shutdown logic during exception cases and set kafka tests as isolated #1385

Uh oh!

Conversation

AndreKurait commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Testing

Check List

Uh oh!

peternied left a comment

Choose a reason for hiding this comment

Uh oh!

peternied Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

AndreKurait Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

peternied Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreKurait commented Mar 25, 2025 •

edited

Loading