Fix flaky tests for issue 19422 to remove Thread.sleep #20262

liuguoqingfz · 2025-12-16T18:07:18Z

Description

Fix 2 flaky tests org.opensearch.telemetry.metrics.TelemetryMetricsEnabledSanityIT.testGauge and org.opensearch.telemetry.metrics.TelemetryMetricsEnabledSanityIT.testGaugeWithValueAndTagSupplier where after gaugeCloseable.close(), there can still be an in-flight / already-scheduled collection that calls your valueProvider one more time, so the max jumps from 2.0 to 3.0 and the “no change” assertEquals() becomes flaky.
Use assertBusy instead of Thread.sleep(2200)

Related Issues

Resolves #19422

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Tests
- Improved telemetry metrics test reliability by replacing fixed delays with dynamic waits tied to observable data state.
- Enhanced test verification to detect in-flight metric operations post-closure, ensuring proper metric publishing behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…it until the exported max value becomes stable using assertBusy. Signed-off-by: Joe Liu <[email protected]>

coderabbitai · 2025-12-16T18:07:41Z

Walkthrough

Refactored TelemetryMetricsEnabledSanityIT test class to eliminate timing-based flakiness by replacing fixed sleeps with assertBusy waits, simplifying lambda expressions, and introducing in-flight publish stabilization checks using AtomicReference to detect post-close gauge value mutations.

Changes

Cohort / File(s)	Change Summary
Flaky test stabilization `plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java`	Replaced deterministic `Thread.sleep()` calls with `assertBusy`-based waits for gauge publishing synchronization. Simplified value provider lambdas to concise single-expression forms. Added in-flight publish stabilization check comparing successive gauge values after close to detect lingering mutations. Maintained pre-close assertions for published observations and metric attributes while removing post-close attribute checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Understanding the in-flight stabilization logic and AtomicReference usage pattern
Verifying assertBusy replacements provide equivalent or improved timing safety
Ensuring pre-close assertion completeness adequately covers metric behavior

Suggested labels

bug, flaky-test

Suggested reviewers

cwperks
gbbafna
sachinpkale
owaiskazi19
dbwiddis

Poem

🐰 Sleeps were flakey, waits were slow,
Now assertBusy lets the metrics flow!
AtomicRef guards the gauge so tight,
No more random failures in the night!
Telemetry tests now stable as can be,
Hopping forward with reliability! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: fixing flaky tests by removing Thread.sleep calls for issue 19422.
Description check	✅ Passed	The description is comprehensive and follows the template, with a clear explanation of the root cause, the specific tests affected, and the solution implemented.
Linked Issues check	✅ Passed	The code changes directly address the flaky test issues reported in #19422 by replacing Thread.sleep with assertBusy and introducing in-flight publish stabilization checks.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the two identified flaky tests in TelemetryMetricsEnabledSanityIT with no unrelated modifications.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (1)

174-187: Stability check correctly applied, with same style note as testGauge.

This stability check mirrors the implementation in testGauge (lines 143-154) and correctly addresses the flaky test root cause. The same import inconsistency applies here—prefer importing AtomicReference rather than using the fully-qualified name. See the earlier comment on lines 143-154 for the suggested refactor.

🧹 Nitpick comments (1)

plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (1)
143-154: Stability check logic is correct, but address import inconsistency.

The stability detection approach correctly handles the race condition described in the PR: allowing in-flight gauge collections to complete after close(), then verifying the value eventually stops changing. The use of Double.compare for floating-point comparison is appropriate.

However, AtomicReference is used with a fully-qualified name while AtomicInteger is imported (line 27). For consistency, add an import for AtomicReference.

Apply this diff to add the import at the top of the file:
 import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
 import java.util.function.Supplier;
Then simplify the declaration:
-        final java.util.concurrent.atomic.AtomicReference<Double> lastSeen = new java.util.concurrent.atomic.AtomicReference<>(
+        final AtomicReference<Double> lastSeen = new AtomicReference<>(
             getMaxObservableGaugeValue(exporter, metricName)
         );
Optional: Extract duplicated stability check logic.

The stability verification logic is duplicated in both testGauge and testGaugeWithValueAndTagSupplier. Consider extracting it to a helper method like:
private void assertGaugeStabilityAfterClose(InMemorySingletonMetricsExporter exporter, String metricName) {
    final AtomicReference<Double> lastSeen = new AtomicReference<>(
        getMaxObservableGaugeValue(exporter, metricName)
    );
    assertBusy(() -> {
        double now = getMaxObservableGaugeValue(exporter, metricName);
        double prev = lastSeen.get();
        if (Double.compare(prev, now) != 0) {
            lastSeen.set(now);
            fail("Gauge value changed after close (in-flight publish still running): prev=" + prev + ", now=" + now);
        }
    });
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 799fb9b and e77e425.

📒 Files selected for processing (1)

plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: gradle-check
GitHub Check: Analyze (java)
GitHub Check: detect-breaking-change
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (21, macos-15)
GitHub Check: assemble (21, windows-latest)
GitHub Check: assemble (21, ubuntu-latest)

🔇 Additional comments (4)

plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (4)

136-136: LGTM: Lambda simplification improves readability.

The simplified lambda expression is more idiomatic and maintains identical behavior.

140-141: Good replacement of Thread.sleep with assertBusy.

Using assertBusy to wait for observable gauge values eliminates the timing-based flakiness of fixed sleeps. This approach correctly synchronizes on actual published data.

163-163: LGTM: Lambda simplification improves readability.

Consistent with the simplification on line 136, this change improves readability while preserving behavior.

167-172: Excellent use of assertBusy for both value and attribute verification.

The pre-close assertions now properly wait for observable conditions: both the gauge value reaching the threshold and the metric attributes containing the expected tag. This eliminates race conditions that could occur with fixed sleeps.

github-actions · 2025-12-16T18:54:59Z

❌ Gradle check result for e77e425: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

andrross · 2025-12-16T22:57:39Z

...ternalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java

-        Thread.sleep(1200);
-        assertEquals(observableGaugeValueAfterStop, getMaxObservableGaugeValue(exporter, metricName), 0.0);
-
+        // After close, allow any in-flight collection to finish, but ensure the value eventually becomes stable.


How is this going to wait for in-flight collections to finish? The code is going to invoke getMaxObservableGaugeValue(exporter, metricName) twice in very quick succession, and assuming it returns the same value both times it will then pass.

don’t assume the value freezes immediately after close(). Instead, wa…

e77e425

…it until the exported max value becomes stable using assertBusy. Signed-off-by: Joe Liu <[email protected]>

liuguoqingfz requested a review from a team as a code owner December 16, 2025 18:07

github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Dec 16, 2025

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

andrross reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky tests for issue 19422 to remove Thread.sleep #20262

Fix flaky tests for issue 19422 to remove Thread.sleep #20262

Uh oh!

liuguoqingfz commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

andrross Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix flaky tests for issue 19422 to remove Thread.sleep #20262

Are you sure you want to change the base?

Fix flaky tests for issue 19422 to remove Thread.sleep #20262

Uh oh!

Conversation

liuguoqingfz commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

andrross Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liuguoqingfz commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading