Skip to content

Conversation

@liuguoqingfz
Copy link
Contributor

@liuguoqingfz liuguoqingfz commented Dec 16, 2025

Description

Fix 2 flaky tests org.opensearch.telemetry.metrics.TelemetryMetricsEnabledSanityIT.testGauge and org.opensearch.telemetry.metrics.TelemetryMetricsEnabledSanityIT.testGaugeWithValueAndTagSupplier where after gaugeCloseable.close(), there can still be an in-flight / already-scheduled collection that calls your valueProvider one more time, so the max jumps from 2.0 to 3.0 and the “no change” assertEquals() becomes flaky.
Use assertBusy instead of Thread.sleep(2200)

Related Issues

Resolves #19422

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Tests
    • Improved telemetry metrics test reliability by replacing fixed delays with dynamic waits tied to observable data state.
    • Enhanced test verification to detect in-flight metric operations post-closure, ensuring proper metric publishing behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

…it until the exported max value becomes stable using assertBusy.

Signed-off-by: Joe Liu <[email protected]>
@liuguoqingfz liuguoqingfz requested a review from a team as a code owner December 16, 2025 18:07
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Dec 16, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

Refactored TelemetryMetricsEnabledSanityIT test class to eliminate timing-based flakiness by replacing fixed sleeps with assertBusy waits, simplifying lambda expressions, and introducing in-flight publish stabilization checks using AtomicReference to detect post-close gauge value mutations.

Changes

Cohort / File(s) Change Summary
Flaky test stabilization
plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java
Replaced deterministic Thread.sleep() calls with assertBusy-based waits for gauge publishing synchronization. Simplified value provider lambdas to concise single-expression forms. Added in-flight publish stabilization check comparing successive gauge values after close to detect lingering mutations. Maintained pre-close assertions for published observations and metric attributes while removing post-close attribute checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Understanding the in-flight stabilization logic and AtomicReference usage pattern
  • Verifying assertBusy replacements provide equivalent or improved timing safety
  • Ensuring pre-close assertion completeness adequately covers metric behavior

Suggested labels

bug, flaky-test

Suggested reviewers

  • cwperks
  • gbbafna
  • sachinpkale
  • owaiskazi19
  • dbwiddis

Poem

🐰 Sleeps were flakey, waits were slow,
Now assertBusy lets the metrics flow!
AtomicRef guards the gauge so tight,
No more random failures in the night!
Telemetry tests now stable as can be,
Hopping forward with reliability! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing flaky tests by removing Thread.sleep calls for issue 19422.
Description check ✅ Passed The description is comprehensive and follows the template, with a clear explanation of the root cause, the specific tests affected, and the solution implemented.
Linked Issues check ✅ Passed The code changes directly address the flaky test issues reported in #19422 by replacing Thread.sleep with assertBusy and introducing in-flight publish stabilization checks.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the two identified flaky tests in TelemetryMetricsEnabledSanityIT with no unrelated modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (1)

174-187: Stability check correctly applied, with same style note as testGauge.

This stability check mirrors the implementation in testGauge (lines 143-154) and correctly addresses the flaky test root cause. The same import inconsistency applies here—prefer importing AtomicReference rather than using the fully-qualified name. See the earlier comment on lines 143-154 for the suggested refactor.

🧹 Nitpick comments (1)
plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (1)

143-154: Stability check logic is correct, but address import inconsistency.

The stability detection approach correctly handles the race condition described in the PR: allowing in-flight gauge collections to complete after close(), then verifying the value eventually stops changing. The use of Double.compare for floating-point comparison is appropriate.

However, AtomicReference is used with a fully-qualified name while AtomicInteger is imported (line 27). For consistency, add an import for AtomicReference.

Apply this diff to add the import at the top of the file:

 import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
 import java.util.function.Supplier;

Then simplify the declaration:

-        final java.util.concurrent.atomic.AtomicReference<Double> lastSeen = new java.util.concurrent.atomic.AtomicReference<>(
+        final AtomicReference<Double> lastSeen = new AtomicReference<>(
             getMaxObservableGaugeValue(exporter, metricName)
         );

Optional: Extract duplicated stability check logic.

The stability verification logic is duplicated in both testGauge and testGaugeWithValueAndTagSupplier. Consider extracting it to a helper method like:

private void assertGaugeStabilityAfterClose(InMemorySingletonMetricsExporter exporter, String metricName) {
    final AtomicReference<Double> lastSeen = new AtomicReference<>(
        getMaxObservableGaugeValue(exporter, metricName)
    );
    assertBusy(() -> {
        double now = getMaxObservableGaugeValue(exporter, metricName);
        double prev = lastSeen.get();
        if (Double.compare(prev, now) != 0) {
            lastSeen.set(now);
            fail("Gauge value changed after close (in-flight publish still running): prev=" + prev + ", now=" + now);
        }
    });
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 799fb9b and e77e425.

📒 Files selected for processing (1)
  • plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: gradle-check
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, macos-15)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
🔇 Additional comments (4)
plugins/telemetry-otel/src/internalClusterTest/java/org/opensearch/telemetry/metrics/TelemetryMetricsEnabledSanityIT.java (4)

136-136: LGTM: Lambda simplification improves readability.

The simplified lambda expression is more idiomatic and maintains identical behavior.


140-141: Good replacement of Thread.sleep with assertBusy.

Using assertBusy to wait for observable gauge values eliminates the timing-based flakiness of fixed sleeps. This approach correctly synchronizes on actual published data.


163-163: LGTM: Lambda simplification improves readability.

Consistent with the simplification on line 136, this change improves readability while preserving behavior.


167-172: Excellent use of assertBusy for both value and attribute verification.

The pre-close assertions now properly wait for observable conditions: both the gauge value reaching the threshold and the metric attributes containing the expected tag. This eliminates race conditions that could occur with fixed sleeps.

@github-actions
Copy link
Contributor

❌ Gradle check result for e77e425: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Thread.sleep(1200);
assertEquals(observableGaugeValueAfterStop, getMaxObservableGaugeValue(exporter, metricName), 0.0);

// After close, allow any in-flight collection to finish, but ensure the value eventually becomes stable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to wait for in-flight collections to finish? The code is going to invoke getMaxObservableGaugeValue(exporter, metricName) twice in very quick succession, and assuming it returns the same value both times it will then pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for TelemetryMetricsEnabledSanityIT

2 participants