Skip to content

Conversation

@liuguoqingfz
Copy link
Contributor

@liuguoqingfz liuguoqingfz commented Dec 16, 2025

Description

Fix a flaky test where the flake happening because DirectoryReader.open(directory) can rarely observe no committed segments yet, so ir.leaves() is empty and get(0) throws.

Related Issues

Resolves #19845

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Tests
    • Refactored aggregation test framework with improved resource lifecycle management and stronger validation mechanisms. Enhanced infrastructure includes optimized reader handling, more robust pre-condition checks, and better resource cleanup patterns. Maintains comprehensive test coverage while strengthening overall test reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

This change refactors the MultiTermsAggregatorTests test class to fix flaky test failures by improving resource management. It converts to nested try-with-resources for Directory and RandomIndexWriter, switches to a near-real-time reader via iw.getReader() instead of DirectoryReader.open(), adds a precondition assertion, and adjusts test logic accordingly while preserving test scenarios.

Changes

Cohort / File(s) Change Summary
Flaky test resource management refactoring
server/src/test/java/org/opensearch/search/aggregations/startree/MultiTermsAggregatorTests.java
Restructures test setup with nested try-with-resources for Directory and RandomIndexWriter lifecycle management. Replaces DirectoryReader.open(directory) with iw.getReader() for near-real-time reader behavior. Adds commit() after forceMerge(1). Introduces precondition assertion on non-empty leaves. Updates all subsequent test logic to operate on NRT reader context with proper leave handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Key areas requiring attention:
    • Verify that switching from DirectoryReader.open() to iw.getReader() (near-real-time reader) correctly addresses the root cause of flakiness
    • Confirm that the commit() call after forceMerge(1) and the updated reader/context handling preserve expected behavior for all test scenarios (2-term, 3-term, filtered, and sub-aggregation cases)
    • Validate that the precondition assertion on non-empty leaves is appropriate and doesn't mask issues
    • Ensure proper resource cleanup with nested try-with-resources in all execution paths

Suggested labels

bug, flaky-test

Suggested reviewers

  • cwperks
  • dbwiddis
  • mch2
  • msfroh
  • andrross
  • sachinpkale

Poem

🐰 A flaky test once caused us woe,
But try-with-resources stole the show!
With NRT readers held ever near,
Resources clean and futures clear—
The Star Tree aggregates without fear! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: fixing a flaky test for issue 19845 as described in the description.
Description check ✅ Passed The description adequately explains the root cause of the flaky test and references the related issue (#19845), though the checklist items remain unchecked.
Linked Issues check ✅ Passed The code changes directly address the flaky test issue by refactoring reader initialization to use NRT reader and adding precondition checks for non-empty leaves, resolving the race condition reported in #19845.
Out of Scope Changes check ✅ Passed All changes are scoped to the MultiTermsAggregatorTests test file and directly address the flaky test issue without introducing unrelated modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e798353 and e042001.

📒 Files selected for processing (1)
  • server/src/test/java/org/opensearch/search/aggregations/startree/MultiTermsAggregatorTests.java (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.

Applied to files:

  • server/src/test/java/org/opensearch/search/aggregations/startree/MultiTermsAggregatorTests.java
🧬 Code graph analysis (1)
server/src/test/java/org/opensearch/search/aggregations/startree/MultiTermsAggregatorTests.java (1)
server/src/main/java/org/opensearch/common/lucene/Lucene.java (1)
  • Lucene (114-992)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (4)
server/src/test/java/org/opensearch/search/aggregations/startree/MultiTermsAggregatorTests.java (4)

94-98: LGTM! Proper resource management with try-with-resources.

The nested try-with-resources pattern ensures that RandomIndexWriter is closed before Directory, which is the correct ordering for cleanup.


107-109: LGTM! Explicit commit after forceMerge.

Calling commit() after forceMerge(1) ensures the merged segment is persisted and visible, which is important for the star-tree structure to be properly built.


111-117: LGTM! Core fix for the flaky test.

Using iw.getReader() (NRT reader) instead of DirectoryReader.open(directory) guarantees visibility of indexed documents, avoiding the race condition where DirectoryReader.open might not see committed segments. The defensive assertion on line 113 provides a clear diagnostic if the precondition fails.


119-176: LGTM! Test logic preserved with proper resource lifecycle.

The test scenarios (2-term, 3-term, filtered, and sub-aggregation cases) are preserved, and the resource lifecycle is correctly managed with properly nested try-with-resources blocks.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

✅ Gradle check result for e042001: SUCCESS

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.19%. Comparing base (930ae74) to head (e042001).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20255      +/-   ##
============================================
- Coverage     73.20%   73.19%   -0.02%     
+ Complexity    71745    71742       -3     
============================================
  Files          5795     5795              
  Lines        328304   328304              
  Branches      47283    47283              
============================================
- Hits         240334   240292      -42     
- Misses        68663    68731      +68     
+ Partials      19307    19281      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross
Copy link
Member

@sandeshkr419 Can you take a look at this?

Copy link
Member

@sandeshkr419 sandeshkr419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NRT reader/writers are not available, are we still expecting the test to fail?


// Force merge to ensure the star-tree structure is built
iw.forceMerge(1);
iw.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, close() performs an implicit commit() unless LiveIndexWriterConfig.commitOnClose is set to false. Can you check if we are deciding this randomly - in that case we can set this to be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for MultiTermsAggregatorTests

3 participants