-
Notifications
You must be signed in to change notification settings - Fork 741
Fix flakky tests #31270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix flakky tests #31270
Conversation
|
🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes flaky tests by introducing a mechanism to reset the node ID counter used by the test actor runtime. The changes add a new static method to reset the global NextNodeId counter to ensure tests start with predictable node IDs.
Key Changes:
- Added
ResetFirstNodeId()static method toTTestActorRuntimeBasefor resetting the node ID counter - Modified the
CheckV1IsBlockedtest to call this reset method before creating a test server - Added explicit
node_id: 1in the test's YAML configuration to match the reset state
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ydb/library/actors/testlib/test_runtime.h | Declares the new static ResetFirstNodeId() method |
| ydb/library/actors/testlib/test_runtime.cpp | Implements ResetFirstNodeId() to reset the static NextNodeId variable to 1 |
| ydb/services/config/bsconfig_ut.cpp | Adds reset call at test start and explicit node_id configuration to ensure deterministic behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void AddLocalService(const TActorId& actorId, TActorSetupCmd cmd, ui32 nodeIndex = 0); | ||
| virtual void Initialize(); | ||
| ui32 GetNodeId(ui32 index = 0) const; | ||
| ui32 GetNodeCount() const; |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ResetFirstNodeId method lacks documentation explaining its purpose, when it should be used, and why it's necessary. Given that this modifies global state and affects test isolation, it would be helpful to document that this method should be called at the beginning of tests that require a predictable starting node ID, and explain the implications of resetting this shared state.
| ui32 GetNodeCount() const; | |
| ui32 GetNodeCount() const; | |
| /** | |
| * Reset the starting node id used for subsequent test runtimes. | |
| * | |
| * This method modifies shared/global state that affects how node ids are | |
| * assigned to test runtime instances. It is intended to be called at the | |
| * beginning of tests that require a predictable first node id (for example, | |
| * when tests assert on concrete node id values or rely on deterministic | |
| * allocation order). | |
| * | |
| * Because this resets shared state, it can affect other tests running in the | |
| * same process. It should therefore be used in a controlled manner, ideally | |
| * as part of test setup before any test runtimes are constructed, and not | |
| * from concurrently executing tests. | |
| */ |
| return NodeCount; | ||
| } | ||
|
|
||
| void TTestActorRuntimeBase::ResetFirstNodeId() { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ResetFirstNodeId method modifies a static variable without any synchronization mechanism. Since NextNodeId is accessed and modified in other methods (such as in the constructor and InitNodes), this could lead to race conditions if tests are run in parallel or if multiple runtime instances are created concurrently. Consider adding a mutex to protect access to NextNodeId, similar to how other shared state is protected with TGuard guard(Mutex) in methods like AllocateLocalId.
| void TTestActorRuntimeBase::ResetFirstNodeId() { | |
| void TTestActorRuntimeBase::ResetFirstNodeId() { | |
| TGuard<TMutex> guard(Mutex); |
| ui32 GetNodeCount() const; | ||
| static void ResetFirstNodeId(); |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name ResetFirstNodeId is misleading because it actually resets NextNodeId, not FirstNodeId. FirstNodeId is an instance variable that stores the starting node ID for a runtime instance, while NextNodeId is a static variable that tracks the next available node ID across all instances. Consider renaming this method to ResetNextNodeId to accurately reflect what it does.
| ui32 GetNodeCount() const; | |
| static void ResetFirstNodeId(); | |
| ui32 GetNodeCount() const; | |
| // Deprecated: use ResetNextNodeId instead. | |
| static void ResetFirstNodeId(); | |
| static void ResetNextNodeId() { ResetFirstNodeId(); } |
24d465a to
0c1ac74
Compare
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ DetailsYa make output | Test bloat | Test bloat
⚪ Ya make output | Test bloat | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Changelog entry
Fix flakky tests
Changelog category
Description for reviewers
...