Fix flakky tests #31270

mregrock · 2025-12-24T13:44:42Z

Changelog entry

Fix flakky tests

Changelog category

Not for changelog (changelog entry is not required)

Description for reviewers

...

ydbot · 2025-12-24T13:46:55Z

Run Extra Tests

Run additional tests for this PR. You can customize:

Test Size: small, medium, large (default: all)
Test Targets: any directory path (default: ydb/)
Sanitizers: ASAN, MSAN, TSAN
Coredumps: enable for debugging (default: off)
Additional args: custom ya make arguments

github-actions · 2025-12-24T13:47:43Z

⚪ 2025-12-24 13:47:43 UTC Pre-commit check linux-x86_64-relwithdebinfo for 56a7af3 has started.
⚪ 2025-12-24 13:47:49 UTC Artifacts will be uploaded here
⚪ 2025-12-24 13:49:12 UTC ya make is running...
⚫ 2025-12-24 13:49:58 UTC Check cancelled

github-actions · 2025-12-24T13:47:45Z

⚪ 2025-12-24 13:47:44 UTC Pre-commit check linux-x86_64-release-asan for 56a7af3 has started.
⚪ 2025-12-24 13:47:53 UTC Artifacts will be uploaded here
⚪ 2025-12-24 13:49:13 UTC ya make is running...
⚫ 2025-12-24 13:49:58 UTC Check cancelled

github-actions · 2025-12-24T13:48:04Z

🟢 2025-12-24 13:48:03 UTC The validation of the Pull Request description is successful.

Copilot

Pull request overview

This PR fixes flaky tests by introducing a mechanism to reset the node ID counter used by the test actor runtime. The changes add a new static method to reset the global NextNodeId counter to ensure tests start with predictable node IDs.

Key Changes:

Added ResetFirstNodeId() static method to TTestActorRuntimeBase for resetting the node ID counter
Modified the CheckV1IsBlocked test to call this reset method before creating a test server
Added explicit node_id: 1 in the test's YAML configuration to match the reset state

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
ydb/library/actors/testlib/test_runtime.h	Declares the new static `ResetFirstNodeId()` method
ydb/library/actors/testlib/test_runtime.cpp	Implements `ResetFirstNodeId()` to reset the static `NextNodeId` variable to 1
ydb/services/config/bsconfig_ut.cpp	Adds reset call at test start and explicit node_id configuration to ensure deterministic behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T13:48:12Z

ydb/library/actors/testlib/test_runtime.h

        void AddLocalService(const TActorId& actorId, TActorSetupCmd cmd, ui32 nodeIndex = 0);
        virtual void Initialize();
        ui32 GetNodeId(ui32 index = 0) const;
        ui32 GetNodeCount() const;


The ResetFirstNodeId method lacks documentation explaining its purpose, when it should be used, and why it's necessary. Given that this modifies global state and affects test isolation, it would be helpful to document that this method should be called at the beginning of tests that require a predictable starting node ID, and explain the implications of resetting this shared state.

Suggested change

ui32 GetNodeCount() const;

ui32 GetNodeCount() const;

/**

* Reset the starting node id used for subsequent test runtimes.

*

* This method modifies shared/global state that affects how node ids are

* assigned to test runtime instances. It is intended to be called at the

* beginning of tests that require a predictable first node id (for example,

* when tests assert on concrete node id values or rely on deterministic

* allocation order).

*

* Because this resets shared state, it can affect other tests running in the

* same process. It should therefore be used in a controlled manner, ideally

* as part of test setup before any test runtimes are constructed, and not

* from concurrently executing tests.

*/

Copilot · 2025-12-24T13:48:12Z

ydb/library/actors/testlib/test_runtime.cpp

        return NodeCount;
    }

+    void TTestActorRuntimeBase::ResetFirstNodeId() {


The ResetFirstNodeId method modifies a static variable without any synchronization mechanism. Since NextNodeId is accessed and modified in other methods (such as in the constructor and InitNodes), this could lead to race conditions if tests are run in parallel or if multiple runtime instances are created concurrently. Consider adding a mutex to protect access to NextNodeId, similar to how other shared state is protected with TGuard guard(Mutex) in methods like AllocateLocalId.

Suggested change

void TTestActorRuntimeBase::ResetFirstNodeId() {

void TTestActorRuntimeBase::ResetFirstNodeId() {

TGuard<TMutex> guard(Mutex);

Copilot · 2025-12-24T13:48:13Z

ydb/library/actors/testlib/test_runtime.h

        ui32 GetNodeCount() const;
+        static void ResetFirstNodeId();


The method name ResetFirstNodeId is misleading because it actually resets NextNodeId, not FirstNodeId. FirstNodeId is an instance variable that stores the starting node ID for a runtime instance, while NextNodeId is a static variable that tracks the next available node ID across all instances. Consider renaming this method to ResetNextNodeId to accurately reflect what it does.

Suggested change

ui32 GetNodeCount() const;

static void ResetFirstNodeId();

ui32 GetNodeCount() const;

// Deprecated: use ResetNextNodeId instead.

static void ResetFirstNodeId();

static void ResetNextNodeId() { ResetFirstNodeId(); }

github-actions · 2025-12-24T13:53:44Z

⚪ 2025-12-24 13:53:43 UTC Pre-commit check linux-x86_64-release-asan for c4732c7 has started.
⚪ 2025-12-24 13:54:21 UTC Artifacts will be uploaded here
⚪ 2025-12-24 13:56:30 UTC ya make is running...
🟡 2025-12-24 14:44:16 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
11614	11565	0	42	4	3

🟢 2025-12-24 14:44:24 UTC Build successful.
🟢 2025-12-24 14:44:59 UTC ydbd size 3.8 GiB changed* by -176 Bytes, which is <= 0 Bytes vs main: OK

ydbd size dash	main: `1182915`	merge: `c4732c7`	diff	diff %
ydbd size	4 087 569 592 Bytes	4 087 569 416 Bytes	-176 Bytes	-0.000%
ydbd stripped size	1 530 591 704 Bytes	1 530 591 576 Bytes	-128 Bytes	-0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

github-actions · 2025-12-24T13:54:54Z

⚪ 2025-12-24 13:54:54 UTC Pre-commit check linux-x86_64-relwithdebinfo for c4732c7 has started.
⚪ 2025-12-24 13:55:11 UTC Artifacts will be uploaded here
⚪ 2025-12-24 13:57:18 UTC ya make is running...
🟡 2025-12-24 15:19:43 UTC Some tests failed, follow the links below. Going to retry failed tests...

Details

Ya make output | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
30158	28284	0	15	1670	189

⚪ 2025-12-24 15:19:57 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-12-24 15:49:20 UTC Some tests failed, follow the links below. Going to retry failed tests...

Details

Ya make output | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
1798 (only retried tests)	1792	0	1	0	5

⚪ 2025-12-24 15:49:27 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-12-24 15:50:44 UTC Some tests failed, follow the links below.

Ya make output | Test bloat | Test bloat | Test bloat

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
180 (only retried tests)	174	0	1	0	5

🟢 2025-12-24 15:50:51 UTC Build successful.
🟢 2025-12-24 15:51:12 UTC ydbd size 2.3 GiB changed* by -32 Bytes, which is <= 0 Bytes vs main: OK

ydbd size dash	main: `1182915`	merge: `c4732c7`	diff	diff %
ydbd size	2 487 005 848 Bytes	2 487 005 816 Bytes	-32 Bytes	-0.000%
ydbd stripped size	529 416 960 Bytes	529 416 960 Bytes	0 Bytes	0.000%

^{*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation}

Copilot AI review requested due to automatic review settings December 24, 2025 13:44

github-actions bot added the not-for-changelog label Dec 24, 2025

Copilot started reviewing on behalf of mregrock December 24, 2025 13:46 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Fix flakky tests

0c1ac74

mregrock force-pushed the fix_flakky_bsconfig branch from 24d465a to 0c1ac74 Compare December 24, 2025 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flakky tests #31270

Fix flakky tests #31270

mregrock commented Dec 24, 2025

Uh oh!

ydbot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        ui32 GetNodeCount() const;
+        ui32 GetNodeCount() const;
+        /**
+         * Reset the starting node id used for subsequent test runtimes.
+         *
+         * This method modifies shared/global state that affects how node ids are
+         * assigned to test runtime instances. It is intended to be called at the
+         * beginning of tests that require a predictable first node id (for example,
+         * when tests assert on concrete node id values or rely on deterministic
+         * allocation order).
+         *
+         * Because this resets shared state, it can affect other tests running in the
+         * same process. It should therefore be used in a controlled manner, ideally
+         * as part of test setup before any test runtimes are constructed, and not
+         * from concurrently executing tests.
+         */

	void TTestActorRuntimeBase::ResetFirstNodeId() {
	void TTestActorRuntimeBase::ResetFirstNodeId() {
	TGuard<TMutex> guard(Mutex);

Fix flakky tests #31270

Are you sure you want to change the base?

Fix flakky tests #31270

Conversation

mregrock commented Dec 24, 2025

Changelog entry

Changelog category

Description for reviewers

Uh oh!

ydbot commented Dec 24, 2025

Run Extra Tests

Uh oh!

github-actions bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 24, 2025 •

edited

Loading

github-actions bot commented Dec 24, 2025 •

edited

Loading

github-actions bot commented Dec 24, 2025 •

edited

Loading

github-actions bot commented Dec 24, 2025 •

edited

Loading