Skip to content

[23088] Solve Discovery Server race conditions (backport #5780) #5807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 2.14.x
Choose a base branch
from

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented May 12, 2025

Description

This PR attempts to fix a couple of race conditions detected in scenarios where a large number of clients are present:

  • PDP and EDP messages are stored in two queues which are not atomically swapped, which could create situations such as processing a data r/w of a data P already received but to be processed in the next server routine iteration.
  • When a data UP is received (or manually inserted if a participant is dropped), it remains in the internal participants map but with its change updated. The item will be deleted only after the data UP is acked by all participants. If a new data P is received before this occurs, it will not be correctly processed. As a result, when the data UP is acked the item will be deleted from the participants map, and if a data r/w was to be processed afterwards, this would fail and print one of these errors "Reader/Writer has no associated participant. Skipping" or "Matching unexisting participant from reader/writer".
  • When a reader/writer is being inserted to the database, checking if the corresponding participant is present is done after insertion, instead of as a first step. And even if the participant is present, it should be checked if it is alive to abort otherwise.

@Mergifyio backport 3.1.x 2.14.x 2.10.x

Contributor Checklist

  • Commit messages follow the project guidelines.

  • The code follows the style guidelines of this project.

  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally

  • N/A Any new/modified methods have been properly documented using Doxygen.

  • N/A Any new configuration API has an equivalent XML API (with the corresponding XSD extension)

  • Changes are backport compatible: they do NOT break ABI nor change library core behavior.

  • Changes are API compatible.

  • N/A New feature has been added to the versions.md file (if applicable).

  • N/A New feature has been documented/Current behavior is correctly described in the documentation.

  • Applicable backports have been included in the description.

Reviewer Checklist

  • The PR has a milestone assigned.
  • The title and description correctly express the PR's purpose.
  • Check contributor checklist is correct.
  • N/A If this is a critical bug fix, backports to the critical-only supported branches have been requested.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

This is an automatic backport of pull request #5780 done by [Mergify](https://mergify.com).

* Refs #23088: Test reconnection when removing participant

Signed-off-by: cferreiragonz <[email protected]>

* Refs #23088: Solve EDP-PDP queues race condition

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Solve data UP + data P race condition

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Abort writer/reader processing if associated participant not alive

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Apply suggestions

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Release change when writer/reader insertion in DB failed

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Match servers after change update

Signed-off-by: Juan Lopez Fernandez <[email protected]>

---------

Signed-off-by: cferreiragonz <[email protected]>
Signed-off-by: Juan Lopez Fernandez <[email protected]>
Co-authored-by: cferreiragonz <[email protected]>
(cherry picked from commit ec666f7)

# Conflicts:
#	src/cpp/rtps/builtin/discovery/database/DiscoveryDataBase.cpp
#	test/blackbox/common/BlackboxTestsDiscovery.cpp
Copy link
Contributor Author

mergify bot commented May 12, 2025

Cherry-pick of ec666f7 has failed:

On branch mergify/bp/2.14.x/pr-5780
Your branch is up to date with 'origin/2.14.x'.

You are currently cherry-picking commit ec666f72.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   src/cpp/rtps/builtin/discovery/database/DiscoveryDataBase.hpp
	modified:   src/cpp/rtps/builtin/discovery/participant/PDPServer.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   src/cpp/rtps/builtin/discovery/database/DiscoveryDataBase.cpp
	both modified:   test/blackbox/common/BlackboxTestsDiscovery.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@mergify mergify bot added the conflicts Backport PR wich git cherry pick failed label May 12, 2025
@Mario-DL Mario-DL added this to the v2.14.5 milestone May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflicts Backport PR wich git cherry pick failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants