Skip to content

[23088] Solve Discovery Server race conditions (backport #5780) #5806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented May 12, 2025

Description

This PR attempts to fix a couple of race conditions detected in scenarios where a large number of clients are present:

  • PDP and EDP messages are stored in two queues which are not atomically swapped, which could create situations such as processing a data r/w of a data P already received but to be processed in the next server routine iteration.
  • When a data UP is received (or manually inserted if a participant is dropped), it remains in the internal participants map but with its change updated. The item will be deleted only after the data UP is acked by all participants. If a new data P is received before this occurs, it will not be correctly processed. As a result, when the data UP is acked the item will be deleted from the participants map, and if a data r/w was to be processed afterwards, this would fail and print one of these errors "Reader/Writer has no associated participant. Skipping" or "Matching unexisting participant from reader/writer".
  • When a reader/writer is being inserted to the database, checking if the corresponding participant is present is done after insertion, instead of as a first step. And even if the participant is present, it should be checked if it is alive to abort otherwise.

@Mergifyio backport 3.1.x 2.14.x 2.10.x

Contributor Checklist

  • Commit messages follow the project guidelines.

  • The code follows the style guidelines of this project.

  • Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally

  • N/A Any new/modified methods have been properly documented using Doxygen.

  • N/A Any new configuration API has an equivalent XML API (with the corresponding XSD extension)

  • Changes are backport compatible: they do NOT break ABI nor change library core behavior.

  • Changes are API compatible.

  • N/A New feature has been added to the versions.md file (if applicable).

  • N/A New feature has been documented/Current behavior is correctly described in the documentation.

  • Applicable backports have been included in the description.

Reviewer Checklist

  • The PR has a milestone assigned.
  • The title and description correctly express the PR's purpose.
  • Check contributor checklist is correct.
  • N/A If this is a critical bug fix, backports to the critical-only supported branches have been requested.
  • Check CI results: changes do not issue any warning.
  • Check CI results: failing tests are unrelated with the changes.

This is an automatic backport of pull request #5780 done by [Mergify](https://mergify.com).

@mergify mergify bot added the conflicts Backport PR wich git cherry pick failed label May 12, 2025
Copy link
Contributor Author

mergify bot commented May 12, 2025

Cherry-pick of ec666f7 has failed:

On branch mergify/bp/3.1.x/pr-5780
Your branch is up to date with 'origin/3.1.x'.

You are currently cherry-picking commit ec666f72.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   src/cpp/rtps/builtin/discovery/database/DiscoveryDataBase.cpp
	modified:   src/cpp/rtps/builtin/discovery/database/DiscoveryDataBase.hpp
	modified:   src/cpp/rtps/builtin/discovery/participant/PDPServer.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   test/blackbox/common/BlackboxTestsDiscovery.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@Mario-DL Mario-DL added this to the v3.1.3 milestone May 14, 2025
* Refs #23088: Test reconnection when removing participant

Signed-off-by: cferreiragonz <[email protected]>

* Refs #23088: Solve EDP-PDP queues race condition

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Solve data UP + data P race condition

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Abort writer/reader processing if associated participant not alive

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Apply suggestions

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Release change when writer/reader insertion in DB failed

Signed-off-by: Juan Lopez Fernandez <[email protected]>

* Refs #23088: Match servers after change update

Signed-off-by: Juan Lopez Fernandez <[email protected]>

---------

Signed-off-by: cferreiragonz <[email protected]>
Signed-off-by: Juan Lopez Fernandez <[email protected]>
Co-authored-by: cferreiragonz <[email protected]>
@Mario-DL Mario-DL force-pushed the mergify/bp/3.1.x/pr-5780 branch from 9a40b50 to c489c7d Compare May 16, 2025 06:02
@Mario-DL Mario-DL requested review from Mario-DL and removed request for Mario-DL May 16, 2025 06:02
@Mario-DL Mario-DL removed the conflicts Backport PR wich git cherry pick failed label May 16, 2025
@Mario-DL Mario-DL self-requested a review May 16, 2025 06:03
@github-actions github-actions bot added the ci-pending PR which CI is running label May 16, 2025
@Mario-DL Mario-DL merged commit dd7e20a into 3.1.x May 19, 2025
21 checks passed
@Mario-DL Mario-DL deleted the mergify/bp/3.1.x/pr-5780 branch May 19, 2025 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-pending PR which CI is running
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants