Skip to content

fix: autoware_test_manager:Fix flaky No subscriber for <topic> failure#1181

Open
liuXinGangChina wants to merge 2 commits into
autowarefoundation:mainfrom
liuXinGangChina:fix-issue-1177
Open

fix: autoware_test_manager:Fix flaky No subscriber for <topic> failure#1181
liuXinGangChina wants to merge 2 commits into
autowarefoundation:mainfrom
liuXinGangChina:fix-issue-1177

Conversation

@liuXinGangChina

Copy link
Copy Markdown
Contributor

Description

Fix flaky No subscriber for <topic> failures in autoware_planning_test_manager CI tests.

Two causes are addressed:

1. Race condition in publishToTargetNode()

autoware_test_utils.hpp:565-571 publishes a message and then immediately checks count_subscribers() once. DDS graph discovery is asynchronous — when the subscription is created moments earlier by subscribeOutput(), the node's graph cache may not yet reflect it. On loaded CI runners discovery can lag, causing a false-negative No subscriber exception. Publishing before confirming the subscription also means the message is dropped under volatile QoS.

Fix: replace the single count_subscribers() check with a spin/wait loop (5s timeout, 100ms interval) that actively drives DDS discovery, and publish only after the subscription is confirmed.

2. No RAII guard for rclcpp::init/rclcpp::shutdown

Each TEST() body manually calls rclcpp::init() and rclcpp::shutdown(). When the exception from (1) is thrown, shutdown() is skipped, leaving the ROS context initialized. All subsequent tests then fail with context is already initialized.

Fix: introduce a PlanningTestManagerTest fixture with SetUp/TearDown managing the ROS context lifecycle, and convert TEST()TEST_F().

Related links

Parent Issue:

How was this PR tested?

Built and tested in official core development Docker images:

Image Build 10× test loop
ghcr.io/autowarefoundation/autoware:core-devel-jazzy Pass 10/10 Pass
colcon build --packages-up-to autoware_planning_test_manager
./build/autoware_planning_test_manager/test_autoware_planning_test_manager

Notes for reviewers

autoware_test_utils is a shared testing library. Fixing publishToTargetNode() in the helper eliminates the same race for all downstream packages that use it.

Interface changes

None.

Effects on system behavior

None.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@liuXinGangChina liuXinGangChina added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.98%. Comparing base (704659a) to head (cb74e3c).

Files with missing lines Patch % Lines
...nclude/autoware_test_utils/autoware_test_utils.hpp 37.50% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1181      +/-   ##
==========================================
+ Coverage   60.38%   61.98%   +1.60%     
==========================================
  Files         422      416       -6     
  Lines       25149    24442     -707     
  Branches    11911    11824      -87     
==========================================
- Hits        15185    15150      -35     
+ Misses       6999     6684     -315     
+ Partials     2965     2608     -357     
Flag Coverage Δ *Carryforward flag
daily-humble 66.97% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
daily-jazzy 67.00% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
differential-humble 53.26% <37.50%> (-2.67%) ⬇️
differential-jazzy 53.45% <37.50%> (-2.43%) ⬇️
total 50.38% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
total-humble 66.97% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
total-humble-agnocast 66.74% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
total-jazzy 67.00% <ø> (-0.01%) ⬇️ Carriedforward from 704659a
total-jazzy-agnocast 66.79% <ø> (-0.01%) ⬇️ Carriedforward from 704659a

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liuXinGangChina liuXinGangChina force-pushed the fix-issue-1177 branch 2 times, most recently from 2ed08b2 to cb324c0 Compare June 16, 2026 03:00
@mitsudome-r mitsudome-r requested a review from Copilot June 16, 2026 05:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces CI flakiness in autoware_planning_test_manager tests by making topic discovery/subscriber matching more robust before publishing, and by ensuring ROS context lifecycle is handled reliably even when a test fails.

Changes:

  • Add a bounded spin/wait loop in publishToTargetNode() to wait for subscriber discovery before publishing.
  • Introduce a PlanningTestManagerTest gtest fixture that manages rclcpp::init()/rclcpp::shutdown() via SetUp()/TearDown().
  • Convert affected tests from TEST() to TEST_F() to use the fixture.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
testing/autoware_test_utils/include/autoware_test_utils/autoware_test_utils.hpp Waits for subscriber discovery before publishing to avoid DDS discovery races and dropped messages under volatile QoS.
testing/autoware_planning_test_manager/test/test_planning_test_manager.cpp Uses a fixture to ensure ROS context shutdown happens even when a test fails mid-body.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +573 to +580
while (rclcpp::ok() && (std::chrono::steady_clock::now() - start) < kTimeout) {
if (target_node->count_subscribers(topic_name) > 0) {
break;
}
rclcpp::spin_some(test_node);
rclcpp::spin_some(target_node);
std::this_thread::sleep_for(kWaitInterval);
}
@liuXinGangChina liuXinGangChina force-pushed the fix-issue-1177 branch 2 times, most recently from 3180766 to af211b1 Compare June 16, 2026 07:47
liuXinGangChina and others added 2 commits June 17, 2026 09:52
…s in autoware_planning_test_manager CI tests (closes autowarefoundation#1177)

Signed-off-by: liuXinGangChina <lxg19892021@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:test code in testing folder of core repo run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci)

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

3 participants