refactor(agnocastlib): Refactor epoll processing and clean up hardcoded event-specific logic#1245
refactor(agnocastlib): Refactor epoll processing and clean up hardcoded event-specific logic#1245
Conversation
f8c0de0 to
f37d50c
Compare
Previously, when events managed by epoll changed, we notified each Executor to call `prepare_epoll_impl()` by setting the global atomic variable `need_epoll_updates` to true. However, this implementation will become problematic for future refactoring efforts aimed at extracting event-specific processing from `agnocast_epoll.hpp` and `.cpp`. This commit removes this global variable and introduces an alternative notification mechanism. While the current implementation only supports broadcasting notifications to all Executors—which leaves some performance challenges—it establishes a 1-to-1 tracking structure for each Executor. This lays the groundwork for implementing targeted 1-to-1 notifications in the future. Relates to: #969 Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Previously, all event-specific logic was hardcoded in `agnocast_epoll.cpp` and `.hpp`. This caused dependency issues and made it difficult to add new event types. This commit refactors the code by categorizing events and moving their implementations into separate source files. Key changes include: - Change epoll_data format from u32 to u64 to hold both event kind and local identifier. - Introduce `EpollManager` to manage and dispatch events from the Epoll class. - Introduce `EpollEventSource` as an abstract base class for specific event handlers. - Replace raw file descriptor `epoll_fd_` usage with the encapsulated `Epoll` class. - Move implementation details from headers to `.cpp` files. Relates to: #969 Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
dd979fe to
0194636
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors agnocastlib’s epoll integration to remove global “polling for updates” (need_epoll_updates), introduce per-executor update tracking, and decouple event-type-specific epoll handling into dedicated event-source classes.
Changes:
- Replaced the global epoll-update flag with an
EpollUpdateDispatcher/EpollUpdateTrackermechanism so each executor can independently decide when to re-prepare epoll. - Introduced
EpollManager+EpollEventSourceabstraction, plusEpollwrapper that packs(event_type, local_id)intoepoll_event.data.u64. - Moved subscription/timer epoll prepare/handle logic out of
agnocast_epoll.cppintoSubscriptionEventSourceandTimerEventSource.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/agnocastlib/test/unit/test_mocked_agnocast.cpp | Updates ID overflow tests to use new max-ID constants. |
| src/agnocastlib/src/node/agnocast_only_single_threaded_executor.cpp | Switches epoll update triggering to epoll_update_tracker_ + epoll_manager_. |
| src/agnocastlib/src/node/agnocast_only_multi_threaded_executor.cpp | Same as above for the MT executor variant. |
| src/agnocastlib/src/node/agnocast_only_executor.cpp | Replaces raw epoll fd usage with EpollManager and registers shutdown via manager. |
| src/agnocastlib/src/agnocast_timer_info.cpp | Replaces global update flag with dispatcher notification; adds TimerEventSource. |
| src/agnocastlib/src/agnocast_single_threaded_executor.cpp | Uses per-executor update tracker instead of global flag. |
| src/agnocastlib/src/agnocast_multi_threaded_executor.cpp | Uses per-executor update tracker instead of global flag. |
| src/agnocastlib/src/agnocast_executor.cpp | Introduces EpollManager and per-executor tracker; delegates prepare/wait to manager. |
| src/agnocastlib/src/agnocast_epoll.cpp | Replaces monolithic event handling with EpollManager dispatch over event sources. |
| src/agnocastlib/src/agnocast_epoll_update_dispatcher.cpp | Adds dispatcher/tracker implementation. |
| src/agnocastlib/src/agnocast_epoll_event.cpp | Adds Epoll wrapper implementation for add/remove/wait with packed event data. |
| src/agnocastlib/src/agnocast_callback_info.cpp | Adds SubscriptionEventSource prepare/handle and new callback-id max check. |
| src/agnocastlib/include/agnocast/node/agnocast_only_executor.hpp | Stores EpollManager + EpollUpdateTracker instead of raw epoll fd. |
| src/agnocastlib/include/agnocast/agnocast_timer_info.hpp | Adds MAX_TIMER_ID and declares TimerEventSource. |
| src/agnocastlib/include/agnocast/agnocast_executor.hpp | Stores EpollManager + EpollUpdateTracker instead of raw epoll fd. |
| src/agnocastlib/include/agnocast/agnocast_epoll.hpp | Defines event-source abstraction and EpollManager. |
| src/agnocastlib/include/agnocast/agnocast_epoll_update_dispatcher.hpp | Declares dispatcher/tracker API. |
| src/agnocastlib/include/agnocast/agnocast_epoll_event.hpp | Declares packed epoll event encoding + Epoll wrapper API. |
| src/agnocastlib/include/agnocast/agnocast_callback_info.hpp | Adds MAX_CALLBACK_INFO_ID, dispatcher notification, and declares SubscriptionEventSource. |
| src/agnocastlib/CMakeLists.txt | Adds new source files to the shared library build. |
Comments suppressed due to low confidence (1)
src/agnocastlib/include/agnocast/agnocast_epoll.hpp:10
agnocast_epoll.hppusesstd::function,std::array, andstd::unique_ptrbut does not include the corresponding standard headers. This makes the header non-self-contained and can cause compilation failures depending on include order. Add the missing includes (e.g.,<functional>,<array>,<memory>) in this header.
#include "agnocast/agnocast_epoll_event.hpp"
#include <rclcpp/callback_group.hpp>
#include <atomic>
#include <mutex>
#include <shared_mutex>
#include <vector>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Coverage Report (jazzy) |
Coverage Report (humble) |
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while (spinning_.load()) { | ||
| if (need_epoll_updates.load()) { | ||
| if (epoll_update_tracker_.need_update()) { | ||
| add_callback_groups_from_nodes_associated_to_executor(); | ||
| agnocast::prepare_epoll_impl( | ||
| epoll_fd_, my_pid_, ready_agnocast_executables_mutex_, ready_agnocast_executables_, | ||
| [this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| epoll_manager_->prepare_epoll([this](const rclcpp::CallbackGroup::SharedPtr & group) { | ||
| return is_callback_group_associated(group); | ||
| }); | ||
| } |
There was a problem hiding this comment.
epoll_update_tracker_.need_update() is edge-triggered (it clears the flag via exchange). If prepare_epoll() skips registrations because is_callback_group_associated(group) is false (e.g., subscription/timer created before the node/callback group is added/associated), the corresponding need_epoll_update flags stay true but this loop won’t call prepare_epoll() again unless some unrelated notify_all() happens. This can leave fds permanently unregistered in this executor’s epoll set. Consider re-notifying (or keeping this executor’s tracker flagged) while there are pending need_epoll_update entries, and/or triggering an epoll-update notification from add_node/callback-group association changes.
There was a problem hiding this comment.
Fixed in f92d376. In this commit, I updated AgnocastOnlyExecutor to send an epoll update notification at the end of methods like add_node() and add_callback_group(). This fixes the issue you pointed out.
Previously, the Executor did not trigger an epoll update inside `add_node()`, `remove_node()`, `add_callback_group()`, and `remove_callback_group()`. This caused a risk of missing epoll update notifications in the following scenario: 1. A Node instance is created. 2. An Entity is added to the Node's CallbackGroup (an epoll update notification is sent here). 3. The Executor receives the notification and processes the epoll update. However, because the Node is not yet associated with the Executor, it skips the epoll registration for that Entity. 4. `add_node()` is called to associate the Node with the Executor. (No epoll update is triggered here, leaving the Entity unregistered). To fix this bug, we now trigger an epoll update notification whenever there are changes to the Nodes or CallbackGroups managed by the Executor. Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
3471353 to
547f94e
Compare
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
Signed-off-by: Takumi Jin <primenumber_2_3_5@yahoo.co.jp>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mq_receive(callback_info.mqdes, reinterpret_cast<char *>(&mq_msg), sizeof(mq_msg), nullptr); | ||
| if (ret < 0) { | ||
| if (errno != EAGAIN) { | ||
| RCLCPP_ERROR(logger, "mq_receive failed: %s", strerror(errno)); |
There was a problem hiding this comment.
SubscriptionEventSource::handle() logs mq_receive failed without identifying which subscription/topic caused the failure. This loses debugging context compared to the prior implementation (topic name + subscriber_id). Consider including callback_info.topic_name and callback_info.subscriber_id in the error output so operational logs remain actionable.
| RCLCPP_ERROR(logger, "mq_receive failed: %s", strerror(errno)); | |
| RCLCPP_ERROR( | |
| logger, "mq_receive failed for topic '%s' subscriber_id %u: %s", | |
| callback_info.topic_name.c_str(), callback_info.subscriber_id, strerror(errno)); |
|
|
||
| #include "agnocast/agnocast.hpp" | ||
| #include "agnocast/agnocast_epoll_event.hpp" | ||
| #include "agnocast/agnocast_mq.hpp" |
There was a problem hiding this comment.
agnocast_epoll.cpp includes agnocast/agnocast_mq.hpp, but this file no longer uses any MQ symbols after the refactor. Please remove the unused include to avoid unnecessary coupling and reduce rebuild scope when MQ headers change.
| #include "agnocast/agnocast_mq.hpp" |
| struct AgnocastExecutable; | ||
|
|
There was a problem hiding this comment.
struct AgnocastExecutable; is forward-declared here but not used anywhere in this translation unit after the refactor. Dropping the unused forward declaration will keep the file tidy and avoid confusion about hidden dependencies.
| struct AgnocastExecutable; |
| #include "agnocast/agnocast_tracepoint_wrapper.h" | ||
| #include "rclcpp/rclcpp.hpp" | ||
| #include "rclcpp/version.h" | ||
| #include "sys/epoll.h" |
There was a problem hiding this comment.
agnocast_executor.cpp still includes sys/epoll.h, but AgnocastExecutor no longer calls epoll_create1 / epoll_ctl directly after switching to EpollManager. Consider removing this now-unused include to reduce dependencies and rebuild scope.
| #include "sys/epoll.h" |
| EpollUpdateTracker EpollUpdateDispatcher::create_tracker() | ||
| { | ||
| int new_id = next_tracker_id_.fetch_add(1, std::memory_order_relaxed); | ||
|
|
||
| auto context = std::make_shared<TrackerContext>(); | ||
|
|
||
| { | ||
| std::lock_guard<std::mutex> lock(mutex_); | ||
| trackers_.emplace(new_id, context); | ||
| } | ||
|
|
||
| return {new_id, context}; | ||
| } | ||
|
|
||
| void EpollUpdateDispatcher::notify_all() | ||
| { | ||
| std::lock_guard<std::mutex> lock(mutex_); | ||
| for (auto & [id, context] : trackers_) { | ||
| context->need_update.store(true, std::memory_order_release); | ||
| } | ||
| } | ||
|
|
||
| void EpollUpdateDispatcher::notify(int tracker_id) | ||
| { | ||
| std::lock_guard<std::mutex> lock(mutex_); | ||
| auto it = trackers_.find(tracker_id); | ||
| if (it != trackers_.end()) { | ||
| it->second->need_update.store(true, std::memory_order_release); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
EpollUpdateDispatcher / EpollUpdateTracker introduce new concurrency-critical behavior (per-executor edge-triggered update flags), but there are no unit tests validating the expected semantics (initial state, notify vs notify_all, flag reset via need_update(), and unregister on destruction). Adding a small gtest for these basics would help prevent regressions in this refactor.
Description
This PR refactors
agnocast_epoll.cppand.hppto eliminate tight dependencies, improve code extensibility, and optimize how Executors handle epoll updates.Key Changes & Motivations
Replaced Global Polling with Individual Notifications
need_epoll_updatesglobal variable. Replaced the polling approach with a new mechanism that notifies each Executor individually when an update is needed.Decoupled Event Handling
agnocast_epoll.cppand.hppand moved them into their respective, event-specific files.Future-proofing
Related links
close #969
How was this PR tested?
bash scripts/test/e2e_test_1to1.bash(required)bash scripts/test/e2e_test_2to2.bash(required)bash scripts/test/run_requires_kernel_module_tests.bash(required)Notes for reviewers
Version Update Label (Required)
Please add exactly one of the following labels to this PR:
need-major-update: User API breaking changesneed-minor-update: Internal API breaking changes (heaphook/kmod/agnocastlib compatibility)need-patch-update: Bug fixes and other changesImportant notes:
need-major-updateorneed-minor-update, please include this in the PR title as well.fix(foo)[needs major version update]: barorfeat(baz)[needs minor version update]: quxrun-build-testlabel. The PR can only be merged after the build tests pass.See CONTRIBUTING.md for detailed versioning rules.