[gpio_controllers] Fix crash on NaN state values and sanitize inputs#2103
[gpio_controllers] Fix crash on NaN state values and sanitize inputs#2103Ishan1923 wants to merge 16 commits intoros-controls:masterfrom
Conversation
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
|
Any news on this? Thanks in advance for all the effort @Ishan1923 |
Hi @JavierIntermodalicsKion, thanks for checking in! The PR is fully ready and tested on my end. Currently, I am just waiting for a maintainer to approve the CI workflows (since I am a new contributor, they don't run automatically) and for a code review. Fingers crossed we can get this merged soon! |
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
|
@Juliaj Thanks for the review! I have addressed your comments:
(Note: I encountered a local dependency race condition with the parameter generation during build, but the code changes themselves are verified. I was just trying to re-verify my changes by rebuilding.) |
christophfroehlich
left a comment
There was a problem hiding this comment.
I don't think that this fixes the reported problem. The exceptions happens in
state_interfaces_map_.at(interface_name).get().get_optional() with default template parameter (which is double), but only on jazzy:
On rolling, we do auto-casting to double with a single warning per interface
https://github.com/ros-controls/ros2_control/blob/b59bd9c48067ad989caf4a719afc529b8f61045b/hardware_interface/include/hardware_interface/handle.hpp#L339-L354
on Jazzy, we throw an exception instead
https://github.com/ros-controls/ros2_control/blob/43d058f1fae58a2cecd5a708e7cce61f554a9bb0/hardware_interface/include/hardware_interface/handle.hpp#L361-L374
Thanks @christophfroehlich, I wasn't aware of this difference between Jazzy and rolling. At a high level, how should this be handled ? |
Currently: Either skip non-double interfaces, or do explicit casting (needs correct template of the get_optional() method) I'd vote for auto-casting of interfaces to make these custom controllers more reusable. This is an always recurring discussion with @saikishor ;) |
|
I wouldn't vote for this approach. IMO the controller should handle them. If you know you cannot deal with certain types, fail the activation or do something. If we have arrays in future, then it gets complicated easily. I would say instead, add a check what are and can be supported by the controller. If this casting is only for publishing the data that's where the issue is. Add a standalone function that does the casting and let's see how many ros2_controllers will need it. I understand that GPIO will need it as it is a more generic controller for the interfaces. |
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
|
Thanks for the clarification on the Rolling vs. Jazzy versions. I apologize for the oversight.
Regarding @saikishor's point on future complexity (e.g., handling arrays): The current sequential approach solves the immediate crash efficiently without unnecessary lookups. I was unsure about the arrays, and other datatypes, as the current problem addresses bool/double issue. Ready for review. |
christophfroehlich
left a comment
There was a problem hiding this comment.
Please make the logic explicitly dependent on the data type using
state_interfaces_map_.at(interface_name).get().get_data_type()
Furthermore, please install pre-commit, activate it for this repo (pre-commit install) and reformat your changes.
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
Local tests passed successfully. Ready for re-review! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2103 +/- ##
==========================================
- Coverage 84.77% 84.76% -0.01%
==========================================
Files 153 153
Lines 15236 15260 +24
Branches 1322 1324 +2
==========================================
+ Hits 12916 12935 +19
- Misses 1838 1842 +4
- Partials 482 483 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
christophfroehlich
left a comment
There was a problem hiding this comment.
This is going into the right direction :)
Two things:
- ONCE macro does not work like this
- compiling on humble fails with
[ RUN ] GpioCommandControllerTestSuite.UpdateBoolGpioInterfaces
[INFO] [1770019124.709374624] [test_gpio_command_controller]: configure successful
[INFO] [1770019124.709482175] [test_gpio_command_controller]: activate successful
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
>>>
|
I have added a unordered_set for storing nan interfaces and used RCLCPP_INFO, and for checking only once, there is a if condition statement that checks for any existing interface, matching with the new one in O(1) TC. I removed the type.to_string() call in the log message, which was causing the _basic_string::M_construct null crash. Local tests (test_gpio_command_controller) are now passing successfully. |
There was a problem hiding this comment.
idk why, but the ros2_control stack including this patch fails in the compatibility-build on humble:
[6193](https://github.com/ros-controls/ros2_controllers/actions/runs/21919550699/job/63317000871?pr=2103#step:6:16206)
[ RUN ] GpioCommandControllerTestSuite.UpdateBoolGpioInterfaces
[INFO] [1770849228.540379897] [test_gpio_command_controller]: configure successful
[INFO] [1770849228.540464415] [test_gpio_command_controller]: activate successful
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
This job uses this repos file, but builds it on humble distro.
Can you have a look please?
I looked into it; I think that this is most likely related to an ABI break, I added undordered_set nan_interfaces_ to keep the records of the joints, the CI library must be linking my changes with the old compiled library, which must be causing a memory layout mismatch or failure of ODR (One-Defination-Rule) implementation. I propose that the nan_interfaces be removed from header file and introduced as a local variable in the cpp file or we can use DEBUG or RCLCPP_WARN_THROTTLE in the cpp file. Pushing the fix shortly. |
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
…923/ros2_controllers into fix/gpio-issue-1970-clean
|
It seems that this issue in the compatibility build was introduced somewhere else, as it is failing everywhere in the CI jobs now. Not sure why this happens suddenly and only in the compatibility builds. I'll come back once we have a fixed them |
|
Thanks for the update! I was worried I broke the whole stack! |
any updates? |
|
Thank you to everyone for the effort on this, it is much appreciated! I have tried this fix and noticed that this PR modifies the Maybe the bool success {false};
if (type == hardware_interface::HandleDataType::DOUBLE) {
success = interface.set_value<double>(command_value);
} else if (type == hardware_interface::HandleDataType::BOOL) {
// Assuming some convert_to_bool logic is available
success = interface.set_value<bool>(convert_to_bool(command_value));
} else {
RCLCPP_WARN_THROTTLE(
get_node()->get_logger(), *get_node()->get_clock(), 10000,
"Interface '%s' has unsupported type. Only 'double' and 'bool' are supported.",
interface_name.c_str());
}disclaimer: I'm still finding my way around ros2 control, so I might be missing something in the bigger picture... But please let me know if I can be of any help. |
|
@christophfroehlich @johanubbink You're absolutely right ; I missed |
This PR attempts to address Issue #1970 ("Boolean data type signals are not supported by GPIO Controller"). Users were encountering crashes and exceptions when reading states, likely due to uninitialized values or type mismatches.
I have introduced a
sanitize_doublehelper function to handle state readings.Approach & Reasoning:
Instead of a hard cast to bool, this function detects
NaN(uninitialized memory) or invalid values and defaults them safely to0.0. This stops the crashes while preserving the originaldoublevalue if valid.This supports both use cases:
I have also updated the
ReproduceBadCastCrashtest case toEXPECT_NO_THROWto verify that these edge cases are now handled gracefully.Resolves #1970