Skip to content

[gpio_controllers] Fix crash on NaN state values and sanitize inputs#2103

Open
Ishan1923 wants to merge 16 commits intoros-controls:masterfrom
Ishan1923:fix/gpio-issue-1970-clean
Open

[gpio_controllers] Fix crash on NaN state values and sanitize inputs#2103
Ishan1923 wants to merge 16 commits intoros-controls:masterfrom
Ishan1923:fix/gpio-issue-1970-clean

Conversation

@Ishan1923
Copy link
Copy Markdown
Contributor

This PR attempts to address Issue #1970 ("Boolean data type signals are not supported by GPIO Controller"). Users were encountering crashes and exceptions when reading states, likely due to uninitialized values or type mismatches.

I have introduced a sanitize_double helper function to handle state readings.

Approach & Reasoning:
Instead of a hard cast to bool, this function detects NaN (uninitialized memory) or invalid values and defaults them safely to 0.0. This stops the crashes while preserving the original double value if valid.

This supports both use cases:

  • Boolean logic: Works as expected (Values like 0.0 and 1.0 pass through).
  • Analog logic: Remains supported (Values like 3.1 or 5.0 are preserved).

I have also updated the ReproduceBadCastCrash test case to EXPECT_NO_THROW to verify that these edge cases are now handled gracefully.

Resolves #1970

Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
@JavierIntermodalicsKion
Copy link
Copy Markdown

Any news on this? Thanks in advance for all the effort @Ishan1923

@Ishan1923
Copy link
Copy Markdown
Contributor Author

Any news on this? Thanks in advance for all the effort @Ishan1923

Hi @JavierIntermodalicsKion, thanks for checking in! The PR is fully ready and tested on my end. Currently, I am just waiting for a maintainer to approve the CI workflows (since I am a new contributor, they don't run automatically) and for a code review. Fingers crossed we can get this merged soon!

Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
@Ishan1923
Copy link
Copy Markdown
Contributor Author

@Juliaj Thanks for the review! I have addressed your comments:

  • Removed the temporary comments.
  • Renamed the test case to UpdateBoolGpioInterfaces.
  • Verified pre-commit passes.

(Note: I encountered a local dependency race condition with the parameter generation during build, but the code changes themselves are verified. I was just trying to re-verify my changes by rebuilding.)

@Ishan1923 Ishan1923 requested a review from Juliaj January 21, 2026 20:12
Copy link
Copy Markdown
Contributor

@Juliaj Juliaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this fixes the reported problem. The exceptions happens in
state_interfaces_map_.at(interface_name).get().get_optional() with default template parameter (which is double), but only on jazzy:

On rolling, we do auto-casting to double with a single warning per interface
https://github.com/ros-controls/ros2_control/blob/b59bd9c48067ad989caf4a719afc529b8f61045b/hardware_interface/include/hardware_interface/handle.hpp#L339-L354

on Jazzy, we throw an exception instead
https://github.com/ros-controls/ros2_control/blob/43d058f1fae58a2cecd5a708e7cce61f554a9bb0/hardware_interface/include/hardware_interface/handle.hpp#L361-L374

@Juliaj
Copy link
Copy Markdown
Contributor

Juliaj commented Jan 30, 2026

I don't think that this fixes the reported problem. The exceptions happens in
state_interfaces_map_.at(interface_name).get().get_optional() with default template parameter (which is double), but only on jazzy:

Thanks @christophfroehlich, I wasn't aware of this difference between Jazzy and rolling. At a high level, how should this be handled ?

@christophfroehlich
Copy link
Copy Markdown
Member

christophfroehlich commented Jan 30, 2026

I don't think that this fixes the reported problem. The exceptions happens in
state_interfaces_map_.at(interface_name).get().get_optional() with default template parameter (which is double), but only on jazzy:

Thanks @christophfroehlich, I wasn't aware of this difference between Jazzy and rolling. At a high level, how should this be handled ?

Currently: Either skip non-double interfaces, or do explicit casting (needs correct template of the get_optional() method)

I'd vote for auto-casting of interfaces to make these custom controllers more reusable. This is an always recurring discussion with @saikishor ;)

@saikishor
Copy link
Copy Markdown
Member

I wouldn't vote for this approach. IMO the controller should handle them. If you know you cannot deal with certain types, fail the activation or do something. If we have arrays in future, then it gets complicated easily.

I would say instead, add a check what are and can be supported by the controller. If this casting is only for publishing the data that's where the issue is. Add a standalone function that does the casting and let's see how many ros2_controllers will need it.

I understand that GPIO will need it as it is a more generic controller for the interfaces.

Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
@Ishan1923
Copy link
Copy Markdown
Contributor Author

Thanks for the clarification on the Rolling vs. Jazzy versions. I apologize for the oversight.
I have updated apply_state_value in gpio_command_controller.cpp with the following changes:

  • sequential Logic: The code now checks for a double interface first. If that fails, it checks for a bool interface (casting it to 0.0/1.0).
  • error Handling: If the interface is neither double nor bool, it logs a single debug message stating data could not be retrieved.
  • tests: Moved new tests to test_gpio_command_controller.cpp.
  • build: linked the parameters library

Regarding @saikishor's point on future complexity (e.g., handling arrays): The current sequential approach solves the immediate crash efficiently without unnecessary lookups. I was unsure about the arrays, and other datatypes, as the current problem addresses bool/double issue.

Ready for review.

@Ishan1923 Ishan1923 requested a review from Juliaj February 1, 2026 09:04
Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the logic explicitly dependent on the data type using
state_interfaces_map_.at(interface_name).get().get_data_type()

Furthermore, please install pre-commit, activate it for this repo (pre-commit install) and reformat your changes.

Signed-off-by: Ishan1923 <ecdev4ishan@gmail.com>
@Ishan1923
Copy link
Copy Markdown
Contributor Author

  • Explicit Type Checking: I now check interface.get_data_type() against hardware_interface::HandleDataType::DOUBLE and BOOL before attempting to read.
  • Safety: It no longer guesses/casts sequentially. If a type is unsupported (not double/bool), it logs a specific debug message and skips it.
  • Formatting: pre-commit has been run on the codebase.

Local tests passed successfully. Ready for re-review!

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 75.86207% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.76%. Comparing base (3e66d39) to head (c5e14b6).

Files with missing lines Patch % Lines
gpio_controllers/src/gpio_command_controller.cpp 46.15% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2103      +/-   ##
==========================================
- Coverage   84.77%   84.76%   -0.01%     
==========================================
  Files         153      153              
  Lines       15236    15260      +24     
  Branches     1322     1324       +2     
==========================================
+ Hits        12916    12935      +19     
- Misses       1838     1842       +4     
- Partials      482      483       +1     
Flag Coverage Δ
unittests 84.76% <75.86%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
..._controllers/test/test_gpio_command_controller.cpp 99.04% <100.00%> (+0.05%) ⬆️
gpio_controllers/src/gpio_command_controller.cpp 81.57% <46.15%> (-1.94%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going into the right direction :)

Two things:

  • ONCE macro does not work like this
  • compiling on humble fails with
    [ RUN      ] GpioCommandControllerTestSuite.UpdateBoolGpioInterfaces
    [INFO] [1770019124.709374624] [test_gpio_command_controller]: configure successful
    [INFO] [1770019124.709482175] [test_gpio_command_controller]: activate successful
    terminate called after throwing an instance of 'std::logic_error'
      what():  basic_string::_M_construct null not valid
  >>>

@Ishan1923
Copy link
Copy Markdown
Contributor Author

I have added a unordered_set for storing nan interfaces and used RCLCPP_INFO, and for checking only once, there is a if condition statement that checks for any existing interface, matching with the new one in O(1) TC.

I removed the type.to_string() call in the log message, which was causing the _basic_string::M_construct null crash.

Local tests (test_gpio_command_controller) are now passing successfully.

Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@christophfroehlich christophfroehlich added backport-jazzy Triggers PR backport to ROS 2 jazzy. backport-kilted Triggers PR backport to ROS 2 kilted. labels Feb 10, 2026
Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk why, but the ros2_control stack including this patch fails in the compatibility-build on humble:

[6193](https://github.com/ros-controls/ros2_controllers/actions/runs/21919550699/job/63317000871?pr=2103#step:6:16206)
    [ RUN      ] GpioCommandControllerTestSuite.UpdateBoolGpioInterfaces
    [INFO] [1770849228.540379897] [test_gpio_command_controller]: configure successful
    [INFO] [1770849228.540464415] [test_gpio_command_controller]: activate successful
    terminate called after throwing an instance of 'std::logic_error'
      what():  basic_string::_M_construct null not valid

This job uses this repos file, but builds it on humble distro.

Can you have a look please?

@Ishan1923
Copy link
Copy Markdown
Contributor Author

idk why, but the ros2_control stack including this patch fails in the compatibility-build on humble:

[6193](https://github.com/ros-controls/ros2_controllers/actions/runs/21919550699/job/63317000871?pr=2103#step:6:16206)
    [ RUN      ] GpioCommandControllerTestSuite.UpdateBoolGpioInterfaces
    [INFO] [1770849228.540379897] [test_gpio_command_controller]: configure successful
    [INFO] [1770849228.540464415] [test_gpio_command_controller]: activate successful
    terminate called after throwing an instance of 'std::logic_error'
      what():  basic_string::_M_construct null not valid

This job uses this repos file, but builds it on humble distro.

Can you have a look please?

I looked into it; I think that this is most likely related to an ABI break, I added undordered_set nan_interfaces_ to keep the records of the joints, the CI library must be linking my changes with the old compiled library, which must be causing a memory layout mismatch or failure of ODR (One-Defination-Rule) implementation.

I propose that the nan_interfaces be removed from header file and introduced as a local variable in the cpp file or we can use DEBUG or RCLCPP_WARN_THROTTLE in the cpp file. Pushing the fix shortly.

@christophfroehlich
Copy link
Copy Markdown
Member

It seems that this issue in the compatibility build was introduced somewhere else, as it is failing everywhere in the CI jobs now. Not sure why this happens suddenly and only in the compatibility builds. I'll come back once we have a fixed them

@Ishan1923
Copy link
Copy Markdown
Contributor Author

Thanks for the update! I was worried I broke the whole stack!
I'll hold off on any further changes until the CI issues are resolved. Thanks!

@Ishan1923
Copy link
Copy Markdown
Contributor Author

It seems that this issue in the compatibility build was introduced somewhere else, as it is failing everywhere in the CI jobs now. Not sure why this happens suddenly and only in the compatibility builds. I'll come back once we have a fixed them

any updates?

@johanubbink
Copy link
Copy Markdown

Thank you to everyone for the effort on this, it is much appreciated!

I have tried this fix and noticed that this PR modifies the apply_state_value function to support both bool and double. However, the apply_command function is not updated. With the current implementation, the controller seems to fail if the command interface is a bool, because it always try to write a double (at least on my ros2_control version 4.43.0). I think the original issue #1970 also mentioned this?

Maybe the apply_command function can be updated along the line of:

    bool success {false};
    if (type == hardware_interface::HandleDataType::DOUBLE) {
      success = interface.set_value<double>(command_value);
    } else if (type == hardware_interface::HandleDataType::BOOL) {
      // Assuming some convert_to_bool logic is available
      success = interface.set_value<bool>(convert_to_bool(command_value));
    } else {
      RCLCPP_WARN_THROTTLE(
        get_node()->get_logger(), *get_node()->get_clock(), 10000,
        "Interface '%s' has unsupported type. Only 'double' and 'bool' are supported.",
        interface_name.c_str());
    }

disclaimer: I'm still finding my way around ros2 control, so I might be missing something in the bigger picture... But please let me know if I can be of any help.

@Ishan1923
Copy link
Copy Markdown
Contributor Author

@christophfroehlich @johanubbink You're absolutely right ; I missed
the apply_command side of the fix, this was a big blunder. I'll update the PR to handle
bool interfaces in apply_command as well, along with a test case
to cover it. Sorry for the oversight, will push the fix shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-jazzy Triggers PR backport to ROS 2 jazzy. backport-kilted Triggers PR backport to ROS 2 kilted.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Boolean data type signals are not supported by GPIO Controller

6 participants