Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(nebula_hw_interfaces): handling http get post request exception #264

Merged
merged 16 commits into from
Feb 17, 2025

Conversation

shmpwk
Copy link
Contributor

@shmpwk shmpwk commented Feb 10, 2025

PR Type

  • Bug Fix

Related Links

Bug found slack thread:
https://star4.slack.com/archives/CEV8XMJBV/p1739142264251479?thread_ts=1739139390.099319&cid=CEV8XMJBV

Description

Added error handling for HTTP GET/POST requests in VelodyneHwInterface to address system crashes caused by unhandled exceptions during HTTP communication.

The system would crash with "Interrupted system call" errors when exceptions occurred during HTTP requests to the Velodyne sensor:

1739140782.9432542 [component_container_mt-52] ^[[0m[INFO 1739140768.990953850] [rclcpp] signal_handler(): "signal_handler(signum=2)" at (/home/autoware/pilot-auto.xx1/src/rclcpp/rclcpp/src/rclcpp/signal_handler.cpp:L71)^[[0m
1739140782.9433613 [component_container_mt-52] terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
1739140782.9435017 [component_container_mt-52]   what():  Interrupted system call
1739140782.9435987 [component_container_mt-52] *** Aborted at 1739140768 (unix time) try "date -d @1739140768" if you are using GNU date ***
1739140782.9436829 [component_container_mt-52] PC: @                0x0 (unknown)
1739140782.9437885 [component_container_mt-52] *** SIGABRT (@0x3e8000009fa) received by PID 2554 (TID 0x7f1d91b9d680) from PID 2554; stack trace: ***
1739140782.9438906 [component_container_mt-52]     @     0x7f1d926634d6 google::(anonymous namespace)::FailureSignalHandler()
1739140782.9440007 [component_container_mt-52]     @     0x7f1d91f13520 (unknown)
1739140782.9441085 [component_container_mt-52]     @     0x7f1d91f679fc pthread_kill
1739140782.9441876 [component_container_mt-52]     @     0x7f1d91f13476 raise
1739140782.9442828 [component_container_mt-52]     @     0x7f1d91ef97f3 abort
1739140782.9443274 [component_container_mt-52]     @     0x7f1d921bcb9e (unknown)
1739140782.9444058 [component_container_mt-52]     @     0x7f1d921c820c (unknown)
1739140782.9444652 [component_container_mt-52]     @     0x7f1d921c8277 std::terminate()
1739140782.9445279 [component_container_mt-52]     @     0x7f1d921c84d8 __cxa_throw
1739140782.9446065 [component_container_mt-52]     @     0x7f1d8c07f055 boost::throw_exception<>()
1739140782.9446507 [component_container_mt-52]     @     0x7f1d8c07eb7c _ZN7drivers10tcp_driver10HttpClient5writeENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN5boost5beast4http4verbES7_.cold
1739140782.9446948 [component_container_mt-52]     @     0x7f1d8c0abc4a drivers::tcp_driver::HttpClient::get()
1739140782.9447391 [component_container_mt-52]     @     0x7f1d8c0cef88 drivers::tcp_driver::HttpClientDriver::get()
1739140782.9447834 [component_container_mt-52]     @     0x7f1d8e1f992d nebula::drivers::VelodyneHwInterface::http_get_request()
1739140782.9448268 [component_container_mt-52]     @     0x7f1d8e1f9b69 nebula::drivers::VelodyneHwInterface::get_snapshot()
1739140782.9448709 [component_container_mt-52]     @     0x7f1d847c0eb4 nebula::ros::VelodyneHwMonitorWrapper::on_velodyne_snapshot_timer()
1739140782.9449153 [component_container_mt-52]     @     0x7f1d847c12d4 _ZN6rclcpp12GenericTimerIZN6nebula3ros24VelodyneHwMonitorWrapper31initialize_velodyne_diagnosticsEvEUlvE_LPv0EE16execute_callbackEv
1739140782.9449596 [component_container_mt-52]     @     0x7f1d924e0ffe rclcpp::Executor::execute_any_executable()
1739140782.9450040 [component_container_mt-52]     @     0x7f1d924e7432 rclcpp::executors::MultiThreadedExecutor::run()
1739140782.9450479 [component_container_mt-52]     @     0x7f1d924e7808 rclcpp::executors::MultiThreadedExecutor::spin()
1739140782.9450917 [component_container_mt-52]     @     0x55f2b35b9771 main
1739140782.9451358 [component_container_mt-52]     @     0x7f1d91efad90 (unknown)
1739140782.9451797 [component_container_mt-52]     @     0x7f1d91efae40 __libc_start_main
1739140782.9452233 [component_container_mt-52]     @     0x55f2b35b9a95 _start

Review Procedure

Remarks

I'm not sure if returning Status::HTTP_CONNECTION_ERROR is ok for the exception.
Please confirm 🙏 @mojomex

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

  • Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

  • Commits are properly organized and messages are according to the guideline
  • (Optional) Unit tests have been written for new behavior
  • PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

  • All open points are addressed and tracked via issues or tickets

CI Checks

  • Build and test for PR: Required to pass before the merge.

@shmpwk shmpwk requested a review from mojomex February 10, 2025 01:18
@shmpwk shmpwk changed the title fix(nebula_velodyne_hx_interface): handling http get post request exception fix(nebula_velodyne_hw_interface): handling http get post request exception Feb 10, 2025
@shmpwk shmpwk changed the title fix(nebula_velodyne_hw_interface): handling http get post request exception fix(nebula_hw_interfaces): handling http get post request exception Feb 10, 2025
@shmpwk shmpwk requested a review from drwnz February 10, 2025 08:19
@mojomex
Copy link
Collaborator

mojomex commented Feb 12, 2025

Thank you for the PR!
As it is currently, the code does not compile (Status is not std::string). I would suggest using nebula::util::expected<std::string, Status>.

Also, this change would just ignore the error, without any error handling, which would not be safe for production.

Ultimately, we should probably have a retry mechanism for temporary network failures, and graceful degradation (working, but possibly without some features) in case we cannot fix the issue by retrying. What do you think?

@shmpwk
Copy link
Contributor Author

shmpwk commented Feb 12, 2025

@mojomex

Thank you for your suggestion.

As it is currently, the code does not compile (Status is not std::string). I would suggest using nebula::util::expected<std::string, Status>.

Thank you, I will fix them.

Also, this change would just ignore the error, without any error handling, which would not be safe for production.
Ultimately, we should probably have a retry mechanism for temporary network failures, and graceful degradation (working, but possibly without some features) in case we cannot fix the issue by retrying. What do you think?

That makes sense to me.
But the current implementation is unsafer because it happens to die anytime.
Actually, jpntaxi often dies because of this, and we cannot drive in Odaiba now.
We need the quick solution for now.

I will submit modified code tonight.

Signed-off-by: Shumpei Wakabayashi <[email protected]>
@shmpwk shmpwk marked this pull request as draft February 12, 2025 17:14
pre-commit-ci bot and others added 4 commits February 12, 2025 17:19
Signed-off-by: Shumpei Wakabayashi <[email protected]>
Signed-off-by: Shumpei Wakabayashi <[email protected]>
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 0% with 88 lines in your changes missing coverage. Please review.

Project coverage is 28.11%. Comparing base (44e6c88) to head (0908bce).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...a_velodyne_hw_interfaces/velodyne_hw_interface.cpp 0.00% 66 Missing ⚠️
...a_hw_interfaces_velodyne/velodyne_hw_interface.hpp 0.00% 12 Missing ⚠️
nebula_ros/src/velodyne/hw_monitor_wrapper.cpp 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #264      +/-   ##
==========================================
- Coverage   28.89%   28.11%   -0.78%     
==========================================
  Files         105      109       +4     
  Lines        9465     9506      +41     
  Branches     3115     2617     -498     
==========================================
- Hits         2735     2673      -62     
- Misses       6240     6360     +120     
+ Partials      490      473      -17     
Flag Coverage Δ
differential 28.11% <0.00%> (?)
total ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shmpwk shmpwk marked this pull request as ready for review February 13, 2025 00:09
Copy link
Collaborator

@mojomex mojomex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep you waiting.

Thanks for implementing the retry logic, it looks good to me.
I left two requests for changes, mainly targeted at maintainability/ code deduplication 🙇

shmpwk and others added 2 commits February 17, 2025 19:15
@shmpwk shmpwk marked this pull request as draft February 17, 2025 10:20
shmpwk and others added 3 commits February 17, 2025 19:46
Signed-off-by: Shumpei Wakabayashi <[email protected]>
Signed-off-by: Shumpei Wakabayashi <[email protected]>
@shmpwk shmpwk marked this pull request as ready for review February 17, 2025 11:01
shmpwk and others added 2 commits February 17, 2025 20:23
Copy link
Collaborator

@mojomex mojomex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your changes! I've changed the other places that directly return HTTP_CONNECTION_ERROR to rt.error() as well for consistency 🙇

Copy link
Collaborator

@mojomex mojomex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a couple of spots 🙇

@mojomex mojomex merged commit 1953495 into tier4:main Feb 17, 2025
9 of 11 checks passed
@shmpwk
Copy link
Contributor Author

shmpwk commented Feb 17, 2025

@mojomex Thank you for everything 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants