Skip to content

Conversation

@qza36
Copy link

@qza36 qza36 commented Aug 11, 2025

A race condition between the 20Hz visualization timer and the registration thread caused a segmentation fault when a loop closure attempt failed.

The root cause was that the need_lc_cloud_vis_update_ flag was set to true immediately after performing registration, but before checking if the registration was valid. The asynchronous visualization timer would then read this flag, attempt to access data from the failed registration (which contained null pointers for clouds like FinalAlignedCloud), and crash.

This commit resolves the issue through a two-pronged approach:

  1. Corrected Logic: The need_lc_cloud_vis_update_ flag is now only set to true inside the if (reg_output.is_valid_) block, ensuring that the visualization is only triggered for successful loop closures.

  2. Defensive Programming: Added null pointer checks within the visualizeLoopClosureClouds function itself. This makes the visualizer more robust and prevents crashes even if similar logic errors occur in the future.

A race condition between the 20Hz visualization timer and the registration thread caused a segmentation fault when a loop closure attempt failed.

The root cause was that the `need_lc_cloud_vis_update_` flag was set to `true` immediately after performing registration, but before checking if the registration was valid. The asynchronous visualization timer would then read this flag, attempt to access data from the failed registration (which contained null pointers for clouds like `FinalAlignedCloud`), and crash.

This commit resolves the issue through a two-pronged approach:

1.  **Corrected Logic:** The `need_lc_cloud_vis_update_` flag is now only set to `true` inside the `if (reg_output.is_valid_)` block, ensuring that the visualization is only triggered for successful loop closures.

2.  **Defensive Programming:** Added null pointer checks within the `visualizeLoopClosureClouds` function itself. This makes the visualizer more robust and prevents crashes even if similar logic errors occur in the future.
@LimHyungTae
Copy link
Member

LimHyungTae commented Aug 11, 2025

Hmmm, interesting, but my intention is to show them regardless of whether the registration fails. Could you elaborate on that?

Q1. Which data did you use?
Q2. Which error message did you get?


Defensive Programming: Added null pointer checks within the visualizeLoopClosureClouds function itself. This makes the visualizer more robust and prevents crashes even if similar logic errors occur in the future.

Where's this part?

@qza36
Copy link
Author

qza36 commented Aug 12, 2025

Hi LimHyungTae,

Thanks again for the guidance. I've gathered the information you requested.

Q1.Which data did you use?

I was using a custom rosbag recorded with a Livox Mid-360 sensor in an underground parking garage. The SLAM pipeline consists of FAST-LIO for odometry and kiss-matcher-sam for loop closure detection and pose graph optimization.

The crash occurs after running for a while.

Q2.Which error message did you get?

The node crashes without extensive ROS logs, but the process exit code clearly indicates a critical error. Here is the final output before the crash

[kiss_matcher_sam-1] [INFO] [1754919878.302443062] [kiss_matcher_sam]: Execute coarse-to-fine alignment: # src = 6903, # tgt = 8870
[kiss_matcher_sam-1] [WARN] [1754919878.739171586] [kiss_matcher_sam]: # final inliers: 0 < 10
[kiss_matcher_sam-1] [WARN] [1754919878.739450914] [kiss_matcher_sam]: LC rejected. KISS-Matcher failed
[kiss_matcher_sam-1] [INFO] [1754919878.739483468] [kiss_matcher_sam]: Reg: 439.9 msec
[ERROR] [kiss_matcher_sam-1]: process has died [pid 219890, exit code -11, cmd '/home/uagentscr000/perception_ws/install/kiss_matcher_ros/lib/kiss_matcher_ros/kiss_matcher_sam --ros-args -r __node:=kiss_matcher_sam -r __ns:=/ --params-file /tmp/launch_params_g4b6pmjl --params-file /tmp/launch_params_leqfeqb_ --params-file /home/uagentscr000/perception_ws/install/kiss_matcher_ros/share/kiss_matcher_ros/config/slam_config.yaml -r /cloud:=/cloud_registered -r /odom:=/Odometry'].

I then ran the node with GDB to get a more precise backtrace. The GDB output confirms a SIGSEGV, Segmentation fault and pinpoints the exact location of the crash:

  - node:
      namespace: $(var namespace)
      pkg: kiss_matcher_ros
      exec: kiss_matcher_sam
      name: kiss_matcher_sam
      output: screen
      on_exit: shutdown
      launch-prefix: "gdb -ex run --args"
[gdb-1] [New Thread 0x7fff8d7fa640 (LWP 249679)]
[gdb-1] [WARN] [1754920396.145454249] [kiss_matcher_sam]: # final inliers: 0 < 10
[gdb-1] [WARN] [1754920396.145726885] [kiss_matcher_sam]: LC rejected. KISS-Matcher failed
[gdb-1] [INFO] [1754920396.145748516] [kiss_matcher_sam]: Reg: 433.2 msec
[gdb-1]
[gdb-1] Thread 17 "kiss_matcher_sa" received signal SIGSEGV, Segmentation fault.
[gdb-1] [Switching to Thread 0x7fffd4ff9640 (LWP 240244)]
[gdb-1] 0x00005555555a1a65 in PoseGraphManager::visualizeLoopClosureClouds() ()

Regarding the "Defensive Programming" part

image

Thanks for your time and for maintaining this great project.

@LimHyungTae
Copy link
Member

Oh, what I'm saying is, you don't seem to push the screenshot things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants