Required Info:
- Operating System:
- Installation type:
- ROS Version
- Version or commit hash:
- humble_lifecycle in asynchronous mode
- Laser unit:
Expected behavior
SLAM map and TF buffer is correctly reset after each training episode.
Actual behavior
After many episodes of a DRL training run, I'm seeing more and more messages like
[INFO] [1774890341.733643040] [slam_toolbox]: Message Filter dropping message: frame ‘laser’ at time 0.715 for reason ‘the timestamp on the message is earlier than all the data in the transform cache’
and when performing lifecycle transitions to clear and reset the map and TF buffer, there comes a point where this error occurs, right after “Deactivate” is printed to the screen:
[ERROR] [async_slam_toolbox_node-1]: process has died [pid 189171, exit code -11, cmd '/ros2_ws/install/slam_toolbox/lib/slam_toolbox/async_slam_toolbox_node --ros-args -r __node:=slam_toolbox -r __ns:=/ --params-file create3_ws/src/irobot_create_common/irobot_create_common_bringup/config/mapper_params_online_async.yaml --params-file /tmp/launch_params_l997fk7t'].
Additional information
At first, I thought it might be due to [INFO] messages shown above, but then it occurred to me that it might be because the calls to the lifecycle transition service are asynchronous, meaning the next one was being called before the previous one had finished. That’s why I designed this function to perform each transition only after the previous one has finished:
def nodoa_kudeatu(self, nodoa, transition_id, next_step=None):
"""
Lanza una transición de lifecycle de forma asíncrona.
No bloquea, no hace spin, no usa sleep.
Cuando termina, ejecuta next_step() si se pasa.
"""
self.get_logger().info(f'/{nodoa}/change_state → {transition_id}')
client = self.create_client(ChangeState, f'/{nodoa}/change_state')
# Si el servicio aún no está disponible, reintentar más tarde
if not client.wait_for_service(timeout_sec=0.5):
self.get_logger().warn(f'{nodoa}/change_state no disponible, reintentando...')
self.create_timer(0.5, lambda: self.nodoa_kudeatu(nodoa, transition_id, next_step))
return
request = ChangeState.Request()
request.transition.id = transition_id
future = client.call_async(request)
def done_cb(fut):
if fut.result() is not None:
self.get_logger().info(f'Transición OK: {nodoa} → {transition_id}')
else:
self.get_logger().error(f'Error en transición: {nodoa} → {transition_id}')
if next_step is not None:
next_step()
future.add_done_callback(done_cb)
The toolbox is run with the following parameters in asynchronous mode:
slam_toolbox:
ros__parameters:
# Plugin params
solver_plugin: solver_plugins::CeresSolver
ceres_linear_solver: SPARSE_NORMAL_CHOLESKY
ceres_preconditioner: SCHUR_JACOBI
ceres_trust_strategy: LEVENBERG_MARQUARDT
ceres_dogleg_type: TRADITIONAL_DOGLEG
ceres_loss_function: None
# ROS Parameters
odom_frame: odom
map_frame: map
base_frame: base_footprint
scan_topic: scan
# restamp_tf: true
use_map_saver: true
mode: mapping #localization
# if you'd like to immediately start continuing a map at a given pose
# or at the dock, but they are mutually exclusive, if pose is given
# will use pose
#map_file_name: /home/raul/create3_ws/my_map_serial
# map_start_pose: [0.0, 0.0, 0.0]
# map_start_at_dock: true
debug_logging: false
throttle_scans: 1
transform_publish_period: 0.0 #0.02 #if 0 never publishes odometry
map_update_interval: 1.0
resolution: 0.05
min_laser_range: 0.1 #for rastering images
max_laser_range: 12.0 #for rastering images
minimum_time_interval: 0.5 # 0.5; tiempo mínimo que debe pasar entre dos actualizaciones del SLAM usando scans del láser.
transform_timeout: 0.02
tf_buffer_duration: 30.
stack_size_to_use: 40000000 #// program needs a larger stack size to serialize large maps
enable_interactive_mode: true
scan_queue_size: 1
# General Parameters
use_scan_matching: true
use_scan_barycenter: true
minimum_travel_distance: 0.2 #0.5
minimum_travel_heading: 0.2 #0.5
scan_buffer_size: 10
scan_buffer_maximum_scan_distance: 10.0
link_match_minimum_response_fine: 0.1
link_scan_maximum_distance: 1.5
loop_search_maximum_distance: 3.0
do_loop_closing: true
loop_match_minimum_chain_size: 10
loop_match_maximum_variance_coarse: 3.0
loop_match_minimum_response_coarse: 0.35
loop_match_minimum_response_fine: 0.45
# Correlation Parameters - Correlation Parameters
correlation_search_space_dimension: 0.5
correlation_search_space_resolution: 0.01
correlation_search_space_smear_deviation: 0.1
# Correlation Parameters - Loop Closure Parameters
loop_search_space_dimension: 8.0
loop_search_space_resolution: 0.05
loop_search_space_smear_deviation: 0.03
# Scan Matcher Parameters
distance_variance_penalty: 0.5
angle_variance_penalty: 1.0
fine_search_angle_offset: 0.00349
coarse_search_angle_offset: 0.349
coarse_angle_resolution: 0.0349
minimum_angle_penalty: 0.9
minimum_distance_penalty: 0.5
use_response_expansion: true
min_pass_through: 2
occupancy_threshold: 0.1
However, I keep getting the same error, and almost always after a similar percentage of the workout has elapsed. Thus, I can't figure out what's going on or get a clearer picture of how to fix it.
Required Info:
Expected behavior
SLAM map and TF buffer is correctly reset after each training episode.
Actual behavior
After many episodes of a DRL training run, I'm seeing more and more messages like
[INFO] [1774890341.733643040] [slam_toolbox]: Message Filter dropping message: frame ‘laser’ at time 0.715 for reason ‘the timestamp on the message is earlier than all the data in the transform cache’and when performing lifecycle transitions to clear and reset the map and TF buffer, there comes a point where this error occurs, right after “Deactivate” is printed to the screen:
[ERROR] [async_slam_toolbox_node-1]: process has died [pid 189171, exit code -11, cmd '/ros2_ws/install/slam_toolbox/lib/slam_toolbox/async_slam_toolbox_node --ros-args -r __node:=slam_toolbox -r __ns:=/ --params-file create3_ws/src/irobot_create_common/irobot_create_common_bringup/config/mapper_params_online_async.yaml --params-file /tmp/launch_params_l997fk7t'].Additional information
At first, I thought it might be due to
[INFO]messages shown above, but then it occurred to me that it might be because the calls to the lifecycle transition service are asynchronous, meaning the next one was being called before the previous one had finished. That’s why I designed this function to perform each transition only after the previous one has finished:The toolbox is run with the following parameters in asynchronous mode:
However, I keep getting the same error, and almost always after a similar percentage of the workout has elapsed. Thus, I can't figure out what's going on or get a clearer picture of how to fix it.