`autoware_behavior_path_planner` node crash when a new planning goal published

### Checklist

- [x] I've read the [contribution guidelines](https://github.com/autowarefoundation/autoware/blob/main/CONTRIBUTING.md).
- [x] I've searched other issues and no duplicate issues were found.
- [x] I'm convinced that this is not my fault but a bug.

### Description

A random crash of the `autoware_behavior_path_planner` node sometimes appears when a new planning goal is published. During the everyday operations we're facing this issue roughly once a day. It happens randomly, but annoys operators very much because the only way to fix it is to restart the stack completely. 

### Expected behavior

The `autoware_behavior_path_planner` is not crashing after a new planning goal is published.

### Actual behavior

A random crash of the `autoware_behavior_path_planner` node sometimes appears when a new planning goal is published. 

---
Logs of the crash:
```
[component_container_mt-62] terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
[component_container_mt-62] what(): failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460
[component_container_mt-62] *** Aborted at 1775119751 (unix time) try "date -d @1775119751" if you are using GNU date ***
[component_container_mt-62] PC: @ 0x0 (unknown)
[component_container_mt-62] *** SIGABRT (@0x3e800000383) received by PID 899 (TID 0x7ed64f7fe640) from PID 899; stack trace: ***
[component_container_mt-62] @ 0x7ed65f7644d6 google::(anonymous namespace)::FailureSignalHandler()
[component_container_mt-62] @ 0x7ed6622df520 (unknown)
[component_container_mt-62] @ 0x7ed6623339fc pthread_kill
[component_container_mt-62] @ 0x7ed6622df476 raise
[component_container_mt-62] @ 0x7ed6622c57f3 abort
[component_container_mt-62] @ 0x7ed66258ab9e (unknown)
[component_container_mt-62] @ 0x7ed66259620c (unknown)
[component_container_mt-62] @ 0x7ed662596277 std::terminate()
[component_container_mt-62] @ 0x7ed6625961fe std::rethrow_exception()
[component_container_mt-62] @ 0x7ed66282e519 rclcpp::exceptions::throw_from_rcl_error()
[component_container_mt-62] @ 0x7ed66283231c rclcpp::detail::add_guard_condition_to_rcl_wait_set()
[component_container_mt-62] @ 0x7ed662844e27 (unknown)
[component_container_mt-62] @ 0x7ed662831b05 rclcpp::Executor::wait_for_work()
[component_container_mt-62] @ 0x7ed662834883 rclcpp::Executor::get_next_executable()
[component_container_mt-62] @ 0x7ed66283b7e2 rclcpp::executors::MultiThreadedExecutor::run()
[component_container_mt-62] @ 0x7ed6625c4253 (unknown)
[component_container_mt-62] @ 0x7ed662331ac3 (unknown)
[component_container_mt-62] @ 0x7ed6623c38c0 (unknown)
[ERROR] [component_container_mt-62]: process has died [pid 899, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=behavior_planning_container -r __ns:=/planning/scenario_planning/lane_driving/behavior_planning -p use_sim_time:=False -p wheel_radius:=0.305 -p wheel_width:=0.185 -p wheel_base:=2.755 -p wheel_tread:=1.469 -p front_overhang:=0.721 -p rear_overhang:=0.666 -p left_overhang:=0.128 -p right_overhang:=0.128 -p vehicle_height:=2.44 -p max_steer_angle:=0.7 -p use_sim_time:=False'].
```



### Steps to reproduce

1. Launch the planning_simulator.
2. Set the vehicle initial pose.
3. Launch script to publish the planning goal. 
4. Wait till the behavior_path_planner dies. It always happens.

Limiting the docker container cpu resources will help.

---
The planning_goal_abuse_runner test tool.

```
#!/usr/bin/env python3
"""
Standalone runner for the planning goal abuse scenario.

Procedure:
  1. Wait for behavior_path_planner and mission_planner to come online.
  2. Publish the vehicle initial pose once.
  3. Continuously publish planning goals at random intervals.
  4. Monitor required nodes; exit(1) if any node disappears (crash detected).
"""
import random
import sys
import threading
import time
import rclpy
from geometry_msgs.msg import PoseStamped, PoseWithCovarianceStamped

STARTUP_TIMEOUT    = 120.0  # seconds to wait for required nodes after launch
CHECK_INTERVAL     = 0.2    # seconds between liveness checks
GOAL_MIN_INTERVAL  = 0.25   # minimum random delay between goal publications (s)
GOAL_MAX_INTERVAL  = 1.5    # maximum random delay between goal publications (s)
INITIAL_POSE_TOPIC = "/initialpose"
GOAL_TOPIC         = "/planning/mission_planning/goal"
MAP_FRAME          = "map"
INITIAL_POSE = dict(
    x  =  219.064,
    y  =  579.993,
    z  =   20.311,
    ox =    0.0,
    oy =    0.0,
    oz =    0.838,
    ow =    0.545,
)
GOAL_POSE = dict(
    x  =  133.534,
    y  =  778.934,
    z  =    0.0,
    ox =    0.0,
    oy =    0.0,
    oz =    0.831,
    ow =    0.556,
)
REQUIRED_NODES = [
    "behavior_path_planner",
    "mission_planner",
]


def _wait_for_nodes(node, node_names, timeout):
    """Block until every name in *node_names* appears in the ROS 2 graph."""
    deadline = time.time() + timeout
    while time.time() < deadline:
        alive = node.get_node_names()
        missing = [n for n in node_names if n not in alive]
        if not missing:
            node.get_logger().info(f"All required nodes online: {node_names}")
            return
        node.get_logger().info(
            f"Waiting for nodes: {missing}",
            throttle_duration_sec=5.0,
        )
        time.sleep(1.0)
    alive = node.get_node_names()
    missing = [n for n in node_names if n not in alive]
    raise TimeoutError(
        f"Required nodes not available after {timeout} s. Missing: {missing}"
    )


def _publish_initial_pose(node, pub):
    msg = PoseWithCovarianceStamped()
    msg.header.stamp            = node.get_clock().now().to_msg()
    msg.header.frame_id         = MAP_FRAME
    msg.pose.pose.position.x    = INITIAL_POSE["x"]
    msg.pose.pose.position.y    = INITIAL_POSE["y"]
    msg.pose.pose.position.z    = INITIAL_POSE["z"]
    msg.pose.pose.orientation.x = INITIAL_POSE["ox"]
    msg.pose.pose.orientation.y = INITIAL_POSE["oy"]
    msg.pose.pose.orientation.z = INITIAL_POSE["oz"]
    msg.pose.pose.orientation.w = INITIAL_POSE["ow"]
    msg.pose.covariance         = [0.0] * 36
    msg.pose.covariance[0]      = 0.25     # x variance
    msg.pose.covariance[7]      = 0.25     # y variance
    msg.pose.covariance[35]     = 0.06854  # yaw variance
    pub.publish(msg)
    node.get_logger().info("Published initial pose.")


def _goal_abuse_loop(node, pub, stop_event):
    """Publish planning goals at random intervals until *stop_event* is set."""
    while not stop_event.is_set():
        msg = PoseStamped()
        msg.header.stamp            = node.get_clock().now().to_msg()
        msg.header.frame_id         = MAP_FRAME
        msg.pose.position.x         = GOAL_POSE["x"]
        msg.pose.position.y         = GOAL_POSE["y"]
        msg.pose.position.z         = GOAL_POSE["z"]
        msg.pose.orientation.x      = GOAL_POSE["ox"]
        msg.pose.orientation.y      = GOAL_POSE["oy"]
        msg.pose.orientation.z      = GOAL_POSE["oz"]
        msg.pose.orientation.w      = GOAL_POSE["ow"]
        interval = random.uniform(GOAL_MIN_INTERVAL, GOAL_MAX_INTERVAL)
        node.get_logger().info(
            f"Publishing planning goal (next in {interval:.2f} s)..."
        )
        pub.publish(msg)
        stop_event.wait(timeout=interval)


def main():
    rclpy.init()
    node = rclpy.create_node("planning_goal_abuse_runner")

    initial_pose_pub = node.create_publisher(
        PoseWithCovarianceStamped, INITIAL_POSE_TOPIC, 1
    )
    goal_pub = node.create_publisher(PoseStamped, GOAL_TOPIC, 10)

    spin_thread = threading.Thread(
        target=rclpy.spin, args=(node,), daemon=True
    )
    spin_thread.start()

    exit_code = 0
    stop_event = threading.Event()
    abuse_thread = None

    try:
        # Step 1 — wait for the planning stack.
        _wait_for_nodes(node, REQUIRED_NODES, timeout=STARTUP_TIMEOUT)

        # Step 2 — publish initial pose once and let the system settle.
        _publish_initial_pose(node, initial_pose_pub)
        time.sleep(1.0)

        # Step 3 — start goal abuse in a background thread.
        abuse_thread = threading.Thread(
            target=_goal_abuse_loop,
            args=(node, goal_pub, stop_event),
            daemon=True,
        )
        abuse_thread.start()

        # Step 4 — monitor required nodes; exit if any disappear.
        node.get_logger().info(
            f"Monitoring {REQUIRED_NODES} for crashes..."
        )
        monitor_start = time.time()
        while True:
            alive = node.get_node_names()
            missing = [n for n in REQUIRED_NODES if n not in alive]
            if missing:
                uptime = time.time() - monitor_start
                node.get_logger().error(
                    f"Node(s) disappeared (crash detected): {missing} "
                    f"after {uptime:.1f} s of monitoring"
                )
                exit_code = 1
                break
            time.sleep(CHECK_INTERVAL)

    except KeyboardInterrupt:
        node.get_logger().info("Interrupted by user.")
    except Exception as e:
        node.get_logger().error(f"Fatal error: {e}")
        exit_code = 1
    finally:
        stop_event.set()
        if abuse_thread is not None:
            abuse_thread.join(timeout=GOAL_MAX_INTERVAL + 1.0)
        node.destroy_node()
        rclpy.shutdown()

    sys.exit(exit_code)


if __name__ == "__main__":
    main()
```

### Versions

- Ubuntu 22.04
- ROS2 Humble
- Autoware version: 0.48, 0.49, 0.50

### Possible causes

The crash is caused by the deleted `lane_parking_timer_cb_group_` and `freespace_parking_timer_cb_group_` callback groups. When a planning goal is published, the `GoalPlannerModule` instance is deleter with its callback groups. But weak pointers to these callback groups are stored in the Node Executor and might be suddenly accessed by this weak pointer. The callback groups suppose to stay alive and be available whole time while the node and executor is running. 

The crash happens in the situation when the node executor tries to lock a timer callback group using a weak smart pointer, fails and throw exception. Because of this exception the whole executor and behavior path planner dies.

### Additional context

Any callback groups in ROS2 should never be deleted while the node is running. 
But there is a quick hack that can help with a crash. The `freespace_parking_timer_cb_group_` and `lane_parking_timer_cb_group_` can be static class properties. This fixed the issue for me.

---
Changes in the goal_planner_module.hpp:
```
static rclcpp::CallbackGroup::SharedPtr lane_parking_timer_cb_group_;
static rclcpp::CallbackGroup::SharedPtr freespace_parking_timer_cb_group_;
```

---
Changes in the goal_planner_module.cpp:
```
rclcpp::CallbackGroup::SharedPtr GoalPlannerModule::lane_parking_timer_cb_group_           = nullptr;
rclcpp::CallbackGroup::SharedPtr GoalPlannerModule::freespace_parking_timer_cb_group_ = nullptr;

// --------------------------------------------------------------------------

if (!lane_parking_timer_cb_group_) {
    lane_parking_timer_cb_group_ = node.create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
  }

// --------------------------------------------------------------------------

if (!freespace_parking_timer_cb_group_) {
      freespace_parking_timer_cb_group_ = node.create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`autoware_behavior_path_planner` node crash when a new planning goal published #12460

Checklist

Description

Expected behavior

Actual behavior

Steps to reproduce

Versions

Possible causes

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

autoware_behavior_path_planner node crash when a new planning goal published #12460

Description

Checklist

Description

Expected behavior

Actual behavior

Steps to reproduce

Versions

Possible causes

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`autoware_behavior_path_planner` node crash when a new planning goal published #12460