Skip to content

Restrict node/callback explosion #711

@youtalk

Description

@youtalk

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I've agreed with the maintainers that I can plan this task.

Description

Autoware Core has excessive ROS 2 contexts (processes with nodes) and callbacks, leading to exponential startup time and CPU baseline overhead. With 35+ nodes but only 1 component container in use, the system suffers from DDS discovery storms every 8 seconds and high baseline CPU usage. This aligns with Anti-pattern 4 in the system performance documentation.

Purpose

The purpose of this task is to:

  1. Reduce the number of separate processes through component containers
  2. Consolidate artificially split micronodes into cohesive functional units
  3. Dramatically improve startup time (target: 50% reduction)
  4. Lower baseline CPU usage from node/context overhead

Possible approaches

Approach 1: Component Containers (Quick Win)

<!-- Before: Three separate processes -->
<node pkg="autoware_ground_filter" exec="ground_filter_node"/>
<node pkg="autoware_euclidean_cluster" exec="euclidean_cluster_node"/>
<node pkg="autoware_object_converter" exec="object_converter_node"/>

<!-- After: Single container process -->
<node_container pkg="rclcpp_components" exec="component_container_mt" name="perception_container">
  <composable_node pkg="autoware_ground_filter" plugin="GroundFilter"/>
  <composable_node pkg="autoware_euclidean_cluster" plugin="EuclideanCluster"/>
  <composable_node pkg="autoware_object_converter" plugin="ObjectConverter"/>
</node_container>

Approach 2: Merge Micronodes into Functions

// Before: Separate node for simple transformation
class Twist2AccelNode : public rclcpp::Node {
  // Entire node just for: accel = derivative(twist)
};

// After: Function within larger node
class LocalizationNode : public rclcpp::Node {
  Accel computeAccel(const Twist& twist) {
    return derivative(twist);
  }
};

Approach 3: Shared Executors

// Share single executor across multiple nodes
auto executor = std::make_shared<rclcpp::executors::MultiThreadedExecutor>();
executor->add_node(node1);
executor->add_node(node2);
executor->add_node(node3);
executor->spin();  // One executor thread pool for all

Definition of done

Immediate Actions

  • Identify top 10 node groups for containerization
  • Create component versions of perception pipeline nodes
  • Measure baseline startup time and process count

Short-term Implementation

  • Containerize perception pipeline (ground_filter → cluster → converter)
  • Containerize sensing pipeline (filters and converters)
  • Containerize localization auxiliary nodes (twist2accel, stop_filter)
  • Reduce process count from ~35 to ~20

Medium-term Optimization

  • Merge transformation nodes into parent nodes
  • Implement shared executors for related components
  • Reduce process count to <15
  • Achieve 50% startup time reduction

Long-term Architecture

  • Establish clear node boundary guidelines
  • Create "function-first" design patterns
  • Implement systematic performance monitoring
  • Document component architecture best practices

Metadata

Metadata

Assignees

Labels

status:staleInactive or outdated issues. (auto-assigned)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions