-
Notifications
You must be signed in to change notification settings - Fork 81
Description
I’m having trouble understanding what Zenoh configuration is currently being used for tests running on ci.ros2.org and build.ros2.org. I hope this discussion will help clarify the situation and define the desired configuration.
Context
With the default configuration rmw_zenoh doesn't use UDP multicast scouting. At startup, any ROS Context attempts to connect to a router and logs this message if the connection fails and ZENOH_ROUTER_CHECK_ATTEMPTS=-1 is not defined:
Unable to connect to a Zenoh router. Have you started a router with 'ros2 run rmw_zenoh_cpp rmw_zenohd'?`
In my view, tests should run with a configuration as close as possible to the default user setup (i.e., with a router) to better reflect conditions with the default configuration.
History
As of March 2025, there was no solution to run a Zenoh router in the build farm. To ensure CI tests could run before the Kilted code freeze, these environment variables were defined for the CI:
export ZENOH_CONFIG_OVERRIDE='scouting/multicast/enabled=true'
export ZENOH_ROUTER_CHECK_ATTEMPTS=-1With this setup, when a test runs multiple Nodes, they discover each other via UDP multicast. However, there’s a risk of interference with other tests running in parallel, either on the same host/container or another host/container.
Note that using different ROS_DOMAIN_ID values would prevent Nodes from exchanging ROS messages as different key expressions would be used. But the Nodes would still discover each other, establish TCP connections, and exchange discovery information. Depending on the host load, this extra processing could cause timeouts in some tests.
Later, #583 added a rmw_test_fixture operation to start the Zenoh router.
When called, this operation creates a Zenoh Session configured as a router listening on tcp/127.0.0.1:0 with multicast scouting disabled. It retrieve the effective listening endpoint with its port number and defines the environement variable ZENOH_CONFIG_OVERRIDE="connect/endpoints=[<router_locator>]". Any test process forked from the process calling this function should therefore connect to the router.
Then #855 made this function to update existing ZENOH_CONFIG_OVERRIDE and restore it after, instead of overwritting it.
Questions:
Are there any multi-process tests running in ci.ros2.org or build.ros2.org that do not call rmw_test_fixture at startup?
If we can confirm that all tests call rmw_test_fixture, could we remove the ZENOH_CONFIG_OVERRIDE='scouting/multicast/enabled=true' environment variable everywhere, as was already done in ros2/ros_buildfarm_config#348?
If some tests do not use rmw_test_fixture, how should we start the router for those cases?