Description
Expected Behavior
wait_for_service block until the service is discovered
Current Behavior
When service and client are created in the same time, the wait_for_service will hang indefinably.
This doesn't happen when using multicast.
Even as one of the nodes hangs on wait_for_service, I can run the same client and get a response
Steps to Reproduce
- start the service and client in the same time
The service and client must be started at the same time for the issue to appear, we use the systemd to run ros2 nodes on nvidia jetson inside a ros:galactic
container, so this bug occures pretty frequently on our system.
I attached a script reproduce-wait-for-service-hang.sh
that pretty easily reproduces the bug, a YouTube video and a docker image
This docker image will reproduce the bug
docker run -ti --rm amfernus/reproduce-wait-for-service-hang:latest
System information
arch linux host Linux ilya.linux 5.15.0-zen1-1-zen #1 ZEN SMP PREEMPT Thu, 04 Nov 2021 00:40:01 +0000 x86_64 GNU/Linux
- Fast-RTPS 2.3.4
- OS: Ubuntu 20.04 ros:galactic docker
- Network interfaces:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 78:24:af:33:69:9a brd ff:ff:ff:ff:ff:ff
altname enp0s25
inet 10.100.102.3/24 brd 10.100.102.255 scope global dynamic noprefixroute eno1
valid_lft 2330sec preferred_lft 2330sec
inet6 fe80::e723:a57:e61d:90c4/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:6b:01:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:b8:3e:40:4a brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:b8ff:fe3e:404a/64 scope link
valid_lft forever preferred_lft forever```
- ROS2: <!--- Provide the ROS2 distribution if you're using Fast-RTPS under ROS2 -->
<!--- e.g. Eloquent, Dashing, ... -->
Additional resources
- bash script to reproduce the issue
#!/bin/bash
/opt/ros/galactic/bin/fast-discovery-server -i 0 & PID_SERVER1=$!
# trap ctrl-c and call ctrl_c()
trap ctrl_c INT
function ctrl_c() {
exit 0
}
sleep 1
while true; do
echo "starting add_two_ints_server"
ROS_DISCOVERY_SERVER="127.0.0.1:11811" RMW_IMPLEMENTATION=rmw_fastrtps_cpp bash -c /opt/ros/galactic/lib/demo_nodes_cpp/add_two_ints_server & PID1=$!
echo "making requests with add_two_ints_client"
ROS_DISCOVERY_SERVER="127.0.0.1:11811" RMW_IMPLEMENTATION=rmw_fastrtps_cpp bash -c /opt/ros/galactic/lib/demo_nodes_cpp/add_two_ints_client
echo "killing with add_two_ints_server"
kill $PID1
done