Description
I ran some iPerf session with UDP traffic proxied through Envoy. The report showed small number of dropped UDP datagrams (say 300), however, the stat 'downstream_rx_datagram_dropped' reported around 250 million dropped packets. This correlates with another issue recently opened: #35142
I looked through the code, and I think the bug is here:
envoy/source/common/network/io_socket_handle_impl.cc
Lines 518 to 529 in 0bf518c
Since the ancillary data that comes when reading from the receive buffer is accumulated, it's expected that once packets are dropped from the receive buffer associated with the UDP socket, we'll always packets dropped in the cmsg
header.
This correlated with logs I collected:
There weren't really new packet drops here rather than the same (previously) dropped packets being reported again and again. Note: the "maybe_dropped: " log was custom added to debug this.
I assume the fix would be checking in the code section above whether dropped packet value from cmsg
is not equal to the number stored in *output.dropped_packets_
and only after that, increase that number with the delta.
cc @danzh2010 please comment your thoughts - haven't had much time looking into fixing this so wanted to get pre-review on the suggested fix