Skip to content

udp stats: downstream_rx_datagram_dropped wrong metric calculation #38431

Closed
@ohadvano

Description

@ohadvano

I ran some iPerf session with UDP traffic proxied through Envoy. The report showed small number of dropped UDP datagrams (say 300), however, the stat 'downstream_rx_datagram_dropped' reported around 250 million dropped packets. This correlates with another issue recently opened: #35142

I looked through the code, and I think the bug is here:

if (output.dropped_packets_ != nullptr) {
msghdr& hdr = mmsg_hdr[0].msg_hdr;
if (hdr.msg_controllen > 0) {
struct cmsghdr* cmsg;
for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != nullptr; cmsg = CMSG_NXTHDR(&hdr, cmsg)) {
absl::optional<uint32_t> maybe_dropped = maybeGetPacketsDroppedFromHeader(*cmsg);
if (maybe_dropped) {
*output.dropped_packets_ += *maybe_dropped;
}
}
}
}

Since the ancillary data that comes when reading from the receive buffer is accumulated, it's expected that once packets are dropped from the receive buffer associated with the UDP socket, we'll always packets dropped in the cmsg header.

This correlated with logs I collected:

Image

There weren't really new packet drops here rather than the same (previously) dropped packets being reported again and again. Note: the "maybe_dropped: " log was custom added to debug this.

I assume the fix would be checking in the code section above whether dropped packet value from cmsg is not equal to the number stored in *output.dropped_packets_ and only after that, increase that number with the delta.

cc @danzh2010 please comment your thoughts - haven't had much time looking into fixing this so wanted to get pre-review on the suggested fix

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions