Skip to content

Conversation

@mozillazg
Copy link
Contributor

@mozillazg mozillazg commented Jan 11, 2025

before:

  1. attach a tc program to block send icmp to 1.1.1.1
  2. run sudo ./ktcpdump -v -i '__dev_queue_xmit+*' -d /usr/lib/debug/boot/vmlinux-`uname -r` 'dst host 1.1.1.1 and icmp'
  3. run ping 1.1.1.1
  4. ktcpdump will raise failed to parse ringbuf skbdata and killed by system due to oom.

after:

without failed to parse ringbuf skbdata, no longer killed due to oom.

Copy link
Owner

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened another pr #7 to fix the same issue.

The root cause is bpf didn't get has_mac correctly: __dev_queue_xmit doesn't reset skb->mac_header until https://elixir.bootlin.com/linux/v6.8/source/net/core/dev.c#L4267 is executed. The new fix takes that into consideration by comparing skb->network_header to skb->mac_header to make sure we won't read uninitialized skb->mac_header (0xffff).

Also I want to explain why I split skb_data from event: skb_data may have variable length (skb->data_end - skb->data), always transmitting a fixed length from bpf kernel to userspace is no doubt a waste of resource. The design can be seen as HTTP header Content-Length followed by payload of variable length, I believe this is a way more efficient to pass data.

Feel free to ask questions and review #7. Appreciate your contribution, let me treat you to a dinner next time we meet 😋

@mozillazg
Copy link
Contributor Author

@jschwinger233 got it. thanks for your explain.

@mozillazg mozillazg closed this Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants