Skip to content

Latest commit

 

History

History
172 lines (119 loc) · 6.13 KB

File metadata and controls

172 lines (119 loc) · 6.13 KB

BPF Trace (bpftrace)

NOTE: retina bpftrace is an experimental feature. The flags and behavior may change in future versions.

The retina bpftrace command allows you to trace network issues on a Kubernetes node in real-time using eBPF/bpftrace.

This is useful for debugging connectivity problems such as:

  • Packet drops (with reason codes)
  • TCP RST events (connection resets)
  • Socket errors (ECONNREFUSED, ETIMEDOUT, etc.)
  • TCP retransmissions (packet loss indicators)

Getting Started

Trace network issues on a node:

# Basic usage - trace all network issues on a node
kubectl retina bpftrace <node-name>

# With custom duration
kubectl retina bpftrace <node-name> --duration 60s

# Filter by IP address
kubectl retina bpftrace <node-name> --ip 10.224.0.5

# Filter by CIDR
kubectl retina bpftrace <node-name> --cidr 10.224.0.0/16

# Output as JSON (for parsing)
kubectl retina bpftrace <node-name> -o json

# Trace only specific event types
kubectl retina bpftrace <node-name> --drops --rst

# Specify custom timeout for trace pod startup
kubectl retina bpftrace <node-name> --startup-timeout 120s

Run kubectl retina bpftrace -h for full documentation and examples.

Event Types

The bpftrace command captures several types of network events:

DROP - Packet Drops

Captures packets dropped by the kernel with reason codes. Common reasons include:

Code Name Description
0 NOT_SPECIFIED Unspecified reason
1 NO_SOCKET No listening socket
3 TCP_CSUM TCP checksum error
6 NETFILTER_DROP Dropped by NetworkPolicy/iptables
8 IP_CSUM IP checksum error

The full list of drop reasons is kernel-version specific and is printed at the start of each trace.

RST_SENT / RST_RECV - TCP Reset Events

Captures TCP RST packets sent or received. These indicate:

  • Connection refused (no service listening)
  • Connection reset by peer
  • Firewall rejecting connections

SOCK_ERR - Socket Errors

Captures socket-level errors reported to applications:

Code Name Description
104 ECONNRESET Connection reset by peer
110 ETIMEDOUT Connection timed out
111 ECONNREFUSED Connection refused
113 EHOSTUNREACH No route to host

RETRANS - TCP Retransmissions

Captures TCP segment retransmissions, which indicate:

  • Packet loss in the network
  • Network congestion
  • Slow or unresponsive peers

The reason code shows the TCP state during retransmission.

Output Format

Table Format (default)

TIME         TYPE       REASON             PROBE              SRC -> DST
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
18:28:12     SOCK_ERR   ECONNREFUSED       inet_sk_error_report 127.0.0.1:35779  ->  127.0.0.1:9999 
18:28:27     DROP       6                  kfree_skb          10.224.0.60:41929  ->  10.224.0.39:80   
18:28:28     RETRANS    2                  tcp_retransmit_skb 10.224.0.60:41929  ->  10.224.0.39:80   
18:28:33     RST_SENT   -                  tcp_send_reset     10.224.0.47:38470  ->  20.161.216.95:443  

JSON Format (-o json)

{"time":"18:28:12","type":"SOCK_ERR","reason_code":111,"probe":"inet_sk_error_report","src_ip":"127.0.0.1","src_port":35779,"dst_ip":"127.0.0.1","dst_port":9999}
{"time":"18:28:27","type":"DROP","reason_code":6,"probe":"kfree_skb","src_ip":"10.224.0.60","src_port":41929,"dst_ip":"10.224.0.39","dst_port":80}

Flags

Flag Type Default Description
--duration duration 0 Duration to run the trace (0 = until Ctrl-C)
--startup-timeout duration 30s Timeout for trace pod startup
--ip string "" Filter events by IP address (matches src or dst)
--cidr string "" Filter events by CIDR (matches src or dst)
--drops bool false Enable only packet drop events
--rst bool false Enable only TCP RST events
--errors bool false Enable only socket error events
--retransmits bool false Enable only retransmit events
--all bool false Enable all event types (default when no event flags specified)
-o, --output string table Output format: table or json
--retina-shell-image-repo string (default) Override the retina-shell image repository
--retina-shell-image-version string (default) Override the retina-shell image version

Example: Debugging NetworkPolicy Drops

When pods can't communicate due to NetworkPolicy:

# Start tracing on the node where the destination pod runs
kubectl retina bpftrace aks-nodepool1-12345678-vmss000000 --drops --duration 60s

# In another terminal, attempt the connection
kubectl exec -it client-pod -- curl http://server-service:80

You'll see output like:

18:14:41     DROP       6      kfree_skb          10.224.0.34:33061  ->  10.224.0.49:80   
18:14:42     RETRANS    2      tcp_retransmit_skb 10.224.0.34:33061  ->  10.224.0.49:80   
18:14:42     DROP       6      kfree_skb          10.224.0.34:33061  ->  10.224.0.49:80   

The DROP with reason code 6 (NETFILTER_DROP) confirms NetworkPolicy is blocking traffic.

Example: Debugging Connection Refused

When connecting to a service that's not listening:

kubectl retina bpftrace node-name --rst --errors --ip 10.224.0.5
18:19:05     RST_RECV   -      tcp_receive_reset  10.224.0.10:35267  ->  10.224.0.5:8080 
18:19:05     SOCK_ERR   ECONNREFUSED inet_sk_error_report 10.224.0.10:35267  ->  10.224.0.5:8080 

This shows the TCP RST and corresponding socket error, indicating no service is listening on port 8080.

Requirements

  • Linux nodes (Windows not supported)
  • Kernel with BTF support (5.x+ recommended)

Limitations

  • IPv6 filtering not currently supported
  • Cilium CNI: DROP events won't capture Cilium policy drops (Cilium uses eBPF datapath, not netfilter/kfree_skb)