Skip to content

bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select#11987

Open
kernel-patches-daemon-bpf[bot] wants to merge 1 commit intobpf-next_basefrom
series/1087457=>bpf-next
Open

bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select#11987
kernel-patches-daemon-bpf[bot] wants to merge 1 commit intobpf-next_basefrom
series/1087457=>bpf-next

Conversation

@kernel-patches-daemon-bpf
Copy link
Copy Markdown

Pull request for series with
subject: bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1087457

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 9f5b3ff
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1087457
version: 3

@kernel-patches-review-bot
Copy link
Copy Markdown

commit 2494c4d7b2e123f92bef39b48c8b9385b2917090
Author: Hasan Basbunar <basbunarhasan@gmail.com>

bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select

bpf_dbg's interactive 'select <N>' command advances the pcap cursor
one packet too many. This fix splits pcap_next_pkt() into a pure
validator (pcap_curr_pkt_valid()) and a validate-advance-validate
combinator, and adjusts boundary checks to correctly handle packets
ending at the file boundary.

Fixes: fd981e3c321a ("filter: bpf_dbg: add minimal bpf debugger")

> diff --git a/tools/bpf/bpf_dbg.c b/tools/bpf/bpf_dbg.c
> --- a/tools/bpf/bpf_dbg.c
> +++ b/tools/bpf/bpf_dbg.c
> @@ -918,21 +918,30 @@ static struct pcap_pkthdr *pcap_curr_pkt(void)
>  	return (void *) pcap_ptr_va_curr;
>  }
>
> -static bool pcap_next_pkt(void)
> +static bool pcap_curr_pkt_valid(void)
>  {
>  	struct pcap_pkthdr *hdr = pcap_curr_pkt();
>
>  	if (pcap_ptr_va_curr + sizeof(*hdr) -
> -	    pcap_ptr_va_start >= pcap_map_size)
> +	    pcap_ptr_va_start > pcap_map_size)
>  		return false;
>  	if (hdr->caplen == 0 || hdr->len == 0 || hdr->caplen > hdr->len)
>  		return false;
>  	if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen -
> -	    pcap_ptr_va_start >= pcap_map_size)
> +	    pcap_ptr_va_start > pcap_map_size)
>  		return false;
> +	return true;
> +}
>
> +static bool pcap_next_pkt(void)
> +{
> +	struct pcap_pkthdr *hdr;
> +
> +	if (!pcap_curr_pkt_valid())
> +		return false;
> +	hdr = pcap_curr_pkt();
>  	pcap_ptr_va_curr += (sizeof(*hdr) + hdr->caplen);
> -	return true;
> +	return pcap_curr_pkt_valid();
>  }
>
>  static void pcap_reset_pkt(void)
> @@ -1143,7 +1152,7 @@ static int cmd_select(char *num)
>
>  	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
>  		/* noop */;
> -	if (!have_next || pcap_curr_pkt() == NULL) {
> +	if (!have_next || !pcap_curr_pkt_valid()) {

The commit message asserts that after this fix, 'select K shows packet K
for all K in 1..N' and 'select N+1 correctly errors with no packet #N+1
available!'. Does the actual code achieve this behavior given that the
loop still initializes i to 0?

With the new pcap_next_pkt() semantics (validate-current, advance,
validate-new), trace through 'select 1' on a file with >=2 packets:

  Iteration 1 (i=0):
    pcap_next_pkt() validates pkt1 (ok)
    advances cursor to pkt2
    validates pkt2 (ok)
    returns true
    i becomes 1

  Loop exit (i=1, i<1 is false)

  Cursor is now on pkt2
  pcap_curr_pkt_valid() returns true
  Function returns CMD_OK

The user requested packet 1, but subsequent 'step' or 'run' operates on
pkt2. This reproduces the exact defect the commit claims to fix: 'Before
this fix, select 1 shows packet 2's payload.'

Similarly, trace 'select 3' on a file with exactly 3 packets:

  Iteration 1 (i=0): advances pkt1 -> pkt2, validates pkt2 (ok), i=1
  Iteration 2 (i=1): advances pkt2 -> pkt3, validates pkt3 (ok), i=2
  Iteration 3 (i=2): pkt3 valid, advances past pkt3 (cursor at EOF),
                     post-advance pcap_curr_pkt_valid() is false,
                     pcap_next_pkt() returns false, have_next=false

  Loop exits with have_next=false
  Code hits 'no packet #3 available!' even though packet 3 exists

The commit message itself identifies the correct fix: "Simply changing the
loop init to 'i = 1' (so it advances N-1 times) fixes the user-visible
symptom". That change was not applied in the diff. The loop still uses
'i = 0', which causes it to advance N times instead of N-1 times, landing
on packet N+1 when the user requested packet N.

The v3 patch discussion on lore.kernel.org/bpf addressed this by changing
the loop from 'for (i = 0; i < which ...)' to 'for (i = 1; i < which ...)'
to properly implement 1-based indexing. The current code has the
pcap_curr_pkt_valid() refactoring but is missing the loop initialization
fix.

Reference: https://lore.kernel.org/bpf/20260429123543.61559-1-basbunarhasan@gmail.com/

>  		rl_printf("no packet #%u available!\n", which);
>  		pcap_reset_pkt();
>  		return CMD_ERR;



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25109638113

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 2ca6723
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1087457
version: 3

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 2ca6723
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1087457
version: 3

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 2ca6723
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1087457
version: 3

…e in cmd_select

bpf_dbg's interactive 'select <N>' command, documented in the file
header ("select 3 (run etc will start from the 3rd packet in the
pcap)") to use 1-based packet indexing, advances the pcap cursor one
packet too many. The loop in cmd_select():

	pcap_reset_pkt();         /* cursor on packet 1 */
	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
		/* noop */;

calls pcap_next_pkt() N times to reach packet N, but pcap_next_pkt()
validates the packet at the cursor and then advances past it. After
N calls the cursor is on packet N+1, so 'select 3' positions on
packet 4, 'select 4' on packet 5, etc.

Simply changing the loop init to 'i = 1' (so it advances N-1 times)
fixes the user-visible symptom but leaves the final landed-on packet
unvalidated, and combined with pcap_next_pkt()'s '>=' boundary
checks, mis-handles the boundary cases on the last and just-past-the-
last packet. As pointed out by the Sashiko AI review on v1 and v2,
this surfaces in two ways:

  1. On a perfect pcap (no trailing bytes after the last packet),
     pcap_next_pkt()'s '>= pcap_map_size' rejects packets whose body
     ends exactly at the file boundary, so 'select N' on an N-packet
     file errors as "no packet #N available" even though the packet
     is fully in-bounds.

  2. On a truncated pcap (filehdr + a few stray bytes that happen to
     pass try_load_pcap()'s 'pcap_map_size > sizeof(filehdr)' guard
     but not enough to contain a full pkthdr), 'select 1' returns
     CMD_OK without ever validating the header, and a subsequent
     'step' or 'run' dereferences pcap_curr_pkt()->caplen past the
     mapped region.

Fix all three issues by splitting pcap_next_pkt() into a pure
validator (pcap_curr_pkt_valid()) and a validate-advance-validate
combinator. The boundary check now uses '>' instead of '>=', so a
packet whose body ends exactly at pcap_map_size is correctly accepted.
pcap_next_pkt() returns true only when both the current packet was
valid and, after advancing, the new cursor position is also valid.
This means the do-while in cmd_run() exits cleanly after the last
packet (no past-end dereference), and cmd_select() can call
pcap_curr_pkt_valid() after the loop to bounds-check the final
packet.

Reproduction (deterministic, no kernel needed): build bpf_dbg from
the unmodified tree, synthesize a pcap with N>=2 packets each with a
distinct payload byte, and drive 'select 1 / step 1 / quit'. Before
this fix, 'select 1' shows packet 2's payload. After this fix,
'select K' shows packet K for all K in 1..N, 'select N+1' correctly
errors with "no packet #N+1 available!", and 'select 1' on a pcap
truncated to filehdr + 1 byte also correctly errors.

Cloudflare's downstream mirror at github.com/cloudflare/bpftools
carries the same defect.

Fixes: fd981e3 ("filter: bpf_dbg: add minimal bpf debugger")
Signed-off-by: Hasan Basbunar <basbunarhasan@gmail.com>
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot force-pushed the series/1087457=>bpf-next branch from c9176f1 to a4e19ca Compare May 2, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants