Add eBPF ISA v4 instructions #7982

niooss-ledger · 2025-04-04T13:49:29Z

Hello,

In 2023, the eBPF instruction set was modified to add several instructions related to signed operations (load with sign-extension, signed division, etc.), in "version 4".

Here are some references about this change:

https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html (a blog post about eBPF instruction set extensions)
https://lore.kernel.org/bpf/[email protected]/ (documentation sent to Linux Kernel mailing list)
https://www.rfc-editor.org/rfc/rfc9669.html#name-sign-extension-load-operati (IETF's BPF Instruction Set Architecture standard defined the new instructions)
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n1859 (implementation of signed division and remainder in Linux kernel. This shows that 32-bit signed DIV and signed MOD are zero-extending the result in DST)
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2135 (implementation of signed memory load in Linux kernel)
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3 (commit which added signed memory load instructions in Linux kernel)

This can be tested with a recent enough version of clang and LLVM (this works with clang 19.1.4 on Alpine 3.21).
For example for signed memory load instructions:

signed int sext_8bit(signed char x) {
    return x;
}

produces:

$ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf
$ llvm-objdump -rd test.ebpf
...
0000000000000000 <sext_8bit>:
       0:  73 1a ff ff 00 00 00 00  *(u8 *)(r10 - 0x1) = r1
       1:  91 a1 ff ff 00 00 00 00  r1 = *(s8 *)(r10 - 0x1)
       2:  bc 10 00 00 00 00 00 00  w0 = w1
       3:  95 00 00 00 00 00 00 00  exit

(The second instruction is a signed memory load)

Instruction MOVS (Sign extend register MOV) uses offset to encode the conversion (whether the source register is to be considered as signed 8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is quite unclear:

They are all named MOVS in the proposal https://lore.kernel.org/bpf/[email protected]/
LLVM and Linux disassemblers only display pseudo-code (r0 = (s8)r1)
RFC 9669 (https://datatracker.ietf.org/doc/rfc9669/) uses MOVSX for all instructions.
GCC uses MOVS for all instructions: https://github.com/gcc-mirror/gcc/blob/releases/gcc-14.1.0/gcc/config/bpf/bpf.md?plain=1#L326-L365

To make the disassembled code clearer, decode such instructions with a size suffix: MOVSB, MOVSH, MOVSW. This deviation is my own choice and if you prefer to stick with what GCC does (MOVS) or what IETF's RFC standardized (MOVSX), I can change this.

To test the new instructions, I wrote a C program with several small functions and compiled it with clang -O0 -target bpf -mcpu=v4 -c and clang -O2 -target bpf -mcpu=v4 -c on Alpine 3.21. This archive contains the source code and the 2 compiled programs: ebpf_v4_signed_op.zip.

For information, eBPF ISA v4 contains other new instructions (for example 32-bit JA instruction). I choose to restrict the scope of this Pull Request to the signed operations only, to make reviewing it easier. Please let me know if you prefer a single Pull Request with all instructions from ISA v4.

GhidorahRex · 2025-04-04T14:40:43Z

According to the first blog post, v4 adds 7 new instructions - the signed operations, an unconditional jump, and a byte-swapping operation. If there's only two additional instructions, I think we can add those to this PR.

In 2023, the eBPF instruction set was modified to add several instructions related to signed operations (load with sign-extension, signed division, etc.), a 32-bit jump instruction and some byte-swap instructions. This became version 4 of eBPF ISA. Here are some references about this change: - https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html (a blog post about eBPF instruction set extensions) - https://lore.kernel.org/bpf/[email protected]/ (documentation sent to Linux Kernel mailing list) - https://www.rfc-editor.org/rfc/rfc9669.html#name-sign-extension-load-operati (IETF's BPF Instruction Set Architecture standard defined the new instructions) - https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n1859 (implementation of signed division and remainder in Linux kernel. This shows that 32-bit signed DIV and signed MOD are zero-extending the result in DST) - https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2135 (implementation of signed memory load in Linux kernel) - https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3 (commit which added signed memory load instructions in Linux kernel) This can be tested with a recent enough version of clang and LLVM (this works with clang 19.1.4 on Alpine 3.21). For example for signed memory load instructions: signed int sext_8bit(signed char x) { return x; } produces: $ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf $ llvm-objdump -rd test.ebpf ... 0000000000000000 <sext_8bit>: 0: 73 1a ff ff 00 00 00 00 *(u8 *)(r10 - 0x1) = r1 1: 91 a1 ff ff 00 00 00 00 r1 = *(s8 *)(r10 - 0x1) 2: bc 10 00 00 00 00 00 00 w0 = w1 3: 95 00 00 00 00 00 00 00 exit (The second instruction is a signed memory load) Instruction MOVS (Sign extend register MOV) uses offset to encode the conversion (whether the source register is to be considered as signed 8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is quite unclear: - They are all named MOVS in the proposal https://lore.kernel.org/bpf/[email protected]/ - LLVM and Linux disassemblers only display pseudo-code (`r0 = (s8)r1`) - RFC 9669 (https://datatracker.ietf.org/doc/rfc9669/) uses MOVSX for all instructions. - GCC uses MOVS for all instructions: https://github.com/gcc-mirror/gcc/blob/releases/gcc-14.1.0/gcc/config/bpf/bpf.md?plain=1#L326-L365 To make the disassembled code clearer, decode such instructions with a size suffix: MOVSB, MOVSH, MOVSW. The decoding of instructions 32-bit JA, BSWAP16, BSWAP32 and BSWAP64 is straightforward.

niooss-ledger · 2025-04-04T16:02:32Z

Thanks for your quick reply! I added the 32-bit-offset jump and byte-swap instructions. I tested it decoded code using __builtin_bswap... correctly using this C program:

unsigned short do_bswap16(unsigned short x) {
    return __builtin_bswap16(x);
}
unsigned int do_bswap32(unsigned int x) {
    return __builtin_bswap32(x);
}
unsigned long do_bswap64(unsigned long x) {
    return __builtin_bswap64(x);
}

By the way, I believe instructions LE16, LE32, BE16 and BE32 produce incorrect p-code semantic as the bits of the source value are not masked enough (for example :LE16 ... { dst=((dst) >> 8) | ((dst) << 8); } is missing something which restricts the result to 16 bits). But this sounds like something out of the scope of this PR (one could also argue that the precise semantic of LE16 is different, as it depends on the endianness of the host running the eBPF program). So I did not touch these instructions.

EDITED TO ADD: I opened another Pull Request to tackle this other issue: #7985

ryanmkurtz assigned GhidorahRex Apr 4, 2025

ryanmkurtz added Status: Triage Information is being gathered Feature: Processor/eBPF labels Apr 4, 2025

niooss-ledger force-pushed the ebpf-add-v4-signed-extension branch from 3ae65b8 to 9346fe2 Compare April 4, 2025 15:50

niooss-ledger changed the title ~~Add eBPF v4 signed extension~~ Add eBPF ISA v4 instructions Apr 4, 2025

niooss-ledger force-pushed the ebpf-add-v4-signed-extension branch from 9346fe2 to ed8b5cc Compare April 4, 2025 15:55

GhidorahRex added Status: Prioritize This is currently being prioritized and removed Status: Triage Information is being gathered labels Apr 4, 2025

GhidorahRex added Status: Internal This is being tracked internally by the Ghidra team and removed Status: Prioritize This is currently being prioritized labels Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add eBPF ISA v4 instructions #7982

Add eBPF ISA v4 instructions #7982

Uh oh!

niooss-ledger commented Apr 4, 2025

Uh oh!

GhidorahRex commented Apr 4, 2025

Uh oh!

niooss-ledger commented Apr 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add eBPF ISA v4 instructions #7982

Are you sure you want to change the base?

Add eBPF ISA v4 instructions #7982

Uh oh!

Conversation

niooss-ledger commented Apr 4, 2025

Uh oh!

GhidorahRex commented Apr 4, 2025

Uh oh!

niooss-ledger commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

niooss-ledger commented Apr 4, 2025 •

edited

Loading