Skip to content

Add eBPF ISA v4 instructions #7982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

niooss-ledger
Copy link
Contributor

Hello,

In 2023, the eBPF instruction set was modified to add several instructions related to signed operations (load with sign-extension, signed division, etc.), in "version 4".

Here are some references about this change:

This can be tested with a recent enough version of clang and LLVM (this works with clang 19.1.4 on Alpine 3.21).
For example for signed memory load instructions:

signed int sext_8bit(signed char x) {
    return x;
}

produces:

$ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf
$ llvm-objdump -rd test.ebpf
...
0000000000000000 <sext_8bit>:
       0:  73 1a ff ff 00 00 00 00  *(u8 *)(r10 - 0x1) = r1
       1:  91 a1 ff ff 00 00 00 00  r1 = *(s8 *)(r10 - 0x1)
       2:  bc 10 00 00 00 00 00 00  w0 = w1
       3:  95 00 00 00 00 00 00 00  exit

(The second instruction is a signed memory load)

Instruction MOVS (Sign extend register MOV) uses offset to encode the conversion (whether the source register is to be considered as signed 8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is quite unclear:

To make the disassembled code clearer, decode such instructions with a size suffix: MOVSB, MOVSH, MOVSW. This deviation is my own choice and if you prefer to stick with what GCC does (MOVS) or what IETF's RFC standardized (MOVSX), I can change this.

To test the new instructions, I wrote a C program with several small functions and compiled it with clang -O0 -target bpf -mcpu=v4 -c and clang -O2 -target bpf -mcpu=v4 -c on Alpine 3.21. This archive contains the source code and the 2 compiled programs: ebpf_v4_signed_op.zip.

For information, eBPF ISA v4 contains other new instructions (for example 32-bit JA instruction). I choose to restrict the scope of this Pull Request to the signed operations only, to make reviewing it easier. Please let me know if you prefer a single Pull Request with all instructions from ISA v4.

@GhidorahRex
Copy link
Collaborator

According to the first blog post, v4 adds 7 new instructions - the signed operations, an unconditional jump, and a byte-swapping operation. If there's only two additional instructions, I think we can add those to this PR.

@niooss-ledger niooss-ledger force-pushed the ebpf-add-v4-signed-extension branch from 3ae65b8 to 9346fe2 Compare April 4, 2025 15:50
@niooss-ledger niooss-ledger changed the title Add eBPF v4 signed extension Add eBPF ISA v4 instructions Apr 4, 2025
In 2023, the eBPF instruction set was modified to add several
instructions related to signed operations (load with sign-extension,
signed division, etc.), a 32-bit jump instruction and some byte-swap
instructions. This became version 4 of eBPF ISA.

Here are some references about this change:

- https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html
  (a blog post about eBPF instruction set extensions)
- https://lore.kernel.org/bpf/[email protected]/
  (documentation sent to Linux Kernel mailing list)
- https://www.rfc-editor.org/rfc/rfc9669.html#name-sign-extension-load-operati
  (IETF's BPF Instruction Set Architecture standard defined the new
  instructions)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n1859
  (implementation of signed division and remainder in Linux kernel.
  This shows that 32-bit signed DIV and signed MOD are zero-extending
  the result in DST)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2135
  (implementation of signed memory load in Linux kernel)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3
  (commit which added signed memory load instructions in Linux kernel)

This can be tested with a recent enough version of clang and LLVM (this
works with clang 19.1.4 on Alpine 3.21).
For example for signed memory load instructions:

    signed int sext_8bit(signed char x) {
        return x;
    }

produces:

    $ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf
    $ llvm-objdump -rd test.ebpf
    ...
    0000000000000000 <sext_8bit>:
           0:  73 1a ff ff 00 00 00 00  *(u8 *)(r10 - 0x1) = r1
           1:  91 a1 ff ff 00 00 00 00  r1 = *(s8 *)(r10 - 0x1)
           2:  bc 10 00 00 00 00 00 00  w0 = w1
           3:  95 00 00 00 00 00 00 00  exit

(The second instruction is a signed memory load)

Instruction MOVS (Sign extend register MOV) uses offset to encode the
conversion (whether the source register is to be considered as signed
8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is
quite unclear:

- They are all named MOVS in the proposal
  https://lore.kernel.org/bpf/[email protected]/
- LLVM and Linux disassemblers only display pseudo-code (`r0 = (s8)r1`)
- RFC 9669 (https://datatracker.ietf.org/doc/rfc9669/) uses MOVSX for
  all instructions.
- GCC uses MOVS for all instructions:
  https://github.com/gcc-mirror/gcc/blob/releases/gcc-14.1.0/gcc/config/bpf/bpf.md?plain=1#L326-L365

To make the disassembled code clearer, decode such instructions with a
size suffix: MOVSB, MOVSH, MOVSW.

The decoding of instructions 32-bit JA, BSWAP16, BSWAP32 and BSWAP64 is
straightforward.
@niooss-ledger niooss-ledger force-pushed the ebpf-add-v4-signed-extension branch from 9346fe2 to ed8b5cc Compare April 4, 2025 15:55
@niooss-ledger
Copy link
Contributor Author

niooss-ledger commented Apr 4, 2025

Thanks for your quick reply! I added the 32-bit-offset jump and byte-swap instructions. I tested it decoded code using __builtin_bswap... correctly using this C program:

unsigned short do_bswap16(unsigned short x) {
    return __builtin_bswap16(x);
}
unsigned int do_bswap32(unsigned int x) {
    return __builtin_bswap32(x);
}
unsigned long do_bswap64(unsigned long x) {
    return __builtin_bswap64(x);
}

By the way, I believe instructions LE16, LE32, BE16 and BE32 produce incorrect p-code semantic as the bits of the source value are not masked enough (for example :LE16 ... { dst=((dst) >> 8) | ((dst) << 8); } is missing something which restricts the result to 16 bits). But this sounds like something out of the scope of this PR (one could also argue that the precise semantic of LE16 is different, as it depends on the endianness of the host running the eBPF program). So I did not touch these instructions.

EDITED TO ADD: I opened another Pull Request to tackle this other issue: #7985

@GhidorahRex GhidorahRex added Status: Prioritize This is currently being prioritized and removed Status: Triage Information is being gathered labels Apr 4, 2025
@GhidorahRex GhidorahRex added Status: Internal This is being tracked internally by the Ghidra team and removed Status: Prioritize This is currently being prioritized labels Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Processor/eBPF Status: Internal This is being tracked internally by the Ghidra team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants