|
| 1 | + |
| 2 | +==================== |
| 3 | +eBPF Instruction Set |
| 4 | +==================== |
| 5 | + |
| 6 | +Registers and calling convention |
| 7 | +================================ |
| 8 | + |
| 9 | +eBPF has 10 general purpose registers and a read-only frame pointer register, |
| 10 | +all of which are 64-bits wide. |
| 11 | + |
| 12 | +The eBPF calling convention is defined as: |
| 13 | + |
| 14 | + * R0: return value from function calls, and exit value for eBPF programs |
| 15 | + * R1 - R5: arguments for function calls |
| 16 | + * R6 - R9: callee saved registers that function calls will preserve |
| 17 | + * R10: read-only frame pointer to access stack |
| 18 | + |
| 19 | +R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if |
| 20 | +necessary across calls. |
| 21 | + |
| 22 | +Instruction encoding |
| 23 | +==================== |
| 24 | + |
| 25 | +eBPF has two instruction encodings: |
| 26 | + |
| 27 | + * the basic instruction encoding, which uses 64 bits to encode an instruction |
| 28 | + * the wide instruction encoding, which appends a second 64-bit immediate value |
| 29 | + (imm64) after the basic instruction for a total of 128 bits. |
| 30 | + |
| 31 | +The basic instruction encoding looks as follows: |
| 32 | + |
| 33 | + ============= ======= =============== ==================== ============ |
| 34 | + 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) |
| 35 | + ============= ======= =============== ==================== ============ |
| 36 | + immediate offset source register destination register opcode |
| 37 | + ============= ======= =============== ==================== ============ |
| 38 | + |
| 39 | +Note that most instructions do not use all of the fields. |
| 40 | +Unused fields shall be cleared to zero. |
| 41 | + |
| 42 | +Instruction classes |
| 43 | +------------------- |
| 44 | + |
| 45 | +The three LSB bits of the 'opcode' field store the instruction class: |
| 46 | + |
| 47 | + ========= ===== =============================== |
| 48 | + class value description |
| 49 | + ========= ===== =============================== |
| 50 | + BPF_LD 0x00 non-standard load operations |
| 51 | + BPF_LDX 0x01 load into register operations |
| 52 | + BPF_ST 0x02 store from immediate operations |
| 53 | + BPF_STX 0x03 store from register operations |
| 54 | + BPF_ALU 0x04 32-bit arithmetic operations |
| 55 | + BPF_JMP 0x05 64-bit jump operations |
| 56 | + BPF_JMP32 0x06 32-bit jump operations |
| 57 | + BPF_ALU64 0x07 64-bit arithmetic operations |
| 58 | + ========= ===== =============================== |
| 59 | + |
| 60 | +Arithmetic and jump instructions |
| 61 | +================================ |
| 62 | + |
| 63 | +For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and |
| 64 | +BPF_JMP32), the 8-bit 'opcode' field is divided into three parts: |
| 65 | + |
| 66 | + ============== ====== ================= |
| 67 | + 4 bits (MSB) 1 bit 3 bits (LSB) |
| 68 | + ============== ====== ================= |
| 69 | + operation code source instruction class |
| 70 | + ============== ====== ================= |
| 71 | + |
| 72 | +The 4th bit encodes the source operand: |
| 73 | + |
| 74 | + ====== ===== ======================================== |
| 75 | + source value description |
| 76 | + ====== ===== ======================================== |
| 77 | + BPF_K 0x00 use 32-bit immediate as source operand |
| 78 | + BPF_X 0x08 use 'src_reg' register as source operand |
| 79 | + ====== ===== ======================================== |
| 80 | + |
| 81 | +The four MSB bits store the operation code. |
| 82 | + |
| 83 | + |
| 84 | +Arithmetic instructions |
| 85 | +----------------------- |
| 86 | + |
| 87 | +BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for |
| 88 | +otherwise identical operations. |
| 89 | +The code field encodes the operation as below: |
| 90 | + |
| 91 | + ======== ===== ================================================= |
| 92 | + code value description |
| 93 | + ======== ===== ================================================= |
| 94 | + BPF_ADD 0x00 dst += src |
| 95 | + BPF_SUB 0x10 dst -= src |
| 96 | + BPF_MUL 0x20 dst \*= src |
| 97 | + BPF_DIV 0x30 dst /= src |
| 98 | + BPF_OR 0x40 dst \|= src |
| 99 | + BPF_AND 0x50 dst &= src |
| 100 | + BPF_LSH 0x60 dst <<= src |
| 101 | + BPF_RSH 0x70 dst >>= src |
| 102 | + BPF_NEG 0x80 dst = ~src |
| 103 | + BPF_MOD 0x90 dst %= src |
| 104 | + BPF_XOR 0xa0 dst ^= src |
| 105 | + BPF_MOV 0xb0 dst = src |
| 106 | + BPF_ARSH 0xc0 sign extending shift right |
| 107 | + BPF_END 0xd0 byte swap operations (see separate section below) |
| 108 | + ======== ===== ================================================= |
| 109 | + |
| 110 | +BPF_ADD | BPF_X | BPF_ALU means:: |
| 111 | + |
| 112 | + dst_reg = (u32) dst_reg + (u32) src_reg; |
| 113 | + |
| 114 | +BPF_ADD | BPF_X | BPF_ALU64 means:: |
| 115 | + |
| 116 | + dst_reg = dst_reg + src_reg |
| 117 | + |
| 118 | +BPF_XOR | BPF_K | BPF_ALU means:: |
| 119 | + |
| 120 | + src_reg = (u32) src_reg ^ (u32) imm32 |
| 121 | + |
| 122 | +BPF_XOR | BPF_K | BPF_ALU64 means:: |
| 123 | + |
| 124 | + src_reg = src_reg ^ imm32 |
| 125 | + |
| 126 | + |
| 127 | +Byte swap instructions |
| 128 | +---------------------- |
| 129 | + |
| 130 | +The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit |
| 131 | +code field of ``BPF_END``. |
| 132 | + |
| 133 | +The byte swap instructions operate on the destination register |
| 134 | +only and do not use a separate source register or immediate value. |
| 135 | + |
| 136 | +The 1-bit source operand field in the opcode is used to to select what byte |
| 137 | +order the operation convert from or to: |
| 138 | + |
| 139 | + ========= ===== ================================================= |
| 140 | + source value description |
| 141 | + ========= ===== ================================================= |
| 142 | + BPF_TO_LE 0x00 convert between host byte order and little endian |
| 143 | + BPF_TO_BE 0x08 convert between host byte order and big endian |
| 144 | + ========= ===== ================================================= |
| 145 | + |
| 146 | +The imm field encodes the width of the swap operations. The following widths |
| 147 | +are supported: 16, 32 and 64. |
| 148 | + |
| 149 | +Examples: |
| 150 | + |
| 151 | +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: |
| 152 | + |
| 153 | + dst_reg = htole16(dst_reg) |
| 154 | + |
| 155 | +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: |
| 156 | + |
| 157 | + dst_reg = htobe64(dst_reg) |
| 158 | + |
| 159 | +``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and |
| 160 | +``BPF_TO_BE`` respectively. |
| 161 | + |
| 162 | + |
| 163 | +Jump instructions |
| 164 | +----------------- |
| 165 | + |
| 166 | +BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for |
| 167 | +otherwise identical operations. |
| 168 | +The code field encodes the operation as below: |
| 169 | + |
| 170 | + ======== ===== ========================= ============ |
| 171 | + code value description notes |
| 172 | + ======== ===== ========================= ============ |
| 173 | + BPF_JA 0x00 PC += off BPF_JMP only |
| 174 | + BPF_JEQ 0x10 PC += off if dst == src |
| 175 | + BPF_JGT 0x20 PC += off if dst > src unsigned |
| 176 | + BPF_JGE 0x30 PC += off if dst >= src unsigned |
| 177 | + BPF_JSET 0x40 PC += off if dst & src |
| 178 | + BPF_JNE 0x50 PC += off if dst != src |
| 179 | + BPF_JSGT 0x60 PC += off if dst > src signed |
| 180 | + BPF_JSGE 0x70 PC += off if dst >= src signed |
| 181 | + BPF_CALL 0x80 function call |
| 182 | + BPF_EXIT 0x90 function / program return BPF_JMP only |
| 183 | + BPF_JLT 0xa0 PC += off if dst < src unsigned |
| 184 | + BPF_JLE 0xb0 PC += off if dst <= src unsigned |
| 185 | + BPF_JSLT 0xc0 PC += off if dst < src signed |
| 186 | + BPF_JSLE 0xd0 PC += off if dst <= src signed |
| 187 | + ======== ===== ========================= ============ |
| 188 | + |
| 189 | +The eBPF program needs to store the return value into register R0 before doing a |
| 190 | +BPF_EXIT. |
| 191 | + |
| 192 | + |
| 193 | +Load and store instructions |
| 194 | +=========================== |
| 195 | + |
| 196 | +For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the |
| 197 | +8-bit 'opcode' field is divided as: |
| 198 | + |
| 199 | + ============ ====== ================= |
| 200 | + 3 bits (MSB) 2 bits 3 bits (LSB) |
| 201 | + ============ ====== ================= |
| 202 | + mode size instruction class |
| 203 | + ============ ====== ================= |
| 204 | + |
| 205 | +The size modifier is one of: |
| 206 | + |
| 207 | + ============= ===== ===================== |
| 208 | + size modifier value description |
| 209 | + ============= ===== ===================== |
| 210 | + BPF_W 0x00 word (4 bytes) |
| 211 | + BPF_H 0x08 half word (2 bytes) |
| 212 | + BPF_B 0x10 byte |
| 213 | + BPF_DW 0x18 double word (8 bytes) |
| 214 | + ============= ===== ===================== |
| 215 | + |
| 216 | +The mode modifier is one of: |
| 217 | + |
| 218 | + ============= ===== ==================================== |
| 219 | + mode modifier value description |
| 220 | + ============= ===== ==================================== |
| 221 | + BPF_IMM 0x00 64-bit immediate instructions |
| 222 | + BPF_ABS 0x20 legacy BPF packet access (absolute) |
| 223 | + BPF_IND 0x40 legacy BPF packet access (indirect) |
| 224 | + BPF_MEM 0x60 regular load and store operations |
| 225 | + BPF_ATOMIC 0xc0 atomic operations |
| 226 | + ============= ===== ==================================== |
| 227 | + |
| 228 | + |
| 229 | +Regular load and store operations |
| 230 | +--------------------------------- |
| 231 | + |
| 232 | +The ``BPF_MEM`` mode modifier is used to encode regular load and store |
| 233 | +instructions that transfer data between a register and memory. |
| 234 | + |
| 235 | +``BPF_MEM | <size> | BPF_STX`` means:: |
| 236 | + |
| 237 | + *(size *) (dst_reg + off) = src_reg |
| 238 | + |
| 239 | +``BPF_MEM | <size> | BPF_ST`` means:: |
| 240 | + |
| 241 | + *(size *) (dst_reg + off) = imm32 |
| 242 | + |
| 243 | +``BPF_MEM | <size> | BPF_LDX`` means:: |
| 244 | + |
| 245 | + dst_reg = *(size *) (src_reg + off) |
| 246 | + |
| 247 | +Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. |
| 248 | + |
| 249 | +Atomic operations |
| 250 | +----------------- |
| 251 | + |
| 252 | +Atomic operations are operations that operate on memory and can not be |
| 253 | +interrupted or corrupted by other access to the same memory region |
| 254 | +by other eBPF programs or means outside of this specification. |
| 255 | + |
| 256 | +All atomic operations supported by eBPF are encoded as store operations |
| 257 | +that use the ``BPF_ATOMIC`` mode modifier as follows: |
| 258 | + |
| 259 | + * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations |
| 260 | + * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations |
| 261 | + * 8-bit and 16-bit wide atomic operations are not supported. |
| 262 | + |
| 263 | +The imm field is used to encode the actual atomic operation. |
| 264 | +Simple atomic operation use a subset of the values defined to encode |
| 265 | +arithmetic operations in the imm field to encode the atomic operation: |
| 266 | + |
| 267 | + ======== ===== =========== |
| 268 | + imm value description |
| 269 | + ======== ===== =========== |
| 270 | + BPF_ADD 0x00 atomic add |
| 271 | + BPF_OR 0x40 atomic or |
| 272 | + BPF_AND 0x50 atomic and |
| 273 | + BPF_XOR 0xa0 atomic xor |
| 274 | + ======== ===== =========== |
| 275 | + |
| 276 | + |
| 277 | +``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means:: |
| 278 | + |
| 279 | + *(u32 *)(dst_reg + off16) += src_reg |
| 280 | + |
| 281 | +``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means:: |
| 282 | + |
| 283 | + *(u64 *)(dst_reg + off16) += src_reg |
| 284 | + |
| 285 | +``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``. |
| 286 | + |
| 287 | +In addition to the simple atomic operations, there also is a modifier and |
| 288 | +two complex atomic operations: |
| 289 | + |
| 290 | + =========== ================ =========================== |
| 291 | + imm value description |
| 292 | + =========== ================ =========================== |
| 293 | + BPF_FETCH 0x01 modifier: return old value |
| 294 | + BPF_XCHG 0xe0 | BPF_FETCH atomic exchange |
| 295 | + BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange |
| 296 | + =========== ================ =========================== |
| 297 | + |
| 298 | +The ``BPF_FETCH`` modifier is optional for simple atomic operations, and |
| 299 | +always set for the complex atomic operations. If the ``BPF_FETCH`` flag |
| 300 | +is set, then the operation also overwrites ``src_reg`` with the value that |
| 301 | +was in memory before it was modified. |
| 302 | + |
| 303 | +The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value |
| 304 | +addressed by ``dst_reg + off``. |
| 305 | + |
| 306 | +The ``BPF_CMPXCHG`` operation atomically compares the value addressed by |
| 307 | +``dst_reg + off`` with ``R0``. If they match, the value addressed by |
| 308 | +``dst_reg + off`` is replaced with ``src_reg``. In either case, the |
| 309 | +value that was at ``dst_reg + off`` before the operation is zero-extended |
| 310 | +and loaded back to ``R0``. |
| 311 | + |
| 312 | +Clang can generate atomic instructions by default when ``-mcpu=v3`` is |
| 313 | +enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction |
| 314 | +Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable |
| 315 | +the atomics features, while keeping a lower ``-mcpu`` version, you can use |
| 316 | +``-Xclang -target-feature -Xclang +alu32``. |
| 317 | + |
| 318 | +64-bit immediate instructions |
| 319 | +----------------------------- |
| 320 | + |
| 321 | +Instructions with the ``BPF_IMM`` mode modifier use the wide instruction |
| 322 | +encoding for an extra imm64 value. |
| 323 | + |
| 324 | +There is currently only one such instruction. |
| 325 | + |
| 326 | +``BPF_LD | BPF_DW | BPF_IMM`` means:: |
| 327 | + |
| 328 | + dst_reg = imm64 |
| 329 | + |
| 330 | + |
| 331 | +Legacy BPF Packet access instructions |
| 332 | +------------------------------------- |
| 333 | + |
| 334 | +eBPF has special instructions for access to packet data that have been |
| 335 | +carried over from classic BPF to retain the performance of legacy socket |
| 336 | +filters running in the eBPF interpreter. |
| 337 | + |
| 338 | +The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and |
| 339 | +``BPF_IND | <size> | BPF_LD``. |
| 340 | + |
| 341 | +These instructions are used to access packet data and can only be used when |
| 342 | +the program context is a pointer to networking packet. ``BPF_ABS`` |
| 343 | +accesses packet data at an absolute offset specified by the immediate data |
| 344 | +and ``BPF_IND`` access packet data at an offset that includes the value of |
| 345 | +a register in addition to the immediate data. |
| 346 | + |
| 347 | +These instructions have seven implicit operands: |
| 348 | + |
| 349 | + * Register R6 is an implicit input that must contain pointer to a |
| 350 | + struct sk_buff. |
| 351 | + * Register R0 is an implicit output which contains the data fetched from |
| 352 | + the packet. |
| 353 | + * Registers R1-R5 are scratch registers that are clobbered after a call to |
| 354 | + ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions. |
| 355 | + |
| 356 | +These instructions have an implicit program exit condition as well. When an |
| 357 | +eBPF program is trying to access the data beyond the packet boundary, the |
| 358 | +program execution will be aborted. |
| 359 | + |
| 360 | +``BPF_ABS | BPF_W | BPF_LD`` means:: |
| 361 | + |
| 362 | + R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32)) |
| 363 | + |
| 364 | +``BPF_IND | BPF_W | BPF_LD`` means:: |
| 365 | + |
| 366 | + R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) |
0 commit comments