|
| 1 | +% LLVM-BLEACH(1) bleach user manual | Version 0.1.0 |
| 2 | +% Alexander Romanov |
| 3 | +% July 19, 2025 |
| 4 | + |
| 5 | +Highly configurable lifter to LLVM IR. |
| 6 | + |
| 7 | +# SYNOPSIS |
| 8 | + |
| 9 | +```sh |
| 10 | +llvm-bleach MIR_FILE --instructions=YAML_PATH [OTHER OPTIONS] |
| 11 | +``` |
| 12 | + |
| 13 | +# DESCRIPTION |
| 14 | + |
| 15 | +llvm-bleach is a tool to lift LLVM MIR code to target-independent LLVM IR to |
| 16 | +analyze, modify or recompile to any other architecture. Input LLVM MIR can be |
| 17 | +lifted from ELF file with **mctomir** tool. llvm-bleach is written with the |
| 18 | +philosophy that lifter should be as generic as possible and contain no |
| 19 | +architecture-specific code as it highly increases cost of maintanence and makes |
| 20 | +supporting new source architectures more difficult. Instead bleach uses |
| 21 | +target-independent concepts and algorithms. Target description is received from |
| 22 | +LLVM backend as well as input architecture description. |
| 23 | + |
| 24 | +# OPTIONS |
| 25 | + |
| 26 | +**--help** |
| 27 | + |
| 28 | +``` |
| 29 | +Display available options. |
| 30 | +``` |
| 31 | + |
| 32 | +**-o \<PATH>** |
| 33 | + |
| 34 | +``` |
| 35 | +Output file for LLVM IR code. |
| 36 | +``` |
| 37 | + |
| 38 | +**--instructions=\<path>** |
| 39 | + |
| 40 | +``` |
| 41 | +File containing YAML description of the architecture to lift from. |
| 42 | +I.e. description of RISC-V subset that is used in source program. |
| 43 | +``` |
| 44 | + |
| 45 | +**--noinline-instr** |
| 46 | + |
| 47 | +``` |
| 48 | +Do not inline function calls generated for each instruction. |
| 49 | +``` |
| 50 | + |
| 51 | +**--stack-size=\<NUMBER>** |
| 52 | + |
| 53 | +``` |
| 54 | +The size of inline stack (in bytes) to use. Inline stack is a memory region |
| 55 | +that is used to simulate program stack of the source program. Default is |
| 56 | +8000. Set this to high enough number for your program. Stack overflows are |
| 57 | +not currently diagnosed. |
| 58 | +``` |
| 59 | + |
| 60 | +**--state-struct-file=\<PATH>** |
| 61 | + |
| 62 | +``` |
| 63 | +Path of the file to save state C struct definition in. State struct is a |
| 64 | +class that is used by lifted LLVM IR code to store current values of source |
| 65 | +program registers as well as inline stack. State struct is necessary to |
| 66 | +recompile lifted program to other architecture. Bleach generates no struct |
| 67 | +definition by default. |
| 68 | +``` |
| 69 | + |
| 70 | +# EXAMPLES |
| 71 | + |
| 72 | +1. Lift MIR code from ./myfunc.mir file using instruction description from |
| 73 | + ./riscv.yaml file. Print lifted LLVM IR on stdout. |
| 74 | + |
| 75 | + ```sh |
| 76 | + llvm-bleach ./myfunc.mir --instructions riscv.yaml -o - |
| 77 | + ``` |
| 78 | + |
| 79 | +1. Same but also save state struct definition to `./state.h` |
| 80 | + |
| 81 | + ```sh |
| 82 | + llvm-bleach ./myfunc.mir --instructions riscv.yaml --state-struct-file=./state.h -o - |
| 83 | + ``` |
| 84 | + |
| 85 | +1. Lift MIR code from ./myfunc.mir file using instruction description from |
| 86 | + ./instrs.yaml file. Print lifted LLVM IR to ./myfunc.ll |
| 87 | + |
| 88 | + ```sh |
| 89 | + llvm-bleach ./myfunc.mir --instructions instrs.yaml -o ./myfunc.ll |
| 90 | + ``` |
| 91 | + |
| 92 | +1. Same but increase inline stack size to 16000 bytes |
| 93 | + |
| 94 | + ```sh |
| 95 | + llvm-bleach ./myfunc.mir --instructions instrs.yaml -o ./myfunc.ll --stack-size 16000 |
| 96 | + ``` |
| 97 | + |
| 98 | +# YAML ARCHITECHURE DESCRIPTION |
| 99 | + |
| 100 | +Architecture description of source program architecture consists of several |
| 101 | +top-level keys. |
| 102 | + |
| 103 | +- **register-classes** |
| 104 | + |
| 105 | + A mapping from custom register class names (used only as a member names in |
| 106 | + state struct definition) to regular expressions matching register names to |
| 107 | + put in this class. Register names are the same as in LLVM backend for |
| 108 | + selected architecture (e.g. X0-X31 for RISC-V or EAX, EDX, R16-R31 for X86). |
| 109 | + |
| 110 | + Example for RISC-V with I and F extensions: |
| 111 | + |
| 112 | +```yaml |
| 113 | + --- |
| 114 | + register-classes: |
| 115 | + GPR: "X[0-9][0-9]?" |
| 116 | + FPR: "F[0-9][0-9]?_F" |
| 117 | + ... |
| 118 | +``` |
| 119 | + |
| 120 | +``` |
| 121 | +Entries for all used register classes are required but no others. I.e. if |
| 122 | +your are lifting a program compiled for rv64imf but program does not use any |
| 123 | +floating point registers user doesn't have to specify FPR register class. |
| 124 | +Though it is not forbidden |
| 125 | +``` |
| 126 | + |
| 127 | +- **instructions** |
| 128 | + |
| 129 | + List of instruction definitions. Each instruction definition consists of |
| 130 | + LLVM IR function with the same name as an instruction in LLVM backend. |
| 131 | + For most instructions every input operand becomes an argument of this |
| 132 | + function and destination operand becomes return value. |
| 133 | + |
| 134 | + Example for RISC-V ADD and SUB instructions: |
| 135 | + |
| 136 | +```yaml |
| 137 | +--- |
| 138 | +instructions: |
| 139 | + - ADD: |
| 140 | + func: | |
| 141 | + define i\xlen\ @ADD(i\xlen\ noundef signext %0, i\xlen\ noundef signext %1) { |
| 142 | + %3 = add i\xlen\ %1, %0 |
| 143 | + ret i\xlen\ %3 |
| 144 | + } |
| 145 | + - SUB: |
| 146 | + func: | |
| 147 | + define i\xlen\ @SUB(i\xlen\ noundef signext %0, i\xlen\ noundef signext %1) { |
| 148 | + %3 = sub i\xlen\ %0, %1 |
| 149 | + ret i\xlen\ %3 |
| 150 | + } |
| 151 | +... |
| 152 | +``` |
| 153 | + |
| 154 | +``` |
| 155 | +This list should contain entries for all instructions that are used in the |
| 156 | +source program but no other instructions are required. |
| 157 | +``` |
| 158 | + |
| 159 | +- **constant-registers** |
| 160 | + |
| 161 | + This key is used to specify constant registers of source architecture |
| 162 | + (e.g. X0 for RISC-V or XZR for AArch64). It consists of mappings from |
| 163 | + constant register names to their values. |
| 164 | + |
| 165 | + Example for XZR AArch64 register: |
| 166 | + |
| 167 | +```yaml |
| 168 | +--- |
| 169 | +constant-registers: |
| 170 | + XZR: 0x0000000000000000 |
| 171 | +... |
| 172 | +``` |
| 173 | + |
| 174 | +- **stack-pointer** |
| 175 | + |
| 176 | + Put the name of stack pointer register under this key. It is used to detect |
| 177 | + that memory instruction works with program stack and not the heap. |
| 178 | + |
| 179 | + Example for AArch64: |
| 180 | + |
| 181 | +```yaml |
| 182 | +--- |
| 183 | +stack-pointer: SP |
| 184 | +... |
| 185 | +``` |
| 186 | + |
| 187 | +``` |
| 188 | +Example for RISC-V: |
| 189 | +``` |
| 190 | + |
| 191 | +```yaml |
| 192 | +--- |
| 193 | +stack-pointer: X2 |
| 194 | +... |
| 195 | +``` |
| 196 | + |
| 197 | +- **extern-functions** |
| 198 | + |
| 199 | + A list of globals **declarations** (functions or global variables ) that |
| 200 | + are used by any of specified instructions. External declarations are a way |
| 201 | + to insert callbacks into your program, emulate CSRs or capture memory |
| 202 | + accesses. You use these definitions in your instruction descriptions. E.g. |
| 203 | + call external `load_double_word` function from load instruction description |
| 204 | + or insert a callback that logs register values in every instruction. |
| 205 | + |
| 206 | + Example for AArch64 with definitions of `store_dw` and `load_dw` functions |
| 207 | + as well as global `NZCV` flags register: |
| 208 | + |
| 209 | +```yaml |
| 210 | +--- |
| 211 | +extern-functions: |
| 212 | + - "@NZCV = external global i64" |
| 213 | + - "declare void @store_dw(i64 %val, i64 %addr)" |
| 214 | + - "declare i64 @load_dw(i64 %addr)" |
| 215 | + |
| 216 | +instructions: |
| 217 | + - LDRXui: |
| 218 | + func: | |
| 219 | + define i64 @LDRXui(i64 %base, i64 %off) { |
| 220 | + %addr = add i64 %base, %off |
| 221 | + %val = call i64 @load_dw(i64 %val, i64 %addr) |
| 222 | + ret i64 %val |
| 223 | + } |
| 224 | +... |
| 225 | +``` |
| 226 | + |
| 227 | +# State Struct Definition |
| 228 | + |
| 229 | +Here's an example of generated state struct definition for architecture with |
| 230 | +stack size equal to 8000 bytes and the following register classes: |
| 231 | + |
| 232 | +Register classes: |
| 233 | + |
| 234 | +```yaml |
| 235 | +--- |
| 236 | +register-classes: |
| 237 | + GPR: "X[0-9][0-9]?" |
| 238 | +... |
| 239 | +``` |
| 240 | + |
| 241 | +Generated state struct definition: |
| 242 | + |
| 243 | +```C |
| 244 | +#include <stdint.h> |
| 245 | +struct register_state { |
| 246 | + int64_t GPR[32]; |
| 247 | + int64_t stack[1000]; |
| 248 | +}; |
| 249 | +``` |
0 commit comments