Skip to content

Commit f702ab8

Browse files
committed
feat: add man pages
1 parent 20a4ca8 commit f702ab8

9 files changed

Lines changed: 396 additions & 6 deletions

File tree

CMakeLists.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,13 @@ add_subdirectory(include)
116116
add_subdirectory(lib)
117117
add_subdirectory(tools)
118118

119+
find_program(PANDOC_PROGRAM pandoc HINTS ${PANDOC_PATH})
120+
if (PANDOC_PROGRAM)
121+
add_subdirectory(docs)
122+
else()
123+
message(WARNING "pandoc executable not found in PATH or PANDOC_PATH variable. Skipping documentation generation")
124+
endif()
125+
119126
string(REGEX MATCH "^([0-9]+)\.([0-9]+)\.([0-9]+)"
120127
LLVM_BLEACH_SEMVER_PARSE_MATCH ${LLVM_BLEACH_VERSION})
121128

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,9 @@ cmake -S . -B build
4242
cmake --build build
4343
```
4444

45-
### Run:
46-
47-
```sh
48-
build/bin/llvm-bleach test/tools/llvm-bleach/inputs/foo.mir --instructions test/tools/llvm-bleach/inputs/addsub.yaml
49-
```
45+
### Guides
46+
llvm-bleach currently comes with documentation for all of 3 distributed tools:
47+
* [llvm-bleach](./docs/bleach.md) - LLVM MIR to LLVM IR lifter
48+
* [mctomir](./docs/mctomir.md) - machine code to MIR lifter
49+
* [bleach-config-gen](./docs/bleach-config-gen.md) - helper tool to generate
50+
architecture configs for llvm-bleach from available templates.

default.nix

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
yq,
1313
ruby,
1414
gtest,
15+
pandoc,
1516
llvmPackages,
1617
llvmLib,
1718
...
@@ -32,12 +33,14 @@ stdenv.mkDerivation {
3233
./cmake
3334
./include
3435
./test
36+
./docs
3537
./version.json
3638
];
3739
};
3840
nativeBuildInputs = [
3941
cmake
4042
ninja
43+
pandoc
4144
];
4245
buildInputs = [
4346
llvmLib

docs/CMakeLists.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
message("PANDOC: ${PANDOC_PROGRAM}")
2+
3+
function(generate_docs input type)
4+
set(single_value_args OUTPUT)
5+
set(multi_value_args)
6+
cmake_parse_arguments(ARG "" "${single_value_args}" "${multi_value_args}"
7+
${ARGN})
8+
set(INPUT_FILE ${CMAKE_CURRENT_SOURCE_DIR}/${input}.md)
9+
add_custom_command(
10+
OUTPUT ${ARG_OUTPUT}
11+
COMMAND ${PANDOC_PROGRAM} -s -t ${type} ${INPUT_FILE} -o ${ARG_OUTPUT}
12+
DEPENDS ${INPUT_FILE}
13+
COMMENT "(Re-)building documentation from \"${input}\"")
14+
set(PANDOC_TARGET_NAME docs-${input}-${type})
15+
add_custom_target(
16+
${PANDOC_TARGET_NAME} ALL
17+
COMMAND
18+
DEPENDS ${ARG_OUTPUT})
19+
install(FILES ${ARG_OUTPUT} DESTINATION ${CMAKE_INSTALL_MANDIR}/man1)
20+
endfunction()
21+
22+
generate_docs(bleach man OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/man1/llvm-bleach.1)
23+
generate_docs(mctomir man OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/man1/mctomir.1)
24+
generate_docs(bleach-config-gen man OUTPUT
25+
${CMAKE_CURRENT_BINARY_DIR}/man1/bleach-config-gen.1)

docs/bleach-config-gen.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
% BLEACH-CONFIG-GEN(1) bleach-config-gen user manual | Version 0.1.0
2+
% Alexander Romanov
3+
% July 19, 2025
4+
5+
Config generator tool for llvm-bleach
6+
7+
# SYNOPSIS
8+
9+
```sh
10+
bleach-config-gen [OPTIONS]
11+
```
12+
13+
# DESCRIPTION
14+
15+
llvm-bleach comes with predefined set of architecture specifications to use
16+
(Currently only small subset RISC-V and AArch64).
17+
bleach-config-gen is a simple tool to generate bleach config file from provided
18+
templates.
19+
20+
# OPTIONS
21+
22+
**--help**
23+
24+
```
25+
Display available options.
26+
```
27+
28+
**-o, --output \<PATH>**
29+
30+
```
31+
Output file for LLVM IR code.
32+
```
33+
34+
**-t, --template \<FILE>**
35+
36+
```
37+
A template YAML file to generate config from
38+
```
39+
40+
**-d, --template-dir \<DIR>**
41+
42+
```
43+
A path to directory with template files. Can be used instead of --template
44+
```
45+
46+
**--march \<ARCH>**
47+
48+
```
49+
Architecture to generate config for.
50+
```

docs/bleach.md

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
% LLVM-BLEACH(1) bleach user manual | Version 0.1.0
2+
% Alexander Romanov
3+
% July 19, 2025
4+
5+
Highly configurable lifter to LLVM IR.
6+
7+
# SYNOPSIS
8+
9+
```sh
10+
llvm-bleach MIR_FILE --instructions=YAML_PATH [OTHER OPTIONS]
11+
```
12+
13+
# DESCRIPTION
14+
15+
llvm-bleach is a tool to lift LLVM MIR code to target-independent LLVM IR to
16+
analyze, modify or recompile to any other architecture. Input LLVM MIR can be
17+
lifted from ELF file with **mctomir** tool. llvm-bleach is written with the
18+
philosophy that lifter should be as generic as possible and contain no
19+
architecture-specific code as it highly increases cost of maintanence and makes
20+
supporting new source architectures more difficult. Instead bleach uses
21+
target-independent concepts and algorithms. Target description is received from
22+
LLVM backend as well as input architecture description.
23+
24+
# OPTIONS
25+
26+
**--help**
27+
28+
```
29+
Display available options.
30+
```
31+
32+
**-o \<PATH>**
33+
34+
```
35+
Output file for LLVM IR code.
36+
```
37+
38+
**--instructions=\<path>**
39+
40+
```
41+
File containing YAML description of the architecture to lift from.
42+
I.e. description of RISC-V subset that is used in source program.
43+
```
44+
45+
**--noinline-instr**
46+
47+
```
48+
Do not inline function calls generated for each instruction.
49+
```
50+
51+
**--stack-size=\<NUMBER>**
52+
53+
```
54+
The size of inline stack (in bytes) to use. Inline stack is a memory region
55+
that is used to simulate program stack of the source program. Default is
56+
8000. Set this to high enough number for your program. Stack overflows are
57+
not currently diagnosed.
58+
```
59+
60+
**--state-struct-file=\<PATH>**
61+
62+
```
63+
Path of the file to save state C struct definition in. State struct is a
64+
class that is used by lifted LLVM IR code to store current values of source
65+
program registers as well as inline stack. State struct is necessary to
66+
recompile lifted program to other architecture. Bleach generates no struct
67+
definition by default.
68+
```
69+
70+
# EXAMPLES
71+
72+
1. Lift MIR code from ./myfunc.mir file using instruction description from
73+
./riscv.yaml file. Print lifted LLVM IR on stdout.
74+
75+
```sh
76+
llvm-bleach ./myfunc.mir --instructions riscv.yaml -o -
77+
```
78+
79+
1. Same but also save state struct definition to `./state.h`
80+
81+
```sh
82+
llvm-bleach ./myfunc.mir --instructions riscv.yaml --state-struct-file=./state.h -o -
83+
```
84+
85+
1. Lift MIR code from ./myfunc.mir file using instruction description from
86+
./instrs.yaml file. Print lifted LLVM IR to ./myfunc.ll
87+
88+
```sh
89+
llvm-bleach ./myfunc.mir --instructions instrs.yaml -o ./myfunc.ll
90+
```
91+
92+
1. Same but increase inline stack size to 16000 bytes
93+
94+
```sh
95+
llvm-bleach ./myfunc.mir --instructions instrs.yaml -o ./myfunc.ll --stack-size 16000
96+
```
97+
98+
# YAML ARCHITECHURE DESCRIPTION
99+
100+
Architecture description of source program architecture consists of several
101+
top-level keys.
102+
103+
- **register-classes**
104+
105+
A mapping from custom register class names (used only as a member names in
106+
state struct definition) to regular expressions matching register names to
107+
put in this class. Register names are the same as in LLVM backend for
108+
selected architecture (e.g. X0-X31 for RISC-V or EAX, EDX, R16-R31 for X86).
109+
110+
Example for RISC-V with I and F extensions:
111+
112+
```yaml
113+
---
114+
register-classes:
115+
GPR: "X[0-9][0-9]?"
116+
FPR: "F[0-9][0-9]?_F"
117+
...
118+
```
119+
120+
```
121+
Entries for all used register classes are required but no others. I.e. if
122+
your are lifting a program compiled for rv64imf but program does not use any
123+
floating point registers user doesn't have to specify FPR register class.
124+
Though it is not forbidden
125+
```
126+
127+
- **instructions**
128+
129+
List of instruction definitions. Each instruction definition consists of
130+
LLVM IR function with the same name as an instruction in LLVM backend.
131+
For most instructions every input operand becomes an argument of this
132+
function and destination operand becomes return value.
133+
134+
Example for RISC-V ADD and SUB instructions:
135+
136+
```yaml
137+
---
138+
instructions:
139+
- ADD:
140+
func: |
141+
define i\xlen\ @ADD(i\xlen\ noundef signext %0, i\xlen\ noundef signext %1) {
142+
%3 = add i\xlen\ %1, %0
143+
ret i\xlen\ %3
144+
}
145+
- SUB:
146+
func: |
147+
define i\xlen\ @SUB(i\xlen\ noundef signext %0, i\xlen\ noundef signext %1) {
148+
%3 = sub i\xlen\ %0, %1
149+
ret i\xlen\ %3
150+
}
151+
...
152+
```
153+
154+
```
155+
This list should contain entries for all instructions that are used in the
156+
source program but no other instructions are required.
157+
```
158+
159+
- **constant-registers**
160+
161+
This key is used to specify constant registers of source architecture
162+
(e.g. X0 for RISC-V or XZR for AArch64). It consists of mappings from
163+
constant register names to their values.
164+
165+
Example for XZR AArch64 register:
166+
167+
```yaml
168+
---
169+
constant-registers:
170+
XZR: 0x0000000000000000
171+
...
172+
```
173+
174+
- **stack-pointer**
175+
176+
Put the name of stack pointer register under this key. It is used to detect
177+
that memory instruction works with program stack and not the heap.
178+
179+
Example for AArch64:
180+
181+
```yaml
182+
---
183+
stack-pointer: SP
184+
...
185+
```
186+
187+
```
188+
Example for RISC-V:
189+
```
190+
191+
```yaml
192+
---
193+
stack-pointer: X2
194+
...
195+
```
196+
197+
- **extern-functions**
198+
199+
A list of globals **declarations** (functions or global variables ) that
200+
are used by any of specified instructions. External declarations are a way
201+
to insert callbacks into your program, emulate CSRs or capture memory
202+
accesses. You use these definitions in your instruction descriptions. E.g.
203+
call external `load_double_word` function from load instruction description
204+
or insert a callback that logs register values in every instruction.
205+
206+
Example for AArch64 with definitions of `store_dw` and `load_dw` functions
207+
as well as global `NZCV` flags register:
208+
209+
```yaml
210+
---
211+
extern-functions:
212+
- "@NZCV = external global i64"
213+
- "declare void @store_dw(i64 %val, i64 %addr)"
214+
- "declare i64 @load_dw(i64 %addr)"
215+
216+
instructions:
217+
- LDRXui:
218+
func: |
219+
define i64 @LDRXui(i64 %base, i64 %off) {
220+
%addr = add i64 %base, %off
221+
%val = call i64 @load_dw(i64 %val, i64 %addr)
222+
ret i64 %val
223+
}
224+
...
225+
```
226+
227+
# State Struct Definition
228+
229+
Here's an example of generated state struct definition for architecture with
230+
stack size equal to 8000 bytes and the following register classes:
231+
232+
Register classes:
233+
234+
```yaml
235+
---
236+
register-classes:
237+
GPR: "X[0-9][0-9]?"
238+
...
239+
```
240+
241+
Generated state struct definition:
242+
243+
```C
244+
#include <stdint.h>
245+
struct register_state {
246+
int64_t GPR[32];
247+
int64_t stack[1000];
248+
};
249+
```

0 commit comments

Comments
 (0)