Skip to content

Commit 1c0bb5c

Browse files
committed
add llvm value note, part 1
1 parent 893d443 commit 1c0bb5c

8 files changed

+287
-2
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ _build_rtd
1313
.cache
1414
.python-version
1515
dist
16-
tags
16+
tags
17+
task/

png/image_llvm_ir_layout.png

350 KB
Loading

png/image_llvm_value-1.png

295 KB
Loading

png/image_llvm_value-2.png

303 KB
Loading

png/image_llvm_value.png

199 KB
Loading

source/llvm/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Welcome to llvm-learning-notes's documentation!
1313
file_into_mem
1414
llvm_ir
1515
vtable
16+
llvm_value
1617

1718
Indices and tables
1819
==================

source/llvm/llvm_ir.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,13 @@
1313

1414
## 站在前人的基础上
1515

16+
- [官方介绍llvm ir的视频](https://www.youtube.com/watch?v=m8G_S5LwlTo&t=249s&ab_channel=LLVM),强烈推荐
17+
1618
- [LLVM IR入门指南](https://evian-zhang.github.io/llvm-ir-tutorial/index.html) :这篇博客应该是中文里介绍llvm ir最基础的,最好的一篇了,适合入门。作者不是一上来就对llvm每个ir进行介绍,而是从体系结构的角度描述了llvm ir的整体概念。特别是第三章介绍的数据表示
1719

1820
- [Mapping High Level Constructs to LLVM IR](https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/index.html):这篇博客对llvm ir介绍的十分详细,并介绍了一些c++的实现,如class,constructer,vtable。不过他ir版本有些老了,我后面会用llvm17举例
1921

20-
- [LLVM Language Reference Manual](https://llvm.org/docs/LangRef.html#runtime-preemption-specifiers)想要详细了解还是要看官方的百科全书
22+
- [LLVM Language Reference Manual](https://llvm.org/docs/LangRef.html#runtime-preemption-specifiers)想要详细了解每个ir的language ref还是要看官方的百科全书
2123

2224
- [A Complete Guide to LLVM for Programming Language Creators](https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/)这篇博客介绍了如何使用api来创建llvm ir
2325

@@ -26,6 +28,12 @@
2628

2729
编译过程中无外乎涉及到的基本概念,符号,符号表,类型系统,数据布局使用。可以从这几个角度思考llvm ir的设计,加深自己的理解。
2830

31+
llvm ir的布局:
32+
33+
![alt text](../../png/image_llvm_ir_layout.png)
34+
35+
上述每个部分官方视频中都有详细介绍,推荐先看一遍
36+
2937
### 符号与符号表
3038

3139
- 符号通常指程序中使用的变量、函数、类型以及其他标识符的名称。
@@ -161,6 +169,8 @@ entry:
161169

162170
### vtable in llvm ir
163171

172+
下面例子介绍的是简单的单继承ir形式,多继承的ir更复杂,我专门在这篇详细介绍了
173+
164174
例子:https://compiler-explorer.com/z/afv6GPdPb
165175

166176
```c++

source/llvm/llvm_value.md

+273
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
# value user use in llvm
2+
3+
4+
llvm源码中通过`value``user``use`这些基础类来表示llvm ir和他们之间的def-use关系(或者说user-usee)。
5+
6+
## 预备知识点
7+
8+
- 什么是llvm ir?:首先需要了解llvm ir的基础概念和设计,可以直接看[官方介绍llvm ir的视频](https://www.youtube.com/watch?v=m8G_S5LwlTo&t=249s&ab_channel=LLVM),我也写了一篇[笔记](https://zhuanlan.zhihu.com/p/685467026)
9+
10+
- 为什么llvm ir需要ssa?:llvm ir中除了alloca/store/load都是SSA形式的,在创建SSA形式的llvm ir时,SSA value之间的def-use信息也会一同被建立。具体解析可以看下面文章:
11+
- [对于LLVM之类的编译器是如何实现在构造 SSA 形式的 IR 的时候,计算出 def-use 链?](https://www.zhihu.com/question/41999500/answer/93243408)
12+
- [SSA的优势](https://blog.csdn.net/dashuniuniu/article/details/52189814)
13+
14+
## 站在前人的基础上
15+
16+
下面是已有的优秀文章:
17+
- [深入浅出 LLVM之 Value 、User 、Use 源码解析](https://zhuanlan.zhihu.com/p/666016704)
18+
- [LLVM笔记(16) - IR基础详解(一) underlying class](https://www.cnblogs.com/Five100Miles/p/14083814.html)
19+
20+
本文是对上述文章的总结和补充,用来加深理解,如有不对之处请指出。前人说过的部分我这里就不。
21+
22+
源码llvm17.0.6
23+
24+
## 概要
25+
26+
-
27+
28+
## 一切皆value
29+
30+
我们看下`llvm::value`的定义:
31+
32+
```c++
33+
/// LLVM Value Representation
34+
///
35+
/// This is a very important LLVM class. It is the base class of all values
36+
/// computed by a program that may be used as operands to other values. Value is
37+
/// the super class of other important classes such as Instruction and Function.
38+
/// All Values have a Type. Type is not a subclass of Value. Some values can
39+
/// have a name and they belong to some Module. Setting the name on the Value
40+
/// automatically updates the module's symbol table.
41+
///
42+
/// Every value has a "use list" that keeps track of which other Values are
43+
/// using this Value. A Value can also have an arbitrary number of ValueHandle
44+
/// objects that watch it and listen to RAUW and Destroy events. See
45+
/// llvm/IR/ValueHandle.h for details.
46+
class Value {
47+
Type *VTy;
48+
Use *UseList;
49+
const unsigned char SubclassID; // Subclass identifier (for isa/dyn_cast)
50+
unsigned char HasValueHandle : 1; // Has a ValueHandle pointing to this?
51+
unsigned short SubclassData;
52+
...
53+
}
54+
```
55+
通过value的注释,我们基本可以了解到:
56+
- `value`是llvm中的基类。比如用到最多的`Instruction`、还有`Function`、`BasicBllock`等等都是value,[下图](https://llvm.org/doxygen/classllvm_1_1Value.html)展示了`value`的继承类。
57+
- 任何`Value`都有一个类型。并且有名字的`value`会自动注册到`module`的符号表中
58+
- 通过添加一个`Use`类的`UseList`指针,用于跟踪使用过该`Value`的其他值。后面会详细介绍`Use`类的用法
59+
- 另外一个重要的成员是SubclassID, 这是一个const值, 用来指示这个Value的子类型. 其用于isa<>与dyn_cast<>的判断.详细可以看网上关于llvm的RTTI介绍,比如这篇。
60+
61+
![alt text](../../png/image_llvm_value.png)
62+
63+
下面就是重点介绍User、Use类,大家通过上述的一些文章已经了解整体设计,或者有些懵懵懂懂的。
64+
65+
别着急,我下面梳理下具体细节和总结点
66+
67+
## 初识User、Use类
68+
69+
首先我们要理解llvm使用User、Use类的目的就是在生成Instruction的时候就建立好指令之间的User-Usee关系(还有BB间的关系)。有些编译器会先创建好ir后再通过遍历实现上述目的。这也是llvm设计的精巧之处,不然没有这么复杂o(* ̄︶ ̄*)o
70+
71+
带着这个目的,我们可以思考下:
72+
- 一条Instruction的User和Usee是什么?如下图1
73+
- llvm如何生成一条指令的,并且建立好User-Usee关系?
74+
- 如何通过User找到所有Usee,或反向通过Usee找到所有的Users?双向的,如下图2
75+
76+
图1:
77+
![图1](../../png/image_llvm_value-2.png)
78+
79+
图2:
80+
![图2](../../png/image_llvm_value-1.png)
81+
82+
### 一条Instruction的内存布局
83+
84+
[深入浅出 LLVM之 Value 、User 、Use 源码解析](https://zhuanlan.zhihu.com/p/666016704)这篇文章的大佬在其中一章中对一条Instruction的创建和内存布局描述的很详细了,我这里就不再细说了。总结下:
85+
86+
- Instruction的继承关系:`Instruciotn <-- User <-- Value`。这其中User类的作用就是主导User(Value)和Uses的内存布局,也可以说建立好User(Value)->Usee链关系。
87+
- 一个Instruction创建一个User和几个Use(operation)是固定的一块内存。是通过`operation new`和`placement new`自定义new的方式分配内存并初始化(之前介绍的llvm读文件的MemoryBuffer也是通过这种方式)
88+
- 这样设计的好处是User在寻找Usee时可以直接通过计算Use*偏移就可以得到第几个操作数了。不用维护链表所以我们看到User里面很干净,连Use的指针都不需要保存,也节省了空间。
89+
90+
其中,有两种布局方式(这里的P就是Use):
91+
- a)固定数量的Use:`User::allocateFixedOperandUser`方法
92+
- b)大数量的Use:`User::allocHungoffUses`方法
93+
94+
```
95+
Layout a) is modelled by prepending the User object by the Use[] array.
96+
...---.---.---.---.-------...
97+
| P | P | P | P | User
98+
'''---'---'---'---'-------'''
99+
```
100+
101+
```
102+
Layout b) is modelled by pointing at the Use[] array.
103+
104+
.-------.------...
105+
| Use** | User
106+
'-------'------'''
107+
|
108+
v
109+
.---.---.---.---...
110+
| P | P | P | P |
111+
'---'---'---'---'''
112+
113+
```
114+
https://www.llvm.org/docs/ProgrammersManual.html#the-core-llvm-class-hierarchy-reference
115+
116+
### User-->Use
117+
118+
如下是`User`的定义和部分重要函数
119+
120+
```C++
121+
class User : public Value {
122+
123+
LLVM_ATTRIBUTE_ALWAYS_INLINE static void *
124+
allocateFixedOperandUser(size_t, unsigned, unsigned);
125+
126+
protected:
127+
/// Allocate a User with an operand pointer co-allocated.
128+
///
129+
/// This is used for subclasses which need to allocate a variable number
130+
/// of operands, ie, 'hung off uses'.
131+
void *operator new(size_t Size);
132+
133+
/// Allocate a User with the operands co-allocated.
134+
///
135+
/// This is used for subclasses which have a fixed number of operands.
136+
void *operator new(size_t Size, unsigned Us);
137+
138+
/// Allocate a User with the operands co-allocated. If DescBytes is non-zero
139+
/// then allocate an additional DescBytes bytes before the operands. These
140+
/// bytes can be accessed by calling getDescriptor.
141+
///
142+
/// DescBytes needs to be divisible by sizeof(void *). The allocated
143+
/// descriptor, if any, is aligned to sizeof(void *) bytes.
144+
///
145+
/// This is used for subclasses which have a fixed number of operands.
146+
void *operator new(size_t Size, unsigned Us, unsigned DescBytes);
147+
148+
template <int Idx> Use &Op() {
149+
return OpFrom<Idx>(this);
150+
}
151+
template <int Idx> const Use &Op() const {
152+
return OpFrom<Idx>(this);
153+
}
154+
155+
private:
156+
const Use *getHungOffOperands() const {
157+
return *(reinterpret_cast<const Use *const *>(this) - 1);
158+
}
159+
160+
Use *&getHungOffOperands() { return *(reinterpret_cast<Use **>(this) - 1); }
161+
162+
const Use *getIntrusiveOperands() const {
163+
return reinterpret_cast<const Use *>(this) - NumUserOperands;
164+
}
165+
166+
public:
167+
const Use *getOperandList() const {
168+
return HasHungOffUses ? getHungOffOperands() : getIntrusiveOperands();
169+
}
170+
171+
Value *getOperand(unsigned i) const {
172+
assert(i < NumUserOperands && "getOperand() out of range!");
173+
return getOperandList()[i];
174+
}
175+
Use &getOperandUse(unsigned i) {
176+
assert(i < NumUserOperands && "getOperandUse() out of range!");
177+
return getOperandList()[i];
178+
}
179+
180+
unsigned getNumOperands() const { return NumUserOperands; }
181+
182+
183+
// Methods for support type inquiry through isa, cast, and dyn_cast:
184+
static bool classof(const Value *V) {
185+
return isa<Instruction>(V) || isa<Constant>(V);
186+
}
187+
};
188+
```
189+
190+
如何通过User找到Usee?
191+
192+
原理就很简单了。由于uses的内存是固定分配好的,通过Use的首地址后计算index的偏移量,如下函数实现:
193+
194+
- `getOperand`函数
195+
- `Op<>()`函数(通过模块偏特化实现的,可以静态检查index数量)
196+
197+
这里补充下`Op<-1>()`的实现(负索引代表从后往前),是通过模板偏特化实现,好处是可以静态检查index范围是否合法。但是由于proctect需要每个子类都要实现偏特化。如下代码:
198+
199+
```c++
200+
template <int Idx> Use &Op() {
201+
return OpFrom<Idx>(this);
202+
}
203+
204+
template <int Idx, typename U> static Use &OpFrom(const U *that) {
205+
return Idx < 0
206+
? OperandTraits<U>::op_end(const_cast<U*>(that))[Idx]
207+
: OperandTraits<U>::op_begin(const_cast<U*>(that))[Idx];
208+
}
209+
210+
template <class>
211+
struct OperandTraits;
212+
213+
template <>
214+
struct OperandTraits<BinaryOperator> :
215+
public FixedNumOperandTraits<BinaryOperator, 2> {
216+
};
217+
218+
template <typename SubClass, unsigned ARITY>
219+
struct FixedNumOperandTraits {
220+
static Use *op_begin(SubClass* U) {
221+
static_assert(
222+
!std::is_polymorphic<SubClass>::value,
223+
"adding virtual methods to subclasses of User breaks use lists");
224+
return reinterpret_cast<Use*>(U) - ARITY;
225+
}
226+
static Use *op_end(SubClass* U) {
227+
return reinterpret_cast<Use*>(U);
228+
}
229+
static unsigned operands(const User*) {
230+
return ARITY;
231+
}
232+
};
233+
```
234+
235+
## Use类的作用
236+
237+
我们首先看下Use类的定义:
238+
239+
```c++
240+
/// A Use represents the edge between a Value definition and its users.
241+
///
242+
/// This is notionally a two-dimensional linked list. It supports traversing
243+
/// all of the uses for a particular value definition. It also supports jumping
244+
/// directly to the used value when we arrive from the User's operands, and
245+
/// jumping directly to the User when we arrive from the Value's uses.
246+
class Use {
247+
...
248+
private:
249+
Value *Val = nullptr;
250+
Use *Next = nullptr;
251+
Use **Prev = nullptr;
252+
User *Parent = nullptr;
253+
254+
void addToList(Use **List) {
255+
Next = *List;
256+
if (Next)
257+
Next->Prev = &Next;
258+
Prev = List;
259+
*Prev = this;
260+
}
261+
262+
void removeFromList() {
263+
*Prev = Next;
264+
if (Next)
265+
Next->Prev = Prev;
266+
}
267+
};
268+
```
269+
- 我这里把Use理解成槽的概念,一个User(Instruction)创建出的几个Use(operation)是固定的一块内存,里面具体填什么value可以随时替换
270+
271+
## 引用
272+
273+
- http://www.cs.toronto.edu/~pekhimenko/courses/cscd70-w18/docs/Tutorial%202%20-%20Intro%20to%20LLVM%20(Cont).pdf

0 commit comments

Comments
 (0)