|
| 1 | +# value user use in llvm |
| 2 | + |
| 3 | + |
| 4 | +llvm源码中通过`value`、`user`、`use`这些基础类来表示llvm ir和他们之间的def-use关系(或者说user-usee)。 |
| 5 | + |
| 6 | +## 预备知识点 |
| 7 | + |
| 8 | +- 什么是llvm ir?:首先需要了解llvm ir的基础概念和设计,可以直接看[官方介绍llvm ir的视频](https://www.youtube.com/watch?v=m8G_S5LwlTo&t=249s&ab_channel=LLVM),我也写了一篇[笔记](https://zhuanlan.zhihu.com/p/685467026) |
| 9 | + |
| 10 | + - 为什么llvm ir需要ssa?:llvm ir中除了alloca/store/load都是SSA形式的,在创建SSA形式的llvm ir时,SSA value之间的def-use信息也会一同被建立。具体解析可以看下面文章: |
| 11 | + - [对于LLVM之类的编译器是如何实现在构造 SSA 形式的 IR 的时候,计算出 def-use 链?](https://www.zhihu.com/question/41999500/answer/93243408) |
| 12 | + - [SSA的优势](https://blog.csdn.net/dashuniuniu/article/details/52189814) |
| 13 | + |
| 14 | +## 站在前人的基础上 |
| 15 | + |
| 16 | +下面是已有的优秀文章: |
| 17 | + - [深入浅出 LLVM之 Value 、User 、Use 源码解析](https://zhuanlan.zhihu.com/p/666016704) |
| 18 | + - [LLVM笔记(16) - IR基础详解(一) underlying class](https://www.cnblogs.com/Five100Miles/p/14083814.html) |
| 19 | + |
| 20 | +本文是对上述文章的总结和补充,用来加深理解,如有不对之处请指出。前人说过的部分我这里就不。 |
| 21 | + |
| 22 | +源码llvm17.0.6 |
| 23 | + |
| 24 | +## 概要 |
| 25 | + |
| 26 | +- |
| 27 | + |
| 28 | +## 一切皆value |
| 29 | + |
| 30 | +我们看下`llvm::value`的定义: |
| 31 | + |
| 32 | +```c++ |
| 33 | +/// LLVM Value Representation |
| 34 | +/// |
| 35 | +/// This is a very important LLVM class. It is the base class of all values |
| 36 | +/// computed by a program that may be used as operands to other values. Value is |
| 37 | +/// the super class of other important classes such as Instruction and Function. |
| 38 | +/// All Values have a Type. Type is not a subclass of Value. Some values can |
| 39 | +/// have a name and they belong to some Module. Setting the name on the Value |
| 40 | +/// automatically updates the module's symbol table. |
| 41 | +/// |
| 42 | +/// Every value has a "use list" that keeps track of which other Values are |
| 43 | +/// using this Value. A Value can also have an arbitrary number of ValueHandle |
| 44 | +/// objects that watch it and listen to RAUW and Destroy events. See |
| 45 | +/// llvm/IR/ValueHandle.h for details. |
| 46 | +class Value { |
| 47 | + Type *VTy; |
| 48 | + Use *UseList; |
| 49 | + const unsigned char SubclassID; // Subclass identifier (for isa/dyn_cast) |
| 50 | + unsigned char HasValueHandle : 1; // Has a ValueHandle pointing to this? |
| 51 | + unsigned short SubclassData; |
| 52 | + ... |
| 53 | +} |
| 54 | +``` |
| 55 | +通过value的注释,我们基本可以了解到: |
| 56 | + - `value`是llvm中的基类。比如用到最多的`Instruction`、还有`Function`、`BasicBllock`等等都是value,[下图](https://llvm.org/doxygen/classllvm_1_1Value.html)展示了`value`的继承类。 |
| 57 | + - 任何`Value`都有一个类型。并且有名字的`value`会自动注册到`module`的符号表中 |
| 58 | + - 通过添加一个`Use`类的`UseList`指针,用于跟踪使用过该`Value`的其他值。后面会详细介绍`Use`类的用法 |
| 59 | + - 另外一个重要的成员是SubclassID, 这是一个const值, 用来指示这个Value的子类型. 其用于isa<>与dyn_cast<>的判断.详细可以看网上关于llvm的RTTI介绍,比如这篇。 |
| 60 | +
|
| 61 | +  |
| 62 | +
|
| 63 | + 下面就是重点介绍User、Use类,大家通过上述的一些文章已经了解整体设计,或者有些懵懵懂懂的。 |
| 64 | +
|
| 65 | +别着急,我下面梳理下具体细节和总结点 |
| 66 | +
|
| 67 | + ## 初识User、Use类 |
| 68 | +
|
| 69 | +首先我们要理解llvm使用User、Use类的目的就是在生成Instruction的时候就建立好指令之间的User-Usee关系(还有BB间的关系)。有些编译器会先创建好ir后再通过遍历实现上述目的。这也是llvm设计的精巧之处,不然没有这么复杂o(* ̄︶ ̄*)o |
| 70 | +
|
| 71 | +带着这个目的,我们可以思考下: |
| 72 | + - 一条Instruction的User和Usee是什么?如下图1 |
| 73 | + - llvm如何生成一条指令的,并且建立好User-Usee关系? |
| 74 | + - 如何通过User找到所有Usee,或反向通过Usee找到所有的Users?双向的,如下图2 |
| 75 | +
|
| 76 | +图1: |
| 77 | +  |
| 78 | +
|
| 79 | +图2: |
| 80 | +  |
| 81 | +
|
| 82 | +### 一条Instruction的内存布局 |
| 83 | +
|
| 84 | +[深入浅出 LLVM之 Value 、User 、Use 源码解析](https://zhuanlan.zhihu.com/p/666016704)这篇文章的大佬在其中一章中对一条Instruction的创建和内存布局描述的很详细了,我这里就不再细说了。总结下: |
| 85 | +
|
| 86 | + - Instruction的继承关系:`Instruciotn <-- User <-- Value`。这其中User类的作用就是主导User(Value)和Uses的内存布局,也可以说建立好User(Value)->Usee链关系。 |
| 87 | + - 一个Instruction创建一个User和几个Use(operation)是固定的一块内存。是通过`operation new`和`placement new`自定义new的方式分配内存并初始化(之前介绍的llvm读文件的MemoryBuffer也是通过这种方式) |
| 88 | + - 这样设计的好处是User在寻找Usee时可以直接通过计算Use*偏移就可以得到第几个操作数了。不用维护链表所以我们看到User里面很干净,连Use的指针都不需要保存,也节省了空间。 |
| 89 | +
|
| 90 | +其中,有两种布局方式(这里的P就是Use): |
| 91 | + - a)固定数量的Use:`User::allocateFixedOperandUser`方法 |
| 92 | + - b)大数量的Use:`User::allocHungoffUses`方法 |
| 93 | +
|
| 94 | +``` |
| 95 | +Layout a) is modelled by prepending the User object by the Use[] array. |
| 96 | +...---.---.---.---.-------... |
| 97 | + | P | P | P | P | User |
| 98 | +'''---'---'---'---'-------''' |
| 99 | +``` |
| 100 | +
|
| 101 | +``` |
| 102 | +Layout b) is modelled by pointing at the Use[] array. |
| 103 | + |
| 104 | +.-------.------... |
| 105 | +| Use** | User |
| 106 | +'-------'------''' |
| 107 | + | |
| 108 | + v |
| 109 | + .---.---.---.---... |
| 110 | + | P | P | P | P | |
| 111 | + '---'---'---'---''' |
| 112 | + |
| 113 | +``` |
| 114 | +https://www.llvm.org/docs/ProgrammersManual.html#the-core-llvm-class-hierarchy-reference |
| 115 | +
|
| 116 | +### User-->Use |
| 117 | +
|
| 118 | +如下是`User`的定义和部分重要函数 |
| 119 | +
|
| 120 | + ```C++ |
| 121 | + class User : public Value { |
| 122 | +
|
| 123 | + LLVM_ATTRIBUTE_ALWAYS_INLINE static void * |
| 124 | + allocateFixedOperandUser(size_t, unsigned, unsigned); |
| 125 | +
|
| 126 | +protected: |
| 127 | + /// Allocate a User with an operand pointer co-allocated. |
| 128 | + /// |
| 129 | + /// This is used for subclasses which need to allocate a variable number |
| 130 | + /// of operands, ie, 'hung off uses'. |
| 131 | + void *operator new(size_t Size); |
| 132 | +
|
| 133 | + /// Allocate a User with the operands co-allocated. |
| 134 | + /// |
| 135 | + /// This is used for subclasses which have a fixed number of operands. |
| 136 | + void *operator new(size_t Size, unsigned Us); |
| 137 | +
|
| 138 | + /// Allocate a User with the operands co-allocated. If DescBytes is non-zero |
| 139 | + /// then allocate an additional DescBytes bytes before the operands. These |
| 140 | + /// bytes can be accessed by calling getDescriptor. |
| 141 | + /// |
| 142 | + /// DescBytes needs to be divisible by sizeof(void *). The allocated |
| 143 | + /// descriptor, if any, is aligned to sizeof(void *) bytes. |
| 144 | + /// |
| 145 | + /// This is used for subclasses which have a fixed number of operands. |
| 146 | + void *operator new(size_t Size, unsigned Us, unsigned DescBytes); |
| 147 | +
|
| 148 | + template <int Idx> Use &Op() { |
| 149 | + return OpFrom<Idx>(this); |
| 150 | + } |
| 151 | + template <int Idx> const Use &Op() const { |
| 152 | + return OpFrom<Idx>(this); |
| 153 | + } |
| 154 | +
|
| 155 | +private: |
| 156 | + const Use *getHungOffOperands() const { |
| 157 | + return *(reinterpret_cast<const Use *const *>(this) - 1); |
| 158 | + } |
| 159 | +
|
| 160 | + Use *&getHungOffOperands() { return *(reinterpret_cast<Use **>(this) - 1); } |
| 161 | +
|
| 162 | + const Use *getIntrusiveOperands() const { |
| 163 | + return reinterpret_cast<const Use *>(this) - NumUserOperands; |
| 164 | + } |
| 165 | +
|
| 166 | +public: |
| 167 | + const Use *getOperandList() const { |
| 168 | + return HasHungOffUses ? getHungOffOperands() : getIntrusiveOperands(); |
| 169 | + } |
| 170 | +
|
| 171 | + Value *getOperand(unsigned i) const { |
| 172 | + assert(i < NumUserOperands && "getOperand() out of range!"); |
| 173 | + return getOperandList()[i]; |
| 174 | + } |
| 175 | + Use &getOperandUse(unsigned i) { |
| 176 | + assert(i < NumUserOperands && "getOperandUse() out of range!"); |
| 177 | + return getOperandList()[i]; |
| 178 | + } |
| 179 | +
|
| 180 | + unsigned getNumOperands() const { return NumUserOperands; } |
| 181 | +
|
| 182 | +
|
| 183 | + // Methods for support type inquiry through isa, cast, and dyn_cast: |
| 184 | + static bool classof(const Value *V) { |
| 185 | + return isa<Instruction>(V) || isa<Constant>(V); |
| 186 | + } |
| 187 | +}; |
| 188 | + ``` |
| 189 | + |
| 190 | + 如何通过User找到Usee? |
| 191 | + |
| 192 | +原理就很简单了。由于uses的内存是固定分配好的,通过Use的首地址后计算index的偏移量,如下函数实现: |
| 193 | + |
| 194 | + - `getOperand`函数 |
| 195 | + - `Op<>()`函数(通过模块偏特化实现的,可以静态检查index数量) |
| 196 | + |
| 197 | +这里补充下`Op<-1>()`的实现(负索引代表从后往前),是通过模板偏特化实现,好处是可以静态检查index范围是否合法。但是由于proctect需要每个子类都要实现偏特化。如下代码: |
| 198 | + |
| 199 | +```c++ |
| 200 | + template <int Idx> Use &Op() { |
| 201 | + return OpFrom<Idx>(this); |
| 202 | + } |
| 203 | + |
| 204 | +template <int Idx, typename U> static Use &OpFrom(const U *that) { |
| 205 | + return Idx < 0 |
| 206 | + ? OperandTraits<U>::op_end(const_cast<U*>(that))[Idx] |
| 207 | + : OperandTraits<U>::op_begin(const_cast<U*>(that))[Idx]; |
| 208 | + } |
| 209 | + |
| 210 | +template <class> |
| 211 | +struct OperandTraits; |
| 212 | + |
| 213 | +template <> |
| 214 | +struct OperandTraits<BinaryOperator> : |
| 215 | + public FixedNumOperandTraits<BinaryOperator, 2> { |
| 216 | +}; |
| 217 | + |
| 218 | +template <typename SubClass, unsigned ARITY> |
| 219 | +struct FixedNumOperandTraits { |
| 220 | + static Use *op_begin(SubClass* U) { |
| 221 | + static_assert( |
| 222 | + !std::is_polymorphic<SubClass>::value, |
| 223 | + "adding virtual methods to subclasses of User breaks use lists"); |
| 224 | + return reinterpret_cast<Use*>(U) - ARITY; |
| 225 | + } |
| 226 | + static Use *op_end(SubClass* U) { |
| 227 | + return reinterpret_cast<Use*>(U); |
| 228 | + } |
| 229 | + static unsigned operands(const User*) { |
| 230 | + return ARITY; |
| 231 | + } |
| 232 | +}; |
| 233 | +``` |
| 234 | +
|
| 235 | + ## Use类的作用 |
| 236 | +
|
| 237 | +我们首先看下Use类的定义: |
| 238 | +
|
| 239 | +```c++ |
| 240 | +/// A Use represents the edge between a Value definition and its users. |
| 241 | +/// |
| 242 | +/// This is notionally a two-dimensional linked list. It supports traversing |
| 243 | +/// all of the uses for a particular value definition. It also supports jumping |
| 244 | +/// directly to the used value when we arrive from the User's operands, and |
| 245 | +/// jumping directly to the User when we arrive from the Value's uses. |
| 246 | +class Use { |
| 247 | +... |
| 248 | +private: |
| 249 | + Value *Val = nullptr; |
| 250 | + Use *Next = nullptr; |
| 251 | + Use **Prev = nullptr; |
| 252 | + User *Parent = nullptr; |
| 253 | +
|
| 254 | + void addToList(Use **List) { |
| 255 | + Next = *List; |
| 256 | + if (Next) |
| 257 | + Next->Prev = &Next; |
| 258 | + Prev = List; |
| 259 | + *Prev = this; |
| 260 | + } |
| 261 | +
|
| 262 | + void removeFromList() { |
| 263 | + *Prev = Next; |
| 264 | + if (Next) |
| 265 | + Next->Prev = Prev; |
| 266 | + } |
| 267 | +}; |
| 268 | +``` |
| 269 | + - 我这里把Use理解成槽的概念,一个User(Instruction)创建出的几个Use(operation)是固定的一块内存,里面具体填什么value可以随时替换 |
| 270 | + |
| 271 | + ## 引用 |
| 272 | + |
| 273 | + - http://www.cs.toronto.edu/~pekhimenko/courses/cscd70-w18/docs/Tutorial%202%20-%20Intro%20to%20LLVM%20(Cont).pdf |
0 commit comments