Skip to content

Commit 475a559

Browse files
committed
more poly stuff
1 parent b7f7d5b commit 475a559

File tree

11 files changed

+435
-0
lines changed

11 files changed

+435
-0
lines changed

lectures/08_classic_loop_ops.qmd

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
---
2+
execute:
3+
echo: true
4+
format:
5+
html: default
6+
revealjs:
7+
chalkboard: true
8+
code-fold: true
9+
code-line-numbers: true
10+
echo: true
11+
mathjax: true
12+
output-file: revealjs_08_classic_loop_ops.qmd
13+
scrollable: true
14+
slideNumber: c/t
15+
sidebar: false
16+
title: classic loop optimizations
17+
18+
---
19+
20+
Loops optimizations are important because
21+
22+
1) typically there is a regular access pattern
23+
1) the body of a loop gets repeated
24+
1) compilers often assume $10^{depth}$ times
25+
26+
27+
What are classic loop optimizations?
28+
29+
1) Loop Invariant Code Motion
30+
1) Induction Variable Recognition
31+
1) Strength Reduction
32+
1) Linear Test Replacement
33+
1) Loop Unrolling
34+
35+
Less classic loop optimizations
36+
37+
1) Scalar replacement
38+
1) Loop Interchange
39+
1) Loop Fusion
40+
1) Loop Distribution (also known as Fision
41+
1) Loop Skewing
42+
1) Loop Reversal
43+
44+
45+
First recall natural loops
46+
47+
1) strongly connected region in the cfg
48+
1) one entry point (dominates all the nodes in the loop)
49+
50+
def of loop invariant for an instruction d = op a,b
51+
52+
1) a,b are constants or,
53+
1) a,b defined outside the loop
54+
1) a,b are loop invariants
55+
56+
in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop
57+
58+
59+
60+
61+
62+
conditions when moving an instruction d = a op b is ok
63+
64+
```
65+
L0: d = 0
66+
preheader:
67+
L1: i = i + 1
68+
d = a ⊕ b
69+
= d
70+
if (i<N) goto L1
71+
L2: x = d
72+
```
73+
74+
can move d
75+
76+
77+
L0: d = 0
78+
preheader
79+
L1: if (i>=N) goto L2
80+
i = i + 1
81+
d = a ⊕ b
82+
= d
83+
goto L1
84+
L2: x = d
85+
```
86+
87+
no good d used after the loop, would not be changed if the loop executes zero times
88+
89+
```
90+
L0: d = 0
91+
preheader
92+
L1: i = i + 1
93+
d = a ⊕ b
94+
= d
95+
d = 0
96+
= d
97+
if (i<N) goto L1
98+
L2: L0: d = 0
99+
```
100+
101+
no good d reassigned in the loop, do invar would be changed
102+
103+
```
104+
l0: d = 0
105+
preheader
106+
L1: = d
107+
i = i + 1
108+
d = a ⊕ b
109+
= d
110+
if (i<N) goto L1
111+
L2: x = d
112+
113+
114+
conditions without SSA
115+
116+
1) the instruction dominates all the loop exits, where d is still live
117+
1) d is only defined once
118+
1) d in not live before the instruction
119+
120+
in SSA
121+
122+
1) is d is live in some block after the loop, then d has to dominate that block
123+
2) clear
124+
3) clear
125+
126+
127+
Suppose the loop might run zero times
128+
129+
```
130+
while (e) {
131+
j = loopinv // may never execute
132+
S
133+
}
134+
135+
j = loopinv // always executes
136+
while (e) {
137+
S
138+
}
139+
```
140+
141+
can be converted into
142+
```
143+
if (e) {
144+
j = loopinv // may never execute
145+
while (e) {
146+
S
147+
}
148+
149+
}
150+
````
151+
152+
153+
## induction variable elimination
154+
155+
```
156+
for (int i = 0; i < 100; ++1){
157+
f(a[i])
158+
}
159+
```
160+
161+
calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4
162+
163+
1) a_i = &a[0] before the loop
164+
1) a_i = a_i + 4 (add the stride) in every iteration
165+
1) the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)
166+
167+
168+
steps
169+
170+
1find basic induction variables
171+
i = i + e, where e is loop invariant
172+
173+
what does this look like in ssa
174+
175+
```
176+
loop header:
177+
i1 = phi(i0, i2)
178+
loop body:
179+
i2 = i1 + e
180+
```
181+
182+
183+
184+
```
185+
loop header:
186+
i1 = phi(i0, i2)
187+
loop body:
188+
a0 = i1 + e
189+
i2 = a0 + e1
190+
```
191+
192+
for each instruction d = c +- loop invariant
193+
see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions
194+
195+
Step 2 find auxiliary induction variables
196+
197+
j = basic_ind * loop inv + loop invar
198+
199+
```
200+
for (int i = 0; i < n; i++) {
201+
j = 2*i + 1; // Y
202+
k = -i; // Y
203+
l = 2*i*i + 1; // N
204+
c = c + 5; // Y*
205+
}
206+
```
207+
208+
step 3 replace auxiliary induction variables (derived ) by new variables without the multiply
209+
210+
step4 if the only remaining use of the induction variable is the termination test, change the test to use the new variable
211+
212+
```
213+
sum = 0
214+
for (i = 1, i < 100; i++) {
215+
sum = sum + a[i -1]
216+
}
217+
```
218+
219+
in SSA form:
220+
221+
```
222+
sum0 = 0
223+
i0 = 1
224+
L: sum1 = phi(sum0, sum2)
225+
i1 = phi(i0, i2)
226+
t10 = i1 -1
227+
t20 = t10 * 4
228+
t30 = t20 + &a
229+
t40 = load t30
230+
sum2 = sum1 + t40
231+
i2 = i1 + 1
232+
if (i2 <= 100)go to l
233+
```
234+
235+
1) i is a basic induction variable
236+
1) t10 is a aux induction variable
237+
1) t20 is an aux induction variable
238+
1) t30 is an aux induction variable
239+
240+
t3 has a use in the load
241+
242+
t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a
243+
244+
t3 = 4* i1 + &a - 4
245+
246+
247+
```
248+
sum0 = 0
249+
i0 = 1
250+
t50 = &a -4 // initial value
251+
L: sum1 = phi(sum0, sum2)
252+
i1 = phi(i0, i2)
253+
t51 = phi(t50, t52)
254+
//t10 = i1 -1
255+
//t20 = t10 * 4
256+
//t30 = t20 + &a
257+
t40 = load t50
258+
sum2 = sum1 + t40
259+
i2 = i1 + 1
260+
t52 = t50 + 4
261+
if (i2 <= 100)go to l
262+
```
263+
264+
265+
266+
```
267+
sum0 = 0
268+
i0 = 1
269+
t50 = &a -4 // initial value
270+
L: sum1 = phi(sum0, sum2)
271+
// i1 = phi(i0, i2)
272+
t51 = phi(t50, t52)
273+
//t10 = i1 -1
274+
//t20 = t10 * 4
275+
//t30 = t20 + &a
276+
t40 = load t50
277+
sum2 = sum1 + t40
278+
//i2 = i1 + 1
279+
t52 = t50 + 4
280+
if (t52 <= 396 + &a )go to l
281+
```
282+
283+
284+
## loop un-switching
285+
286+
```
287+
for (int i = 0 ; i < 100; ++1){
288+
if (c) { // c is loop invariant
289+
f(i)
290+
} else {
291+
g(i)
292+
}
293+
}
294+
```
295+
296+
look for special patterns and replace
297+
298+
299+
```
300+
if (c) { // c is loop invariant
301+
for (int i = 0 ; i < 100; ++1){
302+
f(i)
303+
}
304+
}else {
305+
for (int i = 0 ; i < 100; ++1){
306+
g(i)
307+
}
308+
}
309+
```
310+
311+
This is often done before vectorization
312+
313+
314+
315+
loop fusion
316+
```
317+
for (i = 0; i < 100 ; ++){
318+
s0: b[i] = f(a[i])
319+
}
320+
for (i = 0; i < 100 ; ++){
321+
s1: c[i] = f(b[i])
322+
}
323+
```
324+
325+
1) when is it legal to do this?
326+
1) When can we get rid of the b array?
327+
328+
There is also an optimization that goes the other way
329+
split a loop so that each statement becomes a separate loop incase we could run as vectors
330+
331+
These sort of loop optimizations would make good projects

0 commit comments

Comments
 (0)