Skip to content

Commit b7f7d5b

Browse files
committed
add lcm loops stuff
1 parent 35e740f commit b7f7d5b

24 files changed

+295
-382
lines changed
76.9 KB
Loading
28 KB
Loading
76.9 KB
Loading
28 KB
Loading

lectures/05b_licm.qmd

Lines changed: 295 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -13,65 +13,33 @@ format:
1313
scrollable: true
1414
slideNumber: c/t
1515
sidebar: false
16-
title: loop invariant code motion
16+
title: classic loop optimizations
1717

1818
---
1919

20-
## licm
2120

22-
Loop invariant code motion recognizes computations in loop that produce the same value on each iteration and moves them out of the loop.
2321

24-
## A very common case for this is matrix addressing
22+
Loop optimizations are important because
2523

26-
a[i,j] might expand to to $i*4*\operatorname{stride_{a}} + j *4$
24+
1. typically there is a regular access pattern
25+
1. the body of a loop gets repeated
26+
1. compilers often assume $10^{depth}$ times
2727

28-
```
29-
for j
30-
a[i,j] = f(a[i,j+1])
31-
```
3228

33-
turns into
29+
## What are classic loop optimizations?
3430

31+
1. Loop Invariant Code Motion
32+
1. Induction Variable Recognition
33+
1. Strength Reduction
34+
1. Linear Test Replacement
35+
1. Loop Unrolling
3536

36-
```
37-
a =
38-
b =
39-
result = 0
40-
for (){
41-
result += a*b
42-
}
43-
```
44-
45-
## when is a computation loop invariant
46-
47-
f(a,b)
37+
## Loop Invariant Code Motion
4838

49-
1. a,b are constants
50-
1. a,b are defined outside the loop,
51-
1. a,b are loop invariant
39+
recall natural loops
5240

53-
if in ssa there is only one reaching def
54-
55-
for non ssa, we need
56-
57-
1. all reaching defs of a,b defined outside of the loop
58-
1. only one def of a,b reaches the computation and that def is loop invariant
59-
60-
## when is a computation loop invariant
61-
62-
```
63-
a defined outside of the loop (need one definition or ssa )
64-
65-
b = a*2
66-
67-
d = 5 + b is loop invariant if b
68-
```
69-
70-
In ssa a variable is loop invariant if it does not have a phi function at the header of the
71-
loop, or it is defined by a phi function and all the inputs are loop constants
72-
73-
74-
SSA If we find a loop-invariant computation in SSA form, then we just move it out of the loop to a block before the loop. When moving a (side effect-free) SSA loop-invariant computation to a previous position, nothing can go wrong, because the value it computes cannot be overwritten later and the values it depends on cannot have been changed
41+
1. strongly connected region in the cfg
42+
1. one entry point (dominates all the nodes in the loop)
7543

7644

7745

@@ -229,6 +197,45 @@ posttest --> next
229197
:::
230198

231199

200+
201+
202+
203+
In ssa a variable is loop invariant if it does not have a phi function at the header of the
204+
loop, or it is defined by a phi function and all the inputs are loop constants
205+
206+
207+
SSA If we find a loop-invariant computation in SSA form, then we just move it out of the loop to a block before the loop. When moving a (side effect-free) SSA loop-invariant computation to a previous position, nothing can go wrong, because the value it computes cannot be overwritten later and the values it depends on cannot have been changed
208+
209+
210+
211+
## licm
212+
213+
Loop invariant code motion recognizes computations in loop that produce the same value on each iteration and moves them out of the loop.
214+
215+
## A very common case for this is matrix addressing
216+
217+
a[i,j] might expand to to $i*4*\operatorname{stride_{a}} + j *4$
218+
219+
```
220+
for j
221+
a[i,j] = f(a[i,j+1])
222+
```
223+
224+
turns into
225+
226+
227+
```
228+
a =
229+
b =
230+
result = 0
231+
for (){
232+
result += a*b
233+
}
234+
```
235+
236+
237+
238+
232239
## when is ok to move a computation
233240

234241
1. no side effects - cannot move alloc 10 outside of loop;
@@ -303,4 +310,244 @@ iterate to convergence
303310
we want to process inner loops first
304311

305312
1. add all the pre-headers
306-
1. walk the dominator tree in reverse post order - saving all the loop headers
313+
1. walk the dominator tree in reverse post order - saving all the loop headers
314+
315+
316+
317+
## def of loop invariant for an instruction d = op a,b
318+
319+
1. a,b are constants or,
320+
1. a,b defined outside the loop
321+
1. a,b are loop invariants
322+
323+
in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop
324+
325+
326+
327+
## test at the bottom - loop always executes at least once
328+
```{mermaid}
329+
%%{init: {"flowchart": {"htmlLabels": false}} }%%
330+
331+
graph TD
332+
l0["l0:"]
333+
pre["preheader"]
334+
l1["l1: i = i +1"]
335+
d1["d1 = a ⊕ b"]
336+
d2[" = d1"]
337+
l0-->pre
338+
pre --> l1
339+
l1--> d1
340+
d1--> d2
341+
d3["(i < N) goto L1"]
342+
d2--> d3
343+
d3--> l1
344+
d3--> Next
345+
```
346+
can move d
347+
348+
## test at the top
349+
350+
351+
```{mermaid}
352+
%%{init: {"flowchart": {"htmlLabels": false}} }%%
353+
graph TD
354+
l0["l0:d = 0"]
355+
pre["preheader"]
356+
l1["if (i>=N) goto L2 \nl1: i = i +1"]
357+
l0-->pre
358+
pre --> l1
359+
d1["d1 = a ⊕ b"]
360+
d2[" = d1"]
361+
l1--> d1
362+
d1--> d2
363+
d2--> l1
364+
l1--> l2
365+
l2--> next
366+
367+
```
368+
369+
we can always convert into
370+
371+
```
372+
373+
loop test at top
374+
375+
376+
if (test) {
377+
preheader
378+
loop test at bottom
379+
}
380+
```
381+
382+
383+
384+
## induction variable elimination
385+
386+
```
387+
for (int i = 0; i < 100; ++1){
388+
f(a[i])
389+
}
390+
```
391+
392+
calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4
393+
394+
We want to change the multiply and add to an add
395+
396+
## Transformation
397+
398+
1. a_i = &a[0] before the loop
399+
1. a_i = a_i + 4 (add the stride) in every iteration
400+
1. the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)
401+
402+
403+
## step 1
404+
405+
1. find basic induction variables
406+
i = i + e, where e is loop invariant
407+
408+
what does this look like in ssa
409+
410+
411+
for each instruction d = c +- loop invariant
412+
see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions
413+
414+
## Step 2 find auxiliary induction variables
415+
416+
j = basic_ind * loop inv + loop invar
417+
418+
```
419+
for (int i = 0; i < n; i++) {
420+
j = 2*i + 1; // Y
421+
k = -i; // Y
422+
l = 2*i*i + 1; // N
423+
c = c + 5; // Y*
424+
}
425+
```
426+
427+
## step 3
428+
429+
replace auxiliary induction variables (derived ) by new variables without the multiply
430+
431+
## step4
432+
433+
if the only remaining use of the induction variable is the termination test, change the test to use the new variable
434+
435+
```
436+
sum = 0
437+
for (i = 1, i < 100; i++) {
438+
sum = sum + a[i -1]
439+
}
440+
```
441+
442+
in SSA form:
443+
444+
```
445+
sum0 = 0
446+
i0 = 1
447+
L: sum1 = phi(sum0, sum2)
448+
i1 = phi(i0, i2)
449+
t10 = i1 -1
450+
t20 = t10 * 4
451+
t30 = t20 + &a
452+
t40 = load t30
453+
sum2 = sum1 + t40
454+
i2 = i1 + 1
455+
if (i2 <= 100)go to l
456+
```
457+
458+
1) i is a basic induction variable
459+
1) t10 is a aux induction variable
460+
1) t20 is an aux induction variable
461+
1) t30 is an aux induction variable
462+
463+
t3 has a use in the load
464+
465+
t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a
466+
467+
t3 = 4* i1 + &a - 4
468+
469+
470+
```
471+
sum0 = 0
472+
i0 = 1
473+
t50 = &a -4 // initial value
474+
L: sum1 = phi(sum0, sum2)
475+
i1 = phi(i0, i2)
476+
t51 = phi(t50, t52)
477+
//t10 = i1 -1
478+
//t20 = t10 * 4
479+
//t30 = t20 + &a
480+
t40 = load t50
481+
sum2 = sum1 + t40
482+
i2 = i1 + 1
483+
t52 = t50 + 4
484+
if (i2 <= 100)go to l
485+
```
486+
487+
488+
489+
```
490+
sum0 = 0
491+
i0 = 1
492+
t50 = &a -4 // initial value
493+
L: sum1 = phi(sum0, sum2)
494+
// i1 = phi(i0, i2)
495+
t51 = phi(t50, t52)
496+
//t10 = i1 -1
497+
//t20 = t10 * 4
498+
//t30 = t20 + &a
499+
t40 = load t50
500+
sum2 = sum1 + t40
501+
//i2 = i1 + 1
502+
t52 = t50 + 4
503+
if (t52 <= 396 + &a )go to l
504+
```
505+
506+
507+
## loop un-switching
508+
509+
```
510+
for (int i = 0 ; i < 100; ++1){
511+
if (c) { // c is loop invariant
512+
f(i)
513+
} else {
514+
g(i)
515+
}
516+
}
517+
```
518+
519+
look for special patterns and replace
520+
521+
522+
```
523+
if (c) { // c is loop invariant
524+
for (int i = 0 ; i < 100; ++1){
525+
f(i)
526+
}
527+
}else {
528+
for (int i = 0 ; i < 100; ++1){
529+
g(i)
530+
}
531+
}
532+
```
533+
534+
This is often done before vectorization
535+
536+
537+
538+
loop fusion
539+
```
540+
for (i = 0; i < 100 ; ++){
541+
s0: b[i] = f(a[i])
542+
}
543+
for (i = 0; i < 100 ; ++){
544+
s1: c[i] = f(b[i])
545+
}
546+
```
547+
548+
1) when is it legal to do this?
549+
1) When can we get rid of the b array?
550+
551+
There is also an optimization that goes the other way
552+
split a loop so that each statement becomes a separate loop incase we could run as vectors
553+

0 commit comments

Comments
 (0)