You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loop invariant code motion recognizes computations in loop that produce the same value on each iteration and moves them out of the loop.
23
21
24
-
## A very common case for this is matrix addressing
22
+
Loop optimizations are important because
25
23
26
-
a[i,j] might expand to to $i*4*\operatorname{stride_{a}} + j *4$
24
+
1. typically there is a regular access pattern
25
+
1. the body of a loop gets repeated
26
+
1. compilers often assume $10^{depth}$ times
27
27
28
-
```
29
-
for j
30
-
a[i,j] = f(a[i,j+1])
31
-
```
32
28
33
-
turns into
29
+
## What are classic loop optimizations?
34
30
31
+
1. Loop Invariant Code Motion
32
+
1. Induction Variable Recognition
33
+
1. Strength Reduction
34
+
1. Linear Test Replacement
35
+
1. Loop Unrolling
35
36
36
-
```
37
-
a =
38
-
b =
39
-
result = 0
40
-
for (){
41
-
result += a*b
42
-
}
43
-
```
44
-
45
-
## when is a computation loop invariant
46
-
47
-
f(a,b)
37
+
## Loop Invariant Code Motion
48
38
49
-
1. a,b are constants
50
-
1. a,b are defined outside the loop,
51
-
1. a,b are loop invariant
39
+
recall natural loops
52
40
53
-
if in ssa there is only one reaching def
54
-
55
-
for non ssa, we need
56
-
57
-
1. all reaching defs of a,b defined outside of the loop
58
-
1. only one def of a,b reaches the computation and that def is loop invariant
59
-
60
-
## when is a computation loop invariant
61
-
62
-
```
63
-
a defined outside of the loop (need one definition or ssa )
64
-
65
-
b = a*2
66
-
67
-
d = 5 + b is loop invariant if b
68
-
```
69
-
70
-
In ssa a variable is loop invariant if it does not have a phi function at the header of the
71
-
loop, or it is defined by a phi function and all the inputs are loop constants
72
-
73
-
74
-
SSA If we find a loop-invariant computation in SSA form, then we just move it out of the loop to a block before the loop. When moving a (side effect-free) SSA loop-invariant computation to a previous position, nothing can go wrong, because the value it computes cannot be overwritten later and the values it depends on cannot have been changed
41
+
1. strongly connected region in the cfg
42
+
1. one entry point (dominates all the nodes in the loop)
75
43
76
44
77
45
@@ -229,6 +197,45 @@ posttest --> next
229
197
:::
230
198
231
199
200
+
201
+
202
+
203
+
In ssa a variable is loop invariant if it does not have a phi function at the header of the
204
+
loop, or it is defined by a phi function and all the inputs are loop constants
205
+
206
+
207
+
SSA If we find a loop-invariant computation in SSA form, then we just move it out of the loop to a block before the loop. When moving a (side effect-free) SSA loop-invariant computation to a previous position, nothing can go wrong, because the value it computes cannot be overwritten later and the values it depends on cannot have been changed
208
+
209
+
210
+
211
+
## licm
212
+
213
+
Loop invariant code motion recognizes computations in loop that produce the same value on each iteration and moves them out of the loop.
214
+
215
+
## A very common case for this is matrix addressing
216
+
217
+
a[i,j] might expand to to $i*4*\operatorname{stride_{a}} + j *4$
218
+
219
+
```
220
+
for j
221
+
a[i,j] = f(a[i,j+1])
222
+
```
223
+
224
+
turns into
225
+
226
+
227
+
```
228
+
a =
229
+
b =
230
+
result = 0
231
+
for (){
232
+
result += a*b
233
+
}
234
+
```
235
+
236
+
237
+
238
+
232
239
## when is ok to move a computation
233
240
234
241
1. no side effects - cannot move alloc 10 outside of loop;
@@ -303,4 +310,244 @@ iterate to convergence
303
310
we want to process inner loops first
304
311
305
312
1. add all the pre-headers
306
-
1. walk the dominator tree in reverse post order - saving all the loop headers
313
+
1. walk the dominator tree in reverse post order - saving all the loop headers
314
+
315
+
316
+
317
+
## def of loop invariant for an instruction d = op a,b
318
+
319
+
1. a,b are constants or,
320
+
1. a,b defined outside the loop
321
+
1. a,b are loop invariants
322
+
323
+
in SSA form if we find a loop invariant instruction we can always move it into the pre-header, because the value it writes is never rewritten, and the values that it depends on come from outside the loop
324
+
325
+
326
+
327
+
## test at the bottom - loop always executes at least once
328
+
```{mermaid}
329
+
%%{init: {"flowchart": {"htmlLabels": false}} }%%
330
+
331
+
graph TD
332
+
l0["l0:"]
333
+
pre["preheader"]
334
+
l1["l1: i = i +1"]
335
+
d1["d1 = a ⊕ b"]
336
+
d2[" = d1"]
337
+
l0-->pre
338
+
pre --> l1
339
+
l1--> d1
340
+
d1--> d2
341
+
d3["(i < N) goto L1"]
342
+
d2--> d3
343
+
d3--> l1
344
+
d3--> Next
345
+
```
346
+
can move d
347
+
348
+
## test at the top
349
+
350
+
351
+
```{mermaid}
352
+
%%{init: {"flowchart": {"htmlLabels": false}} }%%
353
+
graph TD
354
+
l0["l0:d = 0"]
355
+
pre["preheader"]
356
+
l1["if (i>=N) goto L2 \nl1: i = i +1"]
357
+
l0-->pre
358
+
pre --> l1
359
+
d1["d1 = a ⊕ b"]
360
+
d2[" = d1"]
361
+
l1--> d1
362
+
d1--> d2
363
+
d2--> l1
364
+
l1--> l2
365
+
l2--> next
366
+
367
+
```
368
+
369
+
we can always convert into
370
+
371
+
```
372
+
373
+
loop test at top
374
+
375
+
376
+
if (test) {
377
+
preheader
378
+
loop test at bottom
379
+
}
380
+
```
381
+
382
+
383
+
384
+
## induction variable elimination
385
+
386
+
```
387
+
for (int i = 0; i < 100; ++1){
388
+
f(a[i])
389
+
}
390
+
```
391
+
392
+
calculate a[i] as: &a[0] + 4 * i in every loop iteration, but the values at each step only differ by 4
393
+
394
+
We want to change the multiply and add to an add
395
+
396
+
## Transformation
397
+
398
+
1. a_i = &a[0] before the loop
399
+
1. a_i = a_i + 4 (add the stride) in every iteration
400
+
1. the only remaining use of i is the test i < 100, which could become a_i < &a[0] + 4*100 (which is loop invariant)
401
+
402
+
403
+
## step 1
404
+
405
+
1. find basic induction variables
406
+
i = i + e, where e is loop invariant
407
+
408
+
what does this look like in ssa
409
+
410
+
411
+
for each instruction d = c +- loop invariant
412
+
see if there is a strongly connected graph in the ssa edges that only has adds and subtracts of loop invariant expressions
413
+
414
+
## Step 2 find auxiliary induction variables
415
+
416
+
j = basic_ind * loop inv + loop invar
417
+
418
+
```
419
+
for (int i = 0; i < n; i++) {
420
+
j = 2*i + 1; // Y
421
+
k = -i; // Y
422
+
l = 2*i*i + 1; // N
423
+
c = c + 5; // Y*
424
+
}
425
+
```
426
+
427
+
## step 3
428
+
429
+
replace auxiliary induction variables (derived ) by new variables without the multiply
430
+
431
+
## step4
432
+
433
+
if the only remaining use of the induction variable is the termination test, change the test to use the new variable
434
+
435
+
```
436
+
sum = 0
437
+
for (i = 1, i < 100; i++) {
438
+
sum = sum + a[i -1]
439
+
}
440
+
```
441
+
442
+
in SSA form:
443
+
444
+
```
445
+
sum0 = 0
446
+
i0 = 1
447
+
L: sum1 = phi(sum0, sum2)
448
+
i1 = phi(i0, i2)
449
+
t10 = i1 -1
450
+
t20 = t10 * 4
451
+
t30 = t20 + &a
452
+
t40 = load t30
453
+
sum2 = sum1 + t40
454
+
i2 = i1 + 1
455
+
if (i2 <= 100)go to l
456
+
```
457
+
458
+
1) i is a basic induction variable
459
+
1) t10 is a aux induction variable
460
+
1) t20 is an aux induction variable
461
+
1) t30 is an aux induction variable
462
+
463
+
t3 has a use in the load
464
+
465
+
t3 = t20 + &a ==> t10 * 4 + &a ==> (i1-1)* 4+ &a
466
+
467
+
t3 = 4* i1 + &a - 4
468
+
469
+
470
+
```
471
+
sum0 = 0
472
+
i0 = 1
473
+
t50 = &a -4 // initial value
474
+
L: sum1 = phi(sum0, sum2)
475
+
i1 = phi(i0, i2)
476
+
t51 = phi(t50, t52)
477
+
//t10 = i1 -1
478
+
//t20 = t10 * 4
479
+
//t30 = t20 + &a
480
+
t40 = load t50
481
+
sum2 = sum1 + t40
482
+
i2 = i1 + 1
483
+
t52 = t50 + 4
484
+
if (i2 <= 100)go to l
485
+
```
486
+
487
+
488
+
489
+
```
490
+
sum0 = 0
491
+
i0 = 1
492
+
t50 = &a -4 // initial value
493
+
L: sum1 = phi(sum0, sum2)
494
+
// i1 = phi(i0, i2)
495
+
t51 = phi(t50, t52)
496
+
//t10 = i1 -1
497
+
//t20 = t10 * 4
498
+
//t30 = t20 + &a
499
+
t40 = load t50
500
+
sum2 = sum1 + t40
501
+
//i2 = i1 + 1
502
+
t52 = t50 + 4
503
+
if (t52 <= 396 + &a )go to l
504
+
```
505
+
506
+
507
+
## loop un-switching
508
+
509
+
```
510
+
for (int i = 0 ; i < 100; ++1){
511
+
if (c) { // c is loop invariant
512
+
f(i)
513
+
} else {
514
+
g(i)
515
+
}
516
+
}
517
+
```
518
+
519
+
look for special patterns and replace
520
+
521
+
522
+
```
523
+
if (c) { // c is loop invariant
524
+
for (int i = 0 ; i < 100; ++1){
525
+
f(i)
526
+
}
527
+
}else {
528
+
for (int i = 0 ; i < 100; ++1){
529
+
g(i)
530
+
}
531
+
}
532
+
```
533
+
534
+
This is often done before vectorization
535
+
536
+
537
+
538
+
loop fusion
539
+
```
540
+
for (i = 0; i < 100 ; ++){
541
+
s0: b[i] = f(a[i])
542
+
}
543
+
for (i = 0; i < 100 ; ++){
544
+
s1: c[i] = f(b[i])
545
+
}
546
+
```
547
+
548
+
1) when is it legal to do this?
549
+
1) When can we get rid of the b array?
550
+
551
+
There is also an optimization that goes the other way
552
+
split a loop so that each statement becomes a separate loop incase we could run as vectors
0 commit comments