Reorder scheduling order of MBBs#993
Conversation
ca5bced to
78f97ca
Compare
| ; loop: inner loop %for.body55.i82 and outer loop body %for.body.i68. | ||
| ; | ||
| ; RUN: llc -mtriple=aie2ps -verify-machineinstrs < %s | FileCheck %s | ||
|
|
| }; | ||
|
|
||
| // Phase 1: (single-block) loops first, so they aren't constrained by | ||
| // their epilogues. |
There was a problem hiding this comment.
nit: Loops will not be constrained anyway. The real reason is that the loop schedule should be taken into account by prologue epiloguie.
| // post-order. post_order(E) ends with E itself, so E is scheduled AFTER | ||
| // everything reachable from it (its successors are already done -> precise | ||
| // inter-block latency for E) and BEFORE its non-loop predecessors. The | ||
| // latter is what keeps MaxLatencyFinder in precise mode for SWP prologues |
There was a problem hiding this comment.
I think prologues are irrelevant, except that it is frequently a successor of the epilogue. The goal is to bring the epilogue in precise mode.
There was a problem hiding this comment.
I have this feeling as well.
This could be:
- insert loops
- insert non-epilogues
- insert epilogues.
We could also insert all blocks and reorder them with a custom comparison function Loop < Non-epilogue < Epilogue
| for (auto &[MBB, BS] : Blocks) { | ||
| if (BS.Kind == BlockType::Loop) { | ||
| MBBSequence.push_back(MBB); | ||
| Push(MBB); |
There was a problem hiding this comment.
We push loop in pointer order, which is not deterministic. It's not performance critical to run over MF->Blocks and do the lookup for the blockstate.
| ; | ||
| ; RUN: llc -mtriple=aie2ps -verify-machineinstrs < %s | FileCheck %s | ||
|
|
||
| source_filename = "Work/aie/0_0/src/0_0.cc" |
There was a problem hiding this comment.
You can remove this metadata,
| target triple = "aie2ps-none-unknown-elf" | ||
| $_Z6conv2dILh1EL5act_t0EaaaLb0ELb1EEvPT1_PT2_PT3_R13conv2d_paramsS6_S6_ = comdat any | ||
|
|
||
| ; Function Attrs: mustprogress |
There was a problem hiding this comment.
nit: rename the function.
| !18 = !{!8, !10, i64 84} | ||
| !19 = !{!8, !10, i64 120} | ||
| !20 = !{!21} | ||
| !21 = distinct !{!21, !22, !"_Z18conv2d_generic_optILh1EL5act_t0EL10out_mode_t0EaaaEvPiS2_S2_S2_S2_R13conv2d_params: %input"} |
There was a problem hiding this comment.
nit: rename this. See end-to-end tests.
| ; CHECK-NEXT: padda [p1], m0; movs dn4, r27; mov m5, r19 // Delay Slot 4 | ||
| ; CHECK-NEXT: vlda.ups.2x cml0, s0, upssign1, [p1, #0]; movs dc4, r5; mov m0, r21 // Delay Slot 3 | ||
| ; CHECK-NEXT: padda [p1], m7; paddb.3d [p0], d0; padds [p6], m5 // Delay Slot 2 | ||
| ; CHECK-NEXT: vlda.ups.2x cmh0, s0, upssign1, [p1, #0]; paddb.3d [p6], d1; movx srssign0, #0; mov r5, dc4 // Delay Slot 1 |
There was a problem hiding this comment.
This is flying high :-)
|
This looks good! Some nits and suggestions! |
d94d0e7 to
789b8c6
Compare
789b8c6 to
f147f3e
Compare
Provide accurate scoreboard information for epilogues by scheduling its' prologues beforehand.
Awaiting QoR.