-
Notifications
You must be signed in to change notification settings - Fork 32
Martien.physreg liveranges #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
| return true; | ||
|
|
||
| // Check if Reg1 is a sub-register of Reg2 | ||
| for (MCSubRegIterator SubRegIt(Reg2, TRI, /*IncludeSelf=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think about using RegUnits then we do not have to iterate through sub and super registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved in a trailing commit.
cf979b2 to
cc8fca8
Compare
Also for virtual registers. For architectures where latencies can go negative, this has impact on RecMII
This is abstracting the live ranges to be used by PostRegAlloc
This module analyses live ranges of physical registers that can be safely reallocated in a basic block. It supplies facilities to rewrite to virtual registers and to restore the original allocation.
This module produces an EventSchedule from the instructions and their issue cycle. The event schedule contains the read and write events of the virtual registers occuring in the instructions ordered in the processor pipeline stage timeline. From the EventSchedule the modulo liveranges for a particular II can be constructed. These represent the lanes of each register that are live at a particular point.
This is a dedicated register allocator for use by the postpipeliner
cc8fca8 to
33264be
Compare
| /// register. There may be multiple current definitions for a register with | ||
| /// disjunct lanemasks. | ||
| VReg2SUnitMultiMap CurrentVRegDefs; | ||
| VReg2SUnitOperIdxMultiMap CurrentVRegDefs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was asymmetric between Uses and Defs. We need the operand index of the outstanding defs to compute operand latencies.
|
|
||
| // Use TRI's regsOverlap which handles both physical and virtual registers, | ||
| // including subregisters and lane masks | ||
| return TRI->regsOverlap(SrcReg, DstReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was only needed transiently, but it looks really good.
| PostSWP->isPostPipelineCandidate(*TheBlock)) | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR); | ||
| PostSWP->isPostPipelineCandidate(*TheBlock)) { | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR, MaterializeAll); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have been nice to be able to skip the scheduler before postpipelining. Sadly, the scheduler sometimes makes better decisions.
| for (int T = 0; T < II; ++T) { | ||
| LaneBitmask Mask = LanesByOffset[T]; | ||
| if (Mask.any()) { | ||
| // Show a simple indicator - could be enhanced to show actual lanes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Full lanemasks are bulky though.
|
|
||
| // Score based on total lane usage (which already incorporates duration) | ||
| // Add a small bonus for max lanes to prioritize wide registers | ||
| unsigned Score = TotalLanes * 10 + MaxLanes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This score determines the order in which the virtregs are going to be allocated. I think this could be improved, or we could use a few different scoring functions. It's good enough for our motivating unittest though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 'loop through strategies" is forthcoming.
| const TargetRegisterClass *RC = MRI.getRegClass(VReg); | ||
|
|
||
| // Get available physical registers from RegTracker | ||
| const auto &AvailableRegs = RegTracker.getAvailablePhysRegs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This references a temporary object. Will be fixed in a future push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that same push, we compute the available registers only once and pass it down.
| bool AIEPostRegAlloc::tryAllocate( | ||
| const DenseMap<unsigned, AIE::LaneMaskVector> &LiveLanesByVReg, int II, | ||
| const RegLiveRangeTracker &RegTracker, const TargetRegisterInfo &TRI, | ||
| const MachineRegisterInfo &MRI, AllocState &State, bool UseBestFit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BestFit is a hallucination of AI. It will be removed.
| static cl::opt<bool> TestRegDefUseTracker( | ||
| "aie-test-regdefuse-tracker", cl::Hidden, cl::init(false), | ||
| cl::desc("[AIE] TEST MODE: Run RegDefUseTracker analysis on all loops " | ||
| "(for testing only)")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is accommodating a dump for the early stages of live range analysis.
|
|
||
| void BlockState::restorePipelining() { | ||
| // Restore to the original allocation of the virtual registers | ||
| RegTracker->restoreOriginalPhysRegs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These registers were used by the scheduler whose result we're going to use as a fallback.
We sandwich graph construction and pipelining between a rewrite to virtual registers and a restoration of those. This is a no-op if pipelining fails. If it succeeds, it has found and applied a valid register allocation for the virtual registers that were introduced. The virtualization is controlled by an option which defaults to false, which kept existing references the same.
7930abc to
dcc908c
Compare
| switch (updateFixPoint(BS)) { | ||
| const auto Stage = updateFixPoint(BS); | ||
| BS.FixPoint.Stage = Stage; | ||
| switch (Stage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will separate this out. It's just cosmetics to return a state and only save it in the top call.
| BS.FixPoint.PipelinerMode = firstPipelinerMode(); | ||
| if (BS.FixPoint.PipelinerMode != PostPipelinerMode::None) { | ||
| return SchedulingStage::Pipelining; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit weird: we have been pipelining and are trying to restore to the first allowed pipelinermode for the next II. This should be invariant, so I don't think we can get None here. Perhaps assert.
|
|
||
| // For virtual mode, re-analyze and virtualize | ||
| if (FixPoint.PipelinerMode == PostPipelinerMode::Virtual) { | ||
| // RegTracker might not exist if we have multiple regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone missed that we can't do physical mode either if we have more than one region.
I would hope that RegTracker is always there for a SWP candidate.
This is a POC of register allocation during postpipelining.
We add