Skip to content

Conversation

@martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Dec 31, 2025

This is a POC of register allocation during postpipelining.

We add

  • RegDefUseTracker, taking care of virtualizing safe live ranges,
  • SchedulingInterpreter, computing register live ranges based on scheduled pipeline timing
  • PostRegAlloc, using the two to allocate the virtual registers after postpipelining.

return true;

// Check if Reg1 is a sub-register of Reg2
for (MCSubRegIterator SubRegIt(Reg2, TRI, /*IncludeSelf=*/false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about using RegUnits then we do not have to iterate through sub and super registers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved in a trailing commit.

@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from cf979b2 to cc8fca8 Compare January 6, 2026 17:12
Martien de Jong added 7 commits January 8, 2026 10:45
Also for virtual registers. For architectures where latencies can go
negative, this has impact on RecMII
This is abstracting the live ranges to be used by PostRegAlloc
This module analyses live ranges of physical registers that can be
safely reallocated in a basic block.

It supplies facilities to rewrite to virtual registers and to restore
the original allocation.
This module produces an EventSchedule from the instructions and their
issue cycle. The event schedule contains the read and write events of
the virtual registers occuring in the instructions ordered in the processor
pipeline stage timeline. From the EventSchedule the modulo liveranges for a
particular II can be constructed. These represent the lanes of each register
that are live at a particular point.
This is a dedicated register allocator for use by the postpipeliner
@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from cc8fca8 to 33264be Compare January 8, 2026 10:48
@martien-de-jong martien-de-jong marked this pull request as ready for review January 8, 2026 11:13
/// register. There may be multiple current definitions for a register with
/// disjunct lanemasks.
VReg2SUnitMultiMap CurrentVRegDefs;
VReg2SUnitOperIdxMultiMap CurrentVRegDefs;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was asymmetric between Uses and Defs. We need the operand index of the outstanding defs to compute operand latencies.


// Use TRI's regsOverlap which handles both physical and virtual registers,
// including subregisters and lane masks
return TRI->regsOverlap(SrcReg, DstReg);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was only needed transiently, but it looks really good.

PostSWP->isPostPipelineCandidate(*TheBlock))
staticallyMaterializeMultiSlotInstructions(*TheBlock, HR);
PostSWP->isPostPipelineCandidate(*TheBlock)) {
staticallyMaterializeMultiSlotInstructions(*TheBlock, HR, MaterializeAll);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have been nice to be able to skip the scheduler before postpipelining. Sadly, the scheduler sometimes makes better decisions.

for (int T = 0; T < II; ++T) {
LaneBitmask Mask = LanesByOffset[T];
if (Mask.any()) {
// Show a simple indicator - could be enhanced to show actual lanes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Full lanemasks are bulky though.


// Score based on total lane usage (which already incorporates duration)
// Add a small bonus for max lanes to prioritize wide registers
unsigned Score = TotalLanes * 10 + MaxLanes;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This score determines the order in which the virtregs are going to be allocated. I think this could be improved, or we could use a few different scoring functions. It's good enough for our motivating unittest though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 'loop through strategies" is forthcoming.

const TargetRegisterClass *RC = MRI.getRegClass(VReg);

// Get available physical registers from RegTracker
const auto &AvailableRegs = RegTracker.getAvailablePhysRegs();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This references a temporary object. Will be fixed in a future push

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that same push, we compute the available registers only once and pass it down.

bool AIEPostRegAlloc::tryAllocate(
const DenseMap<unsigned, AIE::LaneMaskVector> &LiveLanesByVReg, int II,
const RegLiveRangeTracker &RegTracker, const TargetRegisterInfo &TRI,
const MachineRegisterInfo &MRI, AllocState &State, bool UseBestFit) {
Copy link
Collaborator Author

@martien-de-jong martien-de-jong Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BestFit is a hallucination of AI. It will be removed.

static cl::opt<bool> TestRegDefUseTracker(
"aie-test-regdefuse-tracker", cl::Hidden, cl::init(false),
cl::desc("[AIE] TEST MODE: Run RegDefUseTracker analysis on all loops "
"(for testing only)"));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accommodating a dump for the early stages of live range analysis.


void BlockState::restorePipelining() {
// Restore to the original allocation of the virtual registers
RegTracker->restoreOriginalPhysRegs();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These registers were used by the scheduler whose result we're going to use as a fallback.

We sandwich graph construction and pipelining between a rewrite to virtual
registers and a restoration of those. This is a no-op if pipelining fails.
If it succeeds, it has found and applied a valid register allocation for
the virtual registers that were introduced.

The virtualization is controlled by an option which defaults to false,
which kept existing references the same.
@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from 7930abc to dcc908c Compare January 13, 2026 12:49
switch (updateFixPoint(BS)) {
const auto Stage = updateFixPoint(BS);
BS.FixPoint.Stage = Stage;
switch (Stage) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will separate this out. It's just cosmetics to return a state and only save it in the top call.

BS.FixPoint.PipelinerMode = firstPipelinerMode();
if (BS.FixPoint.PipelinerMode != PostPipelinerMode::None) {
return SchedulingStage::Pipelining;
}
Copy link
Collaborator Author

@martien-de-jong martien-de-jong Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird: we have been pipelining and are trying to restore to the first allowed pipelinermode for the next II. This should be invariant, so I don't think we can get None here. Perhaps assert.


// For virtual mode, re-analyze and virtualize
if (FixPoint.PipelinerMode == PostPipelinerMode::Virtual) {
// RegTracker might not exist if we have multiple regions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone missed that we can't do physical mode either if we have more than one region.
I would hope that RegTracker is always there for a SWP candidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants