Skip to content

[pull] master from phate:master#530

Merged
pull[bot] merged 2 commits intoEECS-NTNU:masterfrom
phate:master
Oct 29, 2025
Merged

[pull] master from phate:master#530
pull[bot] merged 2 commits intoEECS-NTNU:masterfrom
phate:master

Conversation

@pull
Copy link

@pull pull bot commented Oct 29, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

phate and others added 2 commits October 29, 2025 14:42
Replaces Common Node Elimination pass with a slightly different
approach.

Each congruence set has a "leader" output. Once an output has become a
leader it will never become a follower. When marking simple nodes, we
just as before look at the leftmost input and compare ourselves to all
other simple nodes whose leftmost input is congruent with the origin. If
no congruent leader is found, the node becomes its own leader. (more
precisely: all its outputs are their own leader). A leader will always
come earlier in the top down traversal order than all its followers.

When marking thetas we speculatively use the origins of loop variable
inputs to partition loop variables, and mark the theta subregion. Once
the subregion has been marked, it might turn out that loop variable
results were not supposed to be congruent after all. In which case the
loop variables are further partitioned, and the subregion is marked
again.

A fun trick here is that the second time a node is visited, it already
knows that it found a leader last time, so it quickly checks if it is
still congruent with its old leader. If it is, then it does not need to
look at any other nodes. Only if some partitioning has happened since
last time to cause the node to no longer be congruent with its old
leader, will it start looking for other nodes that it may be congruent
with.

Runtime comparison in shown in the below plot:

<img width="873" height="722" alt="image"
src="https://github.com/user-attachments/assets/e37797ae-de8b-4dfb-bed3-6e30496b9df8"
/>

In both cases we do all tricks in RegionAwareModRef, and then time the
CNE afterwards.
The axis show billions on nanoseconds, i.e., seconds. The slowest file
thus goes from ~400 seconds to 12 seconds.

I do want to look into what makes xdisp.c take 12 seconds, btw..

I also want to see what happens to the graphs that took 12 000 seconds
with the old CNE.
@pull pull bot locked and limited conversation to collaborators Oct 29, 2025
@pull pull bot added the ⤵️ pull label Oct 29, 2025
@pull pull bot merged commit e4ff307 into EECS-NTNU:master Oct 29, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants