-
Notifications
You must be signed in to change notification settings - Fork 223
[opt] Add AND subset (absorption) rewrite #3692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This seems likely to cause similar problems to the sgtz one where although the expressions are logically equivalent this one is more difficult for our analyses to see though so it should only be transformed into with caution. Furthermore since our delay & area models are all node-based this would add additional error to the estimates produced by them. This may be the sort of transform we want to do post-scheduling or perhaps immediately before scheduling but even there I'm not sure this is actually better in any way we care about. |
|
@allight I see what you mean. This is probably an aside, but on thing that /could/ be of interest for a transform like this is that there are a bunch of optimizations that are gated on single-use, and this reduces the number of uses in an equivalent form. Do you think, for folks who are less sensitive to delay modeling wanting monolithic operations, you'd recommend doing a different pass pipeline and just sharing some subset of the passes? The general problem is that there are some optimization opportunities that can only be uncovered well by further decomposition. I see logic benchmarks that should optimize to a constant but we can't currently recognize that it can be, as you say, because we might choose to tip-toe on some rewrites we "could do, but don't", to try to keep delay model happy seeing larger ops. I put several rewrite rules up for review at the same time, but as different PRs, that I saw were necessary to get some samples towards the constant form. I think in general "XLS as term rewriting optimizer" can sometimes be at odds with "delay modeler preserving big operations", but for cases where you need a bunch of rewrites to discover something is a constant it shows how maybe adding the various optimizations is important to get to the final constant form to get an accurate delay estimation too. LMK you thoughts on this! |
|
@allight [also note this particular optimization may not have been as critical as others, even though it's where our discussion is happening :-) if it's helpful I can try to share a small subgraph where a rewrite was clearly important in the future, but I think even then the general question of optimizing vs delay modeling estimation will remain, we'll just have some clearer case studies as motivation] |
|
This is an interesting disucssion that I think @scampanoni and @ericastor (when he gets back from vacation) should also take a look at. Creating a custom pass pipeline would certainly be one way to customize the behavior of xls. Depending on what you want to do it might or might not be the best solution. Certainly it is a good way to experiment with different tradeoffs which we use a lot for our own things. You can both add a new bazel toolchain you point your build at or pass the custom pipeline protobuf to opt (assuming you have all the passes you want in the pass_registry). We have lately been putting a lot more effort into using abstract evaluators of various stripes and other similar analytical models to avoid or at least reduce our reliance on bit/logic-level incremental lowering since it gives us better control over the tradeoffs between the scheduler, area, timing, and other considerations. This approach has been showing good success and can help us perform large rewrites in a way that is understandable locally. If you can give us an example program where this (and the other cls) rewrites can reduce something not visible by our normal analyses that would be really useful. We do have some ideas of how we could do better constant recognition (eg 10.1017/S1471068424000140) and having motivating examples where we fail would be helpful. For what it's worth I tested this against the larger benchmark suite and this specific cl does not seem to cause any issues so I'd be happy to merge it (assuming @scampanoni doesn't object). I do still worry a bit about scheduler issues but this seems minor enough to not likely cause a problem. The SGTZ one though definitely inhibits our range analyses in ways that cause regressions. |
|
@allight sounds good, I think I should be able to provide motivating subgraphs in the future -- understood why the sgtz one would be trickier to decompose too (and that at a minimum it would want to decompose very late) Discussion would be good at some point for sure, but happy to try to draw the connections more clearly so we also have something more concrete to discuss for the next set of opts. |
This is also useful as an optimization because it reduces the use count for the absorbed operand and we have a bunch of things that trigger off of single-use.
From the block comment: