Skip to content

Conversation

@cdleary
Copy link
Collaborator

@cdleary cdleary commented Jan 16, 2026

This is also useful as an optimization because it reduces the use count for the absorbed operand and we have a bunch of things that trigger off of single-use.

From the block comment:

  // Pattern: AND subset check (mask subset).
  //
  // This is also commonly described as an AND "absorption" rewrite: if y has no
  // bits set outside x, then `y & x == y`. The rewritten form makes that "no
  // bits set outside x" condition explicit as `(y & ~x) == 0`.
  //
  // Why "subset":
  // Think of a bitvector as the set of positions where it has a 1-bit.
  // Then eq(and(x, y), y) holds when that set for y is a subset of the
  // set for x. y has no bits set outside the 1-bits of x.
  //
  // For same-width bitvectors, the following are equivalent:
  // - `eq(and(x, y), y)` means: every 1-bit in y is also a 1-bit in x
  //
  // - `eq(and(not(x), y), 0)` means: y has no 1-bits in positions where x has a
  //   0-bit (equivalently, and(not(x), y) has no 1-bits at all).
  //
  // So we can rewrite:
  // `eq(and(x, y), y)` <=> `eq(and(not(x), y), 0)`
  // `eq(and(x, y), x)` <=> `eq(and(not(y), x), 0)`
  //
  // This generalizes to n-ary and:
  //
  // `eq(and(a, b, ..., t), t)` <=> `eq(and(not(and(a, b, ...)), t), 0)`
  //
  // And similarly for `ne`.

@cdleary cdleary marked this pull request as ready for review January 16, 2026 04:50
@allight
Copy link
Contributor

allight commented Jan 20, 2026

This seems likely to cause similar problems to the sgtz one where although the expressions are logically equivalent this one is more difficult for our analyses to see though so it should only be transformed into with caution. Furthermore since our delay & area models are all node-based this would add additional error to the estimates produced by them. This may be the sort of transform we want to do post-scheduling or perhaps immediately before scheduling but even there I'm not sure this is actually better in any way we care about.

@cdleary
Copy link
Collaborator Author

cdleary commented Jan 21, 2026

@allight I see what you mean. This is probably an aside, but on thing that /could/ be of interest for a transform like this is that there are a bunch of optimizations that are gated on single-use, and this reduces the number of uses in an equivalent form.

Do you think, for folks who are less sensitive to delay modeling wanting monolithic operations, you'd recommend doing a different pass pipeline and just sharing some subset of the passes? The general problem is that there are some optimization opportunities that can only be uncovered well by further decomposition. I see logic benchmarks that should optimize to a constant but we can't currently recognize that it can be, as you say, because we might choose to tip-toe on some rewrites we "could do, but don't", to try to keep delay model happy seeing larger ops. I put several rewrite rules up for review at the same time, but as different PRs, that I saw were necessary to get some samples towards the constant form. I think in general "XLS as term rewriting optimizer" can sometimes be at odds with "delay modeler preserving big operations", but for cases where you need a bunch of rewrites to discover something is a constant it shows how maybe adding the various optimizations is important to get to the final constant form to get an accurate delay estimation too.

LMK you thoughts on this!

@cdleary
Copy link
Collaborator Author

cdleary commented Jan 21, 2026

@allight [also note this particular optimization may not have been as critical as others, even though it's where our discussion is happening :-) if it's helpful I can try to share a small subgraph where a rewrite was clearly important in the future, but I think even then the general question of optimizing vs delay modeling estimation will remain, we'll just have some clearer case studies as motivation]

@allight
Copy link
Contributor

allight commented Jan 21, 2026

This is an interesting disucssion that I think @scampanoni and @ericastor (when he gets back from vacation) should also take a look at.

Creating a custom pass pipeline would certainly be one way to customize the behavior of xls. Depending on what you want to do it might or might not be the best solution. Certainly it is a good way to experiment with different tradeoffs which we use a lot for our own things. You can both add a new bazel toolchain you point your build at or pass the custom pipeline protobuf to opt (assuming you have all the passes you want in the pass_registry).

We have lately been putting a lot more effort into using abstract evaluators of various stripes and other similar analytical models to avoid or at least reduce our reliance on bit/logic-level incremental lowering since it gives us better control over the tradeoffs between the scheduler, area, timing, and other considerations. This approach has been showing good success and can help us perform large rewrites in a way that is understandable locally.

If you can give us an example program where this (and the other cls) rewrites can reduce something not visible by our normal analyses that would be really useful. We do have some ideas of how we could do better constant recognition (eg 10.1017/S1471068424000140) and having motivating examples where we fail would be helpful.

For what it's worth I tested this against the larger benchmark suite and this specific cl does not seem to cause any issues so I'd be happy to merge it (assuming @scampanoni doesn't object). I do still worry a bit about scheduler issues but this seems minor enough to not likely cause a problem. The SGTZ one though definitely inhibits our range analyses in ways that cause regressions.

@mikex-oss mikex-oss requested a review from scampanoni January 21, 2026 21:20
@cdleary
Copy link
Collaborator Author

cdleary commented Jan 22, 2026

@allight sounds good, I think I should be able to provide motivating subgraphs in the future -- understood why the sgtz one would be trickier to decompose too (and that at a minimum it would want to decompose very late)

Discussion would be good at some point for sure, but happy to try to draw the connections more clearly so we also have something more concrete to discuss for the next set of opts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants