Skip to content

Conversation

@fgdorais
Copy link
Contributor

@fgdorais fgdorais commented Oct 31, 2025

This PR adds non-branching runtime implementations for operations for Bool.

  • strictAnd the non-branching version of &&.
  • strictOr the non-branching version of ||.
  • strictXor the non-branching version of ^^.
  • Bool.toNat now simply converts an untagged value to a tagged value.
  • The decision procedures for < and <= now compile to the corresponding comparison in C.

Closes RFC #10835

@fgdorais fgdorais changed the title feat: add non-branching boolean operators feat: add non-branching boolean operations Oct 31, 2025
@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Oct 31, 2025
@leanprover-community-bot
Copy link
Collaborator

leanprover-community-bot commented Oct 31, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase c41cb64ca7cd86a0519d009532bad87bc87e9d2a --onto 705084d9ba258dec704af428d7ce2444525f9ffb. You can force Mathlib CI using the force-mathlib-ci label. (2025-10-31 09:22:15)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase c41cb64ca7cd86a0519d009532bad87bc87e9d2a --onto 1ce05b2a179eac8a902995c0caddc609c0169c49. You can force Mathlib CI using the force-mathlib-ci label. (2025-10-31 14:21:28)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 2c2fcff4f8ce2048d8f2e5e4be4d966cf8f093d9 --onto 838be605acfeeb39186a9af2f56376f2a7fe2246. You can force Mathlib CI using the force-mathlib-ci label. (2025-11-11 16:33:28)

@leanprover-bot
Copy link
Collaborator

leanprover-bot commented Oct 31, 2025

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase c41cb64ca7cd86a0519d009532bad87bc87e9d2a --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-10-31 09:22:17)
  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 2c2fcff4f8ce2048d8f2e5e4be4d966cf8f093d9 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-11-11 16:33:30)

@fgdorais
Copy link
Contributor Author

I thought about making decidable equality non-branching as well. But that had unexpected effect that _ = true comparisons did not get optimized out in IR. I then tried to make BEq non-branching but that gave me a couple of unexpected issues as well. (Perhaps it just needs a stage0 update but I didn't dig too deep.)

@fgdorais fgdorais marked this pull request as ready for review October 31, 2025 10:32
@fgdorais fgdorais requested a review from kim-em as a code owner October 31, 2025 10:32
@fgdorais
Copy link
Contributor Author

Probably needs benching.

@TwoFX TwoFX added the changelog-library Library label Oct 31, 2025
@Rob23oba
Copy link
Contributor

I'm not sure this is a good idea; doesn't this mostly make some boolean functions opaque to the compiler? Especially Bool.not feels like it could have a bad impact since now:

match !a with
| true => x
| false => y

won't be optimized to

match a with
| false => x
| true => y

by the Lean compiler itself.
Also, shouldn't the C compiler be good enough to figure out to use non-branching variants?

@fgdorais
Copy link
Contributor Author

I agree about Bool.not. I wonder if it should be macro_inline too. I'll add a separate Bool.lnot for the non-branching version.

Also, shouldn't the C compiler be good enough to figure out to use non-branching variants?

No, this is not a valid optimization in C. It's only possible in Lean because Bool has only two values.

@Rob23oba
Copy link
Contributor

Rob23oba commented Nov 11, 2025

I wonder if we should provide

set_option linter.unusedVariables.funArgs false in
@[inline, always_inline]
unsafe def Bool.toUInt8Impl (b : Bool) : UInt8 := unsafeCast b

@[implemented_by Bool.toUInt8Impl]
def Bool.toUInt8 (b : Bool) : UInt8 :=
  if b then 1 else 0

set_option linter.unusedVariables.funArgs false in
@[inline, always_inline]
unsafe def Bool.ofUInt8Impl (x : UInt8) (h : x &&& 1 = x) : Bool := unsafeCast x

@[implemented_by Bool.ofUInt8Impl]
def Bool.ofUInt8 (x : UInt8) (h : x &&& 1 = x) : Bool :=
  x != 0

and then implement operations like Bool.land as Bool.ofUInt8 (a.toUInt8 &&& b.toUInt8) (by decide +revert) instead of relying on an implementation in C.

@fgdorais
Copy link
Contributor Author

Note that Bool.toUInt8 is already implemented correctly. I think adding Bool.ofUInt8Unsafe would be enough by adding unsafe strictAndImpl/strictOrImpl/strictXorImpl. That way the kernel would still see strictAnd the same as &&, etc.

That would require moving strictAnd and strictOr out of Init.Core. I'm not sure that's a good idea.

@Rob23oba
Copy link
Contributor

Yeah, I also thought further about the implications for the compiler and I think it would be easier for the compiler to optimize with strictAnd x 1 rather than UInt8.land x 1, so it's probably best to keep it as it is now. Also, am I right to assume that strictNot is missing because you can do strictXor x true?

@fgdorais
Copy link
Contributor Author

Also, am I right to assume that strictNot is missing because you can do strictXor x true?

Yes, it didn't seem that useful to add strictNot.

@leanprover-bot leanprover-bot added the P-medium We may work on this issue if we find the time label Nov 13, 2025
@hargoniX
Copy link
Contributor

hargoniX commented Nov 17, 2025

What is the ultimate guarantee you want here? Do you want to provide a guarantee that both sides get evaluated at runtime to enable people writing things like guaranteed-time code? If so, that is basically impossible with C. LLVM will of course liberally decide what it wants to do with a | and if it can see that certain operations are pure it will change the program to make use of that.

@fgdorais
Copy link
Contributor Author

No, my goal is to allow boolean operations to compile to branching-free bitwise ops. The C semantics require || and && to behave as branching code and indeed usually results in such. (Lean's definition of ^^ as != has a similar effect.) I certainly hope that LLVM will see that it can ultimately optimize StrictOr x false to just x, which is permitted because of static inline.

@Rob23oba
Copy link
Contributor

Rob23oba commented Nov 17, 2025

Well, the compiler can still compile a && b to (a != 0) & (b != 0) where a != 0 is just a few (non-branching) instructions (e.g. test + setne on x86). Similarly, the compiler can optimize a || 0 at least to a != 0.

@fgdorais
Copy link
Contributor Author

fgdorais commented Nov 17, 2025

As far as performance is concerned. Here is a quick bench that is basically extracted from one of my use cases. (Update: don't use this, see BenchBool instead.)

instance : Zero Bool := ⟨false⟩
instance : One Bool := ⟨true⟩

abbrev Matrix α m n := Vector (Vector α n) m

def mmul1 (a : Matrix Bool m n) (x : Vector Bool n) : Vector Bool m :=
  Vector.ofFn fun i => Fin.foldl (n := n) (init := false) fun acc j => acc ^^ a[i][j] && x[j]

def mmul2 (a : Matrix Bool m n) (x : Vector Bool n) : Vector Bool m :=
  Vector.ofFn fun i => Fin.foldl (n := n) (init := false) fun acc j => strictXor acc <| strictAnd a[i][j] x[j]

/-- randomly generated 16×100 matrix -/
def a : Matrix Bool 16 100 := #v[
    #v[0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1],
    #v[0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1],
    #v[1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
    #v[0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1],
    #v[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
    #v[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0],
    #v[0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1],
    #v[1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
    #v[0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0],
    #v[1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1],
    #v[0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
    #v[1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0],
    #v[0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1],
    #v[0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1],
    #v[1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0],
    #v[0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0],
  ]

/-- randomly generated 100 vector -/
def x : Vector Bool 100 := #v[1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0]

def main : IO UInt32 := do
  let mut r : Vector Bool 16 := .ofFn fun _ => 0
  let mut s : Vector Bool 16 := .ofFn fun _ => 0

  let start ← IO.monoNanosNow
  r := mmul1 a x
  let stop ← IO.monoNanosNow
  IO.println s!"mmul1: {stop - start}ns"

  let start ← IO.monoNanosNow
  s := mmul2 a x
  let stop ← IO.monoNanosNow
  IO.println s!"mmul2: {stop - start}ns"

  -- just to make sure `r` and `s` are used
  return ((Vector.zip r s).all fun (x, y) => x ^^ y).toUInt32

I ran this on a linux/x86_64 box and mmul2 is consistently faster and shows much less variance, as expected. These tests are small only because Lean can't digest the notation for larger arrays. I think changing the random inputs to all false will still show an improvement even though that doesn't really test branch prediction.

@fgdorais
Copy link
Contributor Author

fgdorais commented Nov 18, 2025

I deleted the previous post since the code had a major error. I have new code ready but there is currently a github issue that prevents me from uploading the new code to BenchBool.

Update: github issue is now resolved, new code uploaded!

@fgdorais
Copy link
Contributor Author

fgdorais commented Nov 18, 2025

The new BenchBool binary takes two natural number arguments. The first is the size for which to perform the multiplication of a size x size matrix with a size vector. The second argument is an optional random seed. If omitted, the matrix and vector will be filled with false. If submitted then this will be used as a seed for mkStdGen.

On my Linux/x86_64 box for size 10000:

  • With no randomization the timings for mmul1 and mmul2 are consistently the same, around 110ms, with mmul1 often having the upper hand.
  • With randomization, mmul1 is consistently much slower than mmul2, about 350ms to 120ms on average with very little noise.

On my Darwin/arm64 box for size 10000, I see similar results. Interestingly, mmul1 data is very noisy without randomization but still comparable with mmul2 on average. With randomization, mmul2 outperforms mmul1 about 450ms to 190ms. Again, mmul2 performance is similar with and without randomization.

Conclusion:

  • mmul2 uses non-branching operations and the timings are consistent with or without randomization.
  • mmul1 suffers greatly from branch mispredictions but is competitive with mmul2 when branches are perfectly predictible.
  • Overall, mmul2 is significantly better than mmul1 by a factor of at least 2 on both architectures for randomized data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-library Library P-medium We may work on this issue if we find the time toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants