Skip to content

Conversation

@hargoniX
Copy link
Contributor

@hargoniX hargoniX commented Dec 2, 2025

This PR enables the specializer to also recursively specialize in some non trivial higher order situations.

The main motivation for this change is the upcoming changes to do notation by sgraf. In there he uses combinators such as

@[specialize, expose]
def List.newForIn {α β γ} (l : List α) (b : β) (kcons : α → (β → γ) → β → γ) (knil : β → γ) : γ :=
  match l with
  | []     => knil b
  | a :: l => kcons a (l.newForIn · kcons knil) b

in programs such as

def testing :=
  let x := 42;
  List.newForIn (β := Nat) (γ := Id Nat)
    [1,2,3]
    x
    (fun i kcontinue s =>
      let x := s;
      List.newForIn
        [i:10].toList x
        (fun j kcontinue s =>
          let x := s;
          let x := x + i + j;
          kcontinue x)
        kcontinue)
    pure

inspecting this IR right before we get to the specializer in the current compiler we get:

[Compiler.eagerLambdaLifting] size: 22
    def testing : Nat :=
      fun _f.1 _y.2 : Nat :=
        return _y.2;
      let x := 42;
      let _x.3 := 1;
      fun _f.4 i kcontinue s : Nat :=
        fun _f.5 j kcontinue s : Nat :=
          let _x.6 := Nat.add s i;
          let x := Nat.add _x.6 j;
          let _x.7 := kcontinue x;
          return _x.7;
        let _x.8 := 10;
        let _x.9 := Nat.sub _x.8 i;
        let _x.10 := Nat.add _x.9 _x.3;
        let _x.11 := 1;
        let _x.12 := Nat.sub _x.10 _x.11;
        let _x.13 := Nat.mul _x.3 _x.12;
        let _x.14 := Nat.add i _x.13;
        let _x.15 := @List.nil _;
        let _x.16 := List.range'TR.go _x.3 _x.12 _x.14 _x.15;
        let _x.17 := @List.newForIn _ _ _ _x.16 s _f.5 kcontinue;
        return _x.17;
      let _x.18 := 2;
      let _x.19 := 3;
      let _x.20 := @List.nil _;
      let _x.21 := @List.cons _ _x.19 _x.20;
      let _x.22 := @List.cons _ _x.18 _x.21;
      let _x.23 := @List.cons _ _x.3 _x.22;
      let _x.24 := @List.newForIn _ _ _ _x.23 x _f.4 _f.1;
      return _x.24 

Here the kcontinue higher order functions pose a special challenge because they delay the discovery of new specialization opportunities. Inspecting the IR after the current specializer (and a cleanup simp step) we get functions that look as follows:

 [simp] size: 7
      def List.newForIn._at_.testing.spec_0 i kcontinue l b : Nat :=
        cases l : Nat
        | List.nil =>
          let _x.1 := kcontinue b;
          return _x.1
        | List.cons head.2 tail.3 =>
          let _x.4 := Nat.add b i;
          let x := Nat.add _x.4 head.2;
          let _x.5 := List.newForIn._at_.testing.spec_0 i kcontinue tail.3 x;
          return _x.5 
  [simp] size: 14
      def List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 l b : Nat :=
        cases l : Nat
        | List.nil =>
          return b
        | List.cons head.2 tail.3 =>
          fun _f.4 x.5 : Nat :=
            let _x.6 := List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 tail.3 x.5;
            return _x.6;
          let _x.7 := 10;
          let _x.8 := Nat.sub _x.7 head.2;
          let _x.9 := Nat.add _x.8 _x.1;
          let _x.10 := 1;
          let _x.11 := Nat.sub _x.9 _x.10;
          let _x.12 := Nat.mul _x.1 _x.11;
          let _x.13 := Nat.add head.2 _x.12;
          let _x.14 := @List.nil _;
          let _x.15 := List.range'TR.go _x.1 _x.11 _x.13 _x.14;
          let _x.16 := List.newForIn._at_.testing.spec_0 head.2 _f.4 _x.15 b;
          return _x.16

Observe that the specializer decided to abstract over kcontinue instead of specializing further recursively. Thus this tight loop is now going through an indirect call.

This PR now changes the specializer somewhat fundamentally to handle situations like this. The most notable change is going to a fixpoint loop of:

  1. Specialize all current declarations in the worklist
  2. If a declaration
    • succeeded in specializing run the simplifier on it and put it back onto the worklist
    • if it didn't don't put it back onto the worklist anymore
  3. Put all newly generated specialisations on the worklist
  4. Recompute fixed parameters for the current SCC
  5. Repeat until the worklist is empty

Furthermore, declarations that were already specialized:

  • only consider fixedHO parameters for specialization, in order to avoid termination issues with repeated specialization and abstraction of type class parameters under binders
  • recursively specialized declarations only allow specialization if at least one of their fixedHO arguments is not a parameter itself. The reason for allowing this in first generation specialization is that we refrain from specializing inside the body of a declaration marked as @[specialize]. Thus we need to specialize them even if their arguments don't actually contain anything of interest in order to ensure that type classes etc. are correctly cleaned up within their bodies.

Closes #10924

@hargoniX hargoniX added the changelog-compiler Compiler, runtime, and FFI label Dec 2, 2025
@hargoniX
Copy link
Contributor Author

hargoniX commented Dec 2, 2025

!radar

@leanprover-radar
Copy link

leanprover-radar commented Dec 2, 2025

Benchmark results for fd5b87d against 0e83422 are in! @hargoniX

Major changes (72)
  • Init.Data.BitVec.Lemmas re-elab//instructions: -11.1G (-1.3%)
  • 🟥 Init.Data.BitVec.Lemmas//instructions: +11.6G (+7.8%)
  • 🟥 Init.Data.List.Basic re-elab//instructions: +9.5G (+5.8%)
  • 🟥 Init.Data.List.Sublist async//instructions: +889.8M (+7.4%)
  • 🟥 Init.Data.List.Sublist re-elab -j4 (watchdog rss)//instructions: +6.6G (+3.5%)
  • 🟥 Init.Data.List.Sublist re-elab -j4//instructions: +18.6G (+5.4%)
  • 🟥 Init.Prelude async//instructions: +858.2M (+7.0%)
  • 🟥 Std.Data.DHashMap.Internal.RawLemmas//instructions: +19.6G (+8.2%)
  • 🟥 Std.Data.Internal.List.Associative//instructions: +6.5G (+7.7%)
  • 🟥 big_beq//instructions: +640.5M (+5.6%)
  • 🟥 big_beq_rec//instructions: +1.5G (+7.4%)
  • 🟥 big_deceq//instructions: +136.3M (+2.6%)
  • 🟥 big_deceq_rec//instructions: +283.6M (+4.0%)
  • 🟥 big_do//instructions: +1.5G (+5.1%)
  • 🟥 big_match//instructions: +667.8M (+5.6%)
  • 🟥 big_match_partial//instructions: +1.1G (+6.5%)
  • 🟥 big_omega.lean MT//instructions: +1.9G (+7.2%)
  • 🟥 big_omega.lean//instructions: +1.8G (+7.2%)
  • 🟥 binarytrees.st//instructions: +6.8G (+11.6%)
  • 🟥 binarytrees//instructions: +6.8G (+11.5%)
  • 🟥 build//instructions: +41.0G (+0.3%)
  • 🟥 bv_decide_inequality.lean//instructions: +3.1G (+2.8%)
  • 🟥 bv_decide_large_aig.lean//instructions: +1.5G (+3.4%)
  • 🟥 bv_decide_mod//instructions: +7.4G (+3.3%)
  • 🟥 bv_decide_mul//instructions: +1.7G (+3.8%)
  • 🟥 bv_decide_realworld//instructions: +916.9M (+3.6%)
  • 🟥 bv_decide_rewriter.lean//instructions: +1.2G (+6.6%)
  • channel.lean//instructions: -718.5M (-1.8%)
  • 🟥 const_fold//instructions: +809.2M (+10.8%)
  • 🟥 deriv//instructions: +430.6M (+6.1%)
  • 🟥 grind_bitvec2.lean//instructions: +15.3G (+7.9%)
  • 🟥 grind_list2.lean//instructions: +4.7G (+7.6%)
  • 🟥 grind_ring_5.lean//instructions: +610.2M (+6.7%)
  • 🟥 hashmap.lean//instructions: +282.4M (+7.8%)
  • 🟥 identifier auto-completion//instructions: +4.7G (+6.1%)
  • 🟥 ilean roundtrip//instructions: +1.4G (+5.5%)
  • 🟥 iterators (elab)//instructions: +216.8M (+6.0%)
  • 🟥 iterators (interpreted)//instructions: +1.1G (+3.3%)
  • 🟥 lake build clean//instructions: +42.9G (+2.0%)
  • 🟥 lake build no-op//instructions: +223.8M (+3.7%)
  • 🟥 lake config elab//instructions: +89.1M (+2.8%)
  • lake config import//instructions: -27.4M (-1.7%)
  • lake config tree//instructions: -25.3M (-1.5%)
  • lake env//instructions: -29.9M (-1.8%)
  • 🟥 lake startup//instructions: +4.1M (+1.6%)
  • 🟥 language server startup with ileans//instructions: +1.2G (+5.5%)
  • 🟥 language server startup//instructions: +10.3M (+2.2%)
  • 🟥 liasolver//instructions: +261.6M (+6.6%)
  • 🟥 mut_rec_wf//instructions: +2.2G (+7.1%)
  • 🟥 nat_repr//instructions: +131.0M (+3.9%)
  • 🟥 omega_stress.lean async//instructions: +322.7M (+6.3%)
  • 🟥 parser//instructions: +3.2G (+6.4%)
  • 🟥 phashmap.lean//instructions: +544.3M (+5.6%)
  • 🟥 qsort//instructions: +1.1G (+7.0%)
  • 🟥 rbmap//instructions: +1.2G (+8.6%)
  • 🟥 rbmap_1//instructions: +1.0G (+7.3%)
  • 🟥 rbmap_10//instructions: +1.2G (+8.5%)
  • 🟥 rbmap_fbip//instructions: +666.6M (+9.7%)
  • 🟥 rbmap_library//instructions: +615.2M (+4.7%)
  • 🟥 reduceMatch//instructions: +1.1G (+6.2%)
  • 🟥 riscv-ast.lean//instructions: +10.0G (+7.7%)
  • 🟥 simp_arith1//instructions: +115.9M (+3.6%)
  • 🟥 simp_bubblesort_256//instructions: +1.1G (+7.5%)
  • 🟥 simp_congr//instructions: +1.4G (+7.4%)
  • 🟥 simp_local//instructions: +4.8G (+8.5%)
  • 🟥 simp_subexpr//instructions: +1.1G (+7.6%)
  • 🟥 tests/bench/ interpreted//instructions: +7.8G (+3.6%)
  • tests/compiler//sum binary sizes: -20.8M (-1.1%)
  • 🟥 treemap.lean//instructions: +527.5M (+2.4%)
  • 🟥 unionfind//instructions: +1.7G (+7.2%)
  • 🟥 workspaceSymbols with new ranges//instructions: +53.5M (+7.2%)
  • 🟥 workspaceSymbols//instructions: +1.4G (+5.5%)
Minor changes (3)
  • 🟥 iterators (compiled)//instructions: +4.0M (+0.8%)
  • 🟥 sigma iterator//instructions: +3.0G (+7.5%)
  • size/libleanshared.so//bytes: -514kiB (-0.3%)

@hargoniX
Copy link
Contributor Author

hargoniX commented Dec 2, 2025

!bench

@leanprover-radar
Copy link

leanprover-radar commented Dec 2, 2025

Benchmark results for 4a93cec against edcef51 are in! @Garmelon @hargoniX

Major changes (2)
  • 🟥 build//instructions: +41.7G (+0.3%)
  • 🟥 channel.lean//instructions: +1.8G (+4.7%)
Minor changes (2)
  • 🟥 iterators (elab)//instructions: +37.7M (+1.0%)
  • size/libleanshared.so//bytes: -486kiB (-0.2%)

@hargoniX
Copy link
Contributor Author

hargoniX commented Dec 2, 2025

!bench

@leanprover-radar
Copy link

leanprover-radar commented Dec 2, 2025

Benchmark results for 28d874c against edcef51 are in! @hargoniX

Major changes (1)
  • 🟥 build//instructions: +42.7G (+0.3%)
Minor changes (3)
  • 🟥 Init.Data.BitVec.Lemmas re-elab//instructions: +4.3G (+0.5%)
  • 🟥 channel.lean//instructions: +268.8M (+0.7%)
  • 🟥 iterators (elab)//instructions: +37.7M (+1.0%)

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Dec 3, 2025
@leanprover-community-bot
Copy link
Collaborator

leanprover-community-bot commented Dec 3, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase edcef51434804e6432c02f42ef364cb13f9149b2 --onto 5bd331e85d9d110a29fb3367dbb21854010ffcbd. You can force Mathlib CI using the force-mathlib-ci label. (2025-12-03 13:32:42)
  • ✅ Mathlib branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-05 11:42:10) View Log
  • ✅ Mathlib branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-08 12:49:20) View Log

@leanprover-bot
Copy link
Collaborator

leanprover-bot commented Dec 3, 2025

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase edcef51434804e6432c02f42ef364cb13f9149b2 --onto d3dda9f6d4428a906c096067ecb75e432afc4615. You can force reference manual CI using the force-manual-ci label. (2025-12-03 13:32:43)
  • ✅ Reference manual branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-05 10:48:51) View Log
  • 🟡 Reference manual branch lean-pr-testing-11479 build against this PR didn't complete normally. (2025-12-05 10:49:55) View Log
  • ✅ Reference manual branch lean-pr-testing-11479 has successfully built against this PR. (2025-12-08 11:52:29) View Log
  • 🟡 Reference manual branch lean-pr-testing-11479 build against this PR didn't complete normally. (2025-12-08 11:53:32) View Log

@hargoniX
Copy link
Contributor Author

hargoniX commented Dec 3, 2025

!bench

@leanprover-radar
Copy link

leanprover-radar commented Dec 3, 2025

Benchmark results for 91927db against edcef51 are in! @hargoniX

Major changes (2)
  • 🟥 build//instructions: +50.0G (+0.4%)
  • 🟥 channel.lean//instructions: +835.2M (+2.2%)
Minor changes (2)
  • 🟥 iterators (elab)//instructions: +37.7M (+1.0%)
  • 🟥 size/all/.olean//bytes: +963kiB (+0.3%)

@hargoniX hargoniX marked this pull request as ready for review December 5, 2025 09:50
@hargoniX hargoniX requested a review from leodemoura as a code owner December 5, 2025 09:50
leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Dec 5, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Dec 5, 2025
leanprover-bot added a commit to leanprover/reference-manual that referenced this pull request Dec 5, 2025
@leanprover-bot leanprover-bot added the builds-manual CI has verified that the Lean Language Reference builds against this PR label Dec 5, 2025
@leanprover-community-bot leanprover-community-bot added the builds-mathlib CI has verified that Mathlib builds against this PR label Dec 5, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Dec 8, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Dec 8, 2025
leanprover-bot added a commit to leanprover/reference-manual that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

builds-manual CI has verified that the Lean Language Reference builds against this PR builds-mathlib CI has verified that Mathlib builds against this PR changelog-compiler Compiler, runtime, and FFI toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The specialization pass does not specialize for oversaturating arguments

5 participants