[llvm-project] Backport upstream version of lazy template loading #17722

devajithvs · 2025-02-14T12:38:19Z

This Pull request:

Use the upstream version of our downstream patch: root-project/llvm-project@a11e943

Ran a few benchmark tests locally, ctests are slightly slower on mine (could just be noise, but nothing concerning)

Will update LLVM tags later once everything is running and good.

This will allow better testing of this patch and will not be coupled with future LLVM20 updates.

Changes or fixes:

Checklist:

tested changes locally
updated the docs (if necessary)

This PR fixes #

github-actions · 2025-02-14T16:00:08Z

Test Results

19 files 19 suites 5d 6h 15m 50s ⏱️
2 715 tests 2 714 ✅ 0 💤 1 ❌
49 872 runs 49 869 ✅ 0 💤 3 ❌

For more details on these failures, see this check.

Results for commit deb3e7a.

♻️ This comment has been updated with latest results.

devajithvs · 2025-02-24T13:16:51Z

Results of
/usr/bin/time --verbose bin/root -l -b -q -e 'std::vector<int> vec = {1, 2, 3, 4, 5};'

This PR:

Command exited with non-zero status 255
        Command being timed: "bin/root -l -b -q -e std::vector<int> vec = {1, 2, 3, 4, 5};"
        User time (seconds): 0.12
        System time (seconds): 0.07
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 321168
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 17848
        Voluntary context switches: 118
        Involuntary context switches: 4
        Swaps: 0
        File system inputs: 0
        File system outputs: 8
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 255

Master:

Command exited with non-zero status 255
        Command being timed: "bin/root -l -b -q -e std::vector<int> vec = {1, 2, 3, 4, 5};"
        User time (seconds): 0.12
        System time (seconds): 0.06
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.18
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 228812
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 15272
        Voluntary context switches: 119
        Involuntary context switches: 4
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 255

Without upstream/downstream patch:

Command exited with non-zero status 255
        Command being timed: "bin/root -l -b -q -e std::vector<int> vec = {1, 2, 3, 4, 5};"
        User time (seconds): 0.18
        System time (seconds): 0.08
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.27
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 375760
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 22146
        Voluntary context switches: 117
        Involuntary context switches: 4
        Swaps: 0
        File system inputs: 0
        File system outputs: 8
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 255

vgvassilev · 2025-02-24T13:36:07Z

That’s a bummer, we screwed up something badly upstream…

vgvassilev · 2025-02-24T15:27:34Z

@ChuanqiXu9, do you see something obvious that went wrong here? In our current workflows we deserialize 100Mb more with our upstream patch...

core/dictgen/src/Scanner.cxx

hahnjo · 2025-02-25T10:30:39Z

Repeating the test of #14495 (comment) with hsimple.C shows that (on my machine) master uses 290MB, reverting our downstream version of D41416 uses 425MB, and this PR with the upstream patch uses 414MB. Unfortunately, the performance of the patch on the ROOT side was not tracked during upstream review / changes...

hahnjo · 2025-03-07T13:01:13Z

Hi @ChuanqiXu9 @vgvassilev, I had another look at the patch and found four places where things seem wrong. At least two of them are needed for ROOT, and with the last commit I pushed to this PR memory usage returns to the same level as master in my measurements.

Handling of inner template arguments in TemplateArgumentHasher: When debugging I noticed that all template arguments that were instantiations of std::pair hashed to the same value, no matter its template arguments. This is quite bad for the STL where internally I think for example maps are implemented using nodes of std::pair<Key, Value>. This can be fixed in the same manner as ODRHash handles it:

diff --git a/interpreter/llvm-project/clang/lib/Serialization/TemplateArgumentHasher.cpp b/interpreter/llvm-project/clang/lib/Serialization/TemplateArgumentHasher.cpp
index fb62454643..52c3e5ed1d 100644
--- a/interpreter/llvm-project/clang/lib/Serialization/TemplateArgumentHasher.cpp
+++ b/interpreter/llvm-project/clang/lib/Serialization/TemplateArgumentHasher.cpp
@@ -196,6 +196,21 @@ void TemplateArgumentHasher::AddDecl(const Decl *D) {
   }
 
   AddDeclarationName(ND->getDeclName());
+
+  // If this was a specialization we should take into account its template
+  // arguments. This helps to reduce collisions coming when visiting template
+  // specialization types (eg. when processing type template arguments).
+  ArrayRef<TemplateArgument> Args;
+  if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D))
+    Args = CTSD->getTemplateArgs().asArray();
+  else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D))
+    Args = VTSD->getTemplateArgs().asArray();
+  else if (auto *FD = dyn_cast<FunctionDecl>(D))
+    if (FD->getTemplateSpecializationArgs())
+      Args = FD->getTemplateSpecializationArgs()->asArray();
+
+  for (auto &TA : Args)
+    AddTemplateArgument(TA);
 }
 
 void TemplateArgumentHasher::AddQualType(QualType T) {

In ASTReader::CompleteRedeclChain there is a comment saying "For partitial specialization, load all the specializations for safety." which, after fixing 1., is the remaining reason why the upstream version is loading many more templates. The fix I applied is

diff --git a/interpreter/llvm-project/clang/lib/Serialization/ASTReader.cpp b/interpreter/llvm-project/clang/lib/Serialization/ASTReader.cpp
index 3d325774ba..d65f329ae8 100644
--- a/interpreter/llvm-project/clang/lib/Serialization/ASTReader.cpp
+++ b/interpreter/llvm-project/clang/lib/Serialization/ASTReader.cpp
@@ -7723,14 +7723,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) {
     }
   }
 
-  if (Template) {
-    // For partitial specialization, load all the specializations for safety.
-    if (isa<ClassTemplatePartialSpecializationDecl,
-            VarTemplatePartialSpecializationDecl>(D))
-      Template->loadLazySpecializationsImpl();
-    else
-      Template->loadLazySpecializationsImpl(Args);
-  }
+  if (Template)
+    Template->loadLazySpecializationsImpl(Args);
 }
 
 CXXCtorInitializer **

basically reverting what was done as part of the original D41416. Do you remember why that change was done during the upstream review process?

There's a similar comment in RedeclarableTemplateDecl::loadLazySpecializationsImpl about loading all specializations, which also seems dubious to me.
TemplateArgumentHasher has logic to "bail out". While I agree that this is sound to do, it doesn't make sense to me to return a fixed value if one of the internal parts cannot be hashed. Shouldn't we just be able to skip that part and carry on with the hashing process? If that causes a collision then so be it, but at least we are not causing collisions for any template argument list that has a part we cannot hash.

I pushed my (suspected) solutions for the first two points to this PR, and all ROOT tests seem to pass. I can submit patches for any of the above points upstream if you want, in the way that is most convenient (with a single PR, maybe that's easier to test?). We should only note that the changes cannot be backported for LLVM 20 because the hashing changes cause existing on-disk modules to break.

vgvassilev · 2025-03-07T13:59:31Z

@hahnjo, great job. That part was not clear to me either but since the patch departed quite far from the original state I decided I must be missing something. If we get to the same levels of memory and runtime consumption I can propose the next two steps:

Pipe this through cmssw (cc: @smuzaffar)
Open a PR against LLVM and we can ask google to run it on their infrastructure. If that works we land it and you become famous ;)

hahnjo · 2025-03-07T14:12:23Z

Ok, but with which changes? The PR currently has 1&2, but I think we need to get agreement on which modifications we actually want. And then we should first test ourselves (in llvm-project) before inviting external tests, otherwise that's just a waste of resources.

vgvassilev · 2025-03-07T14:15:36Z

Ok, but with which changes? The PR currently has 1&2, but I think we need to get agreement on which modifications we actually want. And then we should first test ourselves (in llvm-project) before inviting external tests, otherwise that's just a waste of resources.

Personally I want the patch as is if it gets us the memory footprint of what we have in the master. I do not see a reason to split it causing more confusion upstream.

hahnjo · 2025-03-28T13:19:07Z

This seems to look good on our side. @smuzaffar would it be possible to run this through CMSSW testing? Thanks in advance!

smuzaffar · 2025-03-28T15:16:58Z

This seems to look good on our side. @smuzaffar would it be possible to run this through CMSSW testing? Thanks in advance!

cmssw tests are running via cms-sw#221

smuzaffar · 2025-04-01T08:27:00Z

cmssw tests passed cms-sw#221 (comment)

llvm/llvm-project@20e9049 Update Cling's ExternalASTSourceWrapper to forward the two new LoadExternalSpecializations functions.

Backport upstream PR llvm/llvm-project#133057

hahnjo · 2025-04-07T14:53:38Z

Latest numbers:

`master`

$ /usr/bin/time --verbose bin/root -l -b -q -e 'std::vector<int> vec = {1, 2, 3, 4, 5};'

Command exited with non-zero status 255
        Command being timed: "bin/root -l -b -q -e std::vector<int> vec = {1, 2, 3, 4, 5};"
        User time (seconds): 0.10
        System time (seconds): 0.06
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.16
        Maximum resident set size (kbytes): 212928

`devajithvs:dev.lazytemplate`

$ /usr/bin/time --verbose bin/root -l -b -q -e 'std::vector<int> vec = {1, 2, 3, 4, 5};'

Command exited with non-zero status 255
        Command being timed: "bin/root -l -b -q -e std::vector<int> vec = {1, 2, 3, 4, 5};"
        User time (seconds): 0.09
        System time (seconds): 0.06
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.15
        Maximum resident set size (kbytes): 214500

vgvassilev · 2025-04-07T15:13:57Z

Slightly worse in memory but tolerable.

vgvassilev

LGTM!

devajithvs self-assigned this Feb 14, 2025

devajithvs added the clean build Ask CI to do non-incremental build on PR label Feb 14, 2025

devajithvs force-pushed the dev.lazytemplate branch from 6c26e5f to 72afea0 Compare February 14, 2025 13:00

devajithvs marked this pull request as ready for review February 24, 2025 13:17

devajithvs requested review from bellenot, vgvassilev, dpiparo and pcanal as code owners February 24, 2025 13:17

devajithvs requested a review from hahnjo February 24, 2025 13:17

vgvassilev reviewed Feb 24, 2025

View reviewed changes

core/dictgen/src/Scanner.cxx Show resolved Hide resolved

hahnjo force-pushed the dev.lazytemplate branch 2 times, most recently from 6ab17c5 to de0d97c Compare March 6, 2025 21:14

hahnjo mentioned this pull request Mar 26, 2025

[Serialization] Fix lazy template loading llvm/llvm-project#133057

Draft

hahnjo force-pushed the dev.lazytemplate branch 2 times, most recently from 77dcc83 to f4e61d3 Compare March 26, 2025 12:49

hahnjo closed this Mar 26, 2025

hahnjo reopened this Mar 26, 2025

hahnjo closed this Mar 27, 2025

hahnjo reopened this Mar 27, 2025

hahnjo force-pushed the dev.lazytemplate branch 2 times, most recently from 599e306 to 1b65dcd Compare March 28, 2025 10:20

dpiparo approved these changes Mar 31, 2025

View reviewed changes

vgvassilev and others added 5 commits April 7, 2025 14:16

TraverseDecl might deserialize

2a0bc6a

[llvm-project] Revert our version of D41416

a4b7450

[llvm-project] Backport upstream version of lazy template loading

950f33e

llvm/llvm-project@20e9049 Update Cling's ExternalASTSourceWrapper to forward the two new LoadExternalSpecializations functions.

[llvm-project] Fix lazy template loading

7f96d50

Backport upstream PR llvm/llvm-project#133057

[llvm-project] Bump to tag ROOT-llvm18-20250407-01

deb3e7a

hahnjo force-pushed the dev.lazytemplate branch from 1b65dcd to deb3e7a Compare April 7, 2025 12:27

vgvassilev approved these changes Apr 7, 2025

View reviewed changes

hahnjo merged commit 7f390ce into root-project:master Apr 8, 2025
19 of 24 checks passed

couet mentioned this pull request Apr 16, 2025

[skip-ci][doc] force full build of root #18420

Merged

[llvm-project] Backport upstream version of lazy template loading #17722

[llvm-project] Backport upstream version of lazy template loading #17722

Uh oh!

Conversation

devajithvs commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull request:

Changes or fixes:

Checklist:

Uh oh!

github-actions bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

devajithvs commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vgvassilev commented Feb 24, 2025

Uh oh!

vgvassilev commented Feb 24, 2025

Uh oh!

Uh oh!

hahnjo commented Feb 25, 2025

Uh oh!

hahnjo commented Mar 7, 2025

Uh oh!

vgvassilev commented Mar 7, 2025

Uh oh!

hahnjo commented Mar 7, 2025

Uh oh!

vgvassilev commented Mar 7, 2025

Uh oh!

hahnjo commented Mar 28, 2025

Uh oh!

smuzaffar commented Mar 28, 2025

Uh oh!

smuzaffar commented Apr 1, 2025

Uh oh!

hahnjo commented Apr 7, 2025

master

devajithvs:dev.lazytemplate

Uh oh!

vgvassilev commented Apr 7, 2025

Uh oh!

vgvassilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devajithvs commented Feb 14, 2025 •

edited

Loading

github-actions bot commented Feb 14, 2025 •

edited

Loading

devajithvs commented Feb 24, 2025 •

edited

Loading

`master`

`devajithvs:dev.lazytemplate`