Skip to content

Integrated Distributed ThinLTO (DTLTO): Design Overview #126654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

bd1976bris
Copy link
Collaborator

@bd1976bris bd1976bris commented Feb 11, 2025

Initial DTLTO Support

This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO). DTLTO was previously discussed in an RFC and during the LTO roundtable discussion at the October US 2024 LLVM conference. PlayStation has offered this feature as a proprietary technology for some time, and we would like to see support in LLVM.

Overview of DTLTO

DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (--thinlto-index-only), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure.

In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.

Documentation and Resources

Features of This Initial Commit

This commit provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. The current implementation includes:

  • COFF and ELF support.
  • Support for bitcode members in thin archives.
  • Basic support for distributing backend ThinLTO compilations.
  • A JSON interface that allows new distribution systems to be supported without modifying LLVM.

The goal of this initial commit is to demonstrate what will be required to support DTLTO while providing useful minimal functionality. Hopefully, having a concrete PR will facilitate discussion and review of the feature.

Performance

We have access to a large farm of computers on Windows. For a link of clang.exe on a modest Windows development machine (AMD64 16 cores, 64GB RAM) DTLTO (via sn-dbs.py) was approximately 4 times as fast as multi-threaded in-process ThinLTO.

To estimate the overhead from DTLTO vs in-process ThinLTO, we measured the difference in the time taken to link Clang with in-process ThinLTO using one thread per core, and DTLTO using one local process per core. On both Windows and Linux the overhead was approximately 6%.

Note that, to facilitate review, this PR elides performance optimizations where possible.

Planned Future Enhancements

The following features will be addressed in future commits:

  • Support for the ThinLTO cache.
  • Support for (non-thin) archives/libraries containing bitcode members.
  • Support for more LTO configuration states e.g., basic block sections.
  • Performance improvements. For example, improving the performance of the temporary file removal.

Discussion Points

  • Feature Name: The DTLTO name could potentially cause confusion with the existing Bazel-style distributed ThinLTO. At the LLVM roundtable discussion no one objected to the name, but we remain open to suggestions.
  • Backend Compilation Configuration: Currently, Clang is invoked to do the backend compilations and a minimal number of options are added to the Clang command line to ensure that the codegen is reasonable (for the testing we have done so far). However, it would be good to find a scalable solution for matching the code-generation state in the invoked external tool to the code-generation state if an in-process ThinLTO backend was in use.
  • Clang Error Handling: There is some precedent for compiler drivers handling options that only apply to specific linkers. Should Clang emit an error if DTLTO options are used and the linker isn't LLD?

Other approaches

We have experimented with other approaches for implementing DTLTO. In particular we have explored:

  • Not using a new ThinLTO backend.
  • Various ways to handle (non-thin) archives.
  • Use of dynamic library plugins instead of processes.

We have prepared another branch to demonstrate some of these ideas: integrated-DTLTO-no-backend

List of Child PRs:

(I intend to update this section as new PRs are filed.)

Core LLVM functionality: PR #127749

@llvmbot llvmbot added clang Clang issues not falling into any other category lld clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' lld:MachO lld:ELF lld:COFF lld:wasm platform:windows LTO Link time optimization (regular/full LTO or ThinLTO) llvm:support llvm:transforms labels Feb 11, 2025
@llvmbot

This comment was marked as spam.

@llvmbot

This comment was marked as spam.

Copy link

github-actions bot commented Feb 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@bd1976bris
Copy link
Collaborator Author

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
View the diff from clang-format here.

I believe this failure is OK as I have followed the (non-standard) formatting in the flagged file which the code for the other ThinLTO backends use.

@bd1976bris
Copy link
Collaborator Author

Some high level comments:

  • Should we just call it distributedLTO, or DLTO? Feel like we can drop the thin part for less typing, and from the user's point of view, using thinLTO infrastructure is just implementation details.

Thanks! I'll discuss this and get back to you tomorrow. It's worth noting that both DTLTO and DLTO have the potential to be confusing - given the existence of the current "Bazel-style" (--thinlto-index-only) distribution for ThinLTO.

  • Are you going to implement some ways to speed up archive or that will come as follow ups? During previous discussion at dev meeting, there are few suggestions for how to do it without extracting archives.

We plan to do that as follow up work. We will eventually want to support archives. The point of this feature is to make it easy for a user to enable distribution - so we don't want restrictions such as "archives are not supported". I think it would be best to start with an RFC-like discussion thread, we will start preparing that.

Thanks to everyone that contributed ideas at the roundtable.

@bd1976bris
Copy link
Collaborator Author

Some high level comments:

  • Should we just call it distributedLTO, or DLTO? Feel like we can drop the thin part for less typing, and from the user's point of view, using thinLTO infrastructure is just implementation details.

Apologies for the slow reply. We would like to keep the "Thin"/"T" in the name. This is clearer as we are currently only going to distribute the ThinLTO part and have no plans to distribute the FullLTO part. Making the name more general could confuse users who might then expect in e.g., a mixed Thin+Full link that the Full bitcode part would also be distributed.

@cachemeifyoucan
Copy link
Collaborator

cachemeifyoucan commented Feb 20, 2025

This is clearer as we are currently only going to distribute the ThinLTO part and have no plans to distribute the FullLTO part. Making the name more general could confuse users who might then expect in e.g., a mixed Thin+Full link that the Full bitcode part would also be distributed.

I don't have a strong opinion on this but I have basically the same concerns with completely opposite conclusions. To me, the distributed thinLTO makes you think there is a distributed full LTO, while just call it distributed LTO will eliminate that confusion. Distributed LTO is by nature based on thin LTO infrastructure but that doesn't need to be exposed. Isn't the LTO option to be Full/Thin/Distributed cleaner? You can still keep DTLTO for DisTributed LTO

@bd1976bris
Copy link
Collaborator Author

I don't have a strong opinion on this but I have basically the same concerns with completely opposite conclusions. To me, the distributed thinLTO makes you think there is a distributed full LTO, while just call it distributed LTO will eliminate that confusion. Distributed LTO is by nature based on thin LTO infrastructure but that doesn't need to be exposed.

Accepted. I think it might be worth appealing to authority here. I wonder if @MaskRay or @teresajohnson have an opinion?

Isn't the LTO option to be Full/Thin/Distributed cleaner?

Sorry, I don't entirely understand this bit, could you expand on this a bit. Are you envisioning an interface like:
clang -flto -> FullLTO
clang -flto=thin -> ThinLTO
clang -flto=distributed -> DTLTO

You can still keep DTLTO for DisTributed LTO

:)

@teresajohnson
Copy link
Contributor

I don't have a strong opinion on this but I have basically the same concerns with completely opposite conclusions. To me, the distributed thinLTO makes you think there is a distributed full LTO, while just call it distributed LTO will eliminate that confusion. Distributed LTO is by nature based on thin LTO infrastructure but that doesn't need to be exposed.

Accepted. I think it might be worth appealing to authority here. I wonder if @MaskRay or @teresajohnson have an opinion?

The naming is to distinguish "distributed" vs "in process" ThinLTO. And now with this I guess there is "integrated" distributed ThinLTO vs "non-integrated" distributed ThinLTO. But they are all flavors of ThinLTO.

(Also, there is support in LLVM for parallel codegen for Full LTO, so I suppose it isn't impossible to distribute that.)

Isn't the LTO option to be Full/Thin/Distributed cleaner?

I don't love that idea because Distributed is not separate from ThinLTO, it is just a sub-type, i.e. the mechanism for invoking the backends.

Sorry, I don't entirely understand this bit, could you expand on this a bit. Are you envisioning an interface like: clang -flto -> FullLTO clang -flto=thin -> ThinLTO clang -flto=distributed -> DTLTO

You can still keep DTLTO for DisTributed LTO

:)

I'm fine with DTLTO as a shorthand for "integrated distributed ThinLTO".

BTW thanks for sending the LLVM patch, I will review that tonight or more likely tomorrow.

@romanova-ekaterina
Copy link
Contributor

I don't have a strong opinion on this but I have basically the same concerns with completely opposite conclusions. To me, the distributed thinLTO makes you think there is a distributed full LTO, while just call it distributed LTO will eliminate that confusion. Distributed LTO is by nature based on thin LTO infrastructure but that doesn't need to be exposed.

Accepted. I think it might be worth appealing to authority here. I wonder if @MaskRay or @teresajohnson have an opinion?

Isn't the LTO option to be Full/Thin/Distributed cleaner?

Sorry, I don't entirely understand this bit, could you expand on this a bit. Are you envisioning an interface like: clang -flto -> FullLTO clang -flto=thin -> ThinLTO clang -flto=distributed -> DTLTO

You can still keep DTLTO for DisTributed LTO

:)

We prefer to keep DTLTO name. One of the reason is to distinguish our "integrated" distributed
appoach from "non-integrated" distributed approach that has been supported for a while.
Another reason is to avoid confusion. We have been using this acronim for a while,
referring to it in RFCs and in a couple of lightning talks on developer conferences. When we had a round table last LLVM developers conference we brought up this topic about naming one more time and we all agreed to keep DTLTO name.

@romanova-ekaterina
Copy link
Contributor

I'm fine with DTLTO as a shorthand for "integrated distributed ThinLTO".
Great! I'm glad that you support this acronim too. :)

BTW thanks for sending the LLVM patch, I will review that tonight or more likely tomorrow.

Teresa, when reviewing, could you please focus on the design/idea rather than doing a full-fledged code review? In a day or two we will submit another PR for "no-backend" DTLTO implementation. We are doing final "touches" to this PR now.

No-backend DLTO implementation has some important benefits/advantages. So, I guess, at this time, it will be most important to understand both designs (i.e. current implementation with DTLTO backend that Ben submitted and the alternative "no DTLTO backend" implementation that we submit a couple of from now), rather than focusing on details of implementations/nikpicks of this particular PR.

I will try to do my best to explain the differences between both designs at the time of submission.

Hopefully, it will help us to choose the best design for using upstream or potentially do a hybrid solution, choosing the best ideas from both designs.

@teresajohnson
Copy link
Contributor

I'm fine with DTLTO as a shorthand for "integrated distributed ThinLTO".
Great! I'm glad that you support this acronim too. :)

BTW thanks for sending the LLVM patch, I will review that tonight or more likely tomorrow.

Teresa, when reviewing, could you please focus on the design/idea rather than doing a full-fledged code review? In a day or two we will submit another PR for "no-backend" DTLTO implementation. We are doing final "touches" to this PR now.

No-backend DLTO implementation has some important benefits/advantages. So, I guess, at this time, it will be most important to understand both designs (i.e. current implementation with DTLTO backend that Ben submitted and the alternative "no DTLTO backend" implementation that we submit a couple of from now), rather than focusing on details of implementations/nikpicks of this particular PR.

I will try to do my best to explain the differences between both designs at the time of submission.

Hopefully, it will help us to choose the best design for using upstream or potentially do a hybrid solution, choosing the best ideas from both designs.

Thanks for the heads up, so I should not do a detailed code review for PR127749? Is there more info on what you mean by a "no-backend DTLTO"?

@romanova-ekaterina
Copy link
Contributor

  • Are you going to implement some ways to speed up archive or that will come as follow ups? During previous discussion at dev meeting, there are few suggestions for how to do it without extracting archives.

Definitely.

The main idea of Integrated distributed ThinLTO approach is to make it seamless for the user to switch from using "in-process" ThinLTO to distributed ThinLTO by simply changing (or adding) one simple option on the command line. No other changes should be expected from the user in the makefiles/builds scripts. "Ease of use" will ensure acceptance of this DTLTO project by the wider audience.

So, yes, archives will be supported.

Current PR (with DTLTO backend) doesn't support regular archives, but there will be a requirement to support them later. Adding a support for regular archive now will increase the size of the initial PR, which is already huge. Note: archive support cannot be done inside backend because module_id needs to be changed (so it could be done only prior control reaches LTO frontend). As a result, for this current PR archive support will have to be done outside DTLTO backend.

In a couple of days, we will submit an alternative PR with "no DTLTO backend" implementation. Though from the first sight it seems logical to use a separate DTLTO backend, we thought that "no DTLTO backend" implementation will be simpler, cleaner and more importantly, will leave a solid foundation for future important performance enhancements that are planning to add in the future. These performance enhancements will be impossible to implement within DTLTO backend.

"No DTLTO backend" PR that we will present in a few days from now will have support for archives from the very beginning. The initial design will be the following: ONLY members of the archives that are part of the imports will be saved on disk. All other members will NOT be saved/unpacked. Note: this is a "poor man" solution to keep the size of the initial PR smaller. The obvious downside of this approach is the following: every link invocation will save necessary archive members on the disk, so they will be saved every time again and again after each new invocation and will not be re-used. Saved archived members will not be shared between different DTLTO processes that use the same archives.

Our downstream DTLTO implementation in Sony is different: we convert regular archives into thin archives and save them in %TEMP% directory. If the thin archive members are determined to be "not stale" they will be re-used by different DTLTO processes that use the same archives.

We will re-discuss with upstream in the future what would be the best way to handle archives. But our current thinking that it will be hybrid of these two solutions: (a) save on the disk only the members that you need (b) share unpacked archive members between different DTLTO invocation, assuming that they are not "stale". We already have a patch for it.

@romanova-ekaterina
Copy link
Contributor

Thanks for the heads up, so I should not do a detailed code review for PR127749? Is there more info on what you mean by a "no-backend DTLTO"?

Yes, I think for now it will be better to understand the design of the current code review that Ben submitted but not to do a detailed code review just yet. In a few days ago we will propose a new PR for a different implementation for integrated DTLTO, but without the use of "DTLTO backend". That's why we call it internally "no-backend DTLTO".

After the design ideas for both PRs will be well understood, we could discuss and decide upstream which solution is better.

I know it looks a bit strange that the same company submits two different solutions upstream. But we couldn't come to the conclusion internally, so it was decided to discuss this issue externally (upstream).

Though it might require additional efforts to understand both designs, discussing two different solutions upstream with experts (notably you) will lead to a choosing of the best one (or eventually lead us to a hybrid of both solutions).

sys::path::append(ObjFilePath, sys::path::stem(ModulePath) + "." +
itostr(Task) + "." + UID + ".native.o");

Job &J = Jobs[Task - 1]; /*Task 0 is reserved*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC - 1 => - RegularLTO.ParallelCodeGenParallelismLevel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I broke this for more than one LTO partition whilst hastily simplifying things. Thanks! Fixed now.


// Common command line template.
JOS.attributeArray("args", [&]() {
JOS.value(RemoteOptTool);
Copy link
Member

@MaskRay MaskRay Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by the name--thinlto-remote-opt-tool=. This is actually clang, not opt. Perhaps thinlto-remote-compiler or thinlto-remote-compile-tool (long) or thinlto-remote-cc (shortest)?

+        Args.MakeArgString("--thinlto-remote-opt-tool=" +
+                           Twine(ToolChain.getDriver().getClangProgramPath())));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you accept thinlto-remote-codegen-tool? This is so that the option name doesn't imply that the invoked tool is a compiler. One of the possibilities that we might want to explore is invoking LLD to do the backend compilations on the remote machines. Please see: #126654 (comment).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion on the linked comment thread I have adopted your suggestion of --thinlto-remote-compiler.

Twine(ToolChain.getDriver().getClangProgramPath())));

for (auto A : Args.getAllArgValues(options::OPT_Xthinlto_distributor_EQ))
CmdArgs.push_back(Args.MakeArgString("-mllvm=-thinlto-distributor-arg=" + A));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why we don't use a regular linker option. In lld, you could use getStrings to read a list option, e.g.

ctx.arg.passPlugins = args::getStrings(args, OPT_load_pass_plugins)

However, introducing a new linker option requires changes to all lld ports and llvm-lto. Therefore, perhaps make --thinlto-remote-opt-tool a cl::opt tool as well?

Copy link
Collaborator Author

@bd1976bris bd1976bris Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you already worked out I used cl::opt options to minimize the changes to LLD ports and llvm-lto. The syntax is verbose, but LLD is usually invoked via the compiler driver so the verbose syntax is not exposed. I'm happy to use a cl::opt for this. However, I would like to retain a retain the LLD option for the COFF port where LLD is often invoked directly. Does that sound ok?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having looked at this again I think your suggestion of making --thinlto-remote-opt-tool a cl::opt tool makes sense. I have updated the code to match. Please take a look.


// Compute a thin archive member full file path.
static std::string
computeThinArchiveMemberFullPath(const StringRef modulePath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two new functions can be combined. Perhaps name the new one dtltoGetMemberPathIfThinArchive. What if it a non-thin archive? Shall we create a temporary file?

if (!ctx.arg.dtltoDistributor.empty() && !archiveName.empty())
  path = dtltoGetMemberPathIfThinArchive(...)

Copy link
Collaborator Author

@bd1976bris bd1976bris Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two new functions can be combined. Perhaps name the new one dtltoGetMemberPathIfThinArchive.

if (!ctx.arg.dtltoDistributor.empty() && !archiveName.empty())
  path = dtltoGetMemberPathIfThinArchive(...)

I believe the simplification suggestions are viable. I also think that I can remove the this->mb = mbref; line (and its accompanying extensive comment), since that was only necessary for emitting empty summary index files - which we don't need for DTLTO.

I'm also not comfortable with isThinArchive relying on checking the magic number at the start of the archive file to determine if it is a thin archive. I'm considering recording the archive type somewhere (perhaps in InputFile) so that this check becomes unnecessary. I'll test these improvements and put them up tomorrow.

What if it a non-thin archive? Shall we create a temporary file?

Non-thin archives with bitcode members are important for PlayStation. A reasonable implementation would be to emit a temporary file into a cache (since libraries are often external to a project and rarely modified) and use that temporary file for DTLTO. Currently, the buffer identifier assigned here is used eventually as the ModulePath stored in the summary index. However, at this point we don't know which lazy bitcode files will actually be used by the link. Therefore, we would need to either:

Write out temporary files for all bitcode members of non-thin archives now, using the temporary file path as the buffer identifier, or write out temporary files later when a lazy bitcode file is used and added to the link. If we write out later, we would need to swap the identifier for LTO or add some remapping information to the LTO link so that the temporary file path is used for loading the bitcode file instead of the identifier assigned here.

At the LLVM conference there were also suggestions that we could teach Clang and the distribution systems to handle archives natively, perhaps even understanding identifiers such as foo.a(bar.o at 40).

It would be beneficial to implement a mechanism that could also be used by the existing Bazel-style distribution in case, in the future, there is a need to support bitcode members in archives for that.

For the initial commit, however, I would prefer to support thin archives only. non-thin archive support can be added in a later commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have simplified the code as we discussed. I also record which archives are thin when files are added to the link in LinkerDriver::addFile and then pass this information into the BitcodeFile constructor. This removes the hack where I was reopening the archive files and checking the magic bytes to determine if they were thin. The implementation is much improved now. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there are quite a few changes to Driver and InputFiles, which actually made me more nervous. Non-dtlto use cases now incur the overhead (not that they can't, but it's not necessary to introduce these changes for the linker option feature).

These changes might turn out to be unneeded when non-thin archives are supported.

Would be nice to restore the previous version but just improve the local code and check the thin magic only when --thinlto-distributor= is specified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Specifies the file to execute as a distributor process.
If specified, ThinLTO backend compilations will be distributed.

- ``--thinlto-remote-opt-tool=<path>``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps --thinlto-remote-compiler or --thinlto-remote-compile-tool. It's unlikely we will use the internal tool named opt

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to use thinlto-remote-compiler.

@teresajohnson
Copy link
Contributor

Thanks for the heads up, so I should not do a detailed code review for PR127749? Is there more info on what you mean by a "no-backend DTLTO"?

Yes, I think for now it will be better to understand the design of the current code review that Ben submitted but not to do a detailed code review just yet. In a few days ago we will propose a new PR for a different implementation for integrated DTLTO, but without the use of "DTLTO backend". That's why we call it internally "no-backend DTLTO".

By "DTLTO backend" do you mean the new CGThinBackend class? Since that's a big part of the LLVM piece, I've kind of held off on reviewing for now since I'd like to compare the 2 approaches first and I'm not sure how much is changing.

- Remove linker version.

- Make AddBuffer a default parameter for LTO::run() to minimize changes
  at call sites.

- Format code with clang-format, even if it deviates slightly from the
  surrounding style. I decided that CI passing for the PRs was more
  important than matching the existing style.

- Order distributor command-line options correctly.

- Remove dead code in llvm-lto2.cpp.

- Add docstrings to Python distributors.

- Update the Python mock.py distributor to validate JSON.

- Organize LLVM DTLTO tests by moving them into a dedicated "dtlto"
  directory, tidying existing tests and adding new ones.

- Implement fix for more than one LTO partition (spotted by MaskRay).
- Combine new functions as suggested by MaskRay in review.

- Remove code that worked around a crash that can no longer occur now
  that the implementation is structured as a new ThinLTO backend.

- Record whether an archive was thin or not and pass this information
  into the BitcodeFile constructor. This replaces code that reopened the
  archive file and checked the archive magic.
@romanova-ekaterina
Copy link
Contributor

romanova-ekaterina commented Feb 26, 2025

Thanks for the heads up, so I should not do a detailed code review for PR127749? Is there more info on what you mean by a "no-backend DTLTO"?

Actually, please review whatever you would like to at this point, Theresa. I don't want to get in the way of hearing what you think - we're keen to your input. I just wanted to point out that since another branch is coming, you may wish to wait until it arrives if you think a side-by-side comparison would be a good way of doing things. To clarify: that other branch won't be put up as a pull request, but we can decide how to proceed here if the general design shown in that other branch is preferred. I also mentioned that it will appear in a few days, but that's really dependent on the results of some more internal review. We're working hard on it!

@romanova-ekaterina
Copy link
Contributor

In a couple of days, we will submit an alternative PR with "no DTLTO backend" implementation. Though from the first sight it seems logical to use a separate DTLTO backend, we thought that "no DTLTO backend" implementation will be simpler, cleaner and more importantly, will leave a solid foundation for future important performance enhancements that are planning to add in the future. These performance enhancements will be impossible to implement within DTLTO backend.

Actually, we don't have any data to determine whether or not these ideas for performance enhancements translate into real world gains. So, let's perhaps revisit these in subsequent PRs, assuming we get something landed for starters, here.

@bd1976bris
Copy link
Collaborator Author

Hi Reviewers! Thanks for the feedback here. I wanted to draw attention to something that I mentioned briefly in the description - the possibility of a plugin interface as opposed to invoking an external process that consumes JSON.

There are some theoretical advantages with plugins. For example, if distribution systems exist/arise that allow data to be passed back and forth between LLVM and a distribution system using memory buffers instead of files, a plugin could perhaps do that more efficiently. But we haven't done anything yet to quantify how much better this would be vs implicitly leaning on e.g. memory mapped files and the OS file cache. The distribution systems we're motivated to support from customer demand don't have such capabilities at this time.

Does anyone have any opinions on this?

@bd1976bris
Copy link
Collaborator Author

Reviewers have suggested that smaller PRs be created to facilitate detailed reviews while keeping this one around to provide an overall picture and facilitate discussion of high-level design (see this ongoing discussion). I have created PR #127749 to collect detailed review comments for the parts related to LLVM and have added those who commented here as reviewers.
I will do my best to keep this PR up to date with suggestions from "child" PRs such as PR #127749.

Note that I have changed the name of this PR to indicate that it provides an overview of DTLTO to avoid confusion.

Thanks again for all your input!

@bd1976bris bd1976bris changed the title Integrated Distributed ThinLTO (DTLTO): Initial support Integrated Distributed ThinLTO (DTLTO): Design Overview Mar 5, 2025
@romanova-ekaterina
Copy link
Contributor

romanova-ekaterina commented Mar 16, 2025

Hi Reviewers! Thanks for the feedback here. I wanted to draw attention to something that I mentioned briefly in the description - the possibility of a plugin interface as opposed to invoking an external process that consumes JSON.

There are some theoretical advantages with plugins. For example, if distribution systems exist/arise that allow data to be passed back and forth between LLVM and a distribution system using memory buffers instead of files, a plugin could perhaps do that more efficiently. But we haven't done anything yet to quantify how much better this would be vs implicitly leaning on e.g. memory mapped files and the OS file cache. The distribution systems we're motivated to support from customer demand don't have such capabilities at this time.

Does anyone have any opinions on this?

The “No backend DTLTO” branch is ready. Please have a look and let us know what you think. https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

This comment #127749 (comment) has more details about the differences between “Out of process (DTLTO) backend” branch and “No backend” DTLTO branch.

@romanova-ekaterina
Copy link
Contributor

romanova-ekaterina commented Mar 16, 2025

Thanks for the heads up, so I should not do a detailed code review for PR127749? Is there more info on what you mean by a "no-backend DTLTO"?

Actually, please review whatever you would like to at this point, Theresa. I don't want to get in the way of hearing what you think - we're keen to your input. I just wanted to point out that since another branch is coming, you may wish to wait until it arrives if you think a side-by-side comparison would be a good way of doing things. To clarify: that other branch won't be put up as a pull request, but we can decide how to proceed here if the general design shown in that other branch is preferred. I also mentioned that it will appear in a few days, but that's really dependent on the results of some more internal review. We're working hard on it!

The “No backend DTLTO” branch is ready. Please have a look and let us know what you think. https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

This comment #127749 (comment) has more details about the differences between “Out of process (DTLTO) backend” branch and “No backend” DTLTO branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category lld:COFF lld:ELF lld:MachO lld:wasm lld llvm:support llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO) platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.