Skip to content

[Windows] Add support for emitting PGO/LTO magic strings in the Windows PE debug directory #114260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

mikolaj-pirog
Copy link
Contributor

This PR adds support for putting magic strings indicating PGO/LTO in the debug directory of Windows PE files. This is to make clang, lld behave as MSVC compiler in this regard.

This needs a little background: MSVC compiler puts magic strings ("PGI" for instrumentation, "PGU" for binary built using instrumented data, "LTCG" for LTO builds) in the debug directory of Windows PE files. You can see these strings by using dumpbin utility (dumpbin /HEADERS a.exe) on the files built with MSVC and PGO/LTO on Windows, see the screenshots for dumpbin showing debug dir of a binary file, built with PGO/LTO.
1
2
3-1

This PR "ports" this behavior to clang and lld, so compiling and linking using lld with -fprofile-generate will result in Windows PE file with "PGI" string in debug dir; for -fprofile-use, -fprofile-sample-use this will result in "PGU" string; for -flto this will result in "LTCG" string in the debug dir.

The implementation of that is split between lld linker and emission of COFF object files. The linker puts the magic strings in the debug directory; the problem is it needs to know when to do that -- lld isn't aware of instrumentation, or building based on profiling info. It knows only about linking of LTOed binary. Naturally, lld has to somehow know that it needs to put "PGI"/ "PGU" string -- I have solved this by putting a special sections (".pgi", ".pgu") in the COFF object files, when they are compiled with instrumentation, or with data from instrumentation. lld checks for these sections, and based on their presence, puts the magic string in the debug directory of Windows PE file.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category lld clang:codegen IR generation bugs: mangling, exceptions, etc. mc Machine (object) code lld:COFF platform:windows labels Oct 30, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-mc

Author: Mikołaj Piróg (mikolaj-pirog)

Changes

This PR adds support for putting magic strings indicating PGO/LTO in the debug directory of Windows PE files. This is to make clang, lld behave as MSVC compiler in this regard.

This needs a little background: MSVC compiler puts magic strings ("PGI" for instrumentation, "PGU" for binary built using instrumented data, "LTCG" for LTO builds) in the debug directory of Windows PE files. You can see these strings by using dumpbin utility (dumpbin /HEADERS a.exe) on the files built with MSVC and PGO/LTO on Windows, see the screenshots for dumpbin showing debug dir of a binary file, built with PGO/LTO.
<img width="263" alt="1" src="https://github.com/user-attachments/assets/66a53564-72ae-4ccf-9c11-e4dd2c283ada">
<img width="258" alt="2" src="https://github.com/user-attachments/assets/1f5983fe-9c04-459e-8508-33a515b83455">
3-1

This PR "ports" this behavior to clang and lld, so compiling and linking using lld with -fprofile-generate will result in Windows PE file with "PGI" string in debug dir; for -fprofile-use, -fprofile-sample-use this will result in "PGU" string; for -flto this will result in "LTCG" string in the debug dir.

The implementation of that is split between lld linker and emission of COFF object files. The linker puts the magic strings in the debug directory; the problem is it needs to know when to do that -- lld isn't aware of instrumentation, or building based on profiling info. It knows only about linking of LTOed binary. Naturally, lld has to somehow know that it needs to put "PGI"/ "PGU" string -- I have solved this by putting a special sections (".pgi", ".pgu") in the COFF object files, when they are compiled with instrumentation, or with data from instrumentation. lld checks for these sections, and based on their presence, puts the magic string in the debug directory of Windows PE file.


Full diff: https://github.com/llvm/llvm-project/pull/114260.diff

9 Files Affected:

  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+3)
  • (added) clang/test/CodeGen/debug-dir-win-pe-ltcg-string.c (+13)
  • (added) clang/test/CodeGen/debug-dir-win-pe-pgi-string.c (+14)
  • (added) clang/test/CodeGen/debug-dir-win-pe-pgu-string.c (+18)
  • (modified) lld/COFF/Writer.cpp (+55-1)
  • (added) lld/test/COFF/debug_dir_magic_strings_from_section_pgi.s (+17)
  • (added) lld/test/COFF/debug_dir_magic_strings_from_section_pgu.s (+17)
  • (modified) llvm/include/llvm/MC/MCTargetOptions.h (+2-1)
  • (modified) llvm/lib/MC/WinCOFFObjectWriter.cpp (+13)
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index f018130807519d..fcf3dc25d95fc0 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -525,6 +525,9 @@ static bool initTargetOptions(DiagnosticsEngine &Diags,
   Options.MCOptions.PPCUseFullRegisterNames =
       CodeGenOpts.PPCUseFullRegisterNames;
   Options.MisExpect = CodeGenOpts.MisExpect;
+  Options.MCOptions.PgoInstrumentation = CodeGenOpts.getProfileInstr() > 0;
+  Options.MCOptions.PgoUse =
+      CodeGenOpts.getProfileUse() > 0 || !CodeGenOpts.SampleProfileFile.empty();
 
   return true;
 }
diff --git a/clang/test/CodeGen/debug-dir-win-pe-ltcg-string.c b/clang/test/CodeGen/debug-dir-win-pe-ltcg-string.c
new file mode 100644
index 00000000000000..a121ab8c9acc45
--- /dev/null
+++ b/clang/test/CodeGen/debug-dir-win-pe-ltcg-string.c
@@ -0,0 +1,13 @@
+// This test checks if Window PE file compiled with -flto option contains a magic 
+// string "LTCG" to indicate LTO compilation.
+
+// REQUIRES: system-windows
+
+// RUN: %clang --target=x86_64-pc-windows-msvc -flto -fuse-ld=lld %s -o %t.exe
+// RUN: dumpbin /HEADERS %t.exe | FileCheck %s
+// CHECK: {{.*}}LTCG{{.*}}
+
+int main(void) {
+
+	return 0;
+}
diff --git a/clang/test/CodeGen/debug-dir-win-pe-pgi-string.c b/clang/test/CodeGen/debug-dir-win-pe-pgi-string.c
new file mode 100644
index 00000000000000..7f1e9e35aaf120
--- /dev/null
+++ b/clang/test/CodeGen/debug-dir-win-pe-pgi-string.c
@@ -0,0 +1,14 @@
+// This test checks if Windows PE file compiled with
+// -fprofile-generate has magic string "PGI" to indicate so.
+
+
+// REQUIRES: system-windows
+
+// RUN: %clang --target=x86_64-pc-windows-msvc -fprofile-generate -fuse-ld=lld %s -o %t.exe
+// RUN: dumpbin /HEADERS %t.exe | FileCheck --check-prefix=CHECK2 %s
+// CHECK2: {{.*}}PGI{{.*}}
+
+int main(void) {
+
+	return 0;
+}
diff --git a/clang/test/CodeGen/debug-dir-win-pe-pgu-string.c b/clang/test/CodeGen/debug-dir-win-pe-pgu-string.c
new file mode 100644
index 00000000000000..12c63425aee0f5
--- /dev/null
+++ b/clang/test/CodeGen/debug-dir-win-pe-pgu-string.c
@@ -0,0 +1,18 @@
+// This test checks if Windows PE file contains a "PGU" string to indicate that
+// it was compiled using profiling data.
+
+// REQUIRES: system-windows
+
+// RUN: %clang --target=x86_64-pc-windows-msvc -fprofile-instr-generate="%profdata" -fuse-ld=lld %s -o %t.exe
+// RUN: %t.exe
+// RUN: llvm-profdata merge -output=%code.profdata %profdata
+// RUN: %clang --target=x86_64-pc-windows-msvc -fprofile-use=%code.profdata -fuse-ld=lld %s -o %t.exe
+// RUN: dumpbin /HEADERS %t.exe | FileCheck %s
+
+// CHECK: {{.*}}PGU{{.*}}
+
+int main(void) {
+
+	return 0;
+}
+	
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 71ee5ce4685553..0ce62ad21c4634 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -77,6 +77,12 @@ static unsigned char dosProgram[] = {
 static_assert(sizeof(dosProgram) % 8 == 0,
               "DOSProgram size must be multiple of 8");
 
+static char ltcg[] = "LTCG";
+static char pgi[] = "PGI";
+static char pgu[] = "PGU";
+static char pgiSectionName[] = ".pgi";
+static char pguSectionName[] = ".pgu";
+
 static const int dosStubSize = sizeof(dos_header) + sizeof(dosProgram);
 static_assert(dosStubSize % 8 == 0, "DOSStub size must be multiple of 8");
 
@@ -179,6 +185,23 @@ class ExtendedDllCharacteristicsChunk : public NonSectionChunk {
   uint32_t characteristics = 0;
 };
 
+class DebugDirStringChunk : public NonSectionChunk {
+public:
+  DebugDirStringChunk(std::string str) : str(str.begin(), str.end()) {
+    while (this->str.size() % 4 != 0)
+      this->str.push_back(0);
+  }
+  size_t getSize() const override { return str.size(); }
+
+  void writeTo(uint8_t *b) const override {
+    char *p = reinterpret_cast<char *>(b);
+    auto strReverse = str;
+    std::reverse(strReverse.begin(), strReverse.end());
+    memcpy(p, strReverse.data(), strReverse.size());
+  }
+  std::vector<char> str;
+};
+
 // PartialSection represents a group of chunks that contribute to an
 // OutputSection. Collating a collection of PartialSections of same name and
 // characteristics constitutes the OutputSection.
@@ -1165,6 +1188,23 @@ void Writer::createMiscChunks() {
   llvm::TimeTraceScope timeScope("Misc chunks");
   Configuration *config = &ctx.config;
 
+  auto searchForPgoMagicSection = [this](char sectionName[]) {
+    for (auto *obj : ctx.objFileInstances) {
+      for (auto &chunk : obj->getChunks()) {
+        if (chunk->kind() == Chunk::SectionKind &&
+            chunk->getSectionName() == sectionName) {
+          return true;
+        }
+      }
+    }
+    return false;
+  };
+
+  bool writePgi = searchForPgoMagicSection(pgiSectionName);
+  bool writePgu = !writePgi && searchForPgoMagicSection(pguSectionName);
+  bool writeLTO = ctx.bitcodeFileInstances.size();
+
+
   for (MergeChunk *p : ctx.mergeChunkInstances) {
     if (p) {
       p->finalizeContents();
@@ -1181,7 +1221,7 @@ void Writer::createMiscChunks() {
   // Create Debug Information Chunks
   debugInfoSec = config->mingw ? buildidSec : rdataSec;
   if (config->buildIDHash != BuildIDHash::None || config->debug ||
-      config->repro || config->cetCompat) {
+      config->repro || config->cetCompat || writePgi || writePgu || writeLTO) {
     debugDirectory =
         make<DebugDirectoryChunk>(ctx, debugRecords, config->repro);
     debugDirectory->setAlignment(4);
@@ -1206,6 +1246,20 @@ void Writer::createMiscChunks() {
                                   IMAGE_DLL_CHARACTERISTICS_EX_CET_COMPAT));
   }
 
+
+  if (writeLTO) {
+    debugRecords.emplace_back(COFF::IMAGE_DEBUG_TYPE_POGO,
+                              make<DebugDirStringChunk>(ltcg));
+  }
+
+  if (writePgi) {
+    debugRecords.emplace_back(COFF::IMAGE_DEBUG_TYPE_POGO,
+                              make<DebugDirStringChunk>(pgi));
+  } else if (writePgu) {
+    debugRecords.emplace_back(COFF::IMAGE_DEBUG_TYPE_POGO,
+                              make<DebugDirStringChunk>(pgu));
+  }
+
   // Align and add each chunk referenced by the debug data directory.
   for (std::pair<COFF::DebugType, Chunk *> r : debugRecords) {
     r.second->setAlignment(4);
diff --git a/lld/test/COFF/debug_dir_magic_strings_from_section_pgi.s b/lld/test/COFF/debug_dir_magic_strings_from_section_pgi.s
new file mode 100644
index 00000000000000..b1782dade39042
--- /dev/null
+++ b/lld/test/COFF/debug_dir_magic_strings_from_section_pgi.s
@@ -0,0 +1,17 @@
+// This test checks if lld puts magic string "PGI" when an object files contains
+// .pgi section.
+
+// REQUIRES: system-windows
+
+// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-windows-msvc %s -o %t.main.obj
+
+// RUN: lld-link -out:%t.exe %t.main.obj  -entry:entry -subsystem:console -debug:symtab
+// RUN: dumpbin /HEADERS %t.exe
+// CHECK: PGI
+
+#--- main.s
+.section .pgi
+.global entry
+entry:
+  movl %edx, %edx
+
diff --git a/lld/test/COFF/debug_dir_magic_strings_from_section_pgu.s b/lld/test/COFF/debug_dir_magic_strings_from_section_pgu.s
new file mode 100644
index 00000000000000..341f88d25bbaad
--- /dev/null
+++ b/lld/test/COFF/debug_dir_magic_strings_from_section_pgu.s
@@ -0,0 +1,17 @@
+// This test checks if lld puts magic string "PGU" when an object files contains
+// .pgu section.
+
+// REQUIRES: system-windows
+
+// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-windows-msvc %s -o %t.main.obj
+
+// RUN: lld-link -out:%t.exe %t.main.obj  -entry:entry -subsystem:console -debug:symtab
+// RUN: dumpbin /HEADERS %t.exe
+// CHECK: PGU
+
+#--- main.s
+.section .pgu
+.global entry
+entry:
+  movl %edx, %edx
+
diff --git a/llvm/include/llvm/MC/MCTargetOptions.h b/llvm/include/llvm/MC/MCTargetOptions.h
index 7b0d81faf73d2d..5e6a58a36615b1 100644
--- a/llvm/include/llvm/MC/MCTargetOptions.h
+++ b/llvm/include/llvm/MC/MCTargetOptions.h
@@ -1,5 +1,4 @@
 //===- MCTargetOptions.h - MC Target Options --------------------*- C++ -*-===//
-//
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
@@ -112,6 +111,8 @@ class MCTargetOptions {
   // Whether or not to use full register names on PowerPC.
   bool PPCUseFullRegisterNames : 1;
 
+  bool PgoInstrumentation = false;
+  bool PgoUse = false;
   MCTargetOptions();
 
   /// getABIName - If this returns a non-empty string this represents the
diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp
index 62f53423126ea9..e413a2d3e48b9e 100644
--- a/llvm/lib/MC/WinCOFFObjectWriter.cpp
+++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp
@@ -28,6 +28,7 @@
 #include "llvm/MC/MCSectionCOFF.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/MC/MCSymbolCOFF.h"
+#include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/MCValue.h"
 #include "llvm/MC/MCWinCOFFObjectWriter.h"
 #include "llvm/MC/StringTableBuilder.h"
@@ -981,6 +982,18 @@ static std::time_t getTime() {
 uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
   uint64_t StartOffset = W.OS.tell();
 
+  const auto *Options = Asm.getContext().getTargetOptions();
+
+  if (Options && Options->PgoInstrumentation) {
+    auto *Section = Asm.getContext().getCOFFSection(".pgi", 0);
+    defineSection(Asm, *Section);
+  }
+
+  if (Options && Options->PgoUse) {
+    auto *Section = Asm.getContext().getCOFFSection(".pgu", 0);
+    defineSection(Asm, *Section);
+  }
+
   if (Sections.size() > INT32_MAX)
     report_fatal_error(
         "PE COFF object files can't have more than 2147483647 sections");

Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. It's always good to be compatible with MSVC as much as possible. Some comments and questions. We also need some more people that know more COFF to weigh in.

@tru tru requested review from aganea, mstorsjo and compnerd October 30, 2024 19:23
@efriedma-quic
Copy link
Collaborator

What mechanism does MSVC compiler use to indicate PGO to the linker for non-LTO builds? We probably want to do the same thing if possible, since people can mix compilers/linkers.

Copy link

github-actions bot commented Oct 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

}
}
return false;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? Why can we not just emit the magic content with COMDAT and let /debug handle the preservation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, you suggest creating a section holding the debug dir during COFF file emission, with expectation that the linker will preserve it, right? But then we would have to update the "Debug" field of the of the "Optional Header Data Directories" of Windows PE file to point to the debug directory, which I believe would more or less be the same to the current solution, we would have to iterate over all section of all object files to update the "Debug" entry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my thinking was that if we have content that is guaranteed to be folded into the debug data directory, the directory will be emitted. As such, the linker will link the directory in the header and emit that. This would avoid the need to iterate all the sections, it would simply force the emission of the debug directory without /debug being passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this content is guaranteed to be folded into the debug data directory. I can create a debug dir entry in the COFF with COMDAT, and this will be folded, but then I still have to adjust pointers in the optional header Debug field to point to debug dir, and to specific entries in the debug dir AddressOfRawData field. This has to be done during the linking phase, because this structures don't exist in the COFF files

@aganea
Copy link
Member

aganea commented Oct 31, 2024

I am just curious what is the purpose of this? Is it just for feature parity? Can you also investigate if MSVC emits other special things/sections in the binary in LTCG/PGO builds?

@mikolaj-pirog
Copy link
Contributor Author

What mechanism does MSVC compiler use to indicate PGO to the linker for non-LTO builds? We probably want to do the same thing if possible, since people can mix compilers/linkers.

I believe they simply pass every PGO/LTO flag to the linker. This is not the case of clang -- the linker (be that ld, lld or anything other) doesn't know it's linking PGOed files. Regarding mix/matching the compiler/linkers -- I just tested that's it sort of possible. This patch obviously works only when using lld linker. It's possible to link clang object files (with PGO) with MSVC linker. I tried simply passing a flag indicating LTO to MSVC linker (while linking a clang object file without LTO) -- it complained that /LTCG is unnecessary, but put the magic string regardless. I will investigate further, if this trick also works with PGO and I will try to incorporate it to this patch (simply pass some options in the driver when using MSVC linker). This also suggests that MSVC linker simply checks for CLI options being present to put a magic string.

@mikolaj-pirog
Copy link
Contributor Author

I am just curious what is the purpose of this? Is it just for feature parity? Can you also investigate if MSVC emits other special things/sections in the binary in LTCG/PGO builds?

Yes, feature parity is the reason this patch exists. I haven't noticed any other special things MSVC does when doing PGO/LTO; then again, I wasn't looking for them. I can dig a little more to see if they are any, but I would like to do it after this patch is finished

@mikolaj-pirog
Copy link
Contributor Author

I would like to benchmark lld after this change, since I have added a loop that goes through every section of every object file. Could someone point in the direction of a good benchmark for that? I was thinking I can benchmark on the linking of clang or any other big project as a reference

@aganea
Copy link
Member

aganea commented Nov 5, 2024

I would like to benchmark lld after this change, since I have added a loop that goes through every section of every object file. Could someone point in the direction of a good benchmark for that? I was thinking I can benchmark on the linking of clang or any other big project as a reference

I think benchmarking clang.exe itself would be a good testbed. Build the toolchain once in Release (first stage) then in another build folder, build it a second time (second stage), but using clang-cl.exe and lld-link.exe from the first build folder. Use ninja not MSBuild. Once the second stage has completed, delete clang.exe from output folder and pass ninja clang -v -d keeprsp on the command-line. That will show the LLD command line which you can re-run and profile. You can also use lld-link ... --time-trace and add a more specific llvm::TimeTraceScope to enclose the code that parses all the sections.

If you have trouble building all this I can provide more detailed instructions, please let me know.

@tru
Copy link
Collaborator

tru commented Nov 5, 2024

I can recommend the hyperfine tool when benchmarking. It runs the same command multiple times and does all the fancy things to make sure you are measuring correctly.

@mikolaj-pirog
Copy link
Contributor Author

I would like to benchmark lld after this change, since I have added a loop that goes through every section of every object file. Could someone point in the direction of a good benchmark for that? I was thinking I can benchmark on the linking of clang or any other big project as a reference

I think benchmarking clang.exe itself would be a good testbed. Build the toolchain once in Release (first stage) then in another build folder, build it a second time (second stage), but using clang-cl.exe and lld-link.exe from the first build folder. Use ninja not MSBuild. Once the second stage has completed, delete clang.exe from output folder and pass ninja clang -v -d keeprsp on the command-line. That will show the LLD command line which you can re-run and profile. You can also use lld-link ... --time-trace and add a more specific llvm::TimeTraceScope to enclose the code that parses all the sections.

If you have trouble building all this I can provide more detailed instructions, please let me know.

Thanks for the suggestions. I have roughly benchmarked this and this change basically doesn't have an impact on lld performance. The overall runtime is essentially the same (around ~2s for RelWithDebInfo build). I have benchmarked under VTune and the function in question take too little time for VTune to record any valuable data (reporting they take 0.0s in both cases). The function that calls createMiscChunks (where my change resides), Writer::run(), doesn't appear in the ~70 most expensive function, and it also calls a bunch of other stuff on top of createMiscChunks. I have used the --time-trace functionality to measure more accurately: a script runs the linking x times, saving the trace file and greps the timer for the whole Writer::run() function. Here are the results (main first, my change second):
image
Screenshot 2024-11-06 154022

So, the whole function Write::run() function is slower by 10ms (comparing best vs best, worst vs worst). I think this change introduces miniscule slowdown when compared to the whole linker machinery. I don't think it's necessary to measure under hyperfine or any other benchmarking tool, given the results presented (even if my results are off by 4x, they are still miniscule).

@mstorsjo
Copy link
Member

mstorsjo commented Nov 7, 2024

FWIW, no further objections from me on this one, but others may want to have another look (potentially also CC @MaskRay for the clang tests).

@mikolaj-pirog
Copy link
Contributor Author

Gently pinging @aganea @MaskRay

@mikolaj-pirog
Copy link
Contributor Author

Gently pinging @aganea @MaskRay @HaohaiWen. If there are no objections to this patch, I would like to merge it

@@ -112,6 +112,8 @@ class MCTargetOptions {
// Whether or not to use full register names on PowerPC.
bool PPCUseFullRegisterNames : 1;

bool PgoInstrumentation = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target options like this don't play well with (thin)LTO , because they don't carry over naturally from the frontend compilation step to the backend compilation step, which LTO separates. Is there an existing global named metadata flag you can look for instead to control this debug info setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I am aware, there isn't any global metadata flag I could fetch from within MC. Could you elaborate a bit more when the current solution would cause problems? I am not that familiar with LTO inner workings

@mikolaj-pirog
Copy link
Contributor Author

Thanks to the review above mentioning LTO, I realized that this solution has a problem with LTO, namely using LTO + PGO/PGU, will only emit the LTCG string in the binary. This is because right now the magic sections are only written in the COFF object file emission. I will also add the emission of this sections in the AsmPrinter, so they should appear in the bitcode files, and using LTO + PGO/PGU will emit both the "LTCG" and "PGO/PGU" strings

@@ -77,6 +77,12 @@ static unsigned char dosProgram[] = {
static_assert(sizeof(dosProgram) % 8 == 0,
"DOSProgram size must be multiple of 8");

static char ltcg[] = "LTCG";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr char ...[]

constexpr/const make this internal linkage, making static unneeded

@MaskRay
Copy link
Member

MaskRay commented Nov 30, 2024

As I'm not a Windows developer, I defer to other reviewers' expertise on MSVC's PGO/LTO feature.

However, to be honest, I'm unsure about the value of porting the strings given the large feature differences between Clang and MSVC on PGO and LTO.
clang -flto -c output is LLVM bitcode files, which are distinguishable on their own.
With LLVM LTO, you can have 1 bitcode file and 99 object files, or 99 bitcode files and 1 object file.
I am not sure identifying that LTO happens is very useful.

There are many PGO flavors. If we add this, there could be some inconsistency everytime someone adds a new flavor of PGO and does not port this piece of code.

@aganea
Copy link
Member

aganea commented Nov 30, 2024

I had a more in-depth look at this. Overall I don't agree with the whole direction of this patch. I don't think it's wise for LLD to emit debug records/coffgrp that do not match what MSVC is generating. We can't just blindly emit a record with the string of the debug type (ie. PGI/PGU/LTCG). These strings are headers for opaque/undocumented structures that Microsoft is generating for its tooling (debugger). Unless we're willing to understand and document those structures, so that LLD can emit them proper, I wouldn't emit these coffgrp at all. Also as a reference: https://www.hexacorn.com/blog/2015/07/30/gctl-debug-section-in-windows-10-binaries/

@@ -1206,6 +1245,19 @@ void Writer::createMiscChunks() {
IMAGE_DLL_CHARACTERISTICS_EX_CET_COMPAT));
}

if (writeLTO) {
debugRecords.emplace_back(COFF::IMAGE_DEBUG_TYPE_POGO,
make<DebugDirStringChunk>(ltcg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t mean to discourage you @mikolaj-pirog. But if you could come up with a proper structure here for IMAGE_DEBUG_TYPE_POGO, I think the PR would be acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t mean to discourage you @mikolaj-pirog. But if you could come up with a proper structure here for IMAGE_DEBUG_TYPE_POGO, I think the PR would be acceptable.

No worry, you didn't discourage me, I appreciate each piece of feedback :) Just to be clear, would this patch be accepted if I manage to make lld emit the appropriate structure (like MSVC does) for the PGO/PGU/LTCG?

@mikolaj-pirog
Copy link
Contributor Author

As I'm not a Windows developer, I defer to other reviewers' expertise on MSVC's PGO/LTO feature.

However, to be honest, I'm unsure about the value of porting the strings given the large feature differences between Clang and MSVC on PGO and LTO. clang -flto -c output is LLVM bitcode files, which are distinguishable on their own. With LLVM LTO, you can have 1 bitcode file and 99 object files, or 99 bitcode files and 1 object file. I am not sure identifying that LTO happens is very useful.

There are many PGO flavors. If we add this, there could be some inconsistency everytime someone adds a new flavor of PGO and does not port this piece of code.

I agree that it's a little awkward to port this behavior to clang, since msvc does pgo/lto differently; the lto is enabled by default; it's impossible to do pgo without lto. The value of this patch, as I have seen when creating it, is the feature parity with msvc, accepting the awkwardness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category lld:COFF lld mc Machine (object) code platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants