Skip to content

[SYCL] Add clang-linker-wrapper changes to call clang-sycl-linker for SYCL offloads #135683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 17, 2025

Conversation

asudarsa
Copy link
Contributor

@asudarsa asudarsa commented Apr 14, 2025

This PR is one of the many PRs in the SYCL upstreaming effort focusing on device code linking during the SYCL offload compilation process. RFC: https://discourse.llvm.org/t/rfc-offloading-design-for-sycl-offload-kind-and-spir-targets/74088

Approved PRs so far:

  1. [Clang][SYCL] Introduce clang-sycl-linker to link SYCL offloading device code (Part 1 of many) - Link
  2. [clang-sycl-linker] Replace llvm-link with API calls - Link
  3. [SYCL][SPIR-V Backend][clang-sycl-linker] Add SPIR-V backend support inside clang-sycl-linker - Link

This PR adds SYCL device code linking support to clang-linker-wrapper.

Summary for this PR

Device code linking happens inside clang-linker-wrapper. In the current implementation, clang-linker-wrapper does the following:

  1. Extracts device code. Input_1, Input_2,.....
  2. Group device code according to target devices Inputs[triple_1] = ....
    Inputs[triple_2] = ....
  3. For each group, i.e. Inputs[triple_i], a. Gather all the offload kinds found inside those inputs in ActiveOffloadKinds b. Link all images inside Inputs[triple_i] by calling clang --target=triple_i .... c. Create a copy of that linked image for each offload kind and add it to Output[Kind] list.

In SYCL compilation flow, there is a deviation in Step 3b. We call device code splitting inside the 'clang --target=triple_i ....' call and the output is now a 'packaged' file containing multiple device images. This deviation requires us to capture the OffloadKind during the linking stage and pass it along to the linking function (clang), so that clang can be called with a unique option '--sycl-link' that will help us to call 'clang-sycl-linker' under the hood (clang-sycl-linker will do SYCL specific linking).

Our current objective is to implement an end-to-end SYCL offloading flow and get it working. We will eventually merge our approach with the community flow.

Thanks

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:binary-utilities labels Apr 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 14, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-clang-driver

Author: Arvind Sudarsanam (asudarsa)

Changes

Device code linking happens inside clang-linker-wrapper. In the current implementation, clang-linker-wrapper does the following:

  1. Extracts device code. Input_1, Input_2,.....
  2. Group device code according to target devices Inputs[triple_1] = ....
    Inputs[triple_2] = ....
  3. For each group, i.e. Inputs[triple_i], a. Gather all the offload kinds found inside those inputs in ActiveOffloadKinds b. Link all images inside Inputs[triple_i] by calling clang --target=triple_i .... c. Create a copy of that linked image for each offload kind and add it to Output[Kind] list.

In SYCL compilation flow, there is a deviation in Step 3b. We call device code splitting inside the 'clang --target=triple_i ....' call and the output is now a 'packaged' file containing multiple device images. This deviation requires us to capture the OffloadKind during the linking stage and pass it along to the linking function (clang), so that clang can be called with a unique option '--sycl-link' that will help us to call 'clang-sycl-linker' under the hood (clang-sycl-linker will do SYCL specific linking).

Our current objective is to implement an end-to-end SYCL offloading flow and get it working. We will eventually merge our approach with the community flow.


Full diff: https://github.com/llvm/llvm-project/pull/135683.diff

6 Files Affected:

  • (modified) clang/docs/ClangOffloadPackager.rst (+2)
  • (modified) clang/test/Driver/linker-wrapper.c (+10)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+67-4)
  • (modified) clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp (+65-19)
  • (modified) llvm/include/llvm/Object/OffloadBinary.h (+1)
  • (modified) llvm/lib/Object/OffloadBinary.cpp (+3)
diff --git a/clang/docs/ClangOffloadPackager.rst b/clang/docs/ClangOffloadPackager.rst
index 2b985e260e302..481069b5e4235 100644
--- a/clang/docs/ClangOffloadPackager.rst
+++ b/clang/docs/ClangOffloadPackager.rst
@@ -112,6 +112,8 @@ the following values for the :ref:`offload kind<table-offload_kind>` and the
     +------------+-------+---------------------------------------+
     | OFK_HIP    | 0x03  | The producer was HIP                  |
     +------------+-------+---------------------------------------+
+    | OFK_SYCL   | 0x04  | The producer was SYCL                 |
+    +------------+-------+---------------------------------------+
 
 The flags are used to signify certain conditions, such as the presence of
 debugging information or whether or not LTO was used. The string entry table is
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 0c77c2b34216a..921c9a11d2d32 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
 
 // An externally visible variable so static libraries extract.
 __attribute__((visibility("protected"), used)) int x;
@@ -9,6 +10,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
+// RUN: %clang -cc1 %s -triple spirv64-unknown-unknown -emit-llvm-bc -o %t.spirv.bc
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
@@ -49,6 +51,14 @@ __attribute__((visibility("protected"), used)) int x;
 
 // AMDGPU-LTO-TEMPS: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -flto {{.*}}-save-temps
 
+// RUN: clang-offload-packager -o %t.out \
+// RUN:   --image=file=%t.spirv.bc,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SPIRV-LINK
+
+// SPIRV-LINK: clang{{.*}} -o {{.*}}.img --target=spirv64-unknown-unknown {{.*}}.o --sycl-link -Xlinker -triple=spirv64-unknown-unknown -Xlinker -arch=
+
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 52d922abbcaec..7c09ce8c9e59d 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -464,7 +464,8 @@ fatbinary(ArrayRef<std::pair<StringRef, StringRef>> InputFiles,
 } // namespace amdgcn
 
 namespace generic {
-Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
+Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args,
+                          bool HasSYCLOffloadKind = false) {
   llvm::TimeTraceScope TimeScope("Clang");
   // Use `clang` to invoke the appropriate device tools.
   Expected<std::string> ClangPath =
@@ -554,6 +555,17 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   if (Args.hasArg(OPT_embed_bitcode))
     CmdArgs.push_back("-Wl,--lto-emit-llvm");
 
+  // For linking device code with the SYCL offload kind, special handling is
+  // required. Passing --sycl-link to clang results in a call to
+  // clang-sycl-linker. Additional linker flags required by clang-sycl-linker
+  // will be communicated via the -Xlinker option.
+  if (HasSYCLOffloadKind) {
+    CmdArgs.push_back("--sycl-link");
+    CmdArgs.append(
+        {"-Xlinker", Args.MakeArgString("-triple=" + Triple.getTriple())});
+    CmdArgs.append({"-Xlinker", Args.MakeArgString("-arch=" + Arch)});
+  }
+
   for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ))
     CmdArgs.append({"-Xlinker", Args.MakeArgString(Arg)});
   for (StringRef Arg : Args.getAllArgValues(OPT_compiler_arg_EQ))
@@ -567,7 +579,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
 } // namespace generic
 
 Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
-                               const ArgList &Args) {
+                               const ArgList &Args,
+                               bool HasSYCLOffloadKind = false) {
   const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
   switch (Triple.getArch()) {
   case Triple::nvptx:
@@ -582,7 +595,7 @@ Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
   case Triple::spirv64:
   case Triple::systemz:
   case Triple::loongarch64:
-    return generic::clang(InputFiles, Args);
+    return generic::clang(InputFiles, Args, HasSYCLOffloadKind);
   default:
     return createStringError(Triple.getArchName() +
                              " linking is not supported");
@@ -924,9 +937,20 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
     auto LinkerArgs = getLinkerArgs(Input, BaseArgs);
 
     DenseSet<OffloadKind> ActiveOffloadKinds;
-    for (const auto &File : Input)
+    // Currently, SYCL device code linking process differs from generic device
+    // code linking.
+    // TODO: Remove check for offload kind, once SYCL device code linking is
+    // aligned with generic linking.
+    bool HasSYCLOffloadKind = false;
+    bool HasNonSYCLOffloadKind = false;
+    for (const auto &File : Input) {
       if (File.getBinary()->getOffloadKind() != OFK_None)
         ActiveOffloadKinds.insert(File.getBinary()->getOffloadKind());
+      if (File.getBinary()->getOffloadKind() == OFK_SYCL)
+        HasSYCLOffloadKind = true;
+      else
+        HasNonSYCLOffloadKind = true;
+    }
 
     // Write any remaining device inputs to an output file.
     SmallVector<StringRef> InputFiles;
@@ -937,6 +961,37 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
       InputFiles.emplace_back(*FileNameOrErr);
     }
 
+    if (HasSYCLOffloadKind) {
+      // Link the remaining device files using the device linker.
+      auto OutputOrErr = linkDevice(InputFiles, LinkerArgs, HasSYCLOffloadKind);
+      if (!OutputOrErr)
+        return OutputOrErr.takeError();
+      // Output is a packaged object of device images. Unpackage the images and
+      // copy them to Images[Kind]
+      ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr =
+          MemoryBuffer::getFileOrSTDIN(*OutputOrErr);
+      if (std::error_code EC = BufferOrErr.getError())
+        return createFileError(*OutputOrErr, EC);
+
+      MemoryBufferRef Buffer = **BufferOrErr;
+      SmallVector<OffloadFile> Binaries;
+      if (Error Err = extractOffloadBinaries(Buffer, Binaries))
+        return std::move(Err);
+      for (auto &OffloadFile : Binaries) {
+        auto TheBinary = OffloadFile.getBinary();
+        OffloadingImage TheImage{};
+        TheImage.TheImageKind = TheBinary->getImageKind();
+        TheImage.TheOffloadKind = TheBinary->getOffloadKind();
+        TheImage.StringData["triple"] = TheBinary->getTriple();
+        TheImage.StringData["arch"] = TheBinary->getArch();
+        TheImage.Image = MemoryBuffer::getMemBufferCopy(TheBinary->getImage());
+        Images[OFK_SYCL].emplace_back(std::move(TheImage));
+      }
+    }
+
+    if (!HasNonSYCLOffloadKind)
+      return Error::success();
+
     // Link the remaining device files using the device linker.
     auto OutputOrErr = linkDevice(InputFiles, LinkerArgs);
     if (!OutputOrErr)
@@ -944,6 +999,9 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
 
     // Store the offloading image for each linked output file.
     for (OffloadKind Kind : ActiveOffloadKinds) {
+      // For SYCL, Offloading images were created inside clang-sycl-linker
+      if (Kind == OFK_SYCL)
+        continue;
       llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> FileOrErr =
           llvm::MemoryBuffer::getFileOrSTDIN(*OutputOrErr);
       if (std::error_code EC = FileOrErr.getError()) {
@@ -986,6 +1044,11 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
              A.StringData["arch"] > B.StringData["arch"] ||
              A.TheOffloadKind < B.TheOffloadKind;
     });
+    if (Kind == OFK_SYCL) {
+      // TODO: Update once SYCL offload wrapping logic is available.
+      reportError(
+          createStringError("SYCL offload wrapping logic is not available"));
+    }
     auto BundledImagesOrErr = bundleLinkedOutput(Input, Args, Kind);
     if (!BundledImagesOrErr)
       return BundledImagesOrErr.takeError();
diff --git a/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp b/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
index c640deddc9e74..7c0b5de6ecb13 100644
--- a/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
+++ b/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
@@ -70,6 +70,8 @@ static StringRef OutputFile;
 /// Directory to dump SPIR-V IR if requested by user.
 static SmallString<128> SPIRVDumpDir;
 
+using OffloadingImage = OffloadBinary::OffloadingImage;
+
 static void printVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("clang-sycl-linker") << '\n';
 }
@@ -168,10 +170,10 @@ Expected<SmallVector<std::string>> getInput(const ArgList &Args) {
 /// are LLVM IR bitcode files.
 // TODO: Support SPIR-V IR files.
 Expected<std::unique_ptr<Module>> getBitcodeModule(StringRef File,
-                                                   LLVMContext &C) {
+                                                   LLVMContext &Ctx) {
   SMDiagnostic Err;
 
-  auto M = getLazyIRFileModule(File, Err, C);
+  auto M = getLazyIRFileModule(File, Err, Ctx);
   if (M)
     return std::move(M);
   return createStringError(Err.getMessage());
@@ -211,16 +213,16 @@ Expected<SmallVector<std::string>> getSYCLDeviceLibs(const ArgList &Args) {
 /// 3. Link all the images gathered in Step 2 with the output of Step 1 using
 /// linkInModule API. LinkOnlyNeeded flag is used.
 Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
-                                   const ArgList &Args, LLVMContext &C) {
+                                   const ArgList &Args, LLVMContext &Ctx) {
   llvm::TimeTraceScope TimeScope("SYCL link device code");
 
   assert(InputFiles.size() && "No inputs to link");
 
-  auto LinkerOutput = std::make_unique<Module>("sycl-device-link", C);
+  auto LinkerOutput = std::make_unique<Module>("sycl-device-link", Ctx);
   Linker L(*LinkerOutput);
   // Link SYCL device input files.
   for (auto &File : InputFiles) {
-    auto ModOrErr = getBitcodeModule(File, C);
+    auto ModOrErr = getBitcodeModule(File, Ctx);
     if (!ModOrErr)
       return ModOrErr.takeError();
     if (L.linkInModule(std::move(*ModOrErr)))
@@ -235,7 +237,7 @@ Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
   // Link in SYCL device library files.
   const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
   for (auto &File : *SYCLDeviceLibFiles) {
-    auto LibMod = getBitcodeModule(File, C);
+    auto LibMod = getBitcodeModule(File, Ctx);
     if (!LibMod)
       return LibMod.takeError();
     if ((*LibMod)->getTargetTriple() == Triple) {
@@ -278,18 +280,18 @@ Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
 /// Converts 'File' from LLVM bitcode to SPIR-V format using SPIR-V backend.
 /// 'Args' encompasses all arguments required for linking device code and will
 /// be parsed to generate options required to be passed into the backend.
-static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
-                                           LLVMContext &C) {
+static Error runSPIRVCodeGen(StringRef File, const ArgList &Args,
+                             StringRef OutputFile, LLVMContext &Ctx) {
   llvm::TimeTraceScope TimeScope("SPIR-V code generation");
 
   // Parse input module.
-  SMDiagnostic Err;
-  std::unique_ptr<Module> M = parseIRFile(File, Err, C);
+  SMDiagnostic E;
+  std::unique_ptr<Module> M = parseIRFile(File, E, Ctx);
   if (!M)
-    return createStringError(Err.getMessage());
+    return createStringError(E.getMessage());
 
   if (Error Err = M->materializeAll())
-    return std::move(Err);
+    return Err;
 
   Triple TargetTriple(Args.getLastArgValue(OPT_triple_EQ));
   M->setTargetTriple(TargetTriple);
@@ -333,7 +335,7 @@ static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
     errs() << formatv("SPIR-V Backend: input: {0}, output: {1}\n", File,
                       OutputFile);
 
-  return OutputFile;
+  return Error::success();
 }
 
 /// Performs the following steps:
@@ -342,17 +344,61 @@ static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
 Error runSYCLLink(ArrayRef<std::string> Files, const ArgList &Args) {
   llvm::TimeTraceScope TimeScope("SYCL device linking");
 
-  LLVMContext C;
+  LLVMContext Ctx;
 
   // Link all input bitcode files and SYCL device library files, if any.
-  auto LinkedFile = linkDeviceCode(Files, Args, C);
+  auto LinkedFile = linkDeviceCode(Files, Args, Ctx);
   if (!LinkedFile)
     reportError(LinkedFile.takeError());
 
+  // TODO: SYCL post link functionality involves device code splitting and will
+  // result in multiple bitcode codes.
+  // The following lines are placeholders to represent multiple files and will
+  // be refactored once SYCL post link support is available.
+  SmallVector<std::string> SplitModules;
+  SplitModules.emplace_back(*LinkedFile);
+
   // SPIR-V code generation step.
-  auto SPVFile = runSPIRVCodeGen(*LinkedFile, Args, C);
-  if (!SPVFile)
-    return SPVFile.takeError();
+  for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
+    auto Stem = OutputFile.rsplit('.').first;
+    std::string SPVFile(Stem);
+    SPVFile.append("_" + utostr(I) + ".spv");
+    auto Err = runSPIRVCodeGen(SplitModules[I], Args, SPVFile, Ctx);
+    if (Err)
+      return std::move(Err);
+    SplitModules[I] = SPVFile;
+  }
+
+  // Write the final output into file.
+  int FD = -1;
+  if (std::error_code EC = sys::fs::openFileForWrite(OutputFile, FD))
+    return errorCodeToError(EC);
+  llvm::raw_fd_ostream FS(FD, /*shouldClose=*/true);
+
+  for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
+    auto File = SplitModules[I];
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> FileOrErr =
+        llvm::MemoryBuffer::getFileOrSTDIN(File);
+    if (std::error_code EC = FileOrErr.getError()) {
+      if (DryRun)
+        FileOrErr = MemoryBuffer::getMemBuffer("");
+      else
+        return createFileError(File, EC);
+    }
+    OffloadingImage TheImage{};
+    TheImage.TheImageKind = IMG_Object;
+    TheImage.TheOffloadKind = OFK_SYCL;
+    TheImage.StringData["triple"] =
+        Args.MakeArgString(Args.getLastArgValue(OPT_triple_EQ));
+    TheImage.StringData["arch"] =
+        Args.MakeArgString(Args.getLastArgValue(OPT_arch_EQ));
+    TheImage.Image = std::move(*FileOrErr);
+
+    llvm::SmallString<0> Buffer = OffloadBinary::write(TheImage);
+    if (Buffer.size() % OffloadBinary::getAlignment() != 0)
+      return createStringError("Offload binary has invalid size alignment");
+    FS << Buffer;
+  }
   return Error::success();
 }
 
@@ -394,7 +440,7 @@ int main(int argc, char **argv) {
   DryRun = Args.hasArg(OPT_dry_run);
   SaveTemps = Args.hasArg(OPT_save_temps);
 
-  OutputFile = "a.spv";
+  OutputFile = "a.out";
   if (Args.hasArg(OPT_o))
     OutputFile = Args.getLastArgValue(OPT_o);
 
diff --git a/llvm/include/llvm/Object/OffloadBinary.h b/llvm/include/llvm/Object/OffloadBinary.h
index c02aec8d956ed..b6e615740d72c 100644
--- a/llvm/include/llvm/Object/OffloadBinary.h
+++ b/llvm/include/llvm/Object/OffloadBinary.h
@@ -35,6 +35,7 @@ enum OffloadKind : uint16_t {
   OFK_OpenMP,
   OFK_Cuda,
   OFK_HIP,
+  OFK_SYCL,
   OFK_LAST,
 };
 
diff --git a/llvm/lib/Object/OffloadBinary.cpp b/llvm/lib/Object/OffloadBinary.cpp
index 56687c9acb653..3fff6b6a09e08 100644
--- a/llvm/lib/Object/OffloadBinary.cpp
+++ b/llvm/lib/Object/OffloadBinary.cpp
@@ -301,6 +301,7 @@ OffloadKind object::getOffloadKind(StringRef Name) {
       .Case("openmp", OFK_OpenMP)
       .Case("cuda", OFK_Cuda)
       .Case("hip", OFK_HIP)
+      .Case("sycl", OFK_SYCL)
       .Default(OFK_None);
 }
 
@@ -312,6 +313,8 @@ StringRef object::getOffloadKindName(OffloadKind Kind) {
     return "cuda";
   case OFK_HIP:
     return "hip";
+  case OFK_SYCL:
+    return "sycl";
   default:
     return "none";
   }

@llvmbot
Copy link
Member

llvmbot commented Apr 14, 2025

@llvm/pr-subscribers-clang

Author: Arvind Sudarsanam (asudarsa)

Changes

Device code linking happens inside clang-linker-wrapper. In the current implementation, clang-linker-wrapper does the following:

  1. Extracts device code. Input_1, Input_2,.....
  2. Group device code according to target devices Inputs[triple_1] = ....
    Inputs[triple_2] = ....
  3. For each group, i.e. Inputs[triple_i], a. Gather all the offload kinds found inside those inputs in ActiveOffloadKinds b. Link all images inside Inputs[triple_i] by calling clang --target=triple_i .... c. Create a copy of that linked image for each offload kind and add it to Output[Kind] list.

In SYCL compilation flow, there is a deviation in Step 3b. We call device code splitting inside the 'clang --target=triple_i ....' call and the output is now a 'packaged' file containing multiple device images. This deviation requires us to capture the OffloadKind during the linking stage and pass it along to the linking function (clang), so that clang can be called with a unique option '--sycl-link' that will help us to call 'clang-sycl-linker' under the hood (clang-sycl-linker will do SYCL specific linking).

Our current objective is to implement an end-to-end SYCL offloading flow and get it working. We will eventually merge our approach with the community flow.


Full diff: https://github.com/llvm/llvm-project/pull/135683.diff

6 Files Affected:

  • (modified) clang/docs/ClangOffloadPackager.rst (+2)
  • (modified) clang/test/Driver/linker-wrapper.c (+10)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+67-4)
  • (modified) clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp (+65-19)
  • (modified) llvm/include/llvm/Object/OffloadBinary.h (+1)
  • (modified) llvm/lib/Object/OffloadBinary.cpp (+3)
diff --git a/clang/docs/ClangOffloadPackager.rst b/clang/docs/ClangOffloadPackager.rst
index 2b985e260e302..481069b5e4235 100644
--- a/clang/docs/ClangOffloadPackager.rst
+++ b/clang/docs/ClangOffloadPackager.rst
@@ -112,6 +112,8 @@ the following values for the :ref:`offload kind<table-offload_kind>` and the
     +------------+-------+---------------------------------------+
     | OFK_HIP    | 0x03  | The producer was HIP                  |
     +------------+-------+---------------------------------------+
+    | OFK_SYCL   | 0x04  | The producer was SYCL                 |
+    +------------+-------+---------------------------------------+
 
 The flags are used to signify certain conditions, such as the presence of
 debugging information or whether or not LTO was used. The string entry table is
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 0c77c2b34216a..921c9a11d2d32 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -2,6 +2,7 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 // REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
 
 // An externally visible variable so static libraries extract.
 __attribute__((visibility("protected"), used)) int x;
@@ -9,6 +10,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
 // RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
 // RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
+// RUN: %clang -cc1 %s -triple spirv64-unknown-unknown -emit-llvm-bc -o %t.spirv.bc
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70 \
@@ -49,6 +51,14 @@ __attribute__((visibility("protected"), used)) int x;
 
 // AMDGPU-LTO-TEMPS: clang{{.*}} --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -flto {{.*}}-save-temps
 
+// RUN: clang-offload-packager -o %t.out \
+// RUN:   --image=file=%t.spirv.bc,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SPIRV-LINK
+
+// SPIRV-LINK: clang{{.*}} -o {{.*}}.img --target=spirv64-unknown-unknown {{.*}}.o --sycl-link -Xlinker -triple=spirv64-unknown-unknown -Xlinker -arch=
+
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 52d922abbcaec..7c09ce8c9e59d 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -464,7 +464,8 @@ fatbinary(ArrayRef<std::pair<StringRef, StringRef>> InputFiles,
 } // namespace amdgcn
 
 namespace generic {
-Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
+Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args,
+                          bool HasSYCLOffloadKind = false) {
   llvm::TimeTraceScope TimeScope("Clang");
   // Use `clang` to invoke the appropriate device tools.
   Expected<std::string> ClangPath =
@@ -554,6 +555,17 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   if (Args.hasArg(OPT_embed_bitcode))
     CmdArgs.push_back("-Wl,--lto-emit-llvm");
 
+  // For linking device code with the SYCL offload kind, special handling is
+  // required. Passing --sycl-link to clang results in a call to
+  // clang-sycl-linker. Additional linker flags required by clang-sycl-linker
+  // will be communicated via the -Xlinker option.
+  if (HasSYCLOffloadKind) {
+    CmdArgs.push_back("--sycl-link");
+    CmdArgs.append(
+        {"-Xlinker", Args.MakeArgString("-triple=" + Triple.getTriple())});
+    CmdArgs.append({"-Xlinker", Args.MakeArgString("-arch=" + Arch)});
+  }
+
   for (StringRef Arg : Args.getAllArgValues(OPT_linker_arg_EQ))
     CmdArgs.append({"-Xlinker", Args.MakeArgString(Arg)});
   for (StringRef Arg : Args.getAllArgValues(OPT_compiler_arg_EQ))
@@ -567,7 +579,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
 } // namespace generic
 
 Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
-                               const ArgList &Args) {
+                               const ArgList &Args,
+                               bool HasSYCLOffloadKind = false) {
   const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
   switch (Triple.getArch()) {
   case Triple::nvptx:
@@ -582,7 +595,7 @@ Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
   case Triple::spirv64:
   case Triple::systemz:
   case Triple::loongarch64:
-    return generic::clang(InputFiles, Args);
+    return generic::clang(InputFiles, Args, HasSYCLOffloadKind);
   default:
     return createStringError(Triple.getArchName() +
                              " linking is not supported");
@@ -924,9 +937,20 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
     auto LinkerArgs = getLinkerArgs(Input, BaseArgs);
 
     DenseSet<OffloadKind> ActiveOffloadKinds;
-    for (const auto &File : Input)
+    // Currently, SYCL device code linking process differs from generic device
+    // code linking.
+    // TODO: Remove check for offload kind, once SYCL device code linking is
+    // aligned with generic linking.
+    bool HasSYCLOffloadKind = false;
+    bool HasNonSYCLOffloadKind = false;
+    for (const auto &File : Input) {
       if (File.getBinary()->getOffloadKind() != OFK_None)
         ActiveOffloadKinds.insert(File.getBinary()->getOffloadKind());
+      if (File.getBinary()->getOffloadKind() == OFK_SYCL)
+        HasSYCLOffloadKind = true;
+      else
+        HasNonSYCLOffloadKind = true;
+    }
 
     // Write any remaining device inputs to an output file.
     SmallVector<StringRef> InputFiles;
@@ -937,6 +961,37 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
       InputFiles.emplace_back(*FileNameOrErr);
     }
 
+    if (HasSYCLOffloadKind) {
+      // Link the remaining device files using the device linker.
+      auto OutputOrErr = linkDevice(InputFiles, LinkerArgs, HasSYCLOffloadKind);
+      if (!OutputOrErr)
+        return OutputOrErr.takeError();
+      // Output is a packaged object of device images. Unpackage the images and
+      // copy them to Images[Kind]
+      ErrorOr<std::unique_ptr<MemoryBuffer>> BufferOrErr =
+          MemoryBuffer::getFileOrSTDIN(*OutputOrErr);
+      if (std::error_code EC = BufferOrErr.getError())
+        return createFileError(*OutputOrErr, EC);
+
+      MemoryBufferRef Buffer = **BufferOrErr;
+      SmallVector<OffloadFile> Binaries;
+      if (Error Err = extractOffloadBinaries(Buffer, Binaries))
+        return std::move(Err);
+      for (auto &OffloadFile : Binaries) {
+        auto TheBinary = OffloadFile.getBinary();
+        OffloadingImage TheImage{};
+        TheImage.TheImageKind = TheBinary->getImageKind();
+        TheImage.TheOffloadKind = TheBinary->getOffloadKind();
+        TheImage.StringData["triple"] = TheBinary->getTriple();
+        TheImage.StringData["arch"] = TheBinary->getArch();
+        TheImage.Image = MemoryBuffer::getMemBufferCopy(TheBinary->getImage());
+        Images[OFK_SYCL].emplace_back(std::move(TheImage));
+      }
+    }
+
+    if (!HasNonSYCLOffloadKind)
+      return Error::success();
+
     // Link the remaining device files using the device linker.
     auto OutputOrErr = linkDevice(InputFiles, LinkerArgs);
     if (!OutputOrErr)
@@ -944,6 +999,9 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
 
     // Store the offloading image for each linked output file.
     for (OffloadKind Kind : ActiveOffloadKinds) {
+      // For SYCL, Offloading images were created inside clang-sycl-linker
+      if (Kind == OFK_SYCL)
+        continue;
       llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> FileOrErr =
           llvm::MemoryBuffer::getFileOrSTDIN(*OutputOrErr);
       if (std::error_code EC = FileOrErr.getError()) {
@@ -986,6 +1044,11 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
              A.StringData["arch"] > B.StringData["arch"] ||
              A.TheOffloadKind < B.TheOffloadKind;
     });
+    if (Kind == OFK_SYCL) {
+      // TODO: Update once SYCL offload wrapping logic is available.
+      reportError(
+          createStringError("SYCL offload wrapping logic is not available"));
+    }
     auto BundledImagesOrErr = bundleLinkedOutput(Input, Args, Kind);
     if (!BundledImagesOrErr)
       return BundledImagesOrErr.takeError();
diff --git a/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp b/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
index c640deddc9e74..7c0b5de6ecb13 100644
--- a/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
+++ b/clang/tools/clang-sycl-linker/ClangSYCLLinker.cpp
@@ -70,6 +70,8 @@ static StringRef OutputFile;
 /// Directory to dump SPIR-V IR if requested by user.
 static SmallString<128> SPIRVDumpDir;
 
+using OffloadingImage = OffloadBinary::OffloadingImage;
+
 static void printVersion(raw_ostream &OS) {
   OS << clang::getClangToolFullVersion("clang-sycl-linker") << '\n';
 }
@@ -168,10 +170,10 @@ Expected<SmallVector<std::string>> getInput(const ArgList &Args) {
 /// are LLVM IR bitcode files.
 // TODO: Support SPIR-V IR files.
 Expected<std::unique_ptr<Module>> getBitcodeModule(StringRef File,
-                                                   LLVMContext &C) {
+                                                   LLVMContext &Ctx) {
   SMDiagnostic Err;
 
-  auto M = getLazyIRFileModule(File, Err, C);
+  auto M = getLazyIRFileModule(File, Err, Ctx);
   if (M)
     return std::move(M);
   return createStringError(Err.getMessage());
@@ -211,16 +213,16 @@ Expected<SmallVector<std::string>> getSYCLDeviceLibs(const ArgList &Args) {
 /// 3. Link all the images gathered in Step 2 with the output of Step 1 using
 /// linkInModule API. LinkOnlyNeeded flag is used.
 Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
-                                   const ArgList &Args, LLVMContext &C) {
+                                   const ArgList &Args, LLVMContext &Ctx) {
   llvm::TimeTraceScope TimeScope("SYCL link device code");
 
   assert(InputFiles.size() && "No inputs to link");
 
-  auto LinkerOutput = std::make_unique<Module>("sycl-device-link", C);
+  auto LinkerOutput = std::make_unique<Module>("sycl-device-link", Ctx);
   Linker L(*LinkerOutput);
   // Link SYCL device input files.
   for (auto &File : InputFiles) {
-    auto ModOrErr = getBitcodeModule(File, C);
+    auto ModOrErr = getBitcodeModule(File, Ctx);
     if (!ModOrErr)
       return ModOrErr.takeError();
     if (L.linkInModule(std::move(*ModOrErr)))
@@ -235,7 +237,7 @@ Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
   // Link in SYCL device library files.
   const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
   for (auto &File : *SYCLDeviceLibFiles) {
-    auto LibMod = getBitcodeModule(File, C);
+    auto LibMod = getBitcodeModule(File, Ctx);
     if (!LibMod)
       return LibMod.takeError();
     if ((*LibMod)->getTargetTriple() == Triple) {
@@ -278,18 +280,18 @@ Expected<StringRef> linkDeviceCode(ArrayRef<std::string> InputFiles,
 /// Converts 'File' from LLVM bitcode to SPIR-V format using SPIR-V backend.
 /// 'Args' encompasses all arguments required for linking device code and will
 /// be parsed to generate options required to be passed into the backend.
-static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
-                                           LLVMContext &C) {
+static Error runSPIRVCodeGen(StringRef File, const ArgList &Args,
+                             StringRef OutputFile, LLVMContext &Ctx) {
   llvm::TimeTraceScope TimeScope("SPIR-V code generation");
 
   // Parse input module.
-  SMDiagnostic Err;
-  std::unique_ptr<Module> M = parseIRFile(File, Err, C);
+  SMDiagnostic E;
+  std::unique_ptr<Module> M = parseIRFile(File, E, Ctx);
   if (!M)
-    return createStringError(Err.getMessage());
+    return createStringError(E.getMessage());
 
   if (Error Err = M->materializeAll())
-    return std::move(Err);
+    return Err;
 
   Triple TargetTriple(Args.getLastArgValue(OPT_triple_EQ));
   M->setTargetTriple(TargetTriple);
@@ -333,7 +335,7 @@ static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
     errs() << formatv("SPIR-V Backend: input: {0}, output: {1}\n", File,
                       OutputFile);
 
-  return OutputFile;
+  return Error::success();
 }
 
 /// Performs the following steps:
@@ -342,17 +344,61 @@ static Expected<StringRef> runSPIRVCodeGen(StringRef File, const ArgList &Args,
 Error runSYCLLink(ArrayRef<std::string> Files, const ArgList &Args) {
   llvm::TimeTraceScope TimeScope("SYCL device linking");
 
-  LLVMContext C;
+  LLVMContext Ctx;
 
   // Link all input bitcode files and SYCL device library files, if any.
-  auto LinkedFile = linkDeviceCode(Files, Args, C);
+  auto LinkedFile = linkDeviceCode(Files, Args, Ctx);
   if (!LinkedFile)
     reportError(LinkedFile.takeError());
 
+  // TODO: SYCL post link functionality involves device code splitting and will
+  // result in multiple bitcode codes.
+  // The following lines are placeholders to represent multiple files and will
+  // be refactored once SYCL post link support is available.
+  SmallVector<std::string> SplitModules;
+  SplitModules.emplace_back(*LinkedFile);
+
   // SPIR-V code generation step.
-  auto SPVFile = runSPIRVCodeGen(*LinkedFile, Args, C);
-  if (!SPVFile)
-    return SPVFile.takeError();
+  for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
+    auto Stem = OutputFile.rsplit('.').first;
+    std::string SPVFile(Stem);
+    SPVFile.append("_" + utostr(I) + ".spv");
+    auto Err = runSPIRVCodeGen(SplitModules[I], Args, SPVFile, Ctx);
+    if (Err)
+      return std::move(Err);
+    SplitModules[I] = SPVFile;
+  }
+
+  // Write the final output into file.
+  int FD = -1;
+  if (std::error_code EC = sys::fs::openFileForWrite(OutputFile, FD))
+    return errorCodeToError(EC);
+  llvm::raw_fd_ostream FS(FD, /*shouldClose=*/true);
+
+  for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
+    auto File = SplitModules[I];
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> FileOrErr =
+        llvm::MemoryBuffer::getFileOrSTDIN(File);
+    if (std::error_code EC = FileOrErr.getError()) {
+      if (DryRun)
+        FileOrErr = MemoryBuffer::getMemBuffer("");
+      else
+        return createFileError(File, EC);
+    }
+    OffloadingImage TheImage{};
+    TheImage.TheImageKind = IMG_Object;
+    TheImage.TheOffloadKind = OFK_SYCL;
+    TheImage.StringData["triple"] =
+        Args.MakeArgString(Args.getLastArgValue(OPT_triple_EQ));
+    TheImage.StringData["arch"] =
+        Args.MakeArgString(Args.getLastArgValue(OPT_arch_EQ));
+    TheImage.Image = std::move(*FileOrErr);
+
+    llvm::SmallString<0> Buffer = OffloadBinary::write(TheImage);
+    if (Buffer.size() % OffloadBinary::getAlignment() != 0)
+      return createStringError("Offload binary has invalid size alignment");
+    FS << Buffer;
+  }
   return Error::success();
 }
 
@@ -394,7 +440,7 @@ int main(int argc, char **argv) {
   DryRun = Args.hasArg(OPT_dry_run);
   SaveTemps = Args.hasArg(OPT_save_temps);
 
-  OutputFile = "a.spv";
+  OutputFile = "a.out";
   if (Args.hasArg(OPT_o))
     OutputFile = Args.getLastArgValue(OPT_o);
 
diff --git a/llvm/include/llvm/Object/OffloadBinary.h b/llvm/include/llvm/Object/OffloadBinary.h
index c02aec8d956ed..b6e615740d72c 100644
--- a/llvm/include/llvm/Object/OffloadBinary.h
+++ b/llvm/include/llvm/Object/OffloadBinary.h
@@ -35,6 +35,7 @@ enum OffloadKind : uint16_t {
   OFK_OpenMP,
   OFK_Cuda,
   OFK_HIP,
+  OFK_SYCL,
   OFK_LAST,
 };
 
diff --git a/llvm/lib/Object/OffloadBinary.cpp b/llvm/lib/Object/OffloadBinary.cpp
index 56687c9acb653..3fff6b6a09e08 100644
--- a/llvm/lib/Object/OffloadBinary.cpp
+++ b/llvm/lib/Object/OffloadBinary.cpp
@@ -301,6 +301,7 @@ OffloadKind object::getOffloadKind(StringRef Name) {
       .Case("openmp", OFK_OpenMP)
       .Case("cuda", OFK_Cuda)
       .Case("hip", OFK_HIP)
+      .Case("sycl", OFK_SYCL)
       .Default(OFK_None);
 }
 
@@ -312,6 +313,8 @@ StringRef object::getOffloadKindName(OffloadKind Kind) {
     return "cuda";
   case OFK_HIP:
     return "hip";
+  case OFK_SYCL:
+    return "sycl";
   default:
     return "none";
   }

@sarnex sarnex requested review from jhuber6 and sarnex and removed request for jhuber6 April 14, 2025 21:29
@asudarsa
Copy link
Contributor Author

asudarsa commented Apr 14, 2025

Hi @jhuber6

Thanks in advance for looking into this.

There are a couple of TODO comments in this PR:

  1. Add sycl post link functionality (including device code splitting) - This PR is under review here - [offload][SYCL] Add SYCL Module splitting. #131347
  2. Add SYCL offload wrapping logic - We are working on submitting this PR. There is interdependency between current PR and the offload wrapping effort. One option is to merge the two efforts. However, I am hoping to reduce the complexity of each PR. Hope, that is agreeable.

I expect these TODOs to be wrapped up very soon.

Thanks

@@ -464,7 +464,8 @@ fatbinary(ArrayRef<std::pair<StringRef, StringRef>> InputFiles,
} // namespace amdgcn

namespace generic {
Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args,
bool HasSYCLOffloadKind = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this a bitfield and check if it contains SYCL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this change makes things better. Making OffloadKind enums to have power-of-two values and then using a bitfield here helps to avoid these flags and also helps to make the ActiveOffloadKinds support better. However, I would prefer to make these changes in a subsequent PR so that we can have a set of simpler commits. Please let me know if this is agreeable or you would prefer to make the change in this PR.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then precommit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah..that's a great idea. let me do that.

Thanks

SmallVector<OffloadFile> Binaries;
if (Error Err = extractOffloadBinaries(Buffer, Binaries))
return std::move(Err);
for (auto &OffloadFile : Binaries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this be made common with the other use? It looks directly copied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are subtle changes here. In the other use site, we look at input filenames provided to clang-linker-wrapper and collect two lists of OffloadFiles - object files and archived files. In this new use site, we look at the output file returned by clang-sycl-linker and try to collect a list of OffloadingImages corresponding to the list of object files generated in clang-sycl-linker.
I tried merging the two use sites. it seems to make the code more complicated than necessary.
Please let me know if I am missing something here.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really can't make this an if-else thing? It would be great to have it common, and now that we can just check for SYCL it should be easy to verify. Also this should probably make it clear that mixed SYCL / HIP or anything else does not work.

@@ -35,6 +35,7 @@ enum OffloadKind : uint16_t {
OFK_OpenMP,
OFK_Cuda,
OFK_HIP,
OFK_SYCL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assign specific values for these and make them powers of two apart so we can use them like a bitfield. Clang does that already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier (#135683 (comment)), it might be better to address this change in a subsequent PR. Hope that's agreeable.

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As proposed by @jhuber6 earlier, I will submit a separate PR to make changes to offload kind representation.
And then update this PR after that gets merged.

Thanks

@asudarsa
Copy link
Contributor Author

Hi @jhuber6

Thanks so much for feedback so far. I have added a separate PR (#135809) for improving OffloadKind implementation. Kindly take a look.

Thanks

@asudarsa asudarsa force-pushed the use_clang_sycl_linker_upstream branch from a6e0d66 to 67e994d Compare April 16, 2025 17:23
@asudarsa
Copy link
Contributor Author

Hi @jhuber6

#135809 has been merged.
I have refactored this change on top of that.
Kindly take a look whenever convenient.

Sincerely

SmallVector<OffloadFile> Binaries;
if (Error Err = extractOffloadBinaries(Buffer, Binaries))
return std::move(Err);
for (auto &OffloadFile : Binaries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really can't make this an if-else thing? It would be great to have it common, and now that we can just check for SYCL it should be easy to verify. Also this should probably make it clear that mixed SYCL / HIP or anything else does not work.

@@ -168,10 +170,10 @@ Expected<SmallVector<std::string>> getInput(const ArgList &Args) {
/// are LLVM IR bitcode files.
// TODO: Support SPIR-V IR files.
Expected<std::unique_ptr<Module>> getBitcodeModule(StringRef File,
LLVMContext &C) {
LLVMContext &Ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was changing C to Ctx in a 'related' code change and I decided to do it everywhere. Just for better readability. Hope its' ok.

Thanks

@@ -988,6 +1038,11 @@ Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
A.StringData["arch"] > B.StringData["arch"] ||
A.TheOffloadKind < B.TheOffloadKind;
});
if (Kind == OFK_SYCL) {
// TODO: Update once SYCL offload wrapping logic is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really need to make the offload wrapping a standalone tool so users can do it themselves. Normally I'd suggest landing that first, but since the tests for it rely on the linker wrapper, kind of hard. I guess it's fine to just let this error for now.

@asudarsa asudarsa marked this pull request as draft April 16, 2025 18:32
@asudarsa
Copy link
Contributor Author

Marking this as a draft to attempt refactoring of clang-linker-wrapper changes based on offline discussions with @jhuber6

Thanks much

@asudarsa asudarsa marked this pull request as ready for review April 17, 2025 00:33
@asudarsa
Copy link
Contributor Author

Hi @jhuber6

I added a constraint that Images of SYCL offloading kind cannot be linked with images of other kind. This is a valid constraint in the current state of SYCL upstreaming effort. We will aim to remove this constraint soon.

This makes the clang-linker-wrapper code changes manageable.

Hope this is an agreeable resolution.

Thanks for driving me towards this.

… SYCL offloads

Device code linking happens inside clang-linker-wrapper. In the current
implementation, clang-linker-wrapper does the following:

1. Extracts device code. Input_1, Input_2,.....
2. Group device code according to target devices
Inputs[triple_1] = ....
Inputs[triple_2] = ....
3. For each group, i.e. Inputs[triple_i],
   a. Gather all the offload kinds found inside those inputs in
      ActiveOffloadKinds
   b. Link all images inside Inputs[triple_i] by calling
      clang --target=triple_i ....
   c. Create a copy of that linked image for each offload kind and add it to
      Output[Kind] list.

In SYCL compilation flow, there is a deviation in Step 3b. We call device code
splitting inside the 'clang --target=triple_i ....' call and the output is now
a 'packaged' file containing multiple device images. This deviation requires us
to capture the OffloadKind during the linking stage and pass it along to the
linking function (clang), so that clang can be called with a unique option
'--sycl-link' that will help us to call 'clang-sycl-linker' under the hood
(clang-sycl-linker will do SYCL specific linking).

Our current objective is to implement an end-to-end SYCL offloading flow and
get it working. We will eventually merge our approach with the community flow.

Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
@asudarsa asudarsa force-pushed the use_clang_sycl_linker_upstream branch from 9afd0b3 to 35ad4bf Compare April 17, 2025 13:43
Copy link
Member

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just a nit!

@sarnex sarnex merged commit 52e10e6 into llvm:main Apr 17, 2025
12 checks passed
@kazutakahirata
Copy link
Contributor

@sarnex I've landed 3133c95 to fix a warning from this PR. Thanks!

@sarnex
Copy link
Member

sarnex commented Apr 17, 2025

Thanks!

@asudarsa
Copy link
Contributor Author

@sarnex I've landed 3133c95 to fix a warning from this PR. Thanks!

Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:binary-utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants