[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2

srividya-sundaram · 2025-03-07T23:51:27Z

This patch enables Clang to support SYCL offloading to several offloading device architectures such as Intel CPUs, Intel GPUs, NVidia and AMD GPUs using the offload-arch= option.

Using offload-arch= option, users can specify an offloading device architecture for SYCL ( in addition to CUDA, HIP or OpenMP offloading).

The target triple strings are deduced from the arch values passed via offload-arch= option and the corresponding toolchains are constructed.

Example:

clang++   --offload-new-driver  -fsycl  --offload-arch=bdw              // Offload SYCL code to Intel GPU
clang++   --offload-new-driver  -fsycl  --offload-arch=broadwell    // Offload SYCL code to Intel CPU
clang++   --offload-new-driver  -fsycl  --offload-arch=sm_80          // Offload SYCL code to NVidia GPU
clang++   --offload-new-driver  -fsycl  --offload-arch=gfx700         // Offload SYCL code to AMD GPU

srividya-sundaram · 2025-03-08T00:38:11Z

clang/include/clang/Basic/Cuda.h

@@ -106,6 +106,8 @@ enum class OffloadArch {
  GFX90a,
  GFX90c,
  GFX9_4_GENERIC,
+  GFX940,
+  GFX941,  


TODO: Add code to enable SYCL offloading to these AMD GPU targets.

These targets have been removed in community, we should follow and not add them in our upstream efforts.

srividya-sundaram · 2025-03-08T00:39:00Z

clang/lib/Driver/Driver.cpp

+      return false;
+    if(SYCLTriple.isSPIRAOT())
+      return false;
+  }
  // Check current set of triples to see if the default has already been set.
  for (const auto &SYCLTriple : SYCLTriples) {
    if (SYCLTriple.getSubArch() == llvm::Triple::NoSubArch &&
        SYCLTriple.isSPIROrSPIRV())


TODO: Update this check to SYCLTriple.isSPIRV()

sarnex · 2025-03-10T18:54:36Z

Sorry is this the upstream version of Mike's internal PR?

mdtoguchi · 2025-03-10T19:00:11Z

Sorry is this the upstream version of Mike's internal PR?

@sarnex , no the internal PR is for OpenMP enabling of --offload-arch with the new model.

sarnex · 2025-03-10T19:02:12Z

Got it, thx

srividya-sundaram · 2025-03-10T19:58:33Z

Sorry is this the upstream version of Mike's internal PR?

This PR is for enabling SYCL offloading to Intel and third-party GPUs and CPUs in the community clang(llvm-project) and a follow up to llvm#117268

bader · 2025-03-10T20:57:05Z

clang++ --offload-new-driver -fsycl --offload-arch=bdw // Offload SYCL code to Intel GPU
clang++ --offload-new-driver -fsycl --offload-arch=broadwell // Offload SYCL code to Intel CPU

To be honest, I found these names very confusing. As a user, it's hard to understand the difference between bdw and broadwell. Ideally, we should have code names (or some other identifiers) to separate GPU architecture from CPU architecture. For me, bdw sounds like CPU microarchitecture abbreviation.

Do you think it's possible to add igpu- prefix to official GPU architecture name to clearly separate from the CPU architectures (e.g. --offload-arch=igpu-bdw)?
Another option is to add GPU generation code name: bdw-gen8.

sarnex

initial pass, thanks!

clang/include/clang/Basic/DiagnosticDriverKinds.td

sarnex · 2025-03-10T20:53:43Z

clang/include/clang/Driver/Driver.h

+  /// Vector of Macros that need to be added to the Host compilation in a
+  /// SYCL based offloading scenario.  These macros are gathered during
+  /// construction of the device compilations.
+  mutable std::vector<std::string> SYCLTargetMacroArgs;


i took a quick look and didn't see openmp doing the same thing, is it possible to do what the other offloading languages do?

AFAIC, SYCL target macros are generated during SYCL device compilation and they are passed to both SYCL device and host compilation step.
Example:

clang -### -fsycl --offload-arch=bdw -nogpulib test.cpp clang-20 -cc1 "-fsycl-is-device" -D__SYCL_TARGET_INTEL_GPU_BDW__ ..... clang-20 -cc1 "-fsycl-is-host" "-D__SYCL_TARGET_INTEL_GPU_BDW__" .......

This is the existing behavior ported from intel/llvm.

These macros mean that we compile for specific target. Typically, they are set by the LLVM backend. We don't have LLVM backend for SPIR target, so we added them in the driver.
Considering that we have SPIR-V backend, we can consider moving this logic to SPIR-V backend, but I'm not sure how community will respond to adding INTEL specific macros to SPIR-V backend.
Maybe we should consider adding Intel's flavor to SPIR-V as AMD folks. They added some macro here: https://github.com/llvm/llvm-project/pull/89796/files#diff-4c95416395669c428da9b77967b9f70792b863f8b32906bbc39692ec0988ab9fR93-R95

It seems like we should have some reasonable success if we add 'limited' number of macros. How many do we expect to add?

Thanks

We have unique values for each GPU target. So each potential GPU we support, we would have a macro that is associated with it. Given the number we currently have here:

llvm-project/clang/lib/Driver/ToolChains/SYCL.cpp

Line 137 in 2f71c26

SmallString<64> clang::driver::getGenDeviceMacro(StringRef DeviceName) {

the number isn't small. We would have those, and we also have one generic __SYCL_TARGET_INTEL_X86_64__ for CPU. Additionally, the driver is emitting all of the __SYCL_ANY_DEVICE_HAS_*__ and __SYCL_ALL_DEVICES_HAVE_*__ macros. Definitely not 'limited' by any means.

sarnex · 2025-03-10T20:54:12Z

clang/lib/Driver/Driver.cpp

  if (llvm::is_contained(SYCLAlias, TargetArch)) {
    llvm::Triple TargetTriple;
    TargetTriple.setArchName(TargetArch);
+    // Return the full SYCL target triple string for NVidia GPU targets.


nit: can you explain why we need to do that in the comment?

sarnex · 2025-03-10T20:54:52Z

clang/lib/Driver/Driver.cpp

@@ -846,16 +850,25 @@ static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) {

 static bool addSYCLDefaultTriple(Compilation &C,
                                 SmallVectorImpl<llvm::Triple> &SYCLTriples) {
+
+  llvm::Triple DefaultTriple = getSYCLDeviceTriple(
+      C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32"


i've never seen spirv32 used, does it actually work? can IGC compile it?

Good question. This was originally introduced by @mdtoguchi in this PR

intel/llvm has spir64 as the default target when -fsycl is passed.

Way back when - when we still supported 32-bit targets with the DPC++ compiler, the equivalent usage for the device compilation was spir, and this is the natural extension of that when moving to spirv based arch values. If we need to restrict the targets for IGC, it would be good to do it now.

This has nothing to do with the IGC. IGC compiles SPIR-V. The right question is: "Does LLVM-SPIRV-Translator support LLVM with spirv32 target triple?" According to my understanding, the answer to this question is "Yes".

I mentioned IGC because of the data type size differences, but yeah the translator behavior is important too. If the translator/SPIR-V backend work with it, we should support it.

AFAIK, 32-bit targets were important for some platforms like Android and/or Windows. I don't think SYCL is enabled on such platforms though.

Got it, thanks.

sarnex · 2025-03-10T20:56:13Z

clang/lib/Driver/Driver.cpp

@@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
  // -ffreestanding cannot be used with -fsycl
  argSYCLIncompatible(options::OPT_ffreestanding);

+  // Map of SYCL target triple strings to their corresponding target archs.
+  // Example: spir64_x86_64 --> SKYLAKEAVX512


nit: can the example have multiple arches? that should be possible given the map value is a set right?

sarnex · 2025-03-10T21:03:00Z

clang/lib/Driver/ToolChains/Clang.cpp

+          }
+        } else if (Triple.getSubArch() == llvm::Triple::SPIRSubArch_x86_64)
+          Macro = "-D__SYCL_TARGET_INTEL_X86_64__";
+        if (Macro.size()) {


nit:

Suggested change

if (Macro.size()) {

if (!Macro.empty()) {

sarnex · 2025-03-10T21:03:21Z

clang/lib/Driver/ToolChains/Cuda.cpp

-          DeviceOffloadingKind == Action::OFK_Cuda) &&
-         "Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");
+          DeviceOffloadingKind == Action::OFK_Cuda || DeviceOffloadingKind == Action::OFK_SYCL) &&
+         "Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs.");


Suggested change

"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs.");

"Only OpenMP, CUDA, and SYCL offloading kinds are supported for NVIDIA GPUs.");

sarnex · 2025-03-10T21:03:54Z

clang/lib/Driver/ToolChains/SYCL.cpp

+    {"westmere", SYCLSupportedIntelArchs::WESTMERE},
+    {"sandybridge", SYCLSupportedIntelArchs::SANDYBRIDGE},
+    {"ivybridge", SYCLSupportedIntelArchs::IVYBRIDGE},
+    {"broadwell", SYCLSupportedIntelArchs::BROADWELL},


I wonder if we could automatically generate this with tablegen. Unfortunately, I'm not good at tablegen :)

sarnex · 2025-03-10T21:05:09Z

clang/lib/Driver/ToolChains/SYCL.h

+// Check if the user provided value for --offload-arch is a valid
+// SYCL supported Intel AOT target.
+SYCLSupportedIntelArchs
+StringToOffloadArchSYCL(llvm::StringRef ArchNameAsString);


since we dont use the namespace in the later fcn

Suggested change

StringToOffloadArchSYCL(llvm::StringRef ArchNameAsString);

StringToOffloadArchSYCL(StringRef ArchNameAsString);

sarnex · 2025-03-10T21:05:49Z

llvm/lib/TargetParser/Triple.cpp

@@ -797,6 +798,16 @@ static Triple::SubArchType parseSubArch(StringRef SubArchName) {
  if (SubArchName == "arm64ec")
    return Triple::AArch64SubArch_arm64ec;

+  if (SubArchName.starts_with("spir")) {
+    StringRef SubArch(SubArchName);
+    if (SubArch.consume_front("spir64_") || SubArch.consume_front("spir_")) {


we should consider if we also want to support spirv64_gen

@bader Opinion?

This leads me to a separate question - given we should aways have an arch value when targeting GPU, can we get away with only using spirv64 as the triple? This would have the tools handle the proper target behavior when the arch value is encountered and not rely on the triple arch/subarch. This transition from old to new model can also provide a clean break from using spir64.

Additional work would need to be done internally for existing usage of spir64 and spir64_gen on build command lines and convert those accordingly.

Yeah this is a good question, my understanding is AMD/NVIDIA don't support JIT at all, so they don't have to deal with AOT vs JIT, so we are the first ones.

I think this is a good topic for a discussion with a larger group, maybe the DPCPP technical forum?

BTW what will the arch be for JIT?

Recent changes in the community have introduced a 'default' arch value, which would be used for JIT. In the packager, representation would be something like --image=file=file.bc,triple=spirv64-unknown-unknown,arch=generic,kind=sycl. Similarly for AOT, it would look like --image=file=file.bc,triple=spirv64-unknown-unknown,arch=bdw,kind=sycl. Of course, the arch value here is TBD.

got it, then i definitely like the idea of triple being only spirv64, and the arch specifying the device/specific hw.

got it, then i definitely like the idea of triple being only spirv64, and the arch specifying the device/specific hw.

May be the 'arch' can be just empty for JIT. Why do we need 'generic'?

Thanks

generic is the default value that is now filled for arch= for the packager. This was introduced here: llvm#126655

asudarsa · 2025-04-04T15:13:24Z

clang++ --offload-new-driver -fsycl --offload-arch=bdw // Offload SYCL code to Intel GPU
clang++ --offload-new-driver -fsycl --offload-arch=broadwell // Offload SYCL code to Intel CPU

To be honest, I found these names very confusing. As a user, it's hard to understand the difference between bdw and broadwell. Ideally, we should have code names (or some other identifiers) to separate GPU architecture from CPU architecture. For me, bdw sounds like CPU microarchitecture abbreviation.

Do you think it's possible to add igpu- prefix to official GPU architecture name to clearly separate from the CPU architectures (e.g. --offload-arch=igpu-bdw)? Another option is to add GPU generation code name: bdw-gen8.

+1 on @bader suggestion. Looking at other arches, we see that sm_* is used for NVidia backends, GFX* is used for AMD. it will be nice to have a 'Intel' prefix like 'igpu-'.

One other quick pointer: In a related PR, we are proposing to move OffloadingArch into its own header file. Please take a look: llvm#133194

Thanks

asudarsa · 2025-04-04T15:17:58Z

clang/include/clang/Basic/DiagnosticDriverKinds.td

@@ -843,4 +843,14 @@ def warn_missing_include_dirs : Warning<

 def err_drv_malformed_warning_suppression_mapping : Error<
  "failed to process suppression mapping file '%0': %1">;
+
+def err_drv_sycl_offload_arch_missing_value : Error<
+  "must pass in an explicit cpu or gpu architecture to '--offload-arch'">;


Suggested change

"must pass in an explicit cpu or gpu architecture to '--offload-arch'">;

"must pass in a valid cpu or gpu architecture string to '--offload-arch'">;

asudarsa · 2025-04-04T15:45:31Z

clang/include/clang/Driver/Driver.h

+  }
+
+  /// getSYCLTargetMacroArgs - return the previously gathered macro target args.
+  llvm::ArrayRef<std::string> getSYCLTargetMacroArgs() const {


Why not just call this getSYCLTargetMacros? Also, why do we need to include '-D' in the macro name? Can we just not add it when including it into a compilation command?
Just a nit....

asudarsa · 2025-04-04T15:48:47Z

clang/lib/Driver/Driver.cpp

@@ -833,10 +833,14 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {

 static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) {
  SmallVector<StringRef, 5> SYCLAlias = {"spir", "spir64", "spirv", "spirv32",
-                                         "spirv64"};
+                                         "spirv64", "spir64_x86_64",


It will be less confusing to just have spirv, spirv32, spirv64 (for intel targets) here. Also, why is AMD missing here?

Thanks

asudarsa · 2025-04-04T15:59:12Z

clang/lib/Driver/Driver.cpp

@@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
  // -ffreestanding cannot be used with -fsycl
  argSYCLIncompatible(options::OPT_ffreestanding);

+  // Map of SYCL target triple strings to their corresponding target archs.


I would prefer to have an implementation where we have a single triple for all Intel targets (spirv64/spirv32/spirv) and then architecture is specified solely by --offload-arch. I think we can get rid of spirv64_x86_64 and spirv_gen.

Thanks

asudarsa · 2025-04-04T16:00:20Z

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

@@ -46,8 +46,9 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
    Action::OffloadKind DeviceOffloadingKind) const {
  HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

-  assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
-         "Only OpenMP offloading kinds are supported.");
+  assert((DeviceOffloadingKind == Action::OFK_OpenMP ||


Are supported in what scenario?

asudarsa · 2025-04-04T16:01:58Z

clang/lib/Driver/ToolChains/Clang.cpp

@@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
      // Set O2 optimization level by default
      if (!Args.getLastArg(options::OPT_O_Group))
        CmdArgs.push_back("-O2");
+      // Add any predefined macros associated with intel_gpu* type targets


This comment (with intel_gpu*) might need an update.

asudarsa · 2025-04-04T16:02:43Z

clang/lib/Driver/ToolChains/Clang.cpp

@@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
      // Set O2 optimization level by default
      if (!Args.getLastArg(options::OPT_O_Group))
        CmdArgs.push_back("-O2");
+      // Add any predefined macros associated with intel_gpu* type targets
+      // passed in with -fsycl-targets


Do we intend to use -fsycl-targets for passing target information?

Thanks

asudarsa · 2025-04-04T16:04:26Z

clang/lib/Driver/ToolChains/Clang.cpp

+      // addition to the host compilation. There is no dependence connection
+      // between device and host where we should be able to use the offloading
+      // arch to add the macro to the host compile.


Nit: Might need a rewrite.

asudarsa · 2025-04-04T16:06:34Z

clang/lib/Driver/ToolChains/SYCL.cpp

@@ -15,6 +15,222 @@ using namespace clang::driver::tools;
 using namespace clang;
 using namespace llvm::opt;

+// Struct that relates an AOT target value with


I think this should be merged with upstream logic. Please see https://github.com/llvm/llvm-project/pull/133194/files

Thanks

asudarsa · 2025-04-04T17:04:42Z

clang/test/Driver/sycl-offload-arch-intel-cpu.cpp

+
+// SYCL AOT compilation to Intel CPUs using --offload-arch
+
+// RUN: %clangxx -### --offload-new-driver -fsycl --offload-arch=broadwell %s 2>&1 | \


Do we need the --offload-new-driver flag here? I thought it was turned on by default in the upstream SYCL compilation flow.

Thanks

asudarsa

Overall looks good.
Three comments:

I think all the offloadarch related logic should be moved to cuda.hpp (or a offloadarch.hpp which can reside in parallel with cuda.hpp and we can move all offloadarch related logic there)
I think the triple should be just spirv64/spirv32/spirv for Intel targets and arch should be used for specifying AOT targets (can be empty for JIT).
Please use intel specific prefix for Intel archs (e.g. igpu-bdw).

Also added a few minor comments.

Thanks

mdtoguchi · 2025-05-30T18:32:24Z

I didn't see it mentioned, but llvm#137070 has been merged that enables the common location for all offload arch values.

[Driver][SYCL] Enable --offload-arch support for SYCL offloading.

2f71c26

srividya-sundaram commented Mar 8, 2025

View reviewed changes

srividya-sundaram marked this pull request as ready for review March 10, 2025 17:18

srividya-sundaram changed the title ~~[Driver][SYCL] Enable --offload-arch support for SYCL offloading.~~ [WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. Mar 10, 2025

srividya-sundaram marked this pull request as draft March 10, 2025 17:21

srividya-sundaram requested review from sarnex and mdtoguchi March 10, 2025 17:32

srividya-sundaram requested a review from bader March 10, 2025 19:51

sarnex reviewed Mar 10, 2025

View reviewed changes

bader mentioned this pull request Mar 13, 2025

[SYCL] Add support AOT compilation for Intel GPUs in clang-sycl-linker jzc/llvm-project#1

Open

asudarsa reviewed Apr 4, 2025

View reviewed changes

asudarsa suggested changes Apr 4, 2025

View reviewed changes

	"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs.");
	"Only OpenMP, CUDA, and SYCL offloading kinds are supported for NVIDIA GPUs.");

	StringToOffloadArchSYCL(llvm::StringRef ArchNameAsString);
	StringToOffloadArchSYCL(StringRef ArchNameAsString);

	"must pass in an explicit cpu or gpu architecture to '--offload-arch'">;
	"must pass in a valid cpu or gpu architecture string to '--offload-arch'">;


		// SYCL AOT compilation to Intel CPUs using --offload-arch

		// RUN: %clangxx -### --offload-new-driver -fsycl --offload-arch=broadwell %s 2>&1 \| \

[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2

Are you sure you want to change the base?

[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2

Uh oh!

Conversation

srividya-sundaram commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex commented Mar 10, 2025

Uh oh!

mdtoguchi commented Mar 10, 2025

Uh oh!

sarnex commented Mar 10, 2025

Uh oh!

srividya-sundaram commented Mar 10, 2025

Uh oh!

bader commented Mar 10, 2025

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bader Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srividya-sundaram commented Mar 7, 2025 •

edited

Loading

bader Mar 12, 2025 •

edited

Loading

sarnex Mar 12, 2025 •

edited

Loading

asudarsa Apr 4, 2025 •

edited

Loading