-
Notifications
You must be signed in to change notification settings - Fork 0
[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -106,6 +106,8 @@ enum class OffloadArch { | |||
GFX90a, | |||
GFX90c, | |||
GFX9_4_GENERIC, | |||
GFX940, | |||
GFX941, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add code to enable SYCL offloading to these AMD GPU targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These targets have been removed in community, we should follow and not add them in our upstream efforts.
return false; | ||
if(SYCLTriple.isSPIRAOT()) | ||
return false; | ||
} | ||
// Check current set of triples to see if the default has already been set. | ||
for (const auto &SYCLTriple : SYCLTriples) { | ||
if (SYCLTriple.getSubArch() == llvm::Triple::NoSubArch && | ||
SYCLTriple.isSPIROrSPIRV()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Update this check to SYCLTriple.isSPIRV()
Sorry is this the upstream version of Mike's internal PR? |
@sarnex , no the internal PR is for OpenMP enabling of |
Got it, thx |
This PR is for enabling SYCL offloading to Intel and third-party GPUs and CPUs in the community clang(llvm-project) and a follow up to llvm#117268 |
To be honest, I found these names very confusing. As a user, it's hard to understand the difference between Do you think it's possible to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial pass, thanks!
/// Vector of Macros that need to be added to the Host compilation in a | ||
/// SYCL based offloading scenario. These macros are gathered during | ||
/// construction of the device compilations. | ||
mutable std::vector<std::string> SYCLTargetMacroArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i took a quick look and didn't see openmp doing the same thing, is it possible to do what the other offloading languages do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIC, SYCL target macros are generated during SYCL device compilation and they are passed to both SYCL device and host compilation step.
Example:
clang -### -fsycl --offload-arch=bdw -nogpulib test.cpp
clang-20 -cc1 "-fsycl-is-device" -D__SYCL_TARGET_INTEL_GPU_BDW__ .....
clang-20 -cc1 "-fsycl-is-host" "-D__SYCL_TARGET_INTEL_GPU_BDW__" .......
This is the existing behavior ported from intel/llvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These macros mean that we compile for specific target. Typically, they are set by the LLVM backend. We don't have LLVM backend for SPIR target, so we added them in the driver.
Considering that we have SPIR-V backend, we can consider moving this logic to SPIR-V backend, but I'm not sure how community will respond to adding INTEL specific macros to SPIR-V backend.
Maybe we should consider adding Intel's flavor to SPIR-V as AMD folks. They added some macro here: https://github.com/llvm/llvm-project/pull/89796/files#diff-4c95416395669c428da9b77967b9f70792b863f8b32906bbc39692ec0988ab9fR93-R95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should have some reasonable success if we add 'limited' number of macros. How many do we expect to add?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have unique values for each GPU target. So each potential GPU we support, we would have a macro that is associated with it. Given the number we currently have here:
SmallString<64> clang::driver::getGenDeviceMacro(StringRef DeviceName) { |
__SYCL_TARGET_INTEL_X86_64__
for CPU. Additionally, the driver is emitting all of the __SYCL_ANY_DEVICE_HAS_*__
and __SYCL_ALL_DEVICES_HAVE_*__
macros. Definitely not 'limited' by any means.
@@ -846,16 +850,25 @@ static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) { | |||
|
|||
static bool addSYCLDefaultTriple(Compilation &C, | |||
SmallVectorImpl<llvm::Triple> &SYCLTriples) { | |||
|
|||
llvm::Triple DefaultTriple = getSYCLDeviceTriple( | |||
C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've never seen spirv32
used, does it actually work? can IGC compile it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This was originally introduced by @mdtoguchi in this PR
intel/llvm has spir64 as the default target when -fsycl
is passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way back when - when we still supported 32-bit targets with the DPC++ compiler, the equivalent usage for the device compilation was spir
, and this is the natural extension of that when moving to spirv based arch values. If we need to restrict the targets for IGC, it would be good to do it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with the IGC. IGC compiles SPIR-V. The right question is: "Does LLVM-SPIRV-Translator support LLVM with spirv32
target triple?" According to my understanding, the answer to this question is "Yes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned IGC because of the data type size differences, but yeah the translator behavior is important too. If the translator/SPIR-V backend work with it, we should support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, 32-bit targets were important for some platforms like Android and/or Windows. I don't think SYCL is enabled on such platforms though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
@@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C, | |||
// -ffreestanding cannot be used with -fsycl | |||
argSYCLIncompatible(options::OPT_ffreestanding); | |||
|
|||
// Map of SYCL target triple strings to their corresponding target archs. | |||
// Example: spir64_x86_64 --> SKYLAKEAVX512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can the example have multiple arches? that should be possible given the map value is a set right?
} | ||
} else if (Triple.getSubArch() == llvm::Triple::SPIRSubArch_x86_64) | ||
Macro = "-D__SYCL_TARGET_INTEL_X86_64__"; | ||
if (Macro.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (Macro.size()) { | |
if (!Macro.empty()) { |
DeviceOffloadingKind == Action::OFK_Cuda) && | ||
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); | ||
DeviceOffloadingKind == Action::OFK_Cuda || DeviceOffloadingKind == Action::OFK_SYCL) && | ||
"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs."); | |
"Only OpenMP, CUDA, and SYCL offloading kinds are supported for NVIDIA GPUs."); |
{"westmere", SYCLSupportedIntelArchs::WESTMERE}, | ||
{"sandybridge", SYCLSupportedIntelArchs::SANDYBRIDGE}, | ||
{"ivybridge", SYCLSupportedIntelArchs::IVYBRIDGE}, | ||
{"broadwell", SYCLSupportedIntelArchs::BROADWELL}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could automatically generate this with tablegen. Unfortunately, I'm not good at tablegen :)
// Check if the user provided value for --offload-arch is a valid | ||
// SYCL supported Intel AOT target. | ||
SYCLSupportedIntelArchs | ||
StringToOffloadArchSYCL(llvm::StringRef ArchNameAsString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we dont use the namespace in the later fcn
StringToOffloadArchSYCL(llvm::StringRef ArchNameAsString); | |
StringToOffloadArchSYCL(StringRef ArchNameAsString); |
@@ -797,6 +798,16 @@ static Triple::SubArchType parseSubArch(StringRef SubArchName) { | |||
if (SubArchName == "arm64ec") | |||
return Triple::AArch64SubArch_arm64ec; | |||
|
|||
if (SubArchName.starts_with("spir")) { | |||
StringRef SubArch(SubArchName); | |||
if (SubArch.consume_front("spir64_") || SubArch.consume_front("spir_")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consider if we also want to support spirv64_gen
@bader Opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads me to a separate question - given we should aways have an arch value when targeting GPU, can we get away with only using spirv64
as the triple? This would have the tools handle the proper target behavior when the arch value is encountered and not rely on the triple arch/subarch. This transition from old to new model can also provide a clean break from using spir64
.
Additional work would need to be done internally for existing usage of spir64
and spir64_gen
on build command lines and convert those accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a good question, my understanding is AMD/NVIDIA don't support JIT at all, so they don't have to deal with AOT vs JIT, so we are the first ones.
I think this is a good topic for a discussion with a larger group, maybe the DPCPP technical forum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW what will the arch be for JIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent changes in the community have introduced a 'default' arch value, which would be used for JIT. In the packager, representation would be something like --image=file=file.bc,triple=spirv64-unknown-unknown,arch=generic,kind=sycl
. Similarly for AOT, it would look like --image=file=file.bc,triple=spirv64-unknown-unknown,arch=bdw,kind=sycl
. Of course, the arch
value here is TBD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, then i definitely like the idea of triple being only spirv64
, and the arch specifying the device/specific hw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, then i definitely like the idea of triple being only
spirv64
, and the arch specifying the device/specific hw.
May be the 'arch' can be just empty for JIT. Why do we need 'generic'?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generic
is the default value that is now filled for arch=
for the packager. This was introduced here: llvm#126655
+1 on @bader suggestion. Looking at other arches, we see that sm_* is used for NVidia backends, GFX* is used for AMD. it will be nice to have a 'Intel' prefix like 'igpu-'. One other quick pointer: In a related PR, we are proposing to move OffloadingArch into its own header file. Please take a look: llvm#133194 Thanks |
@@ -843,4 +843,14 @@ def warn_missing_include_dirs : Warning< | |||
|
|||
def err_drv_malformed_warning_suppression_mapping : Error< | |||
"failed to process suppression mapping file '%0': %1">; | |||
|
|||
def err_drv_sycl_offload_arch_missing_value : Error< | |||
"must pass in an explicit cpu or gpu architecture to '--offload-arch'">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must pass in an explicit cpu or gpu architecture to '--offload-arch'">; | |
"must pass in a valid cpu or gpu architecture string to '--offload-arch'">; |
} | ||
|
||
/// getSYCLTargetMacroArgs - return the previously gathered macro target args. | ||
llvm::ArrayRef<std::string> getSYCLTargetMacroArgs() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call this getSYCLTargetMacros? Also, why do we need to include '-D' in the macro name? Can we just not add it when including it into a compilation command?
Just a nit....
@@ -833,10 +833,14 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const { | |||
|
|||
static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) { | |||
SmallVector<StringRef, 5> SYCLAlias = {"spir", "spir64", "spirv", "spirv32", | |||
"spirv64"}; | |||
"spirv64", "spir64_x86_64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be less confusing to just have spirv, spirv32, spirv64 (for intel targets) here. Also, why is AMD missing here?
Thanks
@@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C, | |||
// -ffreestanding cannot be used with -fsycl | |||
argSYCLIncompatible(options::OPT_ffreestanding); | |||
|
|||
// Map of SYCL target triple strings to their corresponding target archs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have an implementation where we have a single triple for all Intel targets (spirv64/spirv32/spirv) and then architecture is specified solely by --offload-arch. I think we can get rid of spirv64_x86_64 and spirv_gen.
Thanks
@@ -46,8 +46,9 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( | |||
Action::OffloadKind DeviceOffloadingKind) const { | |||
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); | |||
|
|||
assert(DeviceOffloadingKind == Action::OFK_OpenMP && | |||
"Only OpenMP offloading kinds are supported."); | |||
assert((DeviceOffloadingKind == Action::OFK_OpenMP || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are supported in what scenario?
@@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||
// Set O2 optimization level by default | |||
if (!Args.getLastArg(options::OPT_O_Group)) | |||
CmdArgs.push_back("-O2"); | |||
// Add any predefined macros associated with intel_gpu* type targets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment (with intel_gpu*) might need an update.
@@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||
// Set O2 optimization level by default | |||
if (!Args.getLastArg(options::OPT_O_Group)) | |||
CmdArgs.push_back("-O2"); | |||
// Add any predefined macros associated with intel_gpu* type targets | |||
// passed in with -fsycl-targets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to use -fsycl-targets for passing target information?
Thanks
// addition to the host compilation. There is no dependence connection | ||
// between device and host where we should be able to use the offloading | ||
// arch to add the macro to the host compile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Might need a rewrite.
@@ -15,6 +15,222 @@ using namespace clang::driver::tools; | |||
using namespace clang; | |||
using namespace llvm::opt; | |||
|
|||
// Struct that relates an AOT target value with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be merged with upstream logic. Please see https://github.com/llvm/llvm-project/pull/133194/files
Thanks
|
||
// SYCL AOT compilation to Intel CPUs using --offload-arch | ||
|
||
// RUN: %clangxx -### --offload-new-driver -fsycl --offload-arch=broadwell %s 2>&1 | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the --offload-new-driver flag here? I thought it was turned on by default in the upstream SYCL compilation flow.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
Three comments:
- I think all the offloadarch related logic should be moved to cuda.hpp (or a offloadarch.hpp which can reside in parallel with cuda.hpp and we can move all offloadarch related logic there)
- I think the triple should be just spirv64/spirv32/spirv for Intel targets and arch should be used for specifying AOT targets (can be empty for JIT).
- Please use intel specific prefix for Intel archs (e.g. igpu-bdw).
Also added a few minor comments.
Thanks
This patch enables Clang to support SYCL offloading to several different architectures such as Intel CPUs, Intel GPUs, NVidia and AMD GPUs using the
offload-arch=
option.Using
offload-arch=
option, users can specify an offloading device architecture for SYCL ( in addition to CUDA, HIP or OpenMP offloading).The target triple strings are deduced from the arch values passed via
offload-arch=
option and the corresponding toolchains are constructed.Example: