-
Notifications
You must be signed in to change notification settings - Fork 0
[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -843,4 +843,14 @@ def warn_missing_include_dirs : Warning< | |||||
|
||||||
def err_drv_malformed_warning_suppression_mapping : Error< | ||||||
"failed to process suppression mapping file '%0': %1">; | ||||||
|
||||||
def err_drv_sycl_offload_arch_missing_value : Error< | ||||||
"must pass in an explicit cpu or gpu architecture to '--offload-arch'">; | ||||||
srividya-sundaram marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
def err_drv_invalid_sycl_target : Error<"SYCL target is invalid: '%0'">; | ||||||
|
||||||
def warn_drv_sycl_offload_target_duplicate : Warning< | ||||||
"SYCL offloading target '%0' is similar to target '%1' already specified; " | ||||||
"will be ignored">, InGroup<SyclTarget>; | ||||||
|
||||||
} |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -846,6 +846,23 @@ class Driver { | |||
/// Compute the default -fmodule-cache-path. | ||||
/// \return True if the system provides a default cache directory. | ||||
static bool getDefaultModuleCachePath(SmallVectorImpl<char> &Result); | ||||
|
||||
/// Vector of Macros that need to be added to the Host compilation in a | ||||
/// SYCL based offloading scenario. These macros are gathered during | ||||
/// construction of the device compilations. | ||||
mutable std::vector<std::string> SYCLTargetMacroArgs; | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i took a quick look and didn't see openmp doing the same thing, is it possible to do what the other offloading languages do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIC, SYCL target macros are generated during SYCL device compilation and they are passed to both SYCL device and host compilation step.
This is the existing behavior ported from intel/llvm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These macros mean that we compile for specific target. Typically, they are set by the LLVM backend. We don't have LLVM backend for SPIR target, so we added them in the driver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like we should have some reasonable success if we add 'limited' number of macros. How many do we expect to add? Thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have unique values for each GPU target. So each potential GPU we support, we would have a macro that is associated with it. Given the number we currently have here:
__SYCL_TARGET_INTEL_X86_64__ for CPU. Additionally, the driver is emitting all of the __SYCL_ANY_DEVICE_HAS_*__ and __SYCL_ALL_DEVICES_HAVE_*__ macros. Definitely not 'limited' by any means.
|
||||
|
||||
/// addSYCLTargetMacroArg - Add the given macro to the vector of args to be | ||||
/// added to the host compilation step. | ||||
void addSYCLTargetMacroArg(const llvm::opt::ArgList &Args, | ||||
StringRef Macro) const { | ||||
SYCLTargetMacroArgs.push_back(Args.MakeArgString(Macro)); | ||||
} | ||||
|
||||
/// getSYCLTargetMacroArgs - return the previously gathered macro target args. | ||||
llvm::ArrayRef<std::string> getSYCLTargetMacroArgs() const { | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just call this getSYCLTargetMacros? Also, why do we need to include '-D' in the macro name? Can we just not add it when including it into a compilation command? |
||||
return SYCLTargetMacroArgs; | ||||
} | ||||
}; | ||||
|
||||
/// \return True if the last defined optimization level is -Ofast. | ||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -833,10 +833,14 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const { | |||||
|
||||||
static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) { | ||||||
SmallVector<StringRef, 5> SYCLAlias = {"spir", "spir64", "spirv", "spirv32", | ||||||
"spirv64"}; | ||||||
"spirv64", "spir64_x86_64", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will be less confusing to just have spirv, spirv32, spirv64 (for intel targets) here. Also, why is AMD missing here? Thanks |
||||||
"spir64_gen", "nvptx64"}; | ||||||
if (llvm::is_contained(SYCLAlias, TargetArch)) { | ||||||
llvm::Triple TargetTriple; | ||||||
TargetTriple.setArchName(TargetArch); | ||||||
// Return the full SYCL target triple string for NVidia GPU targets. | ||||||
srividya-sundaram marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
if (TargetTriple.getArch() == llvm::Triple::nvptx64) | ||||||
return llvm::Triple("nvptx64-nvidia-cuda"); | ||||||
TargetTriple.setVendor(llvm::Triple::UnknownVendor); | ||||||
TargetTriple.setOS(llvm::Triple::UnknownOS); | ||||||
return TargetTriple; | ||||||
|
@@ -846,16 +850,25 @@ static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) { | |||||
|
||||||
static bool addSYCLDefaultTriple(Compilation &C, | ||||||
SmallVectorImpl<llvm::Triple> &SYCLTriples) { | ||||||
|
||||||
llvm::Triple DefaultTriple = getSYCLDeviceTriple( | ||||||
C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i've never seen There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. This was originally introduced by @mdtoguchi in this PR intel/llvm has spir64 as the default target when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Way back when - when we still supported 32-bit targets with the DPC++ compiler, the equivalent usage for the device compilation was There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has nothing to do with the IGC. IGC compiles SPIR-V. The right question is: "Does LLVM-SPIRV-Translator support LLVM with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mentioned IGC because of the data type size differences, but yeah the translator behavior is important too. If the translator/SPIR-V backend work with it, we should support it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, 32-bit targets were important for some platforms like Android and/or Windows. I don't think SYCL is enabled on such platforms though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, thanks. |
||||||
: "spirv64"); | ||||||
for (const auto &SYCLTriple : SYCLTriples) { | ||||||
if (SYCLTriple == DefaultTriple) | ||||||
return false; | ||||||
// If we encounter a known non-spir* target, do not add the default triple. | ||||||
if (SYCLTriple.isNVPTX() || SYCLTriple.isAMDGCN()) | ||||||
return false; | ||||||
if(SYCLTriple.isSPIRAOT()) | ||||||
return false; | ||||||
} | ||||||
// Check current set of triples to see if the default has already been set. | ||||||
for (const auto &SYCLTriple : SYCLTriples) { | ||||||
if (SYCLTriple.getSubArch() == llvm::Triple::NoSubArch && | ||||||
SYCLTriple.isSPIROrSPIRV()) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: Update this check to |
||||||
return false; | ||||||
} | ||||||
// Add the default triple as it was not found. | ||||||
llvm::Triple DefaultTriple = getSYCLDeviceTriple( | ||||||
C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32" | ||||||
: "spirv64"); | ||||||
SYCLTriples.insert(SYCLTriples.begin(), DefaultTriple); | ||||||
return true; | ||||||
} | ||||||
|
@@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C, | |||||
// -ffreestanding cannot be used with -fsycl | ||||||
argSYCLIncompatible(options::OPT_ffreestanding); | ||||||
|
||||||
// Map of SYCL target triple strings to their corresponding target archs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer to have an implementation where we have a single triple for all Intel targets (spirv64/spirv32/spirv) and then architecture is specified solely by --offload-arch. I think we can get rid of spirv64_x86_64 and spirv_gen. Thanks |
||||||
// Example: spir64_x86_64 --> SKYLAKEAVX512 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can the example have multiple arches? that should be possible given the map value is a set right? |
||||||
llvm::StringMap<llvm::DenseSet<StringRef>> DerivedArchs; | ||||||
llvm::StringMap<StringRef> FoundNormalizedTriples; | ||||||
llvm::SmallVector<llvm::Triple, 4> UniqueSYCLTriplesVec; | ||||||
|
||||||
// StringSet to contain SYCL target triples. | ||||||
llvm::StringSet<> SYCLTriples; | ||||||
// If the user specified --offload-arch, deduce the offloading | ||||||
// target triple(s) from the set of architecture(s). | ||||||
// Create a toolchain for each valid triple. | ||||||
// We do not support SYCL offloading if any of the inputs is a | ||||||
// .cu (for CUDA type) or .hip (for HIP type) file. | ||||||
if (IsSYCL) { | ||||||
addSYCLDefaultTriple(C, UniqueSYCLTriplesVec); | ||||||
if(C.getInputArgs().hasArg(options::OPT_offload_arch_EQ) && !IsHIP && | ||||||
!IsCuda) { | ||||||
|
||||||
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>(); | ||||||
auto AMDTriple = getHIPOffloadTargetTriple(*this, C.getInputArgs()); | ||||||
auto NVPTXTriple = getNVIDIAOffloadTargetTriple(*this, C.getInputArgs(), | ||||||
HostTC->getTriple()); | ||||||
|
||||||
// Attempt to deduce the offloading triple from the set of architectures. | ||||||
// We need to temporarily create these toolchains so that we can access | ||||||
// tools for inferring architectures. | ||||||
llvm::DenseSet<StringRef> Archs; | ||||||
if (NVPTXTriple) { | ||||||
auto TempTC = std::make_unique<toolchains::CudaToolChain>( | ||||||
*this, *NVPTXTriple, *HostTC, C.getInputArgs()); | ||||||
for (StringRef Arch : | ||||||
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, &*TempTC, true)) | ||||||
Archs.insert(Arch); | ||||||
} | ||||||
if (AMDTriple) { | ||||||
auto TempTC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>( | ||||||
*this, *AMDTriple, *HostTC, C.getInputArgs()); | ||||||
for (StringRef Arch : | ||||||
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, &*TempTC, true)) | ||||||
Archs.insert(Arch); | ||||||
} | ||||||
if (!AMDTriple && !NVPTXTriple) { | ||||||
for (StringRef Arch : | ||||||
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, nullptr, true)) | ||||||
Archs.insert(Arch); | ||||||
} | ||||||
for (StringRef Arch : Archs) { | ||||||
if (NVPTXTriple && IsSYCLSupportedNVidiaGPUArch(StringToOffloadArch( | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we have a |
||||||
getProcessorFromTargetID(*NVPTXTriple, Arch)))) { | ||||||
DerivedArchs[NVPTXTriple->getTriple()].insert(Arch); | ||||||
} else if (AMDTriple && | ||||||
IsSYCLSupportedAMDGPUArch(StringToOffloadArch( | ||||||
getProcessorFromTargetID(*AMDTriple, Arch)))) { | ||||||
DerivedArchs[AMDTriple->getTriple()].insert(Arch); | ||||||
} else if (IsSYCLSupportedIntelCPUArch(StringToOffloadArchSYCL(Arch))) { | ||||||
DerivedArchs[getSYCLDeviceTriple("spir64_x86_64").getTriple()].insert( | ||||||
Arch); | ||||||
} else if (IsSYCLSupportedIntelGPUArch(StringToOffloadArchSYCL(Arch))) { | ||||||
StringRef IntelGPUArch; | ||||||
// For Intel Graphics AOT target, valid values for '--offload-arch' | ||||||
// are mapped to valid device names accepted by OCLOC (the Intel GPU AOT | ||||||
// compiler) via the '-device' option. The mapIntelGPUArchName | ||||||
// function maps the accepted values for '--offload-arch' to enable SYCL | ||||||
// offloading to Intel GPUs and the corresponding '-device' value passed | ||||||
// to OCLOC. | ||||||
IntelGPUArch = mapIntelGPUArchName(Arch).data(); | ||||||
DerivedArchs[getSYCLDeviceTriple("spir64_gen").getTriple()].insert( | ||||||
IntelGPUArch); | ||||||
} else { | ||||||
Diag(clang::diag::err_drv_invalid_sycl_target) << Arch; | ||||||
return; | ||||||
} | ||||||
} | ||||||
// Emit an error if architecture value is not provided | ||||||
// to --offload-arch. | ||||||
if (Archs.empty()) { | ||||||
Diag(clang::diag::err_drv_sycl_offload_arch_missing_value); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can only get here for AOT right? or is the idea we will start doing |
||||||
return; | ||||||
} | ||||||
|
||||||
for (const auto &TripleAndArchs : DerivedArchs) | ||||||
SYCLTriples.insert(TripleAndArchs.first()); | ||||||
|
||||||
for (const auto &Val : SYCLTriples) { | ||||||
llvm::Triple SYCLTargetTriple(getSYCLDeviceTriple(Val.getKey())); | ||||||
std::string NormalizedName = SYCLTargetTriple.normalize(); | ||||||
|
||||||
// Make sure we don't have a duplicate triple. | ||||||
auto Duplicate = FoundNormalizedTriples.find(NormalizedName); | ||||||
if (Duplicate != FoundNormalizedTriples.end()) { | ||||||
Diag(clang::diag::warn_drv_sycl_offload_target_duplicate) | ||||||
<< Val.getKey() << Duplicate->second; | ||||||
continue; | ||||||
} | ||||||
|
||||||
// Store the current triple so that we can check for duplicates in the | ||||||
// following iterations. | ||||||
FoundNormalizedTriples[NormalizedName] = Val.getKey(); | ||||||
UniqueSYCLTriplesVec.push_back(SYCLTargetTriple); | ||||||
} | ||||||
|
||||||
addSYCLDefaultTriple(C, UniqueSYCLTriplesVec); | ||||||
} else | ||||||
addSYCLDefaultTriple(C, UniqueSYCLTriplesVec); | ||||||
|
||||||
// We'll need to use the SYCL and host triples as the key into | ||||||
// getOffloadingDeviceToolChain, because the device toolchains we're | ||||||
// getOffloadToolChain, because the device toolchains we're | ||||||
// going to create will depend on both. | ||||||
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>(); | ||||||
for (const auto &TT : UniqueSYCLTriplesVec) { | ||||||
auto SYCLTC = &getOffloadToolChain(C.getInputArgs(), Action::OFK_SYCL, TT, | ||||||
HostTC->getTriple()); | ||||||
C.addOffloadDeviceToolChain(SYCLTC, Action::OFK_SYCL); | ||||||
if (DerivedArchs.contains(TT.getTriple())) | ||||||
KnownArchs[SYCLTC] = DerivedArchs[TT.getTriple()]; | ||||||
} | ||||||
} | ||||||
|
||||||
|
@@ -6596,7 +6709,7 @@ const ToolChain &Driver::getOffloadToolChain( | |||||
if (Kind == Action::OFK_HIP) | ||||||
TC = std::make_unique<toolchains::HIPAMDToolChain>(*this, Target, | ||||||
*HostTC, Args); | ||||||
else if (Kind == Action::OFK_OpenMP) | ||||||
else if ((Kind == Action::OFK_OpenMP) || (Kind == Action::OFK_SYCL)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
|
||||||
TC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(*this, Target, | ||||||
*HostTC, Args); | ||||||
break; | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -46,8 +46,9 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions( | |||||
Action::OffloadKind DeviceOffloadingKind) const { | ||||||
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind); | ||||||
|
||||||
assert(DeviceOffloadingKind == Action::OFK_OpenMP && | ||||||
"Only OpenMP offloading kinds are supported."); | ||||||
assert((DeviceOffloadingKind == Action::OFK_OpenMP || | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are supported in what scenario? |
||||||
DeviceOffloadingKind == Action::OFK_SYCL) && | ||||||
"Only OpenMP or SYCL offloading kinds are supported."); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
if (!DriverArgs.hasFlag(options::OPT_offloadlib, options::OPT_no_offloadlib, | ||||||
true)) | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -5091,6 +5091,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||||
InputInfoList HostOffloadingInputs; | ||||||
const InputInfo *CudaDeviceInput = nullptr; | ||||||
const InputInfo *OpenMPDeviceInput = nullptr; | ||||||
const InputInfo *SYCLDeviceInput = nullptr; | ||||||
for (const InputInfo &I : Inputs) { | ||||||
if (&I == &Input || I.getType() == types::TY_Nothing) { | ||||||
// This is the primary input or contains nothing. | ||||||
|
@@ -5108,13 +5109,15 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||||
CudaDeviceInput = &I; | ||||||
} else if (IsOpenMPDevice && !OpenMPDeviceInput) { | ||||||
OpenMPDeviceInput = &I; | ||||||
} else if (IsSYCL && !SYCLDeviceInput) { | ||||||
SYCLDeviceInput = &I; | ||||||
} else { | ||||||
llvm_unreachable("unexpectedly given multiple inputs"); | ||||||
} | ||||||
} | ||||||
|
||||||
const llvm::Triple *AuxTriple = | ||||||
(IsCuda || IsHIP) ? TC.getAuxTriple() : nullptr; | ||||||
(IsSYCL || IsCuda || IsHIP) ? TC.getAuxTriple() : nullptr; | ||||||
bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment(); | ||||||
bool IsUEFI = RawTriple.isUEFI(); | ||||||
bool IsIAMCU = RawTriple.isOSIAMCU(); | ||||||
|
@@ -5208,6 +5211,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||||
|
||||||
if (IsSYCL) { | ||||||
if (IsSYCLDevice) { | ||||||
if (Triple.isNVPTX()) { | ||||||
StringRef GPUArchName = JA.getOffloadingArch(); | ||||||
// TODO: Once default arch is moved to at least SM_53, empty arch should | ||||||
// also result in the flag added. | ||||||
if (!GPUArchName.empty() && | ||||||
StringToOffloadArch(GPUArchName) >= OffloadArch::SM_53) | ||||||
CmdArgs.push_back("-fnative-half-type"); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mind adding more detail on why we need this flag? do we do this already in intel/llvm? |
||||||
} | ||||||
// Host triple is needed when doing SYCL device compilations. | ||||||
llvm::Triple AuxT = C.getDefaultToolChain().getTriple(); | ||||||
std::string NormalizedTriple = AuxT.normalize(); | ||||||
|
@@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, | |||||
// Set O2 optimization level by default | ||||||
if (!Args.getLastArg(options::OPT_O_Group)) | ||||||
CmdArgs.push_back("-O2"); | ||||||
// Add any predefined macros associated with intel_gpu* type targets | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment (with intel_gpu*) might need an update. |
||||||
// passed in with -fsycl-targets | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we intend to use -fsycl-targets for passing target information? Thanks |
||||||
// TODO: Macros are populated during device compilations and saved for | ||||||
// addition to the host compilation. There is no dependence connection | ||||||
// between device and host where we should be able to use the offloading | ||||||
// arch to add the macro to the host compile. | ||||||
Comment on lines
+5237
to
+5239
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Might need a rewrite. |
||||||
auto addTargetMacros = [&](const llvm::Triple &Triple) { | ||||||
if (!Triple.isSPIR() && !Triple.isNVPTX() && !Triple.isAMDGCN()) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we probably want to use |
||||||
return; | ||||||
SmallString<64> Macro; | ||||||
if ((Triple.isSPIR() && | ||||||
Triple.getSubArch() == llvm::Triple::SPIRSubArch_gen) || | ||||||
Triple.isNVPTX() || Triple.isAMDGCN()) { | ||||||
Comment on lines
+5244
to
+5246
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An |
||||||
StringRef Device = JA.getOffloadingArch(); | ||||||
if (!Device.empty() && | ||||||
!clang::driver::getGenDeviceMacro(Device).empty()) { | ||||||
Macro = "-D"; | ||||||
Macro += clang::driver::getGenDeviceMacro(Device); | ||||||
} | ||||||
} else if (Triple.getSubArch() == llvm::Triple::SPIRSubArch_x86_64) | ||||||
Macro = "-D__SYCL_TARGET_INTEL_X86_64__"; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: we could move this case into |
||||||
if (Macro.size()) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
|
||||||
CmdArgs.push_back(Args.MakeArgString(Macro)); | ||||||
D.addSYCLTargetMacroArg(Args, Macro); | ||||||
} | ||||||
}; | ||||||
addTargetMacros(RawTriple); | ||||||
} else { | ||||||
// Add any options that are needed specific to SYCL offload while | ||||||
// performing the host side compilation. | ||||||
|
||||||
// Let the front-end host compilation flow know about SYCL offload | ||||||
// compilation. | ||||||
CmdArgs.push_back("-fsycl-is-host"); | ||||||
|
||||||
// Add the SYCL target macro arguments that were generated during the | ||||||
// device compilation step. | ||||||
for (auto &Macro : D.getSYCLTargetMacroArgs()) | ||||||
CmdArgs.push_back(Args.MakeArgString(Macro)); | ||||||
} | ||||||
|
||||||
// Set options for both host and device. | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -848,8 +848,8 @@ void CudaToolChain::addClangTargetOptions( | |||||
|
||||||
StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ); | ||||||
assert((DeviceOffloadingKind == Action::OFK_OpenMP || | ||||||
DeviceOffloadingKind == Action::OFK_Cuda) && | ||||||
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs."); | ||||||
DeviceOffloadingKind == Action::OFK_Cuda || DeviceOffloadingKind == Action::OFK_SYCL) && | ||||||
"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs."); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
CC1Args.append({"-fcuda-is-device", "-mllvm", | ||||||
"-enable-memcpyopt-without-libcalls", | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add code to enable SYCL offloading to these AMD GPU targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These targets have been removed in community, we should follow and not add them in our upstream efforts.