Skip to content

[WIP][Driver][SYCL] Enable --offload-arch support for SYCL offloading. #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/Cuda.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ enum class OffloadArch {
GFX90a,
GFX90c,
GFX9_4_GENERIC,
GFX940,
GFX941,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add code to enable SYCL offloading to these AMD GPU targets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These targets have been removed in community, we should follow and not add them in our upstream efforts.

GFX942,
GFX950,
GFX10_1_GENERIC,
Expand Down
10 changes: 10 additions & 0 deletions clang/include/clang/Basic/DiagnosticDriverKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -843,4 +843,14 @@ def warn_missing_include_dirs : Warning<

def err_drv_malformed_warning_suppression_mapping : Error<
"failed to process suppression mapping file '%0': %1">;

def err_drv_sycl_offload_arch_missing_value : Error<
"must pass in an explicit cpu or gpu architecture to '--offload-arch'">;
Copy link

@asudarsa asudarsa Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"must pass in an explicit cpu or gpu architecture to '--offload-arch'">;
"must pass in a valid cpu or gpu architecture string to '--offload-arch'">;


def err_drv_invalid_sycl_target : Error<"SYCL target is invalid: '%0'">;

def warn_drv_sycl_offload_target_duplicate : Warning<
"SYCL offloading target '%0' is similar to target '%1' already specified; "
"will be ignored">, InGroup<SyclTarget>;

}
4 changes: 4 additions & 0 deletions clang/include/clang/Basic/DiagnosticGroups.td
Original file line number Diff line number Diff line change
Expand Up @@ -1628,3 +1628,7 @@ def ExplicitSpecializationStorageClass : DiagGroup<"explicit-specialization-stor

// A warning for options that enable a feature that is not yet complete
def ExperimentalOption : DiagGroup<"experimental-option">;

// SYCL Warnings
def SyclTarget : DiagGroup<"sycl-target">;

17 changes: 17 additions & 0 deletions clang/include/clang/Driver/Driver.h
Original file line number Diff line number Diff line change
Expand Up @@ -846,6 +846,23 @@ class Driver {
/// Compute the default -fmodule-cache-path.
/// \return True if the system provides a default cache directory.
static bool getDefaultModuleCachePath(SmallVectorImpl<char> &Result);

/// Vector of Macros that need to be added to the Host compilation in a
/// SYCL based offloading scenario. These macros are gathered during
/// construction of the device compilations.
mutable std::vector<std::string> SYCLTargetMacroArgs;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i took a quick look and didn't see openmp doing the same thing, is it possible to do what the other offloading languages do?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIC, SYCL target macros are generated during SYCL device compilation and they are passed to both SYCL device and host compilation step.
Example:

clang -###  -fsycl --offload-arch=bdw -nogpulib test.cpp

clang-20   -cc1  "-fsycl-is-device"  -D__SYCL_TARGET_INTEL_GPU_BDW__   .....
clang-20 -cc1    "-fsycl-is-host" "-D__SYCL_TARGET_INTEL_GPU_BDW__"   .......

This is the existing behavior ported from intel/llvm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These macros mean that we compile for specific target. Typically, they are set by the LLVM backend. We don't have LLVM backend for SPIR target, so we added them in the driver.
Considering that we have SPIR-V backend, we can consider moving this logic to SPIR-V backend, but I'm not sure how community will respond to adding INTEL specific macros to SPIR-V backend.
Maybe we should consider adding Intel's flavor to SPIR-V as AMD folks. They added some macro here: https://github.com/llvm/llvm-project/pull/89796/files#diff-4c95416395669c428da9b77967b9f70792b863f8b32906bbc39692ec0988ab9fR93-R95

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should have some reasonable success if we add 'limited' number of macros. How many do we expect to add?

Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have unique values for each GPU target. So each potential GPU we support, we would have a macro that is associated with it. Given the number we currently have here:

SmallString<64> clang::driver::getGenDeviceMacro(StringRef DeviceName) {
the number isn't small. We would have those, and we also have one generic __SYCL_TARGET_INTEL_X86_64__ for CPU. Additionally, the driver is emitting all of the __SYCL_ANY_DEVICE_HAS_*__ and __SYCL_ALL_DEVICES_HAVE_*__ macros. Definitely not 'limited' by any means.


/// addSYCLTargetMacroArg - Add the given macro to the vector of args to be
/// added to the host compilation step.
void addSYCLTargetMacroArg(const llvm::opt::ArgList &Args,
StringRef Macro) const {
SYCLTargetMacroArgs.push_back(Args.MakeArgString(Macro));
}

/// getSYCLTargetMacroArgs - return the previously gathered macro target args.
llvm::ArrayRef<std::string> getSYCLTargetMacroArgs() const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just call this getSYCLTargetMacros? Also, why do we need to include '-D' in the macro name? Can we just not add it when including it into a compilation command?
Just a nit....

return SYCLTargetMacroArgs;
}
};

/// \return True if the last defined optimization level is -Ofast.
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/Basic/Cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ static const OffloadArchToStringMap arch_names[] = {
GFX(90a), // gfx90a
GFX(90c), // gfx90c
{OffloadArch::GFX9_4_GENERIC, "gfx9-4-generic", "compute_amdgcn"},
GFX(940), // gfx940
GFX(941), // gfx941
GFX(942), // gfx942
GFX(950), // gfx950
{OffloadArch::GFX10_1_GENERIC, "gfx10-1-generic", "compute_amdgcn"},
Expand Down
131 changes: 122 additions & 9 deletions clang/lib/Driver/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -833,10 +833,14 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {

static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) {
SmallVector<StringRef, 5> SYCLAlias = {"spir", "spir64", "spirv", "spirv32",
"spirv64"};
"spirv64", "spir64_x86_64",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be less confusing to just have spirv, spirv32, spirv64 (for intel targets) here. Also, why is AMD missing here?

Thanks

"spir64_gen", "nvptx64"};
if (llvm::is_contained(SYCLAlias, TargetArch)) {
llvm::Triple TargetTriple;
TargetTriple.setArchName(TargetArch);
// Return the full SYCL target triple string for NVidia GPU targets.
if (TargetTriple.getArch() == llvm::Triple::nvptx64)
return llvm::Triple("nvptx64-nvidia-cuda");
TargetTriple.setVendor(llvm::Triple::UnknownVendor);
TargetTriple.setOS(llvm::Triple::UnknownOS);
return TargetTriple;
Expand All @@ -846,16 +850,25 @@ static llvm::Triple getSYCLDeviceTriple(StringRef TargetArch) {

static bool addSYCLDefaultTriple(Compilation &C,
SmallVectorImpl<llvm::Triple> &SYCLTriples) {

llvm::Triple DefaultTriple = getSYCLDeviceTriple(
C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've never seen spirv32 used, does it actually work? can IGC compile it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This was originally introduced by @mdtoguchi in this PR

intel/llvm has spir64 as the default target when -fsycl is passed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way back when - when we still supported 32-bit targets with the DPC++ compiler, the equivalent usage for the device compilation was spir, and this is the natural extension of that when moving to spirv based arch values. If we need to restrict the targets for IGC, it would be good to do it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with the IGC. IGC compiles SPIR-V. The right question is: "Does LLVM-SPIRV-Translator support LLVM with spirv32 target triple?" According to my understanding, the answer to this question is "Yes".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned IGC because of the data type size differences, but yeah the translator behavior is important too. If the translator/SPIR-V backend work with it, we should support it.

Copy link
Collaborator

@bader bader Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, 32-bit targets were important for some platforms like Android and/or Windows. I don't think SYCL is enabled on such platforms though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

: "spirv64");
for (const auto &SYCLTriple : SYCLTriples) {
if (SYCLTriple == DefaultTriple)
return false;
// If we encounter a known non-spir* target, do not add the default triple.
if (SYCLTriple.isNVPTX() || SYCLTriple.isAMDGCN())
return false;
if(SYCLTriple.isSPIRAOT())
return false;
}
// Check current set of triples to see if the default has already been set.
for (const auto &SYCLTriple : SYCLTriples) {
if (SYCLTriple.getSubArch() == llvm::Triple::NoSubArch &&
SYCLTriple.isSPIROrSPIRV())
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Update this check to SYCLTriple.isSPIRV()

return false;
}
// Add the default triple as it was not found.
llvm::Triple DefaultTriple = getSYCLDeviceTriple(
C.getDefaultToolChain().getTriple().isArch32Bit() ? "spirv32"
: "spirv64");
SYCLTriples.insert(SYCLTriples.begin(), DefaultTriple);
return true;
}
Expand Down Expand Up @@ -1066,19 +1079,119 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
// -ffreestanding cannot be used with -fsycl
argSYCLIncompatible(options::OPT_ffreestanding);

// Map of SYCL target triple strings to their corresponding target archs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have an implementation where we have a single triple for all Intel targets (spirv64/spirv32/spirv) and then architecture is specified solely by --offload-arch. I think we can get rid of spirv64_x86_64 and spirv_gen.

Thanks

// Example: spir64_x86_64 --> SKYLAKEAVX512
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can the example have multiple arches? that should be possible given the map value is a set right?

llvm::StringMap<llvm::DenseSet<StringRef>> DerivedArchs;
llvm::StringMap<StringRef> FoundNormalizedTriples;
llvm::SmallVector<llvm::Triple, 4> UniqueSYCLTriplesVec;

// StringSet to contain SYCL target triples.
llvm::StringSet<> SYCLTriples;
// If the user specified --offload-arch, deduce the offloading
// target triple(s) from the set of architecture(s).
// Create a toolchain for each valid triple.
// We do not support SYCL offloading if any of the inputs is a
// .cu (for CUDA type) or .hip (for HIP type) file.
if (IsSYCL) {
addSYCLDefaultTriple(C, UniqueSYCLTriplesVec);
if(C.getInputArgs().hasArg(options::OPT_offload_arch_EQ) && !IsHIP &&
!IsCuda) {

const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
auto AMDTriple = getHIPOffloadTargetTriple(*this, C.getInputArgs());
auto NVPTXTriple = getNVIDIAOffloadTargetTriple(*this, C.getInputArgs(),
HostTC->getTriple());

// Attempt to deduce the offloading triple from the set of architectures.
// We need to temporarily create these toolchains so that we can access
// tools for inferring architectures.
llvm::DenseSet<StringRef> Archs;
if (NVPTXTriple) {
auto TempTC = std::make_unique<toolchains::CudaToolChain>(
*this, *NVPTXTriple, *HostTC, C.getInputArgs());
for (StringRef Arch :
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, &*TempTC, true))
Archs.insert(Arch);
}
if (AMDTriple) {
auto TempTC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
*this, *AMDTriple, *HostTC, C.getInputArgs());
for (StringRef Arch :
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, &*TempTC, true))
Archs.insert(Arch);
}
if (!AMDTriple && !NVPTXTriple) {
for (StringRef Arch :
getOffloadArchs(C, C.getArgs(), Action::OFK_SYCL, nullptr, true))
Archs.insert(Arch);
}
for (StringRef Arch : Archs) {
if (NVPTXTriple && IsSYCLSupportedNVidiaGPUArch(StringToOffloadArch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have a getOffloadArch single function like is done for AMDGPU/NVPTX and call that here?

getProcessorFromTargetID(*NVPTXTriple, Arch)))) {
DerivedArchs[NVPTXTriple->getTriple()].insert(Arch);
} else if (AMDTriple &&
IsSYCLSupportedAMDGPUArch(StringToOffloadArch(
getProcessorFromTargetID(*AMDTriple, Arch)))) {
DerivedArchs[AMDTriple->getTriple()].insert(Arch);
} else if (IsSYCLSupportedIntelCPUArch(StringToOffloadArchSYCL(Arch))) {
DerivedArchs[getSYCLDeviceTriple("spir64_x86_64").getTriple()].insert(
Arch);
} else if (IsSYCLSupportedIntelGPUArch(StringToOffloadArchSYCL(Arch))) {
StringRef IntelGPUArch;
// For Intel Graphics AOT target, valid values for '--offload-arch'
// are mapped to valid device names accepted by OCLOC (the Intel GPU AOT
// compiler) via the '-device' option. The mapIntelGPUArchName
// function maps the accepted values for '--offload-arch' to enable SYCL
// offloading to Intel GPUs and the corresponding '-device' value passed
// to OCLOC.
IntelGPUArch = mapIntelGPUArchName(Arch).data();
DerivedArchs[getSYCLDeviceTriple("spir64_gen").getTriple()].insert(
IntelGPUArch);
} else {
Diag(clang::diag::err_drv_invalid_sycl_target) << Arch;
return;
}
}
// Emit an error if architecture value is not provided
// to --offload-arch.
if (Archs.empty()) {
Diag(clang::diag::err_drv_sycl_offload_arch_missing_value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can only get here for AOT right? or is the idea we will start doing -fsycl --offload-targets=spirv64 for JIT instead of -fsycl -fsycl-targets=spirv64

return;
}

for (const auto &TripleAndArchs : DerivedArchs)
SYCLTriples.insert(TripleAndArchs.first());

for (const auto &Val : SYCLTriples) {
llvm::Triple SYCLTargetTriple(getSYCLDeviceTriple(Val.getKey()));
std::string NormalizedName = SYCLTargetTriple.normalize();

// Make sure we don't have a duplicate triple.
auto Duplicate = FoundNormalizedTriples.find(NormalizedName);
if (Duplicate != FoundNormalizedTriples.end()) {
Diag(clang::diag::warn_drv_sycl_offload_target_duplicate)
<< Val.getKey() << Duplicate->second;
continue;
}

// Store the current triple so that we can check for duplicates in the
// following iterations.
FoundNormalizedTriples[NormalizedName] = Val.getKey();
UniqueSYCLTriplesVec.push_back(SYCLTargetTriple);
}

addSYCLDefaultTriple(C, UniqueSYCLTriplesVec);
} else
addSYCLDefaultTriple(C, UniqueSYCLTriplesVec);

// We'll need to use the SYCL and host triples as the key into
// getOffloadingDeviceToolChain, because the device toolchains we're
// getOffloadToolChain, because the device toolchains we're
// going to create will depend on both.
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
for (const auto &TT : UniqueSYCLTriplesVec) {
auto SYCLTC = &getOffloadToolChain(C.getInputArgs(), Action::OFK_SYCL, TT,
HostTC->getTriple());
C.addOffloadDeviceToolChain(SYCLTC, Action::OFK_SYCL);
if (DerivedArchs.contains(TT.getTriple()))
KnownArchs[SYCLTC] = DerivedArchs[TT.getTriple()];
}
}

Expand Down Expand Up @@ -6596,7 +6709,7 @@ const ToolChain &Driver::getOffloadToolChain(
if (Kind == Action::OFK_HIP)
TC = std::make_unique<toolchains::HIPAMDToolChain>(*this, Target,
*HostTC, Args);
else if (Kind == Action::OFK_OpenMP)
else if ((Kind == Action::OFK_OpenMP) || (Kind == Action::OFK_SYCL))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
else if ((Kind == Action::OFK_OpenMP) || (Kind == Action::OFK_SYCL))
else if (Kind == Action::OFK_OpenMP || Kind == Action::OFK_SYCL)

TC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(*this, Target,
*HostTC, Args);
break;
Expand Down
5 changes: 3 additions & 2 deletions clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ void AMDGPUOpenMPToolChain::addClangTargetOptions(
Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");
assert((DeviceOffloadingKind == Action::OFK_OpenMP ||
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are supported in what scenario?

DeviceOffloadingKind == Action::OFK_SYCL) &&
"Only OpenMP or SYCL offloading kinds are supported.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only OpenMP or SYCL offloading kinds are supported.");
"Only OpenMP and SYCL offloading kinds are supported.");


if (!DriverArgs.hasFlag(options::OPT_offloadlib, options::OPT_no_offloadlib,
true))
Expand Down
45 changes: 44 additions & 1 deletion clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5091,6 +5091,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
InputInfoList HostOffloadingInputs;
const InputInfo *CudaDeviceInput = nullptr;
const InputInfo *OpenMPDeviceInput = nullptr;
const InputInfo *SYCLDeviceInput = nullptr;
for (const InputInfo &I : Inputs) {
if (&I == &Input || I.getType() == types::TY_Nothing) {
// This is the primary input or contains nothing.
Expand All @@ -5108,13 +5109,15 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CudaDeviceInput = &I;
} else if (IsOpenMPDevice && !OpenMPDeviceInput) {
OpenMPDeviceInput = &I;
} else if (IsSYCL && !SYCLDeviceInput) {
SYCLDeviceInput = &I;
} else {
llvm_unreachable("unexpectedly given multiple inputs");
}
}

const llvm::Triple *AuxTriple =
(IsCuda || IsHIP) ? TC.getAuxTriple() : nullptr;
(IsSYCL || IsCuda || IsHIP) ? TC.getAuxTriple() : nullptr;
bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment();
bool IsUEFI = RawTriple.isUEFI();
bool IsIAMCU = RawTriple.isOSIAMCU();
Expand Down Expand Up @@ -5208,6 +5211,14 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,

if (IsSYCL) {
if (IsSYCLDevice) {
if (Triple.isNVPTX()) {
StringRef GPUArchName = JA.getOffloadingArch();
// TODO: Once default arch is moved to at least SM_53, empty arch should
// also result in the flag added.
if (!GPUArchName.empty() &&
StringToOffloadArch(GPUArchName) >= OffloadArch::SM_53)
CmdArgs.push_back("-fnative-half-type");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind adding more detail on why we need this flag? do we do this already in intel/llvm?

}
// Host triple is needed when doing SYCL device compilations.
llvm::Triple AuxT = C.getDefaultToolChain().getTriple();
std::string NormalizedTriple = AuxT.normalize();
Expand All @@ -5220,13 +5231,45 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// Set O2 optimization level by default
if (!Args.getLastArg(options::OPT_O_Group))
CmdArgs.push_back("-O2");
// Add any predefined macros associated with intel_gpu* type targets
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment (with intel_gpu*) might need an update.

// passed in with -fsycl-targets
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to use -fsycl-targets for passing target information?

Thanks

// TODO: Macros are populated during device compilations and saved for
// addition to the host compilation. There is no dependence connection
// between device and host where we should be able to use the offloading
// arch to add the macro to the host compile.
Comment on lines +5237 to +5239
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might need a rewrite.

auto addTargetMacros = [&](const llvm::Triple &Triple) {
if (!Triple.isSPIR() && !Triple.isNVPTX() && !Triple.isAMDGCN())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to use isSPIROrSPIRV everywhere, we are having some early-stage talks of maybe stoping using spir64 because we don't actually generate SPIR 1.0, we generate SPIR-V, so it would be a lot easier if spirv(32,64) already worked :)

return;
SmallString<64> Macro;
if ((Triple.isSPIR() &&
Triple.getSubArch() == llvm::Triple::SPIRSubArch_gen) ||
Triple.isNVPTX() || Triple.isAMDGCN()) {
Comment on lines +5244 to +5246
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An IsSYCLAOT helper function might be useful for this kind of check

StringRef Device = JA.getOffloadingArch();
if (!Device.empty() &&
!clang::driver::getGenDeviceMacro(Device).empty()) {
Macro = "-D";
Macro += clang::driver::getGenDeviceMacro(Device);
}
} else if (Triple.getSubArch() == llvm::Triple::SPIRSubArch_x86_64)
Macro = "-D__SYCL_TARGET_INTEL_X86_64__";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could move this case into getGenDeviceMacro

if (Macro.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (Macro.size()) {
if (!Macro.empty()) {

CmdArgs.push_back(Args.MakeArgString(Macro));
D.addSYCLTargetMacroArg(Args, Macro);
}
};
addTargetMacros(RawTriple);
} else {
// Add any options that are needed specific to SYCL offload while
// performing the host side compilation.

// Let the front-end host compilation flow know about SYCL offload
// compilation.
CmdArgs.push_back("-fsycl-is-host");

// Add the SYCL target macro arguments that were generated during the
// device compilation step.
for (auto &Macro : D.getSYCLTargetMacroArgs())
CmdArgs.push_back(Args.MakeArgString(Macro));
}

// Set options for both host and device.
Expand Down
4 changes: 2 additions & 2 deletions clang/lib/Driver/ToolChains/Cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -848,8 +848,8 @@ void CudaToolChain::addClangTargetOptions(

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert((DeviceOffloadingKind == Action::OFK_OpenMP ||
DeviceOffloadingKind == Action::OFK_Cuda) &&
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");
DeviceOffloadingKind == Action::OFK_Cuda || DeviceOffloadingKind == Action::OFK_SYCL) &&
"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only OpenMP or CUDA or SYCL offloading kinds are supported for NVIDIA GPUs.");
"Only OpenMP, CUDA, and SYCL offloading kinds are supported for NVIDIA GPUs.");


CC1Args.append({"-fcuda-is-device", "-mllvm",
"-enable-memcpyopt-without-libcalls",
Expand Down
Loading