-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
As the title suggests the current implementation of simdOptimizedAlignment does not impose an upper alignment limit. As a result we might not be able to allocate large fields, even though there is sufficient memory available on the device. (due to alignment restrictions of the target device)
This error first occurred on an Intel GPU (since the SYCL implementation actually utilizes simdOptimizedAlignment):
template <class T_value, auto N>
struct DataContainer {
T_value value[N];
};
constexpr auto size=static_cast<uint32_t>(1<<15); //==128KiB using uint32 -- this threshhold where the allocation fails is obviously device specific
auto bufAccContainer=onHost::alloc<DataContainer<uint32_t, size>>(device, Vec{1u}); // allocate buffer memory using the DataContainer
auto bufAccExtent=onHost::alloc<uint32_t>(device, Vec{size}); // allocate buffer memory using an extent
std::cout<< " adr using DataContainer struct = " << bufAccContainer.data() << "\n"; //<-- this returns NULL
std::cout<< " adr using memory with extent = " << bufAccExtent.data() << "\n"; //<-- this returns a valid device addressRecommended changes:
- clamp the maximum returned from simdOptimizedAlignment to something reasonable (some multiple of alignOf(T)) for large T, additional edge case handling could be implemented in the Alloc::operator() method for each backend individually.
- apply simdOptimizedAlignment across all backends during memory allocation (there is already a TODO in unifiedCudaHip/Device.hpp indicating this)
Metadata
Metadata
Assignees
Labels
No labels