Skip to content

simdOptimizedAlignment is unbounded - breaking memory allocation #407

@TimHanel00

Description

@TimHanel00

As the title suggests the current implementation of simdOptimizedAlignment does not impose an upper alignment limit. As a result we might not be able to allocate large fields, even though there is sufficient memory available on the device. (due to alignment restrictions of the target device)
This error first occurred on an Intel GPU (since the SYCL implementation actually utilizes simdOptimizedAlignment):

template <class T_value, auto N>
struct DataContainer {
    T_value value[N];
};
    constexpr auto size=static_cast<uint32_t>(1<<15);   //==128KiB using uint32 -- this threshhold where the allocation fails is obviously device specific

    auto bufAccContainer=onHost::alloc<DataContainer<uint32_t, size>>(device, Vec{1u}); // allocate buffer memory using the DataContainer
    auto bufAccExtent=onHost::alloc<uint32_t>(device, Vec{size}); // allocate buffer memory using an extent

    std::cout<< " adr using DataContainer struct = " << bufAccContainer.data() << "\n"; //<-- this returns NULL
    std::cout<< " adr using memory with extent = " << bufAccExtent.data() << "\n"; //<-- this returns a valid device address

Recommended changes:

  1. clamp the maximum returned from simdOptimizedAlignment to something reasonable (some multiple of alignOf(T)) for large T, additional edge case handling could be implemented in the Alloc::operator() method for each backend individually.
  2. apply simdOptimizedAlignment across all backends during memory allocation (there is already a TODO in unifiedCudaHip/Device.hpp indicating this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions