Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.
This repository was archived by the owner on May 23, 2024. It is now read-only.

parallel_reduce with native simd type => seg fault in OpenMP #34

@pkestene

Description

@pkestene

Hello,

I was just trying to test a parallel_reduce (sum) using one of the native simd type and found a seg fault that seems to be associated with a wrong memory alignment in the return value of HostThreadTeamData::pool_reduce_local()

To illustrate this, I've updated avx.hpp to provide operator += (used in the reduce join operation), and used a custom reducer provided below.

// custom reducer for simd type (here avx)
template <class T, class Space>
struct SimdReducer {
 public:

  using simd_t = simd::simd<float,simd::simd_abi::native>;
  //using simd_t = simd::simd<T,simd::simd_abi::pack<4>>;

  using simd_storage_t = simd_t::simd_storage_t;

  // Required
  using reducer = SimdReducer<T, Space>;
  using value_type = simd_t;
  using value_type_storage = simd_storage_t;
  using result_view_type = Kokkos::View<value_type, Space, Kokkos::MemoryUnmanaged>;

 private:
  result_view_type value;

 public:
  KOKKOS_INLINE_FUNCTION
  SimdReducer(value_type& value_) : value(&value_) {}

  // Required
  KOKKOS_INLINE_FUNCTION
  void join(value_type& dest, const value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void join(volatile value_type& dest, const volatile value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void init(value_type& val) const {
    printf("before init %p\n",&val);
    val = simd_t(0.0); // seg fault here
    printf("after init\n");
  }

  KOKKOS_INLINE_FUNCTION
  value_type& reference() const { return *value.data(); }

  KOKKOS_INLINE_FUNCTION
  result_view_type view() const { return value; }

  KOKKOS_INLINE_FUNCTION
  bool references_scalar() const { return true; }
};
  • a parallel_reduce with this reducer works fine if device is Serial, but gives me a segmentation fault when I use device OpenMP (whatever the number of threads)
  • If I change simd type to be e.g. simd_abi::pack<4>, the crash disappears, and it works fine.
  • here when compiling for avx, simd<float,simd::simd_abi::native> is 32 bytes, but when I print in reducer init the address of the reference value coming from the call to pool_reduce_local() (in HostThreadTeamData), the address is 16 bytes aligned, but I think it should be 32 bytes aligned. I think this explains the seg fault.

I may be wrong but I think it is necessary to control alignment inside HostThreadTeamData so that the returned pointer is properly align.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions