Skip to content

AoSoA Boolean Data Corruption on GPU with Large Particle Counts #842

@Sichao25

Description

@Sichao25

Hello, I am learning to use AoSoA as the data structure for particle analysis. When using bool in the member types, some data becomes corrupted when the particle number exceeds a certain threshold.

In the following reproducible script, I create the AoSoA and initialize it, then access it to change some particle information. I use a bool as the mask in the member type. When using over 80 million particles, some masks are changed from true to false. Changing bool to integer resolves the error. Additionally, I did not observe the same issue when using CPU with the same input size, though I cannot guarantee it is safe with larger inputs.

I wonder if there are any restrictions on using Boolean types on GPU? Or am I making obvious mistakes in the code? Any suggestions or thoughts are appreciated.

The code:

#include <Cabana_Core.hpp>
#include <Kokkos_Core.hpp>
#include <iostream>

using DataTypes = Cabana::MemberTypes<double[3],  // position
                                        double[3],   // computed position
                                        int,          // id
                                        float,
                                        float,
                                        bool>;     // mask

using MemorySpace = Kokkos::DefaultExecutionSpace::memory_space;
using AoSoA_t = Cabana::AoSoA<DataTypes, MemorySpace>;

void count(AoSoA_t& aosoa) {
  Kokkos::View<int*, MemorySpace> count("count", 1);
  auto mask = Cabana::slice<5>(aosoa, "mask");
  Kokkos::parallel_for("count_active", aosoa.size(), KOKKOS_LAMBDA(const int i) {
      if (mask(i)) {
          Kokkos::atomic_fetch_add(&count(0), 1);
      }
  });
  Kokkos::View<int*, Kokkos::HostSpace> active_count("active_count", 1);
  Kokkos::deep_copy(active_count, count);
  std::cout << "Number of active particles: " << active_count(0) << std::endl;
}

int main(int argc, char* argv[])
{
  Kokkos::initialize(argc, argv);

  {
    const int size = std::atoi(argv[1]);

    AoSoA_t aosoa("ParticleData", size);
    auto positions = Cabana::slice<0>(aosoa, "position");
    auto computed = Cabana::slice<1>(aosoa, "computed");
    auto pid = Cabana::slice<2>(aosoa, "id");
    auto ellipse_b = Cabana::slice<3>(aosoa, "ellipse_b");
    auto angle = Cabana::slice<4>(aosoa, "angle");
    auto mask = Cabana::slice<5>(aosoa, "mask");

    std::cout << "Initializing AoSoA with " << size << " elements..." << std::endl;
    
    // Initialize elements using parallel_for
    Kokkos::parallel_for("initialize",
      Kokkos::RangePolicy<>(0, size),
      KOKKOS_LAMBDA(const int i) {
        positions(i, 0) = 1.0;      // x
        positions(i, 1) = 2.0;      // y
        positions(i, 2) = 3.0;      // z
        computed(i, 0) = 1.0;  // vx
        computed(i, 1) = 2.0;  // vy
        computed(i, 2) = 3.0;  // vz
        pid(i) = i;            // id
        ellipse_b(i) = 0.5;   // ellipse b
        angle(i) = 0.25;      // angle
        mask(i) = true; // all active
      }
    );

    count(aosoa);
    std::cout << "\nModifying elements" << std::endl;
    const auto soa_len = AoSoA_t::vector_length;
    std::cout << "AoSoA vector length: " << soa_len << std::endl;
    Cabana::SimdPolicy<soa_len,Kokkos::DefaultExecutionSpace> simd_policy(0, size);
    Cabana::simd_parallel_for(simd_policy, KOKKOS_LAMBDA( const int soa, const int ptcl ) {
      if (mask.access(soa, ptcl)) {
        positions.access(soa, ptcl, 0) = 1.0;
        positions.access(soa, ptcl, 1) = 2.0;
        positions.access(soa, ptcl, 2) = 3.0;
      }
    });
    count(aosoa);
    std::cout << "capacity " << aosoa.capacity() << " size " << aosoa.size() << " numSoA " << aosoa.numSoA() << std::endl;
  }

  // Finalize Kokkos
  Kokkos::finalize();

  return 0;
}

The output with 80 million particles:

Initializing AoSoA with 80000000 elements...
Number of active particles: 80000000

Modifying elements
AoSoA vector length: 32
Number of active particles: 72806984
capacity 80000000 size 80000000 numSoA 2500000

Test with:

gcc/12.3.0
mpich/4.1.1
cmake/3.26.3
cuda/12.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions