-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Hello, I am learning to use AoSoA as the data structure for particle analysis. When using bool in the member types, some data becomes corrupted when the particle number exceeds a certain threshold.
In the following reproducible script, I create the AoSoA and initialize it, then access it to change some particle information. I use a bool as the mask in the member type. When using over 80 million particles, some masks are changed from true to false. Changing bool to integer resolves the error. Additionally, I did not observe the same issue when using CPU with the same input size, though I cannot guarantee it is safe with larger inputs.
I wonder if there are any restrictions on using Boolean types on GPU? Or am I making obvious mistakes in the code? Any suggestions or thoughts are appreciated.
The code:
#include <Cabana_Core.hpp>
#include <Kokkos_Core.hpp>
#include <iostream>
using DataTypes = Cabana::MemberTypes<double[3], // position
double[3], // computed position
int, // id
float,
float,
bool>; // mask
using MemorySpace = Kokkos::DefaultExecutionSpace::memory_space;
using AoSoA_t = Cabana::AoSoA<DataTypes, MemorySpace>;
void count(AoSoA_t& aosoa) {
Kokkos::View<int*, MemorySpace> count("count", 1);
auto mask = Cabana::slice<5>(aosoa, "mask");
Kokkos::parallel_for("count_active", aosoa.size(), KOKKOS_LAMBDA(const int i) {
if (mask(i)) {
Kokkos::atomic_fetch_add(&count(0), 1);
}
});
Kokkos::View<int*, Kokkos::HostSpace> active_count("active_count", 1);
Kokkos::deep_copy(active_count, count);
std::cout << "Number of active particles: " << active_count(0) << std::endl;
}
int main(int argc, char* argv[])
{
Kokkos::initialize(argc, argv);
{
const int size = std::atoi(argv[1]);
AoSoA_t aosoa("ParticleData", size);
auto positions = Cabana::slice<0>(aosoa, "position");
auto computed = Cabana::slice<1>(aosoa, "computed");
auto pid = Cabana::slice<2>(aosoa, "id");
auto ellipse_b = Cabana::slice<3>(aosoa, "ellipse_b");
auto angle = Cabana::slice<4>(aosoa, "angle");
auto mask = Cabana::slice<5>(aosoa, "mask");
std::cout << "Initializing AoSoA with " << size << " elements..." << std::endl;
// Initialize elements using parallel_for
Kokkos::parallel_for("initialize",
Kokkos::RangePolicy<>(0, size),
KOKKOS_LAMBDA(const int i) {
positions(i, 0) = 1.0; // x
positions(i, 1) = 2.0; // y
positions(i, 2) = 3.0; // z
computed(i, 0) = 1.0; // vx
computed(i, 1) = 2.0; // vy
computed(i, 2) = 3.0; // vz
pid(i) = i; // id
ellipse_b(i) = 0.5; // ellipse b
angle(i) = 0.25; // angle
mask(i) = true; // all active
}
);
count(aosoa);
std::cout << "\nModifying elements" << std::endl;
const auto soa_len = AoSoA_t::vector_length;
std::cout << "AoSoA vector length: " << soa_len << std::endl;
Cabana::SimdPolicy<soa_len,Kokkos::DefaultExecutionSpace> simd_policy(0, size);
Cabana::simd_parallel_for(simd_policy, KOKKOS_LAMBDA( const int soa, const int ptcl ) {
if (mask.access(soa, ptcl)) {
positions.access(soa, ptcl, 0) = 1.0;
positions.access(soa, ptcl, 1) = 2.0;
positions.access(soa, ptcl, 2) = 3.0;
}
});
count(aosoa);
std::cout << "capacity " << aosoa.capacity() << " size " << aosoa.size() << " numSoA " << aosoa.numSoA() << std::endl;
}
// Finalize Kokkos
Kokkos::finalize();
return 0;
}
The output with 80 million particles:
Initializing AoSoA with 80000000 elements...
Number of active particles: 80000000
Modifying elements
AoSoA vector length: 32
Number of active particles: 72806984
capacity 80000000 size 80000000 numSoA 2500000
Test with:
gcc/12.3.0
mpich/4.1.1
cmake/3.26.3
cuda/12.1.1