Open
Description
- Seems like lapis expects the default parallelization strategy. Using alternative option causes lowering error.
Steps to reproduce:
gh gist clone https://gist.github.com/vmiheer/06c23a25e37e69f3de05c9d031e1512f lapis-bspmm
cmake --workflow --preset=default-kokkos
- Orthogonal question: the default value of the option seems to be "always parallel," which is opposite the default opted by mlir-opt, which is "always serial." The always-serial seems to be conservative but always produces correct code. The other options seem to be directives rather than suggestions, which causes the generated code to make all loops parallel (even though some iterators are reduction iterators), which causes incorrect output due to data races. I think the compiler should always be correct before being performant, so maybe the default parallelization strategy should be serial, and (a) user or (b) heuristic in the compiler can selectively make some loops parallel.
To reproduce, download, and extract the zip:
OMP_NUM_THREADS=1 ./localbuild/dump_partitions.part_tensor -nh 2 -dh 4 -i banded_11_r2.coordinates.bin --ntimes 0 --local-only # avoid data race
# In another shell open python with dgl installed
./a.py # The return code will be 0
for i in `seq 1 3`; do
# running 3 times so that data race would definitely happen
./localbuild/dump_partitions.part_tensor -nh 2 -dh 4 -i banded_11_r2.coordinates.bin --ntimes 0 --local-only # let omp do it's thing
./a.py banded_11_r2.coordinates.bin 11 4 2 # The script will dump all tensors and difference between expected/observed
done
Metadata
Metadata
Assignees
Labels
No labels