Description
Describe the bug
With 2-qubit gates, the buffer passed to the Loop_DN function may not be aligned and causes assertion failure.
To Reproduce
Steps to reproduce the behavior:
- Build the project with MPI
- Run the example with 2 processes:
mpirun -np 2 /opt/intel-qs/examples/bin/grover_4qubit.exe - We get assetion failures:
grover_4qubit.exe: /root/intel-qs/src/highperfkernels.cpp:299: void Loop_DN(unsigned long, unsigned long, unsigned long, Type *, Type *, unsigned long, unsigned long, const qhipster::TinyMatrix<Type, 2U, 2U, 32U> &, bool, Timer *) [with Type = std::complex]: Assertion
(UL(state1) % 256) == 0' failed. grover_4qubit.exe: /root/intel-qs/src/highperfkernels.cpp:298: void Loop_DN(unsigned long, unsigned long, unsigned long, Type *, Type *, unsigned long, unsigned long, const qhipster::TinyMatrix<Type, 2U, 2U, 32U> &, bool, Timer *) [with Type = std::complex<double>]: Assertion
(UL(state0) % 256) == 0' failed.
Additional context
Another example also has this behavior:
mpirun -np 2 /opt/intel-qs/examples/bin/test_of_custom_gates.exe 4
It seems single-qubit gates are fine and only two-qubit gates have this problem. In particular, the problem appeared in psig.ApplyCPhaseRotation() in the grover_4qubit example. I did some debugging and found the pointer was pointed to offset 0x80. I'm not sure if this is a real bug, or just the way I'm running it is wrong.
When I run with 4 processes, the pointer points to offset 0x40. When I run with 8 processes, the problem disappears again.