Open
Description
This issue was found in #133 where the optimized version of mortar kernels fail to pass the 3D tests on remote GPU CI. This probably relates to the timing of when it should synchronize all the threads within the grid in optimized kernels (i.e., the current position of synchronization is wrong). This issue should be fixed as soon as possible.