Code to compute the staggered quark-field outer product needed for the fermion force has been added to QUDA. However, the code still requires substantial polishing. Before the next release, we need to fix the following:
The multi-GPU code uses buffers that are initialized in the dslash routine. Therefore, the outer-product code has to be called together with the dslash code. This is not a problem in production runs, but will cause difficulties in debugging.
Need to write proper CPU test code for the staggered outer products.
At the moment, the exterior outer-product kernel is called twice - once for the one-hop outer product and once for the three-hop outer product. These should be merged.
Need to fix the tuning.
Need to integrate the outer-product interface into the fermion-force interface functions. There's no good reason why these should be separate.
Code to compute the staggered quark-field outer product needed for the fermion force has been added to QUDA. However, the code still requires substantial polishing. Before the next release, we need to fix the following:
The multi-GPU code uses buffers that are initialized in the dslash routine. Therefore, the outer-product code has to be called together with the dslash code. This is not a problem in production runs, but will cause difficulties in debugging.
Need to write proper CPU test code for the staggered outer products.
At the moment, the exterior outer-product kernel is called twice - once for the one-hop outer product and once for the three-hop outer product. These should be merged.
Need to fix the tuning.
Need to integrate the outer-product interface into the fermion-force interface functions. There's no good reason why these should be separate.