We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Use the batch QR decomposition from CuSolver (needs to a wrapper writing for the strided form — see CUBLAS's strided getrf for reference).
Note there are many variants of this but the vanilla (two-step, separated) implementation is the most useful for RBBSi.