Matrix inverse alternative

I have done some work before to quantize RoMa and deploy it on some mobile platform like NVIDIA Jetson, however, I found out that the matrix inverse is the bottleneck. To keep the precision, the matrix inversion should be calculated in high precision (fp16 even fp32), which is extremely slow on a mobile platform.

Have you ever tried some other methods like direct regression to get gp_posterior?