I found the 'xa_nn_matXvec_f32xf32_f32''s implementation uses 'xtfloatx2'. How about use 'ae_f32x2' to change it? Which one is faster?
I found the 'xa_nn_matXvec_f32xf32_f32''s implementation uses 'xtfloatx2'. How about use 'ae_f32x2' to change it? Which one is faster?