Implement or extend LoongArch LASX/LSX (256-bit/128-bit SIMD) optimization for mlkem/ML-KEM. Covers NTT, inverseNTT, nttMul/nttMulAcc, polyAdd/Sub, compress/decompress, CBD sampling, and rejUniform. Use when adding new LASX assembly functions, creating field_loong64.go dispatch stubs, translating AVX2 patterns to LASX equivalents, or working with LoongArch vector register conventions (X0-X31 for LASX, V0-V31 for LSX).
Implement or extend LoongArch LASX/LSX (256-bit/128-bit SIMD) optimization for mlkem/ML-KEM. Covers NTT, inverseNTT, nttMul/nttMulAcc, polyAdd/Sub, compress/decompress, CBD sampling, and rejUniform. Use when adding new LASX assembly functions, creating field_loong64.go dispatch stubs, translating AVX2 patterns to LASX equivalents, or working with LoongArch vector register conventions (X0-X31 for LASX, V0-V31 for LSX).