Skip to content

Conversation

@ihb2032
Copy link
Contributor

@ihb2032 ihb2032 commented Sep 25, 2025

This PR introduces RISC-V Vector intrinsic optimizations for the following element-wise matrix operations:

  • MatrixAdd
  • MatrixSub
  • MatrixMax

Performance Gains

The RVV implementation shows substantial speedups compared to the scalar version. Here are some highlights from the test results:

  • MatrixAdd: Achieved up to 6.36x speedup.
  • MatrixSub: Achieved up to 6.21x speedup.
  • MatrixMax: Achieved up to 13.48x speed-up.

Testing Environment

The tests were conducted using the same environment as in PR #3779.

# test_matrix_sub
widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024
Scalar time : 0.0966 sec
RVV time : 0.0156 sec, Speedup: 6.21x
test widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024: PASSED

# test_matrix_max
widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024
Scalar time : 0.2053 sec
RVV time : 0.0152 sec, Speedup: 13.48x
test widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024: PASSED

# test_matrix_add
widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024
Scalar time : 0.0970 sec
RVV time : 0.0153 sec, Speedup: 6.36x
test widthC4=1024, cStride=4096, aStride=4096, bStride=4096, height=1024: PASSED

…tions with RVV intrinsics

This commit replaces the scalar implementations of element-wise matrix addition, subtraction, and maximum functions with versions optimized using RISC-V Vector (RVV) intrinsics.
These changes significantly accelerate computation on supported hardware, with performance tests showing speedups of up to 13.48x for MatrixMax and over 6x for MatrixAdd and MatrixSub on large matrices.

Signed-off-by: lyd1992 <[email protected]>
Signed-off-by: ihb2032 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant