Add optimized INT8 quantized matrix-vector multiplication kernel #167

jayshah1819 · 2025-09-28T11:11:02Z

Add optimized INT8 matrix-vector multiplication kernel

Added a new optimized kernel for INT8 matrix-vector multiplication with better performance:
- Vectorized memory loads for input vectors.
- Each warp handles 4 rows at a time.
- Unrolled dot product computation.
- Warp-level reduction using shuffle instructions.
- Supports batched matrices and vectors.
Kept the original kernel as a fallback.
Added helper functions for memory loads, warp reductions, and runtime FP32 → INT8 quantization.
Defined INT8 and INT4 quantization structures.

jafioti · 2025-09-28T15:15:30Z

Does the new matvec kernel perform favorably to the existing one that was already there? What are the cases where we would need a fallback to the old one? Also can you remove the binary files in the commit, and the tree.txt?

had issues with tests and header file.

jayshah1819 · 2025-09-30T01:54:24Z

Yes. I rewrote the matvec kernel in a separate .cu file with optimizations like vectorized loads and improved warp reductions. The new optimized version should be 1.5-2x faster(I am not sure but we can keep both versions so we can benchmark them) than the original string-embedded kernel. The fallback is just for safety during testing once benchmarked and stable, we can remove the original version( I forgot to add it.) . I'll clean up the binary files and tree.txt from the commit.

jafioti · 2025-09-30T04:19:50Z

awesome, yeah once we can verify the new one is faster, just remove the original one.

jayshah1819 added 2 commits September 28, 2025 06:26

added to test

cabc411

updated test

7978e0c

jayshah1819 added 7 commits September 28, 2025 13:21

worked on graph tensor.

b18dc53

had issues with tests and header file.

added fall back in utils.rs worked on removing files

3b4afae

added cargo

0b8ae34

hznged cargo format

47613fa

added dep issue(aws )

30b6e4d

changed dep

3cba3f0

dockerfile

6c54c71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimized INT8 quantized matrix-vector multiplication kernel #167

Add optimized INT8 quantized matrix-vector multiplication kernel #167

Uh oh!

jayshah1819 commented Sep 28, 2025

Uh oh!

jafioti commented Sep 28, 2025

Uh oh!

jayshah1819 commented Sep 30, 2025

Uh oh!

jafioti commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add optimized INT8 quantized matrix-vector multiplication kernel #167

Are you sure you want to change the base?

Add optimized INT8 quantized matrix-vector multiplication kernel #167

Uh oh!

Conversation

jayshah1819 commented Sep 28, 2025

Uh oh!

jafioti commented Sep 28, 2025

Uh oh!

jayshah1819 commented Sep 30, 2025

Uh oh!

jafioti commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants