RVV1.0 Packed INT8 Convolution#6763
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 617639afab
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const int i = ii * wp; | ||
| const int* intptr = (const int*)bottom_blob + i * elempack; | ||
| __fp16* ptr = (__fp16*)top_blob + i * elempack; | ||
| __fp16* ptr = (__fp16*)top_blob + i * out_elempack; |
There was a problem hiding this comment.
Use the source pack offset for 1D dequantize chunks
When a 1D int32 blob is already packed and RVV changes the output packing, this chunk offset is no longer the flat element offset being dequantized. For example, with elempack == packn and out_elempack == packn_f16, the chunk starting at input element i should write at flat fp16 offset i * elempack, but this writes at i * out_elempack, leaving holes and eventually writing past the end of top_blob; with out_elempack == 1 it overlaps earlier output. The same flat offset used before this change (i * elempack) is still needed because size is counted in unpacked scalar elements.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6763 +/- ##
========================================
Coverage 95.77% 95.77%
========================================
Files 946 947 +1
Lines 410747 411192 +445
========================================
+ Hits 393380 393813 +433
- Misses 17367 17379 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds RVV int8 convolution support and packed dequantize/requantize paths for RISC-V, improving packed int8 inference performance in the RISC-V backend.
Changes:
- Adds RVV packed int8 convolution kernel packing and execution paths.
- Adds packed RVV dequantize-to-fp16 and requantize-to-int8 handling.
- Adjusts requantize tests for RISC-V packing behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_requantize.cpp |
Updates RISC-V-specific requantize test coverage. |
src/layer/riscv/requantize_riscv.cpp |
Adds packed RVV requantization paths and output repacking. |
src/layer/riscv/dequantize_riscv_zfh.cpp |
Adds packed RVV int32-to-fp16 dequantization paths. |
src/layer/riscv/convolution_riscv.h |
Declares RVV int8 convolution pipeline and state. |
src/layer/riscv/convolution_riscv.cpp |
Wires int8 convolution to the RVV pipeline. |
src/layer/riscv/convolution_packed_int8.h |
Adds RVV packed int8 convolution kernel transform and execution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Changes:
Sidenotes:
Result (Tested on OrangePi RV2):
loop_count = 4 num_threads = 8 powersave = 2 gpu_device = -1 cooling_down = 1 squeezenet_int8 min = 849.99 max = 922.56 avg = 888.55 mobilenet_int8 min = 1169.46 max = 1178.57 avg = 1173.94 googlenet_int8 min = 2248.73 max = 2378.06 avg = 2284.60 resnet18_int8 min = 2323.13 max = 2401.37 avg = 2354.69 vgg16_int8 min = 33983.54 max = 34998.53 avg = 34539.23 resnet50_int8 min = 6729.28 max = 6935.57 avg = 6865.15 squeezenet_ssd_int8 min = 1412.29 max = 1479.31 avg = 1438.86 mobilenet_ssd_int8 min = 3501.22 max = 3717.62 avg = 3609.66loop_count = 4 num_threads = 8 powersave = 2 gpu_device = -1 cooling_down = 1 squeezenet_int8 min = 62.53 max = 62.88 avg = 62.68 mobilenet_int8 min = 133.92 max = 135.37 avg = 134.51 googlenet_int8 min = 201.48 max = 206.20 avg = 203.66 resnet18_int8 min = 188.46 max = 192.25 avg = 190.28 vgg16_int8 min = 1200.24 max = 1268.10 avg = 1226.08 resnet50_int8 min = 324.67 max = 328.74 avg = 327.02 squeezenet_ssd_int8 min = 287.93 max = 412.64 avg = 319.39 mobilenet_ssd_int8 min = 260.59 max = 263.09 avg = 261.67