Skip to content

RVV1.0 Packed INT8 Convolution#6763

Open
Deepdive543443 wants to merge 5 commits into
Tencent:masterfrom
Deepdive543443:int8-conv-packed/PR
Open

RVV1.0 Packed INT8 Convolution#6763
Deepdive543443 wants to merge 5 commits into
Tencent:masterfrom
Deepdive543443:int8-conv-packed/PR

Conversation

@Deepdive543443
Copy link
Copy Markdown
Contributor

@Deepdive543443 Deepdive543443 commented May 31, 2026

Changes:

  • Packed fp16 Dequantize and packed requantize
  • Packed int8 convolution optimization, following the same double pixel unrolling approach from ARM

Sidenotes:

  • (vlm8 x vlm4) transformed kernel packing was also attempted to fit more data into vector regs, but attempt ends up with performance downgrade, so I switch back to ARM's approach. This downgrade seems related to cache, but I'm unsure at this moment.
  • RISCV Shuffle Channels layer seems causing incorrect result at this point. I need to delete RISCV Shuffle Layer to get correct result from one of the network I'm using. Will come up with an issue/PR for that later.

Result (Tested on OrangePi RV2):

  • Before patch
loop_count = 4
num_threads = 8
powersave = 2
gpu_device = -1
cooling_down = 1
     squeezenet_int8  min =  849.99  max =  922.56  avg =  888.55
      mobilenet_int8  min = 1169.46  max = 1178.57  avg = 1173.94
      googlenet_int8  min = 2248.73  max = 2378.06  avg = 2284.60
       resnet18_int8  min = 2323.13  max = 2401.37  avg = 2354.69
          vgg16_int8  min = 33983.54  max = 34998.53  avg = 34539.23
       resnet50_int8  min = 6729.28  max = 6935.57  avg = 6865.15
 squeezenet_ssd_int8  min = 1412.29  max = 1479.31  avg = 1438.86
  mobilenet_ssd_int8  min = 3501.22  max = 3717.62  avg = 3609.66
  • Patched
loop_count = 4
num_threads = 8
powersave = 2
gpu_device = -1
cooling_down = 1
     squeezenet_int8  min =   62.53  max =   62.88  avg =   62.68
      mobilenet_int8  min =  133.92  max =  135.37  avg =  134.51
      googlenet_int8  min =  201.48  max =  206.20  avg =  203.66
       resnet18_int8  min =  188.46  max =  192.25  avg =  190.28
          vgg16_int8  min = 1200.24  max = 1268.10  avg = 1226.08
       resnet50_int8  min =  324.67  max =  328.74  avg =  327.02
 squeezenet_ssd_int8  min =  287.93  max =  412.64  avg =  319.39
  mobilenet_ssd_int8  min =  260.59  max =  263.09  avg =  261.67

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 617639afab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const int i = ii * wp;
const int* intptr = (const int*)bottom_blob + i * elempack;
__fp16* ptr = (__fp16*)top_blob + i * elempack;
__fp16* ptr = (__fp16*)top_blob + i * out_elempack;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the source pack offset for 1D dequantize chunks

When a 1D int32 blob is already packed and RVV changes the output packing, this chunk offset is no longer the flat element offset being dequantized. For example, with elempack == packn and out_elempack == packn_f16, the chunk starting at input element i should write at flat fp16 offset i * elempack, but this writes at i * out_elempack, leaving holes and eventually writing past the end of top_blob; with out_elempack == 1 it overlaps earlier output. The same flat offset used before this change (i * elempack) is still needed because size is counted in unpacked scalar elements.

Useful? React with 👍 / 👎.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 93.95405% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (882f319) to head (6a3b5df).

Files with missing lines Patch % Lines
src/layer/riscv/requantize_riscv.cpp 77.82% 49 Missing ⚠️
src/layer/riscv/convolution_riscv.cpp 98.43% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #6763    +/-   ##
========================================
  Coverage   95.77%   95.77%            
========================================
  Files         946      947     +1     
  Lines      410747   411192   +445     
========================================
+ Hits       393380   393813   +433     
- Misses      17367    17379    +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds RVV int8 convolution support and packed dequantize/requantize paths for RISC-V, improving packed int8 inference performance in the RISC-V backend.

Changes:

  • Adds RVV packed int8 convolution kernel packing and execution paths.
  • Adds packed RVV dequantize-to-fp16 and requantize-to-int8 handling.
  • Adjusts requantize tests for RISC-V packing behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_requantize.cpp Updates RISC-V-specific requantize test coverage.
src/layer/riscv/requantize_riscv.cpp Adds packed RVV requantization paths and output repacking.
src/layer/riscv/dequantize_riscv_zfh.cpp Adds packed RVV int32-to-fp16 dequantization paths.
src/layer/riscv/convolution_riscv.h Declares RVV int8 convolution pipeline and state.
src/layer/riscv/convolution_riscv.cpp Wires int8 convolution to the RVV pipeline.
src/layer/riscv/convolution_packed_int8.h Adds RVV packed int8 convolution kernel transform and execution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/layer/riscv/dequantize_riscv_zfh.cpp Outdated
Comment thread src/layer/riscv/convolution_riscv.cpp Outdated
Comment thread src/layer/riscv/requantize_riscv.cpp
Comment thread src/layer/riscv/requantize_riscv.cpp
Deepdive543443 and others added 4 commits June 1, 2026 13:11
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants