RVV1.0 Packed INT8 Convolution by Deepdive543443 · Pull Request #6763 · Tencent/ncnn

Deepdive543443 · 2026-05-31T16:10:16Z

Changes:

Packed fp16 Dequantize and packed requantize
Packed int8 convolution optimization, following the same double pixel unrolling approach from ARM

Sidenotes:

(vlm8 x vlm4) transformed kernel packing was also attempted to fit more data into vector regs, but attempt ends up with performance downgrade, so I switch back to ARM's approach. This downgrade seems related to cache, but I'm unsure at this moment.
RISCV Shuffle Channels layer seems causing incorrect result at this point. I need to delete RISCV Shuffle Layer to get correct result from one of the network I'm using. Will come up with an issue/PR for that later.

Result (Tested on OrangePi RV2):

Before patch

loop_count = 4
num_threads = 8
powersave = 2
gpu_device = -1
cooling_down = 1
     squeezenet_int8  min =  849.99  max =  922.56  avg =  888.55
      mobilenet_int8  min = 1169.46  max = 1178.57  avg = 1173.94
      googlenet_int8  min = 2248.73  max = 2378.06  avg = 2284.60
       resnet18_int8  min = 2323.13  max = 2401.37  avg = 2354.69
          vgg16_int8  min = 33983.54  max = 34998.53  avg = 34539.23
       resnet50_int8  min = 6729.28  max = 6935.57  avg = 6865.15
 squeezenet_ssd_int8  min = 1412.29  max = 1479.31  avg = 1438.86
  mobilenet_ssd_int8  min = 3501.22  max = 3717.62  avg = 3609.66

Patched

loop_count = 4
num_threads = 8
powersave = 2
gpu_device = -1
cooling_down = 1
     squeezenet_int8  min =   62.53  max =   62.88  avg =   62.68
      mobilenet_int8  min =  133.92  max =  135.37  avg =  134.51
      googlenet_int8  min =  201.48  max =  206.20  avg =  203.66
       resnet18_int8  min =  188.46  max =  192.25  avg =  190.28
          vgg16_int8  min = 1200.24  max = 1268.10  avg = 1226.08
       resnet50_int8  min =  324.67  max =  328.74  avg =  327.02
 squeezenet_ssd_int8  min =  287.93  max =  412.64  avg =  319.39
  mobilenet_ssd_int8  min =  260.59  max =  263.09  avg =  261.67

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 617639afab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T16:13:12Z

            const int i = ii * wp;
            const int* intptr = (const int*)bottom_blob + i * elempack;
-            __fp16* ptr = (__fp16*)top_blob + i * elempack;
+            __fp16* ptr = (__fp16*)top_blob + i * out_elempack;


Use the source pack offset for 1D dequantize chunks

When a 1D int32 blob is already packed and RVV changes the output packing, this chunk offset is no longer the flat element offset being dequantized. For example, with elempack == packn and out_elempack == packn_f16, the chunk starting at input element i should write at flat fp16 offset i * elempack, but this writes at i * out_elempack, leaving holes and eventually writing past the end of top_blob; with out_elempack == 1 it overlaps earlier output. The same flat offset used before this change (i * elempack) is still needed because size is counted in unpacked scalar elements.

Useful? React with 👍 / 👎.

codecov-commenter · 2026-06-01T00:08:33Z

Codecov Report

❌ Patch coverage is 93.95405% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (882f319) to head (6a3b5df).

Files with missing lines	Patch %	Lines
src/layer/riscv/requantize_riscv.cpp	77.82%	49 Missing ⚠️
src/layer/riscv/convolution_riscv.cpp	98.43%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #6763    +/-   ##
========================================
  Coverage   95.77%   95.77%            
========================================
  Files         946      947     +1     
  Lines      410747   411192   +445     
========================================
+ Hits       393380   393813   +433     
- Misses      17367    17379    +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds RVV int8 convolution support and packed dequantize/requantize paths for RISC-V, improving packed int8 inference performance in the RISC-V backend.

Changes:

Adds RVV packed int8 convolution kernel packing and execution paths.
Adds packed RVV dequantize-to-fp16 and requantize-to-int8 handling.
Adjusts requantize tests for RISC-V packing behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_requantize.cpp`	Updates RISC-V-specific requantize test coverage.
`src/layer/riscv/requantize_riscv.cpp`	Adds packed RVV requantization paths and output repacking.
`src/layer/riscv/dequantize_riscv_zfh.cpp`	Adds packed RVV int32-to-fp16 dequantization paths.
`src/layer/riscv/convolution_riscv.h`	Declares RVV int8 convolution pipeline and state.
`src/layer/riscv/convolution_riscv.cpp`	Wires int8 convolution to the RVV pipeline.
`src/layer/riscv/convolution_packed_int8.h`	Adds RVV packed int8 convolution kernel transform and execution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…egression......

RVV1.0 Packed Convolution

617639a

github-actions Bot added riscv test labels May 31, 2026

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

nihui requested a review from Copilot June 1, 2026 01:51

Copilot started reviewing on behalf of nihui June 1, 2026 01:51 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread src/layer/riscv/dequantize_riscv_zfh.cpp Outdated

Comment thread src/layer/riscv/convolution_riscv.cpp Outdated

Comment thread src/layer/riscv/requantize_riscv.cpp

Comment thread src/layer/riscv/requantize_riscv.cpp

Deepdive543443 and others added 4 commits June 1, 2026 13:11

Apply missed parallelism

65f58c1

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Typo...

19ef642

Adjust packing

601700d

reduced lines of code, but I'm not sure this will cause performance r…

6a3b5df

…egression......

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RVV1.0 Packed INT8 Convolution#6763

RVV1.0 Packed INT8 Convolution#6763
Deepdive543443 wants to merge 5 commits into
Tencent:masterfrom
Deepdive543443:int8-conv-packed/PR

Deepdive543443 commented May 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

codecov-commenter commented Jun 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Deepdive543443 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deepdive543443 commented May 31, 2026 •

edited

Loading

codecov-commenter commented Jun 1, 2026 •

edited

Loading