F16 variants - Update loads and stores to AVX2 - Group 6 #649

r-abishek · 2025-12-10T22:51:32Z

This PR includes replacement of scalar loads/stores and conversion to FP32, with AVX2 intrinsics - no additions or removals to external user API.

Replaced SSE implementations with AVX-based variants for the FP16 versions of six augmentations: Copy, Color Jitter, Crop, Gridmask, Ricap, Crop and Patch.
Scalar load/store and FP32 conversion have been replaced with AVX vectorized intrinsics for Hue and Saturation additionally.
Observe 28% - 58% gains for PKD3 to PLN3 variants with the changes made.

F16 load store group 6

Copilot

Pull request overview

This PR optimizes FP16 image processing operations by replacing scalar loads/stores with AVX2 vectorized intrinsics. The changes enable processing 24 elements at once with AVX2 (versus 12 with SSE), achieving 28-58% performance gains for PKD3 to PLN3 variants.

Key changes:

Replaced SSE implementations with AVX2 variants for six augmentations (Copy, Color Jitter, Crop, Gridmask, Ricap, Crop and Patch)
Direct FP16-to-FP32 AVX2 conversions eliminate intermediate scalar conversion loops
Unified vectorization constants via conditional compilation for AVX2/SSE paths

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/modules/tensor/cpu/kernel/saturation.cpp	Added AVX2 fast path with direct F16-to-F32 SIMD operations for saturation adjustments across all layout combinations
src/modules/tensor/cpu/kernel/ricap.cpp	Implemented AVX2 vectorization for RICAP augmentation with unified vector increment handling
src/modules/tensor/cpu/kernel/hue.cpp	Added AVX2 support for hue adjustments with direct F16 conversions; fixed incorrect cast in scalar fallback path
src/modules/tensor/cpu/kernel/gridmask.cpp	Introduced AVX2 mask computation functions and integrated them across all gridmask layout paths
src/modules/tensor/cpu/kernel/crop_and_patch.cpp	Added AVX2 vectorization for crop-and-patch operations with unified alignment calculations
src/modules/tensor/cpu/kernel/crop.cpp	Implemented AVX2 fast path for crop operations across layout toggle scenarios
src/modules/tensor/cpu/kernel/copy.cpp	Added AVX2 support for copy operations with direct F16 SIMD loads/stores
src/modules/tensor/cpu/kernel/color_jitter.cpp	Implemented AVX2 color jitter computation with proper F16 boundary checking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-10T22:52:06Z

src/modules/tensor/cpu/kernel/gridmask.cpp

        else if ((srcDescPtr->c == 3) && (srcDescPtr->layout == RpptLayout::NHWC) && (dstDescPtr->layout == RpptLayout::NHWC))
        {
-            Rpp32u alignedLength = bufferLength & ~3;
+            // Rpp32u alignedLength = bufferLength & ~3;


Commented-out code should be removed rather than left in the codebase. The alignedLength calculation is now handled by the conditional compilation block above.

Suggested change

// Rpp32u alignedLength = bufferLength & ~3;

Copilot · 2025-12-10T22:52:06Z

src/modules/tensor/cpu/kernel/gridmask.cpp

+            // Rpp32u alignedLength = bufferLength & ~3;



Commented-out code should be removed rather than left in the codebase. The alignedLength calculation is now handled by the conditional compilation block above.

Suggested change

// Rpp32u alignedLength = bufferLength & ~3;

Hue and Saturation SSE Updates - F16 Copilot Fix

codecov · 2025-12-16T01:09:22Z

Codecov Report

❌ Patch coverage is 87.68769% with 41 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/modules/tensor/cpu/kernel/ricap.cpp	0.00%	37 Missing ⚠️
src/modules/tensor/cpu/kernel/crop_and_patch.cpp	90.70%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #649      +/-   ##
===========================================
+ Coverage    88.16%   88.38%   +0.21%     
===========================================
  Files          195      195              
  Lines        82723    82420     -303     
===========================================
- Hits         72932    72839      -93     
+ Misses        9791     9581     -210

Files with missing lines	Coverage Δ
src/modules/tensor/cpu/kernel/color_jitter.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/copy.cpp	`97.37% <100.00%> (-0.08%)`	⬇️
src/modules/tensor/cpu/kernel/crop.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/gridmask.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/hue.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/saturation.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/crop_and_patch.cpp	`95.84% <90.70%> (+0.32%)`	⬆️
src/modules/tensor/cpu/kernel/ricap.cpp	`47.24% <0.00%> (+2.23%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Srihari-mcw and others added 6 commits November 24, 2025 12:32

Changes for F16 Load/Store - Group 6

afabce9

Fix boundary pixel check for SIMD codes in F16 color jitter kernel

4cc1545

Cleanup the code

36d938b

Remove unnecessary brackets

5489184

F16 Load/Store Updates for Hue and Saturation

d97e33e

Merge pull request #535 from Srihari-mcw/f16_load_store_group_6

aa55a04

F16 load store group 6

r-abishek requested a review from Copilot December 10, 2025 22:51

r-abishek added enhancement New feature or request ci:precheckin labels Dec 10, 2025

Copilot AI reviewed Dec 10, 2025

View reviewed changes

LakshmiKumar23 requested review from LakshmiKumar23 and rrawther December 10, 2025 23:37

LakshmiKumar23 assigned kiritigowda Dec 10, 2025

Srihari-mcw and others added 3 commits December 11, 2025 06:43

Remove unnecessary comments

0a39cff

Hue and Saturation SSE Updates

1d1e9b6

Merge pull request #551 from Srihari-mcw/f16_load_store_copilot_comment

d778ee2

Hue and Saturation SSE Updates - F16 Copilot Fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

F16 variants - Update loads and stores to AVX2 - Group 6 #649

F16 variants - Update loads and stores to AVX2 - Group 6 #649

r-abishek commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 10, 2025

Uh oh!

Srihari-mcw Dec 11, 2025

Uh oh!

Copilot AI Dec 10, 2025

Uh oh!

Srihari-mcw Dec 11, 2025

Uh oh!

codecov bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

F16 variants - Update loads and stores to AVX2 - Group 6 #649

Are you sure you want to change the base?

F16 variants - Update loads and stores to AVX2 - Group 6 #649

Conversation

r-abishek commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Dec 16, 2025 •

edited

Loading