Skip to content

Conversation

@r-abishek
Copy link
Member

This PR includes replacement of scalar loads/stores and conversion to FP32, with AVX2 intrinsics - no additions or removals to external user API.

  • Replaced SSE implementations with AVX-based variants for the FP16 versions of six augmentations: Copy, Color Jitter, Crop, Gridmask, Ricap, Crop and Patch.
  • Scalar load/store and FP32 conversion have been replaced with AVX vectorized intrinsics for Hue and Saturation additionally.
  • Observe 28% - 58% gains for PKD3 to PLN3 variants with the changes made.

@r-abishek r-abishek requested a review from Copilot December 10, 2025 22:51
@r-abishek r-abishek added enhancement New feature or request ci:precheckin labels Dec 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes FP16 image processing operations by replacing scalar loads/stores with AVX2 vectorized intrinsics. The changes enable processing 24 elements at once with AVX2 (versus 12 with SSE), achieving 28-58% performance gains for PKD3 to PLN3 variants.

Key changes:

  • Replaced SSE implementations with AVX2 variants for six augmentations (Copy, Color Jitter, Crop, Gridmask, Ricap, Crop and Patch)
  • Direct FP16-to-FP32 AVX2 conversions eliminate intermediate scalar conversion loops
  • Unified vectorization constants via conditional compilation for AVX2/SSE paths

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/modules/tensor/cpu/kernel/saturation.cpp Added AVX2 fast path with direct F16-to-F32 SIMD operations for saturation adjustments across all layout combinations
src/modules/tensor/cpu/kernel/ricap.cpp Implemented AVX2 vectorization for RICAP augmentation with unified vector increment handling
src/modules/tensor/cpu/kernel/hue.cpp Added AVX2 support for hue adjustments with direct F16 conversions; fixed incorrect cast in scalar fallback path
src/modules/tensor/cpu/kernel/gridmask.cpp Introduced AVX2 mask computation functions and integrated them across all gridmask layout paths
src/modules/tensor/cpu/kernel/crop_and_patch.cpp Added AVX2 vectorization for crop-and-patch operations with unified alignment calculations
src/modules/tensor/cpu/kernel/crop.cpp Implemented AVX2 fast path for crop operations across layout toggle scenarios
src/modules/tensor/cpu/kernel/copy.cpp Added AVX2 support for copy operations with direct F16 SIMD loads/stores
src/modules/tensor/cpu/kernel/color_jitter.cpp Implemented AVX2 color jitter computation with proper F16 boundary checking

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

else if ((srcDescPtr->c == 3) && (srcDescPtr->layout == RpptLayout::NHWC) && (dstDescPtr->layout == RpptLayout::NHWC))
{
Rpp32u alignedLength = bufferLength & ~3;
// Rpp32u alignedLength = bufferLength & ~3;
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the codebase. The alignedLength calculation is now handled by the conditional compilation block above.

Suggested change
// Rpp32u alignedLength = bufferLength & ~3;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Comment on lines 1255 to 1256
// Rpp32u alignedLength = bufferLength & ~3;

Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the codebase. The alignedLength calculation is now handled by the conditional compilation block above.

Suggested change
// Rpp32u alignedLength = bufferLength & ~3;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 87.68769% with 41 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/modules/tensor/cpu/kernel/ricap.cpp 0.00% 37 Missing ⚠️
src/modules/tensor/cpu/kernel/crop_and_patch.cpp 90.70% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #649      +/-   ##
===========================================
+ Coverage    88.16%   88.38%   +0.21%     
===========================================
  Files          195      195              
  Lines        82723    82420     -303     
===========================================
- Hits         72932    72839      -93     
+ Misses        9791     9581     -210     
Files with missing lines Coverage Δ
src/modules/tensor/cpu/kernel/color_jitter.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/copy.cpp 97.37% <100.00%> (-0.08%) ⬇️
src/modules/tensor/cpu/kernel/crop.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/gridmask.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/hue.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/saturation.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/crop_and_patch.cpp 95.84% <90.70%> (+0.32%) ⬆️
src/modules/tensor/cpu/kernel/ricap.cpp 47.24% <0.00%> (+2.23%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:precheckin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants