Skip to content

Conversation

@r-abishek
Copy link
Member

Adds Multi-dimensional tensor tests support:

  • Introduces test support for multiple dims at the same time with num_dims_list parameter instead of num_dims.
  • QA test support extended for U8 and F32 bit depths for all kernels in misc test suite.
  • Minor fixes for QA match inside individual kernels.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive support for multi-dimensional tensor testing across the test suite. Key changes include enabling simultaneous testing of multiple dimensions (2D, 3D, 4D) via a num_dims_list parameter, extending QA test coverage to U8 and F32 bit depths for all misc test kernels, and fixing various issues in QA validation and kernel implementations.

Key Changes:

  • Replaced single num_dims parameter with num_dims_list to support testing multiple dimensions in one run
  • Extended bit depth testing support for U8 and F32 across all misc kernels
  • Fixed QA comparison logic and various kernel-specific issues (normalize, concat, log operations)

Reviewed changes

Copilot reviewed 12 out of 57 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
utilities/test_suite/rpp_test_suite_misc.h Updated path generation, data reading functions to support multiple bit depths; fixed QA comparison logic; added 4D support for normalize/transpose operations
utilities/test_suite/rpp_test_suite_common.h Added new bit depth test modes (I8_TO_F32, I16_TO_F32) for extended test coverage
utilities/test_suite/common.py Added bit depth dictionary entries for new I8_TO_F32 and I16_TO_F32 modes
utilities/test_suite/HOST/runMiscTests.py Changed from single num_dims to num_dims_list parameter; updated bit depth configurations for unit and performance tests
utilities/test_suite/HOST/Tensor_misc_host.cpp Refactored memory allocation and data handling to support new bit depths; improved function naming with bit depth suffixes
utilities/test_suite/HIP/runMiscTests.py Mirrored HOST changes for HIP backend with num_dims_list support and bit depth updates
utilities/test_suite/HIP/Tensor_misc_hip.cpp Updated device memory handling and QA comparison for multiple bit depths; fixed memory allocation sizing
src/modules/tensor/rppt_tensor_geometric_augmentations.cpp Removed unnecessary layout mismatch check in concat operations
src/modules/tensor/hip/kernel/normalize.cpp Fixed shared memory sizing, improved stddev computation, added memory zero-initialization
src/modules/tensor/cpu/kernel/normalize.cpp Removed extraneous braces; added pixel clamping for U8 output; fixed code formatting
src/modules/tensor/cpu/kernel/concat.cpp Fixed stride usage bugs in 2D/3D concat operations affecting different axis masks
src/include/common/cpu/rpp_cpu_simd_math.hpp Replaced fast approximation with standard math library for inverse square root calculation
Comments suppressed due to low confidence (1)

utilities/test_suite/rpp_test_suite_misc.h:1

  • The ternary operator defaults to 1.0f when stdDev is zero, but this may not be the intended behavior for all normalization cases. Consider using 0.0f or adding a comment explaining why 1.0f is the correct default.
/*

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// fill roi based on mode and number of dimensions
Rpp32u *roiTensor = static_cast<Rpp32u *>(calloc(nDim * 2 * batchSize, sizeof(Rpp32u)));
Rpp32u *roiTensorSecond ;
Rpp32u *roiTensorSecond = nullptr;
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roiTensorSecond is declared and initialized to nullptr, but later at line 97 it's used in fill_roi_values without being allocated memory for the CONCAT case. This will cause a segmentation fault.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before fill_roi_values() is invoked, a calloc call allocates and zero-initializes the buffer. This guarantees that the function always operates on valid, initialized memory, preventing uninitialized-access or out-of-bounds-related bugs. Resolved

if(testCase == CONCAT)
{
iBufferSizeSecondInBytes = iBufferSizeSecond * get_size_of_data_type(srcDescriptorPtrNDSecond->dataType);
inputSecond = calloc(iBufferSizeSecond, get_size_of_data_type(srcDescriptorPtrNDSecond->dataType));
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calloc arguments are reversed. The first argument should be the size of each element, and the second should be the number of elements. This should be: calloc(get_size_of_data_type(srcDescriptorPtrNDSecond->dataType), iBufferSizeSecond) or use calloc(iBufferSizeSecondInBytes, 1) for consistency with other allocations.

Suggested change
inputSecond = calloc(iBufferSizeSecond, get_size_of_data_type(srcDescriptorPtrNDSecond->dataType));
inputSecond = calloc(iBufferSizeSecondInBytes, 1);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

if (BitDepthTestMode == U8_TO_U8 || BitDepthTestMode == F16_TO_F16 || BitDepthTestMode == F32_TO_F32 || BitDepthTestMode == I8_TO_I8)
errorCodeCapture = rppt_transpose_gpu(d_input, srcDescriptorPtrND, d_output, dstDescriptorPtrND, permTensor, roiTensor, handle);
if(BitDepthTestMode == U8_TO_U8 || BitDepthTestMode == F16_TO_F16 || BitDepthTestMode == F32_TO_F32 || BitDepthTestMode == I8_TO_I8)
rppt_transpose_gpu(d_input, srcDescriptorPtrND, d_output, dstDescriptorPtrND, permTensor, roiTensor, handle);
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value from rppt_transpose_gpu is not captured in errorCodeCapture, unlike other function calls in this switch statement. This inconsistency could lead to errors being silently ignored.

Suggested change
rppt_transpose_gpu(d_input, srcDescriptorPtrND, d_output, dstDescriptorPtrND, permTensor, roiTensor, handle);
errorCodeCapture = rppt_transpose_gpu(d_input, srcDescriptorPtrND, d_output, dstDescriptorPtrND, permTensor, roiTensor, handle);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

// Computes concatenation for 2D tensors (Supports Rpp32f and Rpp8u)
template <typename T, typename SIMD_LOAD, typename SIMD_STORE>
void concat_2D_tensor(T *srcPtr1, T *srcPtr2, SIMD_LOAD simd_load, SIMD_STORE simd_store, RpptGenericDescPtr srcDescPtr, RpptGenericDescPtr srcDescPtr1, T *dstPtr, RpptGenericDescPtr dstDescPtr, Rpp32u *dims, Rpp32u *strides,Rpp32u *dims1, Rpp32u *strides1, Rpp32u axisMask)
void concat_2D_tensor(T *srcPtr1, T *srcPtr2, SIMD_LOAD simd_load, SIMD_STORE simd_store, RpptGenericDescPtr srcDescPtr, RpptGenericDescPtr srcDescPtr1, T *dstPtr, RpptGenericDescPtr dstDescPtr, Rpp32u *dims, Rpp32u *strides1, Rpp32u *dims1, Rpp32u *strides2, Rpp32u axisMask)
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter names are confusing: 'dims' and 'strides1' appear to be for the first source, while 'dims1' and 'strides2' appear to be for the second source. Consider renaming to 'dims1/strides1' and 'dims2/strides2' for clarity.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Comment on lines 290 to 291
Rpp8u *srcPtrRowTemp1 = srcPtrRow1;
Rpp8u *dstPtrRow = dstPtr;
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line declares srcPtrRowTemp1 but it's immediately reassigned in the for loop at line 294. The declaration at line 290 is redundant and should be removed.

Suggested change
Rpp8u *srcPtrRowTemp1 = srcPtrRow1;
Rpp8u *dstPtrRow = dstPtr;
Rpp8u *dstPtrRow = dstPtr;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Comment on lines 868 to 872
inline Rpp32f rpp_math_inverse_square_root(Rpp32f x)
{
__m128 X = _mm_set_ss(x);
__m128 tmp = _mm_rsqrt_ss(X);
Rpp32f y = _mm_cvtss_f32(tmp);
return y * (1.5f - x * 0.5f * y * y);
//raw C version of scalar
return 1.0f / std::sqrt(x);
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment 'raw C version of scalar' is unclear. This is actually C++ using std::sqrt. Consider clarifying: 'Standard library implementation using C++ std::sqrt' or removing the comment if it doesn't add value.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 71.71717% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/modules/tensor/cpu/kernel/concat.cpp 65.75% 25 Missing ⚠️
src/include/common/cpu/rpp_cpu_simd_math.hpp 57.14% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #644      +/-   ##
===========================================
+ Coverage    88.16%   89.85%   +1.69%     
===========================================
  Files          195      195              
  Lines        82723    82687      -36     
===========================================
+ Hits         72932    74295    +1363     
+ Misses        9791     8392    -1399     
Files with missing lines Coverage Δ
src/modules/tensor/cpu/kernel/normalize.cpp 84.33% <100.00%> (+36.65%) ⬆️
src/modules/tensor/hip/kernel/normalize.cpp 99.20% <100.00%> (+64.36%) ⬆️
...les/tensor/rppt_tensor_geometric_augmentations.cpp 96.69% <ø> (+0.12%) ⬆️
src/include/common/cpu/rpp_cpu_simd_math.hpp 46.17% <57.14%> (-0.17%) ⬇️
src/modules/tensor/cpu/kernel/concat.cpp 85.25% <65.75%> (+42.19%) ⬆️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants