|
| 1 | +# rustVX Implementation Analysis vs OpenVX CTS Conformance Issues |
| 2 | + |
| 3 | +## Repository: kiritigowda/rustVX |
| 4 | +## Analysis Date: 2025-06-05 |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## 1. HOG (Histogram of Oriented Gradients) Implementation |
| 9 | + |
| 10 | +### 1.1 vxHOGCellsNode Implementation |
| 11 | + |
| 12 | +**Location**: `openvx-core/src/vxu_impl.rs` - `vxu_hog_cells_impl` |
| 13 | + |
| 14 | +**Key Implementation Details**: |
| 15 | +- **Dimensions**: Uses 2D tensors for magnitudes `[num_cells_x, num_cells_y]` and 3D tensors for bins `[num_cells_x, num_cells_y, num_bins]` |
| 16 | +- **Data Types**: Hardcoded to `VX_TYPE_INT16` for both magnitudes and bins |
| 17 | +- **Cell calculation**: `num_cells_x = width / cell_w`, `num_cells_y = height / cell_h` |
| 18 | +- **Border handling**: Uses replicate border for Sobel gradient calculation |
| 19 | +- **Accumulation**: Accumulates directly into INT16 arrays with truncation at each pixel |
| 20 | + |
| 21 | +**Potential CTS Issues**: |
| 22 | +1. **Data Type Validation**: The code validates `mag_dtype != VX_TYPE_INT16` and `bins_dtype != VX_TYPE_INT16` - CTS may expect different type handling |
| 23 | +2. **Dimension Ordering**: Uses `[num_cells_x, num_cells_y]` layout - CTS may expect different dimension ordering |
| 24 | +3. **Magnitude Calculation**: Uses `sqrt(gx*gx + gy*gy)` - CTS may expect L1 norm or different magnitude calculation |
| 25 | +4. **Bin Indexing**: `(orientation * num_div_360).floor()` - CTS may expect rounding differences |
| 26 | + |
| 27 | +### 1.2 vxHOGFeaturesNode Implementation |
| 28 | + |
| 29 | +**Location**: `openvx-core/src/vxu_impl.rs` - `vxu_hog_features_impl` |
| 30 | + |
| 31 | +**Key Implementation Details**: |
| 32 | +- **Input**: Takes magnitudes (2D INT16), bins (3D INT16), and HOG params struct |
| 33 | +- **Output**: 3D INT16 tensor `[num_windows_w, num_windows_h, feature_dim]` |
| 34 | +- **Q7.8 Format**: Stores features as Q7.8 fixed-point (`value * 256.0`) |
| 35 | +- **L2-Hys Normalization**: Implements renormalization using truncated Q7.8 values |
| 36 | +- **Window calculation**: |
| 37 | + - `num_windows_w = (width - window_w) / window_stride + 1` |
| 38 | + - `blocks_per_window_w = (window_w - block_w) / block_stride + 1` |
| 39 | + |
| 40 | +**Potential CTS Issues**: |
| 41 | +1. **Feature Dimension Calculation**: May differ from CTS reference if cell/block arithmetic varies |
| 42 | +2. **Normalization**: L2-Hys uses truncation at Q7.8 conversion - CTS may expect different rounding |
| 43 | +3. **Feature Tensor Layout**: 3D tensor `[windows_w, windows_h, feature_dim]` - dimension ordering critical |
| 44 | +4. **Threshold Application**: `min(threshold)` applied before Q7.8 conversion |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## 2. BilateralFilter Implementation |
| 49 | + |
| 50 | +### 2.1 3D Tensor Handling |
| 51 | + |
| 52 | +**Location**: `openvx-core/src/vxu_impl.rs` - `vxu_bilateral_filter_impl_with_border` |
| 53 | + |
| 54 | +**Key Implementation Details**: |
| 55 | +- **Dimension Support**: Supports 2D and 3D tensors |
| 56 | +- **Data Types**: U8 (`VX_TYPE_UINT8`) and S16 (`VX_TYPE_INT16`) |
| 57 | +- **Dimension Layout**: |
| 58 | + - 2D: `[width, height]` |
| 59 | + - 3D: `[channels, width, height]` or `[width, height, channels]` |
| 60 | +- **Strides calculation**: Built from dimensions for multi-dimensional access |
| 61 | + |
| 62 | +**3D Tensor Logic**: |
| 63 | +```rust |
| 64 | +let (width, height) = if num_dims >= 2 { |
| 65 | + (dims[num_dims - 2] as i32, dims[num_dims - 1] as i32) |
| 66 | +} else { |
| 67 | + (dims[0] as i32, 1) |
| 68 | +}; |
| 69 | +let channels = if width * height > 0 { |
| 70 | + total_elements / (width * height) |
| 71 | +} else { 1 }; |
| 72 | +``` |
| 73 | + |
| 74 | +**Potential CTS Issues**: |
| 75 | +1. **Dimension Interpretation**: Last 2 dimensions treated as spatial (width, height) - CTS may expect different layout |
| 76 | +2. **Channel Handling**: Multi-channel processing may differ from CTS reference |
| 77 | +3. **Stride Calculation**: Row-major strides may not match CTS expectations for certain layouts |
| 78 | + |
| 79 | +### 2.2 Border Mode Handling |
| 80 | + |
| 81 | +**Key Implementation Details**: |
| 82 | +- **Supported modes**: Undefined, Constant, Replicate |
| 83 | +- **Constant border**: Uses `border.constant_value.U8` for U8, may not handle S16 constant values properly |
| 84 | +- **Undefined mode**: Skips border pixels (`low_x = radius`, `high_x = width - radius`) |
| 85 | +- **Replicate mode**: Clamps coordinates to image bounds |
| 86 | + |
| 87 | +**Border Implementation**: |
| 88 | +```rust |
| 89 | +let border_mode = match border { |
| 90 | + Some(b) => match b.mode { |
| 91 | + 0x0000C000 => BorderMode::Undefined, |
| 92 | + 0x0000C001 => BorderMode::Constant(val), |
| 93 | + 0x0000C002 => BorderMode::Replicate, |
| 94 | + _ => BorderMode::Undefined, |
| 95 | + }, |
| 96 | + None => BorderMode::Undefined, |
| 97 | +}; |
| 98 | +``` |
| 99 | + |
| 100 | +**Potential CTS Issues**: |
| 101 | +1. **S16 Constant Border**: Constant value reading only handles U8 via `b.constant_value.U8` - S16 constant borders may be broken |
| 102 | +2. **Border Pixel Skipping**: Undefined mode skips pixels but may leave them uninitialized |
| 103 | +3. **Border Mode Default**: Defaults to Undefined when not specified - CTS may expect different default |
| 104 | + |
| 105 | +### 2.3 Color Weight Tables |
| 106 | + |
| 107 | +**S16 Specific Logic**: |
| 108 | +- Calculates min/max values in tensor to determine range |
| 109 | +- Creates lookup table with `k_exp_num_bins = 4096 * channels` |
| 110 | +- Uses `scale_index_s16 = k_exp_num_bins / (max_val - min_val)` |
| 111 | + |
| 112 | +**Potential CTS Issues**: |
| 113 | +1. **Range Calculation**: Min/max scan over entire tensor may differ from CTS |
| 114 | +2. **Table Size**: 4096 bins per channel may not match CTS implementation |
| 115 | +3. **Edge Case**: When range is near zero, copies input to output - CTS may expect different behavior |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## 3. TensorConvertDepth Implementation |
| 120 | + |
| 121 | +### 3.1 Data Type Handling |
| 122 | + |
| 123 | +**Location**: `openvx-core/src/vxu_impl.rs` - `vxu_tensor_convert_depth_impl` |
| 124 | + |
| 125 | +**Key Implementation Details**: |
| 126 | +- **Input types supported**: INT16, UINT8, INT8 |
| 127 | +- **Output types supported**: INT16, UINT8, INT8 |
| 128 | +- **Formula**: `converted = (val - offset) * scale` where `scale = 1.0 / norm` |
| 129 | +- **INT16 handling**: Assumes Q7.8 format (`val / 256.0` for input, `converted * 256.0` for output) |
| 130 | + |
| 131 | +**Type Conversion Logic**: |
| 132 | +```rust |
| 133 | +let val = match in_dtype { |
| 134 | + VX_TYPE_INT16 => { |
| 135 | + let bytes = [in_data[i*2], in_data[i*2+1]]; |
| 136 | + i16::from_ne_bytes(bytes) as f64 / 256.0 // Q7.8 decode |
| 137 | + } |
| 138 | + VX_TYPE_UINT8 => in_data[i] as f64, |
| 139 | + VX_TYPE_INT8 => in_data[i] as i8 as f64, |
| 140 | + _ => return VX_ERROR_INVALID_PARAMETERS, |
| 141 | +}; |
| 142 | +``` |
| 143 | + |
| 144 | +**Potential CTS Issues**: |
| 145 | +1. **Q7.8 Assumption**: Hardcoded division/multiplication by 256 for INT16 - CTS may expect raw values |
| 146 | +2. **Policy Handling**: Wrap policy uses bitwise AND (`tmp as i64 & 0xFFFF`) - may differ from CTS overflow behavior |
| 147 | +3. **Saturation**: Saturate policy uses `clamp()` - CTS may expect different saturation bounds |
| 148 | +4. **Endianness**: Uses `from_ne_bytes()` - may differ on big-endian systems |
| 149 | + |
| 150 | +### 3.2 Output Type Handling |
| 151 | + |
| 152 | +**Output Type Logic**: |
| 153 | +```rust |
| 154 | +match out_dtype { |
| 155 | + VX_TYPE_INT16 => { |
| 156 | + let tmp = converted * 256.0; |
| 157 | + let clamped = if _policy == VX_CONVERT_POLICY_WRAP { |
| 158 | + (tmp as i64 & 0xFFFF) as i16 |
| 159 | + } else { |
| 160 | + tmp.clamp(i16::MIN as f64, i16::MAX as f64) as i16 |
| 161 | + }; |
| 162 | + } |
| 163 | + VX_TYPE_UINT8 => { /* similar */ } |
| 164 | + VX_TYPE_INT8 => { /* similar */ } |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +**Potential CTS Issues**: |
| 169 | +1. **INT16 Output**: Multiplies by 256 (Q7.8 encoding) - CTS may expect raw INT16 output |
| 170 | +2. **Sign Extension**: Wrap policy for INT16 uses `0xFFFF` mask - may not handle sign correctly |
| 171 | +3. **Float Precision**: Uses f64 for intermediate calculations - may differ from CTS fixed-point math |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +## 4. WeightedAverage Kernel Registration |
| 176 | + |
| 177 | +### 4.1 Kernel Registration |
| 178 | + |
| 179 | +**Location**: |
| 180 | +- `openvx-vision/src/arithmetic.rs` - `WeightedAverageKernel` |
| 181 | +- `openvx-core/src/unified_c_api.rs` - C API wrapper |
| 182 | + |
| 183 | +**Key Implementation Details**: |
| 184 | +- **Kernel name**: `"org.khronos.openvx.weighted_average"` |
| 185 | +- **C API Registration**: Uses `create_node_with_params` with kernel name string |
| 186 | +- **Graph node creation**: `vxWeightedAverageNode` creates node via `get_kernel_by_name` |
| 187 | + |
| 188 | +**Kernel Registration in openvx-vision**: |
| 189 | +```rust |
| 190 | +impl KernelTrait for WeightedAverageKernel { |
| 191 | + fn get_name(&self) -> &str { |
| 192 | + "org.khronos.openvx.weighted_average" |
| 193 | + } |
| 194 | + fn get_enum(&self) -> VxKernel { |
| 195 | + VxKernel::WeightedAverage |
| 196 | + } |
| 197 | + // ... |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +**C API Registration**: |
| 202 | +```rust |
| 203 | +pub extern "C" fn vxWeightedAverageNode( |
| 204 | + graph: vx_graph, |
| 205 | + img1: vx_image, |
| 206 | + alpha: vx_scalar, |
| 207 | + img2: vx_image, |
| 208 | + output: vx_image, |
| 209 | +) -> vx_node { |
| 210 | + create_node_with_params( |
| 211 | + graph, |
| 212 | + "org.khronos.openvx.weighted_average", |
| 213 | + &[img1 as vx_reference, alpha as vx_reference, |
| 214 | + img2 as vx_reference, output as vx_reference], |
| 215 | + ) |
| 216 | +} |
| 217 | +``` |
| 218 | + |
| 219 | +**Potential CTS Issues**: |
| 220 | +1. **Kernel Name Mismatch**: CTS may expect different kernel name format |
| 221 | +2. **Parameter Count**: 4 parameters (src1, alpha, src2, dst) - CTS may validate parameter count strictly |
| 222 | +3. **Alpha Type**: Uses `vx_scalar` for alpha - CTS may expect specific scalar type validation |
| 223 | +4. **Kernel Enum**: `VX_KERNEL_WEIGHTED_AVERAGE = 0x40` - CTS may expect different enum value |
| 224 | + |
| 225 | +### 4.2 Implementation Formula |
| 226 | + |
| 227 | +**Formula**: `(src1 * alpha + src2 * (256 - alpha)) / 256` |
| 228 | + |
| 229 | +**Potential CTS Issues**: |
| 230 | +1. **Rounding**: Integer division truncates - CTS may expect rounding |
| 231 | +2. **Overflow**: Intermediate calculation may overflow - CTS may expect saturation |
| 232 | +3. **Alpha Range**: May not validate alpha is in [0, 256] range |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## 5. MinMaxLoc Implementation |
| 237 | + |
| 238 | +### 5.1 minCount/maxCount Types |
| 239 | + |
| 240 | +**Location**: `openvx-core/src/vxu_impl.rs` - `vxu_min_max_loc_impl` |
| 241 | + |
| 242 | +**Key Implementation Details**: |
| 243 | +- **minCount/maxCount**: Written as `u32` (32-bit unsigned) |
| 244 | +- **Type detection**: Uses `is_s16` to determine image type |
| 245 | +- **Value writing**: min_val/max_val written as native image type (u8 or i16) |
| 246 | + |
| 247 | +**Count Writing Logic**: |
| 248 | +```rust |
| 249 | +if !min_count_scalar.is_null() { |
| 250 | + let count = result.min_locs.len() as u32; |
| 251 | + crate::c_api_data::vxCopyScalarData( |
| 252 | + min_count_scalar, |
| 253 | + &count as *const u32 as *mut c_void, |
| 254 | + 0x11002, // VX_TYPE_UINT32 |
| 255 | + 0x0, |
| 256 | + ); |
| 257 | +} |
| 258 | +``` |
| 259 | + |
| 260 | +**Potential CTS Issues**: |
| 261 | +1. **Count Type**: Uses `u32` - CTS OpenVX spec may expect `vx_size` (usize/size_t) |
| 262 | +2. **Scalar Type**: Hardcoded `0x11002` (VX_TYPE_UINT32) - CTS may expect type checking against scalar's actual type |
| 263 | +3. **Null Handling**: Optional parameters checked with `is_null()` - CTS may pass invalid but non-null scalars |
| 264 | + |
| 265 | +### 5.2 Value Type Handling |
| 266 | + |
| 267 | +**Value Writing Logic**: |
| 268 | +```rust |
| 269 | +if !min_val_scalar.is_null() { |
| 270 | + if is_s16 { |
| 271 | + let v = result.min_val as i16; |
| 272 | + crate::c_api_data::vxCopyScalarData( |
| 273 | + min_val_scalar, &v as *const i16 as *mut c_void, 0x11002, 0x0); |
| 274 | + } else { |
| 275 | + let v = result.min_val as u8; |
| 276 | + crate::c_api_data::vxCopyScalarData( |
| 277 | + min_val_scalar, &v as *const u8 as *mut c_void, 0x11002, 0x0); |
| 278 | + } |
| 279 | +} |
| 280 | +``` |
| 281 | + |
| 282 | +**Potential CTS Issues**: |
| 283 | +1. **Type Code**: Uses `0x11002` for all types - CTS may expect type-specific codes |
| 284 | +2. **S16 Values**: Written as i16 - CTS may expect raw values or scaled values |
| 285 | +3. **U8 Values**: Written as u8 - CTS may expect different handling |
| 286 | + |
| 287 | +### 5.3 Location Array Handling |
| 288 | + |
| 289 | +**Location Writing**: |
| 290 | +- Uses `vxTruncateArray` followed by `vxAddArrayItems` |
| 291 | +- Converts to `vx_coordinates2d_t` {x, y} format |
| 292 | +- Respects array capacity |
| 293 | + |
| 294 | +**Potential CTS Issues**: |
| 295 | +1. **Array Type**: CTS may expect VX_TYPE_COORDINATES2D specifically |
| 296 | +2. **Order**: Locations written in discovery order - CTS may expect sorted order |
| 297 | +3. **Duplicates**: All locations with min/max value added - CTS may limit count |
| 298 | + |
| 299 | +--- |
| 300 | + |
| 301 | +## 6. CopyImagePatch Implementation |
| 302 | + |
| 303 | +### 6.1 Uniform Image Handling |
| 304 | + |
| 305 | +**Location**: `openvx-image/src/c_api.rs` - `vxCopyImagePatch` |
| 306 | + |
| 307 | +**Key Implementation Details**: |
| 308 | +- **No special uniform handling**: The code does NOT check for uniform images |
| 309 | +- **Direct memory access**: Copies from `img.data` regardless of uniform status |
| 310 | +- **Uniform image creation**: `vxCreateUniformImage` fills data with uniform value but doesn't set special flags |
| 311 | + |
| 312 | +**Relevant Code**: |
| 313 | +```rust |
| 314 | +pub extern "C" fn vxCopyImagePatch( |
| 315 | + image: vx_image, |
| 316 | + rect: *const vx_rectangle_t, |
| 317 | + plane_index: vx_uint32, |
| 318 | + user_addr: *const vx_imagepatch_addressing_t, |
| 319 | + user_ptr: *mut c_void, |
| 320 | + usage: vx_enum, |
| 321 | + mem_type: vx_enum, |
| 322 | + _flags: vx_uint32, |
| 323 | +) -> vx_status { |
| 324 | + // ... validation ... |
| 325 | + let img = unsafe { &mut *(image as *mut VxCImage) }; |
| 326 | + |
| 327 | + // Direct data access - no uniform check |
| 328 | + if let Ok(data) = img.data.read() { |
| 329 | + for y in 0..height { |
| 330 | + let src_start = offset + y * stride_y; |
| 331 | + // ... copy ... |
| 332 | + } |
| 333 | + } |
| 334 | +} |
| 335 | +``` |
| 336 | + |
| 337 | +**Potential CTS Issues**: |
| 338 | +1. **Uniform Image Optimization**: CTS may expect uniform images to return uniform value without actual memory read |
| 339 | +2. **Virtual Image Handling**: May not properly handle virtual images without backing |
| 340 | +3. **External Memory**: Has special handling for `is_external_memory` but may miss edge cases |
| 341 | + |
| 342 | +### 6.2 Image Format Handling |
| 343 | + |
| 344 | +**Format Support**: |
| 345 | +- Planar formats (IYUV, NV12, NV21): Plane index validation and offset calculation |
| 346 | +- Packed formats (YUYV, UYVY): Direct pixel access |
| 347 | +- RGB/RGBA: Standard interleaved access |
| 348 | + |
| 349 | +**Potential CTS Issues**: |
| 350 | +1. **Plane Validation**: `plane_index` validation may reject valid indices for certain formats |
| 351 | +2. **Region Clamping**: Width/height clamped to plane dimensions - CTS may expect error on oversized rects |
| 352 | +3. **Stride Handling**: Uses calculated strides - CTS may expect specific stride patterns |
| 353 | + |
| 354 | +--- |
| 355 | + |
| 356 | +## Summary of Critical CTS Breakage Risks |
| 357 | + |
| 358 | +| Kernel | Risk Level | Key Issue | |
| 359 | +|--------|------------|-----------| |
| 360 | +| HOGCells | **HIGH** | INT16 data type requirement may conflict with CTS U8 expectations | |
| 361 | +| HOGFeatures | **HIGH** | Q7.8 encoding and dimension ordering critical for CTS match | |
| 362 | +| BilateralFilter | **MEDIUM** | S16 constant border handling broken (only reads U8 constant_value) | |
| 363 | +| TensorConvertDepth | **HIGH** | Q7.8 hardcoded for INT16 - CTS may expect raw values | |
| 364 | +| WeightedAverage | **LOW** | Kernel name and enum registration appears correct | |
| 365 | +| MinMaxLoc | **MEDIUM** | minCount/maxCount as u32 may not match CTS vx_size expectation | |
| 366 | +| CopyImagePatch | **MEDIUM** | No uniform image optimization - CTS may expect special handling | |
| 367 | + |
| 368 | +--- |
| 369 | + |
| 370 | +## Recommendations for CTS Compliance |
| 371 | + |
| 372 | +1. **HOG**: Verify CTS expects INT16 vs U8 for magnitudes/bins |
| 373 | +2. **BilateralFilter**: Fix S16 constant border value reading |
| 374 | +3. **TensorConvertDepth**: Add option for raw INT16 (non-Q7.8) handling |
| 375 | +4. **MinMaxLoc**: Change count type to vx_size/usize instead of u32 |
| 376 | +5. **CopyImagePatch**: Add uniform image fast-path for CTS performance tests |
| 377 | +6. **General**: Add comprehensive type validation for all scalar outputs |
0 commit comments