Skip to content

feat(tensor): implement all TensorOp kernels (214/214 passing)#40

Merged
kiritigowda merged 2 commits into
kiritigowda:mainfrom
simonCatBot:ev-tensor-controlflow
May 26, 2026
Merged

feat(tensor): implement all TensorOp kernels (214/214 passing)#40
kiritigowda merged 2 commits into
kiritigowda:mainfrom
simonCatBot:ev-tensor-controlflow

Conversation

@simonCatBot

Copy link
Copy Markdown
Collaborator

This PR implements the full TensorOp category for OpenVX Enhanced Vision conformance.

What's implemented

Tensor category (12/12 passing):

  • Fixed VX_TENSOR_* attribute constants to follow OpenVX spec pattern
  • Implemented vxMapTensorPatch, vxUnmapTensorPatch, vxSwapTensorHandle, vxCreateTensorFromHandle, vxCreateVirtualTensor

TensorOp category (214/214 passing):

  • vxuTensorAdd / vxTensorAddNode — elementwise add
  • vxuTensorSubtract / vxTensorSubtractNode — elementwise subtract
  • vxuTensorMultiply / vxTensorMultiplyNode — elementwise multiply with scale
  • vxuTensorConvertDepth / vxTensorConvertDepthNode — depth conversion
  • vxuTensorTableLookup / vxTensorTableLookupNode — LUT lookup
  • vxuTensorTranspose / vxTensorTransposeNode — N-D transpose
  • vxuTensorMatrixMultiply / vxTensorMatrixMultiplyNode — 2D GEMM with optional C

Key bug fixes

  • vxCopyTensorPatch write path: fixed UB from immutable reference cast
  • Q78/S8 wrap behavior: use bit-level truncation instead of saturating casts
  • Matrix multiply layout: correctly handles CTS column-major dimension ordering
  • Graph dispatch: allows null C tensor for matrix multiply (optional param)

Test results

Category Tests Passing
Tensor 12 12
TensorOp 214 214
HogCells 11 11
HogFeatures 11 11
BilateralFilter 361 361
Total EV 609 609

CI changes

Added tensor-ops job to conformance.yml testing Tensor.* and TensorOp.* filters.

Note: remaining gaps are ControlFlow (186 tests), GraphEnhanced (14 tests), and GraphDelayTensor (3 tests).

Kiriti added 2 commits May 25, 2026 15:06
…nsorHandle, vxCreateTensorFromHandle, vxCreateVirtualTensor

- Fix tensor attribute constants (0x81500 series instead of 0x00)
- Implement vxMapTensorPatch with ROI offset and stride calculation
- Implement vxUnmapTensorPatch returning VX_SUCCESS
- Implement vxSwapTensorHandle copying data from new pointer
- Implement vxCreateTensorFromHandle copying external data
- Fix vxCreateVirtualTensor to extract context from graph properly

Tensor category: 12/12 passing
Implement 7 tensor operation kernels in vxu_impl.rs:
- vxu_tensor_add_impl — elementwise add with Q78/U8/S8, wrap vs saturate
- vxu_tensor_subtract_impl — elementwise subtract
- vxu_tensor_multiply_impl — elementwise multiply with Q78 fixed-point,
  scale factor, rounding policy
- vxu_tensor_convert_depth_impl — depth conversion with norm/offset scalars
- vxu_tensor_table_lookup_impl — LUT lookup using ARRAYS registry
- vxu_tensor_transpose_impl — generic N-D transpose swapping two dimensions
- vxu_tensor_matrix_multiply_impl — 2D matrix multiply with transpose
  flags for A/B/C and optional C addition

Fix critical bugs:
- vxCopyTensorPatch write path: use mutable HashMap access instead of
  immutable reference cast to avoid UB and silent write failures
- Q78/S8 wrap behavior: use bit-level truncation (value & mask) instead
  of Rust's saturating as i16/i8
- Tensor attribute constants: correct VX_TENSOR_* values to follow
  (VX_TYPE_TENSOR << 8) | local_index pattern
- Matrix multiply dimension validation: use CTS column-major layout where
  dims[0] is inner/fast dimension, dims[1] is outer/slow dimension
- Graph dispatch for tensor_matrix_multiply: allow null C tensor (optional)
- Graph execution: add exception for tensor_matrix_multiply param 2 being
  optional (param_id == 0 should not be an error)

Also add kernel registrations in c_api.rs (enums 0x41–0x47) and wire
up all tensor op dispatch in unified_c_api.rs.

CI: add tensor-ops job (Tensor + TensorOp categories) to conformance.yml.
@kiritigowda kiritigowda merged commit 84bb325 into kiritigowda:main May 26, 2026
20 checks passed
@simonCatBot simonCatBot deleted the ev-tensor-controlflow branch May 27, 2026 17:14
kiritigowda pushed a commit that referenced this pull request May 28, 2026
Update the OpenVX 1.3.1 coverage plan to current state:

- Mark P2 (Base API + UDO, 10 funcs) as COMPLETE (#16, #18, #23, #24)
- Mark P3 (Enhanced Vision non-tensor, 14 funcs) as COMPLETE (#35, #36, #39)
- Mark P4 (Tensor kernels, 14 funcs) as COMPLETE (#40)
- Mark P5a (Control-flow nodes, 2 funcs) as COMPLETE (#41)
- Update conformance tally: 6,786 / 6,786 tests passing (100%)
- Add open issues status review (#38 stale, #20#22 should close)
- Update coverage trajectory: ~300/361 (~83%) implemented
- Refresh risks and tracking labels

Co-authored-by: Kiriti <kiriti@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants