Skip to content

Conversation

@wsttiger
Copy link
Collaborator

Add TensorRT Decoder Plugin for Quantum Error Correction

Overview

This PR introduces a new TensorRT-based decoder plugin for quantum error correction, leveraging NVIDIA TensorRT for accelerated neural network inference in QEC applications.

Key Features

  • TensorRT Integration: Full TensorRT runtime integration with support for both ONNX model loading and pre-built engine loading
  • Flexible Precision Support: Configurable precision modes (fp16, bf16, int8, fp8, tf32, best) with automatic hardware capability detection
  • Memory Management: Efficient CUDA memory allocation and stream-based execution
  • Parameter Validation: Comprehensive input validation with clear error messages
  • Python Utilities: ONNX to TensorRT engine conversion script for model preprocessing

Technical Implementation

  • Core Decoder Class: trt_decoder implementing the decoder interface with TensorRT backend
  • Hardware Detection: Automatic GPU capability detection for optimal precision selection
  • Error Handling: Robust error handling with graceful fallbacks and informative error messages
  • Plugin Architecture: CMake-based plugin system with conditional TensorRT linking

Files Added/Modified

  • libs/qec/include/cudaq/qec/trt_decoder_internal.h - Internal API declarations
  • libs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp - Main decoder implementation
  • libs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt - Plugin build configuration
  • libs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py - Python utility
  • libs/qec/unittests/test_trt_decoder.cpp - Comprehensive unit tests
  • Updated CMakeLists.txt files for integration

Testing

  • ✅ All 8 unit tests passing
  • Parameter validation tests
  • File loading utility tests
  • Edge case handling tests
  • Error condition testing

Usage Example

// Load from ONNX model
cudaqx::heterogeneous_map params;
params.insert("onnx_load_path", "model.onnx");
params.insert("precision", "fp16");
auto decoder = std::make_unique<trt_decoder>(H, params);

// Or load pre-built engine
params.clear();
params.insert("engine_load_path", "model.trt");
auto decoder = std::make_unique<trt_decoder>(H, params);

Dependencies

  • TensorRT 10.13.3.9+
  • CUDA 12.0+
  • NVIDIA GPU with appropriate compute capability

Performance Benefits

  • GPU-accelerated inference for QEC decoding
  • Optimized precision selection based on hardware capabilities
  • Efficient memory usage with CUDA streams
  • Reduced latency compared to CPU-based decoders

This implementation provides a production-ready TensorRT decoder plugin that can significantly accelerate quantum error correction workflows while maintaining compatibility with the existing CUDA-Q QEC framework.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add trt_decoder class implementing TensorRT-accelerated inference
- Support both ONNX model loading and pre-built engine loading
- Include precision configuration (fp16, bf16, int8, fp8, tf32, best)
- Add hardware platform detection for capability-based precision selection
- Implement CUDA memory management and stream-based execution
- Add Python utility script for ONNX to TensorRT engine conversion
- Update CMakeLists.txt to build TensorRT decoder plugin
- Add comprehensive parameter validation and error handling
Signed-off-by: Scott Thornton <[email protected]>
…some of the test cases (more to come)

Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test b645807

@wsttiger
Copy link
Collaborator Author

/ok to test 8e0ab06

Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test e9825b3

@wsttiger
Copy link
Collaborator Author

/ok to test c2c61db

Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test 663ba48

Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test 32e8e64

@wsttiger wsttiger enabled auto-merge (squash) October 31, 2025 18:53
@wsttiger
Copy link
Collaborator Author

/ok to test 30da7ce

@wsttiger
Copy link
Collaborator Author

wsttiger commented Nov 1, 2025

/ok to test 5389216

Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this @wsttiger! If anyone has any additional review comments, we can address them post-merge.

@wsttiger wsttiger merged commit 63455f2 into NVIDIA:main Nov 1, 2025
19 checks passed
message(STATUS "TensorRT ONNX parser: ${TENSORRT_ONNX_LIBRARY}")
target_compile_definitions(${MODULE_NAME} PRIVATE TENSORRT_AVAILABLE)
else()
message(WARNING "TensorRT not found. Building decoder without TensorRT support.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the build succeeds if TensorRT is installed. At least it doesn't on my machine. Is the whole build supposed to fail if TRT is not found? I would advocate for making this a top-level CMake flag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved by #331?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved by #331?

Sort of. #331 allows the user to disable the TRT decoder at cmake time by specifically disabling it, but if they leave it enabled at cmake time (which is the default), then there is still a build failure about missing include files despite this message making it sound like it should just build the decoder without TensorRT support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Makes sense now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be addressed by #332. @wsttiger take a look and let me know what you think.

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this .onnx file for CI testing only? Or are the users supposed to be downloading this file as well?

- name: Install TensorRT (arm64)
if: matrix.platform == 'arm64'
run: |
apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda13.0\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda13.0.pref > /dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is installing CUDA 13 regardless of what ${{matrix.cuda_version}} is, so it is installing it in our 12.6 images, too. I believe (?) this should not be installed for CUDA 12.6 because we are not supporting CUDA 12 + ARM for this, right?

Point of reference: https://github.com/NVIDIA/cudaqx/actions/runs/18989982159/job/54240883357#step:12:41 shows the CUDA 13 version being installed in AR CUDA 12.6. (I found this because our GitLab pipeline is broken for ARM right now, and I am still investigating.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be addressed by #332. @wsttiger take a look and let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants