Add trt decoder #307

wsttiger · 2025-09-29T17:49:58Z

Add TensorRT Decoder Plugin for Quantum Error Correction

Overview

This PR introduces a new TensorRT-based decoder plugin for quantum error correction, leveraging NVIDIA TensorRT for accelerated neural network inference in QEC applications.

Key Features

TensorRT Integration: Full TensorRT runtime integration with support for both ONNX model loading and pre-built engine loading
Flexible Precision Support: Configurable precision modes (fp16, bf16, int8, fp8, tf32, best) with automatic hardware capability detection
Memory Management: Efficient CUDA memory allocation and stream-based execution
Parameter Validation: Comprehensive input validation with clear error messages
Python Utilities: ONNX to TensorRT engine conversion script for model preprocessing

Technical Implementation

Core Decoder Class: trt_decoder implementing the decoder interface with TensorRT backend
Hardware Detection: Automatic GPU capability detection for optimal precision selection
Error Handling: Robust error handling with graceful fallbacks and informative error messages
Plugin Architecture: CMake-based plugin system with conditional TensorRT linking

Files Added/Modified

libs/qec/include/cudaq/qec/trt_decoder_internal.h - Internal API declarations
libs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp - Main decoder implementation
libs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt - Plugin build configuration
libs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py - Python utility
libs/qec/unittests/test_trt_decoder.cpp - Comprehensive unit tests
Updated CMakeLists.txt files for integration

Testing

✅ All 8 unit tests passing
Parameter validation tests
File loading utility tests
Edge case handling tests
Error condition testing

Usage Example

// Load from ONNX model
cudaqx::heterogeneous_map params;
params.insert("onnx_load_path", "model.onnx");
params.insert("precision", "fp16");
auto decoder = std::make_unique<trt_decoder>(H, params);

// Or load pre-built engine
params.clear();
params.insert("engine_load_path", "model.trt");
auto decoder = std::make_unique<trt_decoder>(H, params);

Dependencies

TensorRT 10.13.3.9+
CUDA 12.0+
NVIDIA GPU with appropriate compute capability

Performance Benefits

GPU-accelerated inference for QEC decoding
Optimized precision selection based on hardware capabilities
Efficient memory usage with CUDA streams
Reduced latency compared to CPU-based decoders

This implementation provides a production-ready TensorRT decoder plugin that can significantly accelerate quantum error correction workflows while maintaining compatibility with the existing CUDA-Q QEC framework.

copy-pr-bot · 2025-09-29T17:50:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add trt_decoder class implementing TensorRT-accelerated inference - Support both ONNX model loading and pre-built engine loading - Include precision configuration (fp16, bf16, int8, fp8, tf32, best) - Add hardware platform detection for capability-based precision selection - Implement CUDA memory management and stream-based execution - Add Python utility script for ONNX to TensorRT engine conversion - Update CMakeLists.txt to build TensorRT decoder plugin - Add comprehensive parameter validation and error handling

Signed-off-by: Scott Thornton <[email protected]>

libs/qec/unittests/test_trt_decoder.cpp

libs/qec/include/cudaq/qec/trt_decoder_internal.h

Signed-off-by: Scott Thornton <[email protected]>

.github/workflows/all_libs.yaml

scripts/build_engine_from_onnx.py

libs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp

scripts/build_engine_from_onnx.py

libs/qec/unittests/CMakeLists.txt

libs/qec/unittests/test_trt_decoder.cpp

Signed-off-by: Scott Thornton <[email protected]>

…trix) Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

…ecoder model, added to unittest Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

…some of the test cases (more to come) Signed-off-by: Scott Thornton <[email protected]>

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-28T03:58:18Z

/ok to test b645807

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-28T16:12:38Z

/ok to test 8e0ab06

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-28T18:16:51Z

/ok to test e9825b3

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-28T18:53:27Z

/ok to test c2c61db

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-28T21:32:38Z

/ok to test 663ba48

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-31T18:45:31Z

/ok to test 32e8e64

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-10-31T20:58:38Z

/ok to test 30da7ce

Signed-off-by: Scott Thornton <[email protected]>

wsttiger · 2025-11-01T02:28:57Z

/ok to test 5389216

bmhowe23

Thanks for all the hard work on this @wsttiger! If anyone has any additional review comments, we can address them post-merge.

bmhowe23 · 2025-11-01T22:29:05Z

libs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt

+    message(STATUS "TensorRT ONNX parser: ${TENSORRT_ONNX_LIBRARY}")
+    target_compile_definitions(${MODULE_NAME} PRIVATE TENSORRT_AVAILABLE)
+else()
+    message(WARNING "TensorRT not found. Building decoder without TensorRT support.")


I don't think the build succeeds if TensorRT is installed. At least it doesn't on my machine. Is the whole build supposed to fail if TRT is not found? I would advocate for making this a top-level CMake flag.

Is this resolved by #331?

Is this resolved by #331?

Sort of. #331 allows the user to disable the TRT decoder at cmake time by specifically disabling it, but if they leave it enabled at cmake time (which is the default), then there is still a build failure about missing include files despite this message making it sound like it should just build the decoder without TensorRT support.

Got it. Makes sense now.

This may be addressed by #332. @wsttiger take a look and let me know what you think.

melody-ren · 2025-11-02T04:42:34Z

assets/tests/surface_code_decoder.onnx

@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1


Is this .onnx file for CI testing only? Or are the users supposed to be downloading this file as well?

bmhowe23 · 2025-11-03T01:51:36Z

.github/workflows/all_libs.yaml

+      - name: Install TensorRT (arm64)
+        if: matrix.platform == 'arm64'
+        run: |
+          apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda13.0\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda13.0.pref > /dev/null


This line is installing CUDA 13 regardless of what ${{matrix.cuda_version}} is, so it is installing it in our 12.6 images, too. I believe (?) this should not be installed for CUDA 12.6 because we are not supporting CUDA 12 + ARM for this, right?

Point of reference: https://github.com/NVIDIA/cudaqx/actions/runs/18989982159/job/54240883357#step:12:41 shows the CUDA 13 version being installed in AR CUDA 12.6. (I found this because our GitLab pipeline is broken for ARM right now, and I am still investigating.)

This may be addressed by #332. @wsttiger take a look and let me know what you think.

wsttiger requested review from bmhowe23 and melody-ren September 29, 2025 17:50

wsttiger force-pushed the add_trt_decoder branch from bd14f16 to c34be87 Compare September 30, 2025 17:33

wsttiger force-pushed the add_trt_decoder branch from c34be87 to 9e97e26 Compare September 30, 2025 18:34

Formatting

79a7e19

Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 reviewed Sep 30, 2025

View reviewed changes

libs/qec/unittests/test_trt_decoder.cpp Outdated Show resolved Hide resolved

melody-ren reviewed Sep 30, 2025

View reviewed changes

libs/qec/include/cudaq/qec/trt_decoder_internal.h Show resolved Hide resolved

wsttiger added 5 commits October 1, 2025 19:02

Removed hardcoded paths to TensorRT installation

d88452f

Signed-off-by: Scott Thornton <[email protected]>

Merge branch 'main' into add_trt_decoder

e7ec736

Signed-off-by: Scott Thornton <[email protected]>

Incorrect URL

7cbbeb1

Signed-off-by: Scott Thornton <[email protected]>

Fixed up the references to cuda in CMake

5287b09

Signed-off-by: Scott Thornton <[email protected]>

Switched to finding cuda toolkit instead of hardcoding cuda headers

88b3cc1

Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 reviewed Oct 3, 2025

View reviewed changes

.github/workflows/all_libs.yaml Outdated Show resolved Hide resolved

bmhowe23 reviewed Oct 3, 2025

View reviewed changes

scripts/build_engine_from_onnx.py Show resolved Hide resolved

bmhowe23 reviewed Oct 3, 2025

View reviewed changes

wsttiger added 13 commits October 6, 2025 19:14

Disabled trt_decoder for ARM

1bfbb3d

Signed-off-by: Scott Thornton <[email protected]>

Redo platform check for x86

4c040dc

Signed-off-by: Scott Thornton <[email protected]>

Added include directory for the Arm64 arch

e01de62

Signed-off-by: Scott Thornton <[email protected]>

Removed cudaqx namespace

ce0f24c

Signed-off-by: Scott Thornton <[email protected]>

Added copyright notice

e641f49

Signed-off-by: Scott Thornton <[email protected]>

Added CUDAQ logging + minor details

f3d7a95

Signed-off-by: Scott Thornton <[email protected]>

Handled CUDA (potential) errors + formatting

6533797

Signed-off-by: Scott Thornton <[email protected]>

Removed block_size from trt_decoder logic (there's no parity check ma…

83e957b

…trix) Signed-off-by: Scott Thornton <[email protected]>

Default initialization + formatting

cdb1754

Signed-off-by: Scott Thornton <[email protected]>

Added LFS (no assets yet), added training for E2E test with test AI d…

b6cfa6f

…ecoder model, added to unittest Signed-off-by: Scott Thornton <[email protected]>

Added test AI model (onnx)

aea8d56

Signed-off-by: Scott Thornton <[email protected]>

Formatting

036b331

Signed-off-by: Scott Thornton <[email protected]>

Added test_trt_decoder.py - for the python path

1deb4f5

Signed-off-by: Scott Thornton <[email protected]>

wsttiger added 2 commits October 28, 2025 03:55

Removed the x86_64 constraint on the trt_decoder tests, also removed …

ac18d2b

…some of the test cases (more to come) Signed-off-by: Scott Thornton <[email protected]>

Formatting

b645807

Signed-off-by: Scott Thornton <[email protected]>

Re-enabled building trt_decoder for ARM architecture

8e0ab06

Signed-off-by: Scott Thornton <[email protected]>

wsttiger added 2 commits October 28, 2025 18:10

Added try/catch around instantiating decoder

60e703d

Signed-off-by: Scott Thornton <[email protected]>

Formatting

e9825b3

Signed-off-by: Scott Thornton <[email protected]>

Re-enabled ARM in pyproject.toml.cu13

c2c61db

Signed-off-by: Scott Thornton <[email protected]>

wsttiger added 2 commits October 28, 2025 20:16

no tensorrt-cu13

2b39242

Signed-off-by: Scott Thornton <[email protected]>

tensorrt-cu13 is back in

663ba48

Signed-off-by: Scott Thornton <[email protected]>

wsttiger added 4 commits October 31, 2025 17:15

Re-calibrated the warning messages

9626a12

Signed-off-by: Scott Thornton <[email protected]>

Created a more focused set of test data

13e5c69

Signed-off-by: Scott Thornton <[email protected]>

Added check for CUDA

6ea6ec9

Signed-off-by: Scott Thornton <[email protected]>

Formatting

32e8e64

Signed-off-by: Scott Thornton <[email protected]>

Merge branch 'main' into add_trt_decoder

3642fc8

wsttiger enabled auto-merge (squash) October 31, 2025 18:53

Reduced the number of syndromes to 30

30da7ce

Signed-off-by: Scott Thornton <[email protected]>

nvidia-cublas-cuXXX -> nvidia-cublas

5389216

Signed-off-by: Scott Thornton <[email protected]>

bmhowe23 approved these changes Nov 1, 2025

View reviewed changes

wsttiger merged commit 63455f2 into NVIDIA:main Nov 1, 2025
19 checks passed

bmhowe23 reviewed Nov 1, 2025

View reviewed changes

melody-ren mentioned this pull request Nov 2, 2025

Misc fixes for various build/test environments #331

Merged

melody-ren reviewed Nov 2, 2025

View reviewed changes

bmhowe23 reviewed Nov 3, 2025

View reviewed changes

Add trt decoder #307

Add trt decoder #307

Uh oh!

Conversation

wsttiger commented Sep 29, 2025

Add TensorRT Decoder Plugin for Quantum Error Correction

Overview

Key Features

Technical Implementation

Files Added/Modified

Testing

Usage Example

Dependencies

Performance Benefits

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wsttiger commented Oct 28, 2025

Uh oh!

wsttiger commented Oct 28, 2025

Uh oh!

wsttiger commented Oct 28, 2025

Uh oh!

wsttiger commented Oct 28, 2025

Uh oh!

wsttiger commented Oct 28, 2025

Uh oh!

wsttiger commented Oct 31, 2025

Uh oh!

wsttiger commented Oct 31, 2025

Uh oh!

wsttiger commented Nov 1, 2025

Uh oh!

bmhowe23 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants