hi, when run on jetpack 6.2,L4T 36.4.3 ,it gives out errors as below:
Build YoloXP DLA loadable for fp16 and int8
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --fp16 --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin --inputIOFormats=fp16:dla_linear --outputIOFormats=fp16:dla_linear
[08/20/2025-18:16:54] [I] === Model Options ===
[08/20/2025-18:16:54] [I] Format: ONNX
[08/20/2025-18:16:54] [I] Model: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:16:54] [I] Output:
[08/20/2025-18:16:54] [I] === Build Options ===
[08/20/2025-18:16:54] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[08/20/2025-18:16:54] [I] avgTiming: 8
[08/20/2025-18:16:54] [I] Precision: FP32+FP16
[08/20/2025-18:16:54] [I] LayerPrecisions:
[08/20/2025-18:16:54] [I] Layer Device Types:
[08/20/2025-18:16:54] [I] Calibration:
[08/20/2025-18:16:54] [I] Refit: Disabled
[08/20/2025-18:16:54] [I] Strip weights: Disabled
[08/20/2025-18:16:54] [I] Version Compatible: Disabled
[08/20/2025-18:16:54] [I] ONNX Plugin InstanceNorm: Disabled
[08/20/2025-18:16:54] [I] TensorRT runtime: full
[08/20/2025-18:16:54] [I] Lean DLL Path:
[08/20/2025-18:16:54] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/20/2025-18:16:54] [I] Exclude Lean Runtime: Disabled
[08/20/2025-18:16:54] [I] Sparsity: Disabled
[08/20/2025-18:16:54] [I] Safe mode: Disabled
[08/20/2025-18:16:54] [I] Build DLA standalone loadable: Enabled
[08/20/2025-18:16:54] [I] Allow GPU fallback for DLA: Disabled
[08/20/2025-18:16:54] [I] DirectIO mode: Disabled
[08/20/2025-18:16:54] [I] Restricted mode: Disabled
[08/20/2025-18:16:54] [I] Skip inference: Enabled
[08/20/2025-18:16:54] [I] Save engine: data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin
[08/20/2025-18:16:54] [I] Load engine:
[08/20/2025-18:16:54] [I] Profiling verbosity: 0
[08/20/2025-18:16:54] [I] Tactic sources: Using default tactic sources
[08/20/2025-18:16:54] [I] timingCacheMode: local
[08/20/2025-18:16:54] [I] timingCacheFile:
[08/20/2025-18:16:54] [I] Enable Compilation Cache: Enabled
[08/20/2025-18:16:54] [I] errorOnTimingCacheMiss: Disabled
[08/20/2025-18:16:54] [I] Preview Features: Use default preview flags.
[08/20/2025-18:16:54] [I] MaxAuxStreams: -1
[08/20/2025-18:16:54] [I] BuilderOptimizationLevel: -1
[08/20/2025-18:16:54] [I] Calibration Profile Index: 0
[08/20/2025-18:16:54] [I] Weight Streaming: Disabled
[08/20/2025-18:16:54] [I] Runtime Platform: Same As Build
[08/20/2025-18:16:54] [I] Debug Tensors:
[08/20/2025-18:16:54] [I] Input(s): fp16:+dla_linear
[08/20/2025-18:16:54] [I] Output(s): fp16:+dla_linear
[08/20/2025-18:16:54] [I] Input build shapes: model
[08/20/2025-18:16:54] [I] Input calibration shapes: model
[08/20/2025-18:16:54] [I] === System Options ===
[08/20/2025-18:16:54] [I] Device: 0
[08/20/2025-18:16:54] [I] DLACore: 0
[08/20/2025-18:16:54] [I] Plugins:
[08/20/2025-18:16:54] [I] setPluginsToSerialize:
[08/20/2025-18:16:54] [I] dynamicPlugins:
[08/20/2025-18:16:54] [I] ignoreParsedPluginLibs: 0
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] === Inference Options ===
[08/20/2025-18:16:54] [I] Batch: Explicit
[08/20/2025-18:16:54] [I] Input inference shapes: model
[08/20/2025-18:16:54] [I] Iterations: 10
[08/20/2025-18:16:54] [I] Duration: 3s (+ 200ms warm up)
[08/20/2025-18:16:54] [I] Sleep time: 0ms
[08/20/2025-18:16:54] [I] Idle time: 0ms
[08/20/2025-18:16:54] [I] Inference Streams: 1
[08/20/2025-18:16:54] [I] ExposeDMA: Disabled
[08/20/2025-18:16:54] [I] Data transfers: Enabled
[08/20/2025-18:16:54] [I] Spin-wait: Disabled
[08/20/2025-18:16:54] [I] Multithreading: Disabled
[08/20/2025-18:16:54] [I] CUDA Graph: Disabled
[08/20/2025-18:16:54] [I] Separate profiling: Disabled
[08/20/2025-18:16:54] [I] Time Deserialize: Disabled
[08/20/2025-18:16:54] [I] Time Refit: Disabled
[08/20/2025-18:16:54] [I] NVTX verbosity: 0
[08/20/2025-18:16:54] [I] Persistent Cache Ratio: 0
[08/20/2025-18:16:54] [I] Optimization Profile Index: 0
[08/20/2025-18:16:54] [I] Weight Streaming Budget: 100.000000%
[08/20/2025-18:16:54] [I] Inputs:
[08/20/2025-18:16:54] [I] Debug Tensor Save Destinations:
[08/20/2025-18:16:54] [I] === Reporting Options ===
[08/20/2025-18:16:54] [I] Verbose: Disabled
[08/20/2025-18:16:54] [I] Averages: 10 inferences
[08/20/2025-18:16:54] [I] Percentiles: 90,95,99
[08/20/2025-18:16:54] [I] Dump refittable layers:Disabled
[08/20/2025-18:16:54] [I] Dump output: Disabled
[08/20/2025-18:16:54] [I] Profile: Disabled
[08/20/2025-18:16:54] [I] Export timing to JSON file:
[08/20/2025-18:16:54] [I] Export output to JSON file:
[08/20/2025-18:16:54] [I] Export profile to JSON file:
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] === Device Information ===
[08/20/2025-18:16:54] [I] Available Devices:
[08/20/2025-18:16:54] [I] Device 0: "Orin" UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:16:54] [I] Selected Device: Orin
[08/20/2025-18:16:54] [I] Selected Device ID: 0
[08/20/2025-18:16:54] [I] Selected Device UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:16:54] [I] Compute Capability: 8.7
[08/20/2025-18:16:54] [I] SMs: 16
[08/20/2025-18:16:54] [I] Device Global Memory: 62842 MiB
[08/20/2025-18:16:54] [I] Shared Memory per SM: 164 KiB
[08/20/2025-18:16:54] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/20/2025-18:16:54] [I] Application Compute Clock Rate: 1.3 GHz
[08/20/2025-18:16:54] [I] Application Memory Clock Rate: 1.3 GHz
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] TensorRT version: 10.3.0
[08/20/2025-18:16:54] [I] Loading standard plugins
[08/20/2025-18:16:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 10245 (MiB)
[08/20/2025-18:16:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +753, now: CPU 1002, GPU 11042 (MiB)
[08/20/2025-18:16:56] [I] Start parsing network model.
[08/20/2025-18:16:56] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:16:56] [I] [TRT] Input filename: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:16:56] [I] [TRT] ONNX IR version: 0.0.6
[08/20/2025-18:16:56] [I] [TRT] Opset version: 11
[08/20/2025-18:16:56] [I] [TRT] Producer name: pytorch
[08/20/2025-18:16:56] [I] [TRT] Producer version: 2.0.0
[08/20/2025-18:16:56] [I] [TRT] Domain:
[08/20/2025-18:16:56] [I] [TRT] Model version: 0
[08/20/2025-18:16:56] [I] [TRT] Doc string:
[08/20/2025-18:16:56] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:16:56] [I] Finished parsing network model. Parse time: 0.0700865
[08/20/2025-18:17:03] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/20/2025-18:17:20] [I] [TRT] Total Host Persistent Memory: 112
[08/20/2025-18:17:20] [I] [TRT] Total Device Persistent Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Scratch Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Activation Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Weights Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Engine generation completed in 17.3973 seconds.
[08/20/2025-18:17:20] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 33 MiB
[08/20/2025-18:17:20] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 1587 MiB
[08/20/2025-18:17:20] [I] Engine built in 24.7202 sec.
[08/20/2025-18:17:20] [I] Created engine with size: 29.2488 MiB
[08/20/2025-18:17:20] [I] Skipped inference phase since --skipInference is added.
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --fp16 --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin --inputIOFormats=fp16:dla_linear --outputIOFormats=fp16:dla_linear
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin --inputIOFormats=int8:dla_linear --outputIOFormats=fp16:dla_linear --int8 --fp16 --calib=data/model/yoloXP.cache --precisionConstraints=obey --layerPrecisions=/head/Concat_2:fp16,/head/Concat_1:fp16,/head/Concat:fp16
[08/20/2025-18:17:21] [I] === Model Options ===
[08/20/2025-18:17:21] [I] Format: ONNX
[08/20/2025-18:17:21] [I] Model: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:17:21] [I] Output:
[08/20/2025-18:17:21] [I] === Build Options ===
[08/20/2025-18:17:21] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[08/20/2025-18:17:21] [I] avgTiming: 8
[08/20/2025-18:17:21] [I] Precision: FP32+FP16+INT8 (obey precision constraints)
[08/20/2025-18:17:21] [I] LayerPrecisions: /head/Concat:fp16,/head/Concat_1:fp16,/head/Concat_2:fp16
[08/20/2025-18:17:21] [I] Layer Device Types:
[08/20/2025-18:17:21] [I] Calibration: data/model/yoloXP.cache
[08/20/2025-18:17:21] [I] Refit: Disabled
[08/20/2025-18:17:21] [I] Strip weights: Disabled
[08/20/2025-18:17:21] [I] Version Compatible: Disabled
[08/20/2025-18:17:21] [I] ONNX Plugin InstanceNorm: Disabled
[08/20/2025-18:17:21] [I] TensorRT runtime: full
[08/20/2025-18:17:21] [I] Lean DLL Path:
[08/20/2025-18:17:21] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/20/2025-18:17:21] [I] Exclude Lean Runtime: Disabled
[08/20/2025-18:17:21] [I] Sparsity: Disabled
[08/20/2025-18:17:21] [I] Safe mode: Disabled
[08/20/2025-18:17:21] [I] Build DLA standalone loadable: Enabled
[08/20/2025-18:17:21] [I] Allow GPU fallback for DLA: Disabled
[08/20/2025-18:17:21] [I] DirectIO mode: Disabled
[08/20/2025-18:17:21] [I] Restricted mode: Disabled
[08/20/2025-18:17:21] [I] Skip inference: Enabled
[08/20/2025-18:17:21] [I] Save engine: data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin
[08/20/2025-18:17:21] [I] Load engine:
[08/20/2025-18:17:21] [I] Profiling verbosity: 0
[08/20/2025-18:17:21] [I] Tactic sources: Using default tactic sources
[08/20/2025-18:17:21] [I] timingCacheMode: local
[08/20/2025-18:17:21] [I] timingCacheFile:
[08/20/2025-18:17:21] [I] Enable Compilation Cache: Enabled
[08/20/2025-18:17:21] [I] errorOnTimingCacheMiss: Disabled
[08/20/2025-18:17:21] [I] Preview Features: Use default preview flags.
[08/20/2025-18:17:21] [I] MaxAuxStreams: -1
[08/20/2025-18:17:21] [I] BuilderOptimizationLevel: -1
[08/20/2025-18:17:21] [I] Calibration Profile Index: 0
[08/20/2025-18:17:21] [I] Weight Streaming: Disabled
[08/20/2025-18:17:21] [I] Runtime Platform: Same As Build
[08/20/2025-18:17:21] [I] Debug Tensors:
[08/20/2025-18:17:21] [I] Input(s): int8:+dla_linear
[08/20/2025-18:17:21] [I] Output(s): fp16:+dla_linear
[08/20/2025-18:17:21] [I] Input build shapes: model
[08/20/2025-18:17:21] [I] Input calibration shapes: model
[08/20/2025-18:17:21] [I] === System Options ===
[08/20/2025-18:17:21] [I] Device: 0
[08/20/2025-18:17:21] [I] DLACore: 0
[08/20/2025-18:17:21] [I] Plugins:
[08/20/2025-18:17:21] [I] setPluginsToSerialize:
[08/20/2025-18:17:21] [I] dynamicPlugins:
[08/20/2025-18:17:21] [I] ignoreParsedPluginLibs: 0
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] === Inference Options ===
[08/20/2025-18:17:21] [I] Batch: Explicit
[08/20/2025-18:17:21] [I] Input inference shapes: model
[08/20/2025-18:17:21] [I] Iterations: 10
[08/20/2025-18:17:21] [I] Duration: 3s (+ 200ms warm up)
[08/20/2025-18:17:21] [I] Sleep time: 0ms
[08/20/2025-18:17:21] [I] Idle time: 0ms
[08/20/2025-18:17:21] [I] Inference Streams: 1
[08/20/2025-18:17:21] [I] ExposeDMA: Disabled
[08/20/2025-18:17:21] [I] Data transfers: Enabled
[08/20/2025-18:17:21] [I] Spin-wait: Disabled
[08/20/2025-18:17:21] [I] Multithreading: Disabled
[08/20/2025-18:17:21] [I] CUDA Graph: Disabled
[08/20/2025-18:17:21] [I] Separate profiling: Disabled
[08/20/2025-18:17:21] [I] Time Deserialize: Disabled
[08/20/2025-18:17:21] [I] Time Refit: Disabled
[08/20/2025-18:17:21] [I] NVTX verbosity: 0
[08/20/2025-18:17:21] [I] Persistent Cache Ratio: 0
[08/20/2025-18:17:21] [I] Optimization Profile Index: 0
[08/20/2025-18:17:21] [I] Weight Streaming Budget: 100.000000%
[08/20/2025-18:17:21] [I] Inputs:
[08/20/2025-18:17:21] [I] Debug Tensor Save Destinations:
[08/20/2025-18:17:21] [I] === Reporting Options ===
[08/20/2025-18:17:21] [I] Verbose: Disabled
[08/20/2025-18:17:21] [I] Averages: 10 inferences
[08/20/2025-18:17:21] [I] Percentiles: 90,95,99
[08/20/2025-18:17:21] [I] Dump refittable layers:Disabled
[08/20/2025-18:17:21] [I] Dump output: Disabled
[08/20/2025-18:17:21] [I] Profile: Disabled
[08/20/2025-18:17:21] [I] Export timing to JSON file:
[08/20/2025-18:17:21] [I] Export output to JSON file:
[08/20/2025-18:17:21] [I] Export profile to JSON file:
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] === Device Information ===
[08/20/2025-18:17:21] [I] Available Devices:
[08/20/2025-18:17:21] [I] Device 0: "Orin" UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:17:21] [I] Selected Device: Orin
[08/20/2025-18:17:21] [I] Selected Device ID: 0
[08/20/2025-18:17:21] [I] Selected Device UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:17:21] [I] Compute Capability: 8.7
[08/20/2025-18:17:21] [I] SMs: 16
[08/20/2025-18:17:21] [I] Device Global Memory: 62842 MiB
[08/20/2025-18:17:21] [I] Shared Memory per SM: 164 KiB
[08/20/2025-18:17:21] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/20/2025-18:17:21] [I] Application Compute Clock Rate: 1.3 GHz
[08/20/2025-18:17:21] [I] Application Memory Clock Rate: 1.3 GHz
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] TensorRT version: 10.3.0
[08/20/2025-18:17:21] [I] Loading standard plugins
[08/20/2025-18:17:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 10242 (MiB)
[08/20/2025-18:17:23] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +749, now: CPU 1002, GPU 11035 (MiB)
[08/20/2025-18:17:23] [I] Start parsing network model.
[08/20/2025-18:17:23] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:17:23] [I] [TRT] Input filename: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:17:23] [I] [TRT] ONNX IR version: 0.0.6
[08/20/2025-18:17:23] [I] [TRT] Opset version: 11
[08/20/2025-18:17:23] [I] [TRT] Producer name: pytorch
[08/20/2025-18:17:23] [I] [TRT] Producer version: 2.0.0
[08/20/2025-18:17:23] [I] [TRT] Domain:
[08/20/2025-18:17:23] [I] [TRT] Model version: 0
[08/20/2025-18:17:23] [I] [TRT] Doc string:
[08/20/2025-18:17:23] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:17:23] [I] Finished parsing network model. Parse time: 0.0698041
[08/20/2025-18:17:23] [I] Set layer /head/Concat to precision fp16
[08/20/2025-18:17:23] [I] Set layer /head/Concat_1 to precision fp16
[08/20/2025-18:17:23] [I] Set layer /head/Concat_2 to precision fp16
[08/20/2025-18:17:23] [E] Error[3]: IBuilderConfig::setFlag: Error Code 3: API Usage Error (Parameter check failed, condition: builderFlag != BuilderFlag::kPREFER_PRECISION_CONSTRAINTS || !flags[BuilderFlag::kOBEY_PRECISION_CONSTRAINTS]. kPREFER_PRECISION_CONSTRAINTS cannot be set if kOBEY_PRECISION_CONSTRAINTS is set.)
[08/20/2025-18:17:23] [I] [TRT] Calibration table does not match calibrator algorithm type.
[08/20/2025-18:17:23] [I] [TRT] Perform graph optimization on calibration graph.
[08/20/2025-18:17:23] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/20/2025-18:17:26] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/20/2025-18:17:26] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/20/2025-18:17:26] [I] [TRT] Detected 1 inputs and 12 output network tensors.
[08/20/2025-18:17:29] [I] [TRT] Total Host Persistent Memory: 529952
[08/20/2025-18:17:29] [I] [TRT] Total Device Persistent Memory: 3314176
[08/20/2025-18:17:29] [I] [TRT] Total Scratch Memory: 4608
[08/20/2025-18:17:29] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 298 steps to complete.
[08/20/2025-18:17:29] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 18.8033ms to assign 10 blocks to 298 nodes requiring 97689600 bytes.
[08/20/2025-18:17:29] [I] [TRT] Total Activation Memory: 97689600
[08/20/2025-18:17:29] [I] [TRT] Total Weights Memory: 90619392
[08/20/2025-18:17:29] [I] [TRT] Engine generation completed in 5.44279 seconds.
[08/20/2025-18:17:29] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +96, now: CPU 0, GPU 227 (MiB)
[08/20/2025-18:17:29] [I] [TRT] Starting Calibration.
[08/20/2025-18:17:29] [E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "images" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[08/20/2025-18:17:29] [E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[08/20/2025-18:17:29] [E] Engine could not be created from network
[08/20/2025-18:17:29] [E] Building engine failed
[08/20/2025-18:17:29] [E] Failed to create engine from model or file.
[08/20/2025-18:17:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin --inputIOFormats=int8:dla_linear --outputIOFormats=fp16:dla_linear --int8 --fp16 --calib=data/model/yoloXP.cache --precisionConstraints=obey --layerPrecisions=/head/Concat_2:fp16,/head/Concat_1:fp16,/head/Concat:fp16
by the way, I can run example from https://github.com/NVIDIA-AI-IOT/cuDLA-samples.git both on jetapck 5.1.2 L4T 35.4.1 and jetpack 6.2,L4T 36.4.3
can you change your code to fit for both jetpack?
hi, when run on jetpack 6.2,L4T 36.4.3 ,it gives out errors as below:
Build YoloXP DLA loadable for fp16 and int8
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --fp16 --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin --inputIOFormats=fp16:dla_linear --outputIOFormats=fp16:dla_linear
[08/20/2025-18:16:54] [I] === Model Options ===
[08/20/2025-18:16:54] [I] Format: ONNX
[08/20/2025-18:16:54] [I] Model: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:16:54] [I] Output:
[08/20/2025-18:16:54] [I] === Build Options ===
[08/20/2025-18:16:54] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[08/20/2025-18:16:54] [I] avgTiming: 8
[08/20/2025-18:16:54] [I] Precision: FP32+FP16
[08/20/2025-18:16:54] [I] LayerPrecisions:
[08/20/2025-18:16:54] [I] Layer Device Types:
[08/20/2025-18:16:54] [I] Calibration:
[08/20/2025-18:16:54] [I] Refit: Disabled
[08/20/2025-18:16:54] [I] Strip weights: Disabled
[08/20/2025-18:16:54] [I] Version Compatible: Disabled
[08/20/2025-18:16:54] [I] ONNX Plugin InstanceNorm: Disabled
[08/20/2025-18:16:54] [I] TensorRT runtime: full
[08/20/2025-18:16:54] [I] Lean DLL Path:
[08/20/2025-18:16:54] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/20/2025-18:16:54] [I] Exclude Lean Runtime: Disabled
[08/20/2025-18:16:54] [I] Sparsity: Disabled
[08/20/2025-18:16:54] [I] Safe mode: Disabled
[08/20/2025-18:16:54] [I] Build DLA standalone loadable: Enabled
[08/20/2025-18:16:54] [I] Allow GPU fallback for DLA: Disabled
[08/20/2025-18:16:54] [I] DirectIO mode: Disabled
[08/20/2025-18:16:54] [I] Restricted mode: Disabled
[08/20/2025-18:16:54] [I] Skip inference: Enabled
[08/20/2025-18:16:54] [I] Save engine: data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin
[08/20/2025-18:16:54] [I] Load engine:
[08/20/2025-18:16:54] [I] Profiling verbosity: 0
[08/20/2025-18:16:54] [I] Tactic sources: Using default tactic sources
[08/20/2025-18:16:54] [I] timingCacheMode: local
[08/20/2025-18:16:54] [I] timingCacheFile:
[08/20/2025-18:16:54] [I] Enable Compilation Cache: Enabled
[08/20/2025-18:16:54] [I] errorOnTimingCacheMiss: Disabled
[08/20/2025-18:16:54] [I] Preview Features: Use default preview flags.
[08/20/2025-18:16:54] [I] MaxAuxStreams: -1
[08/20/2025-18:16:54] [I] BuilderOptimizationLevel: -1
[08/20/2025-18:16:54] [I] Calibration Profile Index: 0
[08/20/2025-18:16:54] [I] Weight Streaming: Disabled
[08/20/2025-18:16:54] [I] Runtime Platform: Same As Build
[08/20/2025-18:16:54] [I] Debug Tensors:
[08/20/2025-18:16:54] [I] Input(s): fp16:+dla_linear
[08/20/2025-18:16:54] [I] Output(s): fp16:+dla_linear
[08/20/2025-18:16:54] [I] Input build shapes: model
[08/20/2025-18:16:54] [I] Input calibration shapes: model
[08/20/2025-18:16:54] [I] === System Options ===
[08/20/2025-18:16:54] [I] Device: 0
[08/20/2025-18:16:54] [I] DLACore: 0
[08/20/2025-18:16:54] [I] Plugins:
[08/20/2025-18:16:54] [I] setPluginsToSerialize:
[08/20/2025-18:16:54] [I] dynamicPlugins:
[08/20/2025-18:16:54] [I] ignoreParsedPluginLibs: 0
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] === Inference Options ===
[08/20/2025-18:16:54] [I] Batch: Explicit
[08/20/2025-18:16:54] [I] Input inference shapes: model
[08/20/2025-18:16:54] [I] Iterations: 10
[08/20/2025-18:16:54] [I] Duration: 3s (+ 200ms warm up)
[08/20/2025-18:16:54] [I] Sleep time: 0ms
[08/20/2025-18:16:54] [I] Idle time: 0ms
[08/20/2025-18:16:54] [I] Inference Streams: 1
[08/20/2025-18:16:54] [I] ExposeDMA: Disabled
[08/20/2025-18:16:54] [I] Data transfers: Enabled
[08/20/2025-18:16:54] [I] Spin-wait: Disabled
[08/20/2025-18:16:54] [I] Multithreading: Disabled
[08/20/2025-18:16:54] [I] CUDA Graph: Disabled
[08/20/2025-18:16:54] [I] Separate profiling: Disabled
[08/20/2025-18:16:54] [I] Time Deserialize: Disabled
[08/20/2025-18:16:54] [I] Time Refit: Disabled
[08/20/2025-18:16:54] [I] NVTX verbosity: 0
[08/20/2025-18:16:54] [I] Persistent Cache Ratio: 0
[08/20/2025-18:16:54] [I] Optimization Profile Index: 0
[08/20/2025-18:16:54] [I] Weight Streaming Budget: 100.000000%
[08/20/2025-18:16:54] [I] Inputs:
[08/20/2025-18:16:54] [I] Debug Tensor Save Destinations:
[08/20/2025-18:16:54] [I] === Reporting Options ===
[08/20/2025-18:16:54] [I] Verbose: Disabled
[08/20/2025-18:16:54] [I] Averages: 10 inferences
[08/20/2025-18:16:54] [I] Percentiles: 90,95,99
[08/20/2025-18:16:54] [I] Dump refittable layers:Disabled
[08/20/2025-18:16:54] [I] Dump output: Disabled
[08/20/2025-18:16:54] [I] Profile: Disabled
[08/20/2025-18:16:54] [I] Export timing to JSON file:
[08/20/2025-18:16:54] [I] Export output to JSON file:
[08/20/2025-18:16:54] [I] Export profile to JSON file:
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] === Device Information ===
[08/20/2025-18:16:54] [I] Available Devices:
[08/20/2025-18:16:54] [I] Device 0: "Orin" UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:16:54] [I] Selected Device: Orin
[08/20/2025-18:16:54] [I] Selected Device ID: 0
[08/20/2025-18:16:54] [I] Selected Device UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:16:54] [I] Compute Capability: 8.7
[08/20/2025-18:16:54] [I] SMs: 16
[08/20/2025-18:16:54] [I] Device Global Memory: 62842 MiB
[08/20/2025-18:16:54] [I] Shared Memory per SM: 164 KiB
[08/20/2025-18:16:54] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/20/2025-18:16:54] [I] Application Compute Clock Rate: 1.3 GHz
[08/20/2025-18:16:54] [I] Application Memory Clock Rate: 1.3 GHz
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/20/2025-18:16:54] [I]
[08/20/2025-18:16:54] [I] TensorRT version: 10.3.0
[08/20/2025-18:16:54] [I] Loading standard plugins
[08/20/2025-18:16:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 10245 (MiB)
[08/20/2025-18:16:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +753, now: CPU 1002, GPU 11042 (MiB)
[08/20/2025-18:16:56] [I] Start parsing network model.
[08/20/2025-18:16:56] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:16:56] [I] [TRT] Input filename: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:16:56] [I] [TRT] ONNX IR version: 0.0.6
[08/20/2025-18:16:56] [I] [TRT] Opset version: 11
[08/20/2025-18:16:56] [I] [TRT] Producer name: pytorch
[08/20/2025-18:16:56] [I] [TRT] Producer version: 2.0.0
[08/20/2025-18:16:56] [I] [TRT] Domain:
[08/20/2025-18:16:56] [I] [TRT] Model version: 0
[08/20/2025-18:16:56] [I] [TRT] Doc string:
[08/20/2025-18:16:56] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:16:56] [I] Finished parsing network model. Parse time: 0.0700865
[08/20/2025-18:17:03] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/20/2025-18:17:20] [I] [TRT] Total Host Persistent Memory: 112
[08/20/2025-18:17:20] [I] [TRT] Total Device Persistent Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Scratch Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Activation Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Total Weights Memory: 0
[08/20/2025-18:17:20] [I] [TRT] Engine generation completed in 17.3973 seconds.
[08/20/2025-18:17:20] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 29 MiB, GPU 33 MiB
[08/20/2025-18:17:20] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 1587 MiB
[08/20/2025-18:17:20] [I] Engine built in 24.7202 sec.
[08/20/2025-18:17:20] [I] Created engine with size: 29.2488 MiB
[08/20/2025-18:17:20] [I] Skipped inference phase since --skipInference is added.
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --fp16 --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.fp16.fp16chwin.fp16chwout.standalone.bin --inputIOFormats=fp16:dla_linear --outputIOFormats=fp16:dla_linear
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin --inputIOFormats=int8:dla_linear --outputIOFormats=fp16:dla_linear --int8 --fp16 --calib=data/model/yoloXP.cache --precisionConstraints=obey --layerPrecisions=/head/Concat_2:fp16,/head/Concat_1:fp16,/head/Concat:fp16
[08/20/2025-18:17:21] [I] === Model Options ===
[08/20/2025-18:17:21] [I] Format: ONNX
[08/20/2025-18:17:21] [I] Model: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:17:21] [I] Output:
[08/20/2025-18:17:21] [I] === Build Options ===
[08/20/2025-18:17:21] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[08/20/2025-18:17:21] [I] avgTiming: 8
[08/20/2025-18:17:21] [I] Precision: FP32+FP16+INT8 (obey precision constraints)
[08/20/2025-18:17:21] [I] LayerPrecisions: /head/Concat:fp16,/head/Concat_1:fp16,/head/Concat_2:fp16
[08/20/2025-18:17:21] [I] Layer Device Types:
[08/20/2025-18:17:21] [I] Calibration: data/model/yoloXP.cache
[08/20/2025-18:17:21] [I] Refit: Disabled
[08/20/2025-18:17:21] [I] Strip weights: Disabled
[08/20/2025-18:17:21] [I] Version Compatible: Disabled
[08/20/2025-18:17:21] [I] ONNX Plugin InstanceNorm: Disabled
[08/20/2025-18:17:21] [I] TensorRT runtime: full
[08/20/2025-18:17:21] [I] Lean DLL Path:
[08/20/2025-18:17:21] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/20/2025-18:17:21] [I] Exclude Lean Runtime: Disabled
[08/20/2025-18:17:21] [I] Sparsity: Disabled
[08/20/2025-18:17:21] [I] Safe mode: Disabled
[08/20/2025-18:17:21] [I] Build DLA standalone loadable: Enabled
[08/20/2025-18:17:21] [I] Allow GPU fallback for DLA: Disabled
[08/20/2025-18:17:21] [I] DirectIO mode: Disabled
[08/20/2025-18:17:21] [I] Restricted mode: Disabled
[08/20/2025-18:17:21] [I] Skip inference: Enabled
[08/20/2025-18:17:21] [I] Save engine: data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin
[08/20/2025-18:17:21] [I] Load engine:
[08/20/2025-18:17:21] [I] Profiling verbosity: 0
[08/20/2025-18:17:21] [I] Tactic sources: Using default tactic sources
[08/20/2025-18:17:21] [I] timingCacheMode: local
[08/20/2025-18:17:21] [I] timingCacheFile:
[08/20/2025-18:17:21] [I] Enable Compilation Cache: Enabled
[08/20/2025-18:17:21] [I] errorOnTimingCacheMiss: Disabled
[08/20/2025-18:17:21] [I] Preview Features: Use default preview flags.
[08/20/2025-18:17:21] [I] MaxAuxStreams: -1
[08/20/2025-18:17:21] [I] BuilderOptimizationLevel: -1
[08/20/2025-18:17:21] [I] Calibration Profile Index: 0
[08/20/2025-18:17:21] [I] Weight Streaming: Disabled
[08/20/2025-18:17:21] [I] Runtime Platform: Same As Build
[08/20/2025-18:17:21] [I] Debug Tensors:
[08/20/2025-18:17:21] [I] Input(s): int8:+dla_linear
[08/20/2025-18:17:21] [I] Output(s): fp16:+dla_linear
[08/20/2025-18:17:21] [I] Input build shapes: model
[08/20/2025-18:17:21] [I] Input calibration shapes: model
[08/20/2025-18:17:21] [I] === System Options ===
[08/20/2025-18:17:21] [I] Device: 0
[08/20/2025-18:17:21] [I] DLACore: 0
[08/20/2025-18:17:21] [I] Plugins:
[08/20/2025-18:17:21] [I] setPluginsToSerialize:
[08/20/2025-18:17:21] [I] dynamicPlugins:
[08/20/2025-18:17:21] [I] ignoreParsedPluginLibs: 0
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] === Inference Options ===
[08/20/2025-18:17:21] [I] Batch: Explicit
[08/20/2025-18:17:21] [I] Input inference shapes: model
[08/20/2025-18:17:21] [I] Iterations: 10
[08/20/2025-18:17:21] [I] Duration: 3s (+ 200ms warm up)
[08/20/2025-18:17:21] [I] Sleep time: 0ms
[08/20/2025-18:17:21] [I] Idle time: 0ms
[08/20/2025-18:17:21] [I] Inference Streams: 1
[08/20/2025-18:17:21] [I] ExposeDMA: Disabled
[08/20/2025-18:17:21] [I] Data transfers: Enabled
[08/20/2025-18:17:21] [I] Spin-wait: Disabled
[08/20/2025-18:17:21] [I] Multithreading: Disabled
[08/20/2025-18:17:21] [I] CUDA Graph: Disabled
[08/20/2025-18:17:21] [I] Separate profiling: Disabled
[08/20/2025-18:17:21] [I] Time Deserialize: Disabled
[08/20/2025-18:17:21] [I] Time Refit: Disabled
[08/20/2025-18:17:21] [I] NVTX verbosity: 0
[08/20/2025-18:17:21] [I] Persistent Cache Ratio: 0
[08/20/2025-18:17:21] [I] Optimization Profile Index: 0
[08/20/2025-18:17:21] [I] Weight Streaming Budget: 100.000000%
[08/20/2025-18:17:21] [I] Inputs:
[08/20/2025-18:17:21] [I] Debug Tensor Save Destinations:
[08/20/2025-18:17:21] [I] === Reporting Options ===
[08/20/2025-18:17:21] [I] Verbose: Disabled
[08/20/2025-18:17:21] [I] Averages: 10 inferences
[08/20/2025-18:17:21] [I] Percentiles: 90,95,99
[08/20/2025-18:17:21] [I] Dump refittable layers:Disabled
[08/20/2025-18:17:21] [I] Dump output: Disabled
[08/20/2025-18:17:21] [I] Profile: Disabled
[08/20/2025-18:17:21] [I] Export timing to JSON file:
[08/20/2025-18:17:21] [I] Export output to JSON file:
[08/20/2025-18:17:21] [I] Export profile to JSON file:
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] === Device Information ===
[08/20/2025-18:17:21] [I] Available Devices:
[08/20/2025-18:17:21] [I] Device 0: "Orin" UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:17:21] [I] Selected Device: Orin
[08/20/2025-18:17:21] [I] Selected Device ID: 0
[08/20/2025-18:17:21] [I] Selected Device UUID: GPU-c04044dc-605b-52f5-a57f-2d2785097c6b
[08/20/2025-18:17:21] [I] Compute Capability: 8.7
[08/20/2025-18:17:21] [I] SMs: 16
[08/20/2025-18:17:21] [I] Device Global Memory: 62842 MiB
[08/20/2025-18:17:21] [I] Shared Memory per SM: 164 KiB
[08/20/2025-18:17:21] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/20/2025-18:17:21] [I] Application Compute Clock Rate: 1.3 GHz
[08/20/2025-18:17:21] [I] Application Memory Clock Rate: 1.3 GHz
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/20/2025-18:17:21] [I]
[08/20/2025-18:17:21] [I] TensorRT version: 10.3.0
[08/20/2025-18:17:21] [I] Loading standard plugins
[08/20/2025-18:17:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 10242 (MiB)
[08/20/2025-18:17:23] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +749, now: CPU 1002, GPU 11035 (MiB)
[08/20/2025-18:17:23] [I] Start parsing network model.
[08/20/2025-18:17:23] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:17:23] [I] [TRT] Input filename: data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx
[08/20/2025-18:17:23] [I] [TRT] ONNX IR version: 0.0.6
[08/20/2025-18:17:23] [I] [TRT] Opset version: 11
[08/20/2025-18:17:23] [I] [TRT] Producer name: pytorch
[08/20/2025-18:17:23] [I] [TRT] Producer version: 2.0.0
[08/20/2025-18:17:23] [I] [TRT] Domain:
[08/20/2025-18:17:23] [I] [TRT] Model version: 0
[08/20/2025-18:17:23] [I] [TRT] Doc string:
[08/20/2025-18:17:23] [I] [TRT] ----------------------------------------------------------------
[08/20/2025-18:17:23] [I] Finished parsing network model. Parse time: 0.0698041
[08/20/2025-18:17:23] [I] Set layer /head/Concat to precision fp16
[08/20/2025-18:17:23] [I] Set layer /head/Concat_1 to precision fp16
[08/20/2025-18:17:23] [I] Set layer /head/Concat_2 to precision fp16
[08/20/2025-18:17:23] [E] Error[3]: IBuilderConfig::setFlag: Error Code 3: API Usage Error (Parameter check failed, condition: builderFlag != BuilderFlag::kPREFER_PRECISION_CONSTRAINTS || !flags[BuilderFlag::kOBEY_PRECISION_CONSTRAINTS]. kPREFER_PRECISION_CONSTRAINTS cannot be set if kOBEY_PRECISION_CONSTRAINTS is set.)
[08/20/2025-18:17:23] [I] [TRT] Calibration table does not match calibrator algorithm type.
[08/20/2025-18:17:23] [I] [TRT] Perform graph optimization on calibration graph.
[08/20/2025-18:17:23] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/20/2025-18:17:26] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/20/2025-18:17:26] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[08/20/2025-18:17:26] [I] [TRT] Detected 1 inputs and 12 output network tensors.
[08/20/2025-18:17:29] [I] [TRT] Total Host Persistent Memory: 529952
[08/20/2025-18:17:29] [I] [TRT] Total Device Persistent Memory: 3314176
[08/20/2025-18:17:29] [I] [TRT] Total Scratch Memory: 4608
[08/20/2025-18:17:29] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 298 steps to complete.
[08/20/2025-18:17:29] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 18.8033ms to assign 10 blocks to 298 nodes requiring 97689600 bytes.
[08/20/2025-18:17:29] [I] [TRT] Total Activation Memory: 97689600
[08/20/2025-18:17:29] [I] [TRT] Total Weights Memory: 90619392
[08/20/2025-18:17:29] [I] [TRT] Engine generation completed in 5.44279 seconds.
[08/20/2025-18:17:29] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +96, now: CPU 0, GPU 227 (MiB)
[08/20/2025-18:17:29] [I] [TRT] Starting Calibration.
[08/20/2025-18:17:29] [E] Error[3]: IExecutionContext::executeV2: Error Code 3: API Usage Error (Parameter check failed, condition: nullPtrAllowed. Tensor "images" is bound to nullptr, which is allowed only for an empty input tensor, shape tensor, or an output tensor associated with an IOuputAllocator.)
[08/20/2025-18:17:29] [E] Error[2]: [calibrator.cpp::calibrateEngine::1236] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[08/20/2025-18:17:29] [E] Engine could not be created from network
[08/20/2025-18:17:29] [E] Building engine failed
[08/20/2025-18:17:29] [E] Failed to create engine from model or file.
[08/20/2025-18:17:29] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/modified_yolox-sPlus-T4-960x960-pseudo-finetune.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yoloxp.int8.int8chwin.fp16chwout.standalone.bin --inputIOFormats=int8:dla_linear --outputIOFormats=fp16:dla_linear --int8 --fp16 --calib=data/model/yoloXP.cache --precisionConstraints=obey --layerPrecisions=/head/Concat_2:fp16,/head/Concat_1:fp16,/head/Concat:fp16
by the way, I can run example from https://github.com/NVIDIA-AI-IOT/cuDLA-samples.git both on jetapck 5.1.2 L4T 35.4.1 and jetpack 6.2,L4T 36.4.3
can you change your code to fit for both jetpack?