Skip to content

Commit e9df7f9

Browse files
authored
* Upgrade presets for ONNX Runtime 1.25.1 (pull #1753)
1 parent 15b571e commit e9df7f9

38 files changed

Lines changed: 1619 additions & 882 deletions

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
* Add new `SampleOnnxMNIST` in samples for TensorRT ([pull #1742](https://github.com/bytedeco/javacpp-presets/pull/1742))
55
* Fix loading issues with `libomp.dylib` and `libiomp5.dylib` for DNNL and PyTorch on Mac
66
* Include `model_package_loader.h` header file in presets for PyTorch ([issue #1729](https://github.com/bytedeco/javacpp-presets/issues/1729))
7-
* Upgrade presets for FFmpeg 8.1, OpenBLAS 0.3.32, CUDA 13.2.1, cuDNN 9.21.1.3, NCCL 2.30.4, nvCOMP 5.2.0.10, CPython 3.14.4, NumPy 2.4.4, SciPy 1.17.1, LLVM 22.1.1, PyTorch 2.11.0, TensorFlow Lite 2.21.0, TensorRT 10.16.1.11, Triton Inference Server 2.68.0, ONNX 1.21.0, ONNX Runtime 1.24.4 ([pull #1750](https://github.com/bytedeco/javacpp-presets/pull/1750)), and their dependencies
7+
* Upgrade presets for FFmpeg 8.1, OpenBLAS 0.3.32, CUDA 13.2.1, cuDNN 9.21.1.3, NCCL 2.30.4, nvCOMP 5.2.0.10, CPython 3.14.4, NumPy 2.4.4, SciPy 1.17.1, LLVM 22.1.1, PyTorch 2.11.0, TensorFlow Lite 2.21.0, TensorRT 10.16.1.11, Triton Inference Server 2.68.0, ONNX 1.21.0, ONNX Runtime 1.25.1 ([pull #1753](https://github.com/bytedeco/javacpp-presets/pull/1753)), and their dependencies
88
* Compile classes with `parameters` bumping minimum requirements to Java SE 8 and Android 7.0 ([issue #1739](https://github.com/bytedeco/javacpp-presets/issues/1739))
99

1010
### February 22, 2026 version 1.5.13

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ Each child module in turn relies by default on the included [`cppbuild.sh` scrip
234234
* DepthAI 2.24.x https://github.com/luxonis/depthai-core
235235
* ONNX 1.20.x https://github.com/onnx/onnx
236236
* nGraph 0.26.0 https://github.com/NervanaSystems/ngraph
237-
* ONNX Runtime 1.24.x https://github.com/microsoft/onnxruntime
237+
* ONNX Runtime 1.25.x https://github.com/microsoft/onnxruntime
238238
* TVM 0.18.x https://github.com/apache/tvm
239239
* Bullet Physics SDK 3.25 https://pybullet.org
240240
* LiquidFun http://google.github.io/liquidfun/

onnxruntime/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Introduction
99
------------
1010
This directory contains the JavaCPP Presets module for:
1111

12-
* ONNX Runtime 1.24.4 https://microsoft.github.io/onnxruntime/
12+
* ONNX Runtime 1.25.1 https://microsoft.github.io/onnxruntime/
1313

1414
Please refer to the parent README.md file for more detailed information about the JavaCPP Presets.
1515

@@ -46,14 +46,14 @@ We can use [Maven 3](http://maven.apache.org/) to download and install automatic
4646
<dependency>
4747
<groupId>org.bytedeco</groupId>
4848
<artifactId>onnxruntime-platform</artifactId>
49-
<version>1.24.4-1.5.14-SNAPSHOT</version>
49+
<version>1.25.1-1.5.14-SNAPSHOT</version>
5050
</dependency>
5151

5252
<!-- Additional dependencies required to use CUDA and cuDNN -->
5353
<dependency>
5454
<groupId>org.bytedeco</groupId>
5555
<artifactId>onnxruntime-platform-gpu</artifactId>
56-
<version>1.24.4-1.5.14-SNAPSHOT</version>
56+
<version>1.25.1-1.5.14-SNAPSHOT</version>
5757
</dependency>
5858

5959
<!-- Additional dependencies to use bundled CUDA and cuDNN -->

onnxruntime/cppbuild.sh

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ export DNNL_FLAGS="--use_dnnl"
1212
export CMAKE_ARGS=
1313
export COREML_FLAGS=
1414
export OPENMP_FLAGS= # "--use_openmp"
15+
export TRAINING_FLAGS= # --enable_training_apis --enable_training_ops
1516
export CUDAFLAGS="-v"
1617
export CUDACXX="/usr/local/cuda/bin/nvcc"
1718
export CUDA_HOME="/usr/local/cuda"
@@ -25,7 +26,7 @@ if [[ "$EXTENSION" == *gpu ]]; then
2526
GPU_FLAGS="--use_cuda"
2627
fi
2728

28-
ONNXRUNTIME=1.24.4
29+
ONNXRUNTIME=1.25.1
2930

3031
mkdir -p "$PLATFORM$EXTENSION"
3132
cd "$PLATFORM$EXTENSION"
@@ -69,7 +70,7 @@ case $PLATFORM in
6970
;;
7071
esac
7172

72-
patch -Np1 < ../../../onnxruntime-cuda13.patch
73+
patch -Np1 < ../../../onnxruntime-cuda13.patch || true
7374

7475
#if [[ -n "$ARCH_FLAGS" ]]; then
7576
# # build host version of protoc
@@ -107,6 +108,10 @@ sedinplace 's/Darwin|iOS/iOS/g' cmake/onnxruntime_providers_cpu.cmake cmake/onnx
107108
sedinplace 's/-fvisibility=hidden//g' cmake/CMakeLists.txt cmake/adjust_global_compile_flags.cmake cmake/onnxruntime_providers_cpu.cmake cmake/onnxruntime_providers.cmake
108109
sedinplace 's:/Yucuda_pch.h /FIcuda_pch.h::g' cmake/onnxruntime_providers_cuda.cmake cmake/onnxruntime_providers.cmake
109110
sedinplace 's/${PROJECT_SOURCE_DIR}\/external\/cub//g' cmake/onnxruntime_providers_cuda.cmake cmake/onnxruntime_providers.cmake
111+
sedinplace 's/-Xcompiler \/Zc:__cplusplus/-Xcompiler \/Zc:__cplusplus -Xcompiler \/Zc:preprocessor/g' cmake/onnxruntime_providers_cuda.cmake cmake/onnxruntime_providers_cuda_plugin.cmake
112+
sedinplace '/CXX>:\/permissive/a\
113+
"$<$<COMPILE_LANGUAGE:CXX>:/Zc:preprocessor>"
114+
' cmake/onnxruntime_providers_cuda.cmake cmake/onnxruntime_providers_cuda_plugin.cmake
110115
sedinplace 's/ONNXRUNTIME_PROVIDERS_SHARED)/ONNXRUNTIME_PROVIDERS_SHARED onnxruntime_providers_shared)/g' cmake/onnxruntime_providers_cpu.cmake cmake/onnxruntime_providers.cmake
111116
sedinplace 's/DNNL_TAG v.*)/DNNL_TAG v3.11)/g' cmake/external/dnnl.cmake
112117
sedinplace 's/DNNL_SHARED_LIB libdnnl.1.dylib/DNNL_SHARED_LIB libdnnl.2.dylib/g' cmake/external/dnnl.cmake
@@ -132,7 +137,7 @@ sedinplace '/cvtfp16Avx/d' cmake/onnxruntime_mlas.cmake
132137
sedinplace 's/MlasCastF16ToF32KernelAvx;/MlasCastF16ToF32KernelAvx2;/g' onnxruntime/core/mlas/lib/platform.cpp
133138

134139
# compile for all CUDA archs instead of using PTX to reduce load time
135-
sedinplace 's/"60;70;75;80;86;89;90;100;120"/"75;80;90;100;120"/g' cmake/external/cuda_configuration.cmake
140+
sedinplace 's/75;80;86;89;90;100;120/75;80;90;100;120/g' cmake/external/cuda_configuration.cmake
136141
sedinplace 's/"all"/"50-real;60-real;70-real;80-real;90-real;100-real;120-real"/g' cmake/CMakeLists.txt
137142
sedinplace 's/-gencode=arch=compute_52,code=sm_52/-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90/g' cmake/CMakeLists.txt
138143
sedinplace '/-gencode=arch=compute_..,code=sm_../d' cmake/CMakeLists.txt
@@ -187,10 +192,12 @@ sedinplace 's/devicePtrs = allocarray/devicePtrs = (const OrtEpDevice**)allocarr
187192
sedinplace 's/UTFChars(javaNameStrings/UTFChars((jstring)javaNameStrings/g' java/src/main/native/ai_onnxruntime_OrtSession_SessionOptions.cpp
188193
sedinplace 's/initializers = allocarray/initializers = (const OrtValue**)allocarray/g' java/src/main/native/ai_onnxruntime_OrtSession_SessionOptions.cpp
189194

195+
sedinplace 's/SoftMaxComputeHelper<T, TOut, true>(ctx->GetComputeStream()/SoftMaxComputeHelper<T, TOut, true>((CUstream_st*)ctx->GetComputeStream()->GetHandle()/g' orttraining/orttraining/training_ops/cuda/loss/softmax_cross_entropy_loss_impl.cc
196+
sedinplace 's/SoftMaxComputeHelper<T, T, true>(ctx->GetComputeStream()/SoftMaxComputeHelper<T, T, true>((CUstream_st*)ctx->GetComputeStream()->GetHandle()/g' orttraining/orttraining/training_ops/cuda/loss/softmaxcrossentropy_impl.cc
197+
sedinplace 's/PrepareCompute<TIndex>(context->GetComputeStream()/PrepareCompute<TIndex>(context->GetComputeStream()->GetHandle(), (CUstream_st*)context->GetComputeStream()->GetHandle()/g' orttraining/orttraining/training_ops/cuda/tensor/gather_nd_grad.cc
198+
190199
which ctest3 &> /dev/null && CTEST="ctest3" || CTEST="ctest"
191-
for i in {1..2}; do
192-
"$PYTHON_BIN_PATH" tools/ci_build/build.py --build_dir ../build --config Release --parallel $MAKEJ --enable_training_apis --enable_training_ops --cmake_path "$CMAKE" --ctest_path "$CTEST" --build_shared_lib $ARCH_FLAGS $DNNL_FLAGS $COREML_FLAGS $OPENMP_FLAGS $GPU_FLAGS || sedinplace 's/5ea4d05e62d7f954a46b3213f9b2535bdd866803/51982be81bbe52572b54180454df11a3ece9a934/g' cmake/deps.txt
193-
done
200+
"$PYTHON_BIN_PATH" tools/ci_build/build.py --build_dir ../build --config Release --parallel $MAKEJ --cmake_path "$CMAKE" --ctest_path "$CTEST" --build_shared_lib $ARCH_FLAGS $DNNL_FLAGS $COREML_FLAGS $OPENMP_FLAGS $TRAINING_FLAGS $GPU_FLAGS
194201

195202
# install headers and libraries in standard directories
196203
cp -r include/* ../include

onnxruntime/onnxruntime-cuda13.patch

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,99 @@
1+
From 712fbe0f6e491a2edd7388f99ea4124f25cda774 Mon Sep 17 00:00:00 2001
2+
From: "M. Chornyi" <99709299+mc-nv@users.noreply.github.com>
3+
Date: Fri, 1 May 2026 21:48:54 +0000
4+
Subject: [PATCH] Fix CUDA 13.2 (CUB 3.2.0) build failure: invalid C++ in
5+
device_transform.cuh
6+
MIME-Version: 1.0
7+
Content-Type: text/plain; charset=UTF-8
8+
Content-Transfer-Encoding: 8bit
9+
10+
CUB 3.2.0 ships device_transform.cuh with an invalid template specialisation:
11+
struct ::cuda::proclaims_copyable_arguments<...> : ::cuda::std::true_type {};
12+
A globally-qualified class name in a specialisation is rejected by the compiler
13+
under -std=c++20. device_copy.cuh transitively pulls device_transform.cuh in
14+
via dispatch_copy_mdspan.cuh, so it fails for the same reason.
15+
16+
Fix: two shadow stubs under onnxruntime/cub/device/, resolved first via -I
17+
ahead of the -isystem CUDA toolkit path.
18+
19+
device_transform.cuh — re-emits the parts Thrust uses internally
20+
(cub::detail::__return_constant and the proclaims_copyable_arguments
21+
specialisation) with the specialisation written inside the cuda namespace
22+
so the class name is unqualified. cub::DeviceTransform is omitted.
23+
24+
device_copy.cuh — empty stub. ORT does not use cub::DeviceCopy.
25+
26+
cub.cuh is unchanged.
27+
---
28+
onnxruntime/cub/device/device_copy.cuh | 9 +++++
29+
onnxruntime/cub/device/device_transform.cuh | 42 +++++++++++++++++++++
30+
2 files changed, 51 insertions(+)
31+
create mode 100644 onnxruntime/cub/device/device_copy.cuh
32+
create mode 100644 onnxruntime/cub/device/device_transform.cuh
33+
34+
diff --git a/onnxruntime/cub/device/device_copy.cuh b/onnxruntime/cub/device/device_copy.cuh
35+
new file mode 100644
36+
index 0000000000000..14e9f1772a3ef
37+
--- /dev/null
38+
+++ b/onnxruntime/cub/device/device_copy.cuh
39+
@@ -0,0 +1,9 @@
40+
+// Copyright (c) Microsoft Corporation. All rights reserved.
41+
+// Licensed under the MIT License.
42+
+
43+
+// Shadow stub for <cub/device/device_copy.cuh>. The real header transitively
44+
+// includes dispatch_copy_mdspan.cuh, which references cub::DeviceTransform — a
45+
+// type our device_transform.cuh stub intentionally omits. ORT does not use
46+
+// cub::DeviceCopy, so this empty stub is sufficient.
47+
+
48+
+#pragma once
49+
diff --git a/onnxruntime/cub/device/device_transform.cuh b/onnxruntime/cub/device/device_transform.cuh
50+
new file mode 100644
51+
index 0000000000000..378bd8f0b5be8
52+
--- /dev/null
53+
+++ b/onnxruntime/cub/device/device_transform.cuh
54+
@@ -0,0 +1,42 @@
55+
+// Copyright (c) Microsoft Corporation. All rights reserved.
56+
+// Licensed under the MIT License.
57+
+
58+
+// Shadow stub for <cub/device/device_transform.cuh>. Resolved first via -I,
59+
+// ahead of the -isystem CUDA toolkit path.
60+
+//
61+
+// CUB 3.2.0 (CUDA 13.2) ships an invalid template specialisation:
62+
+// struct ::cuda::proclaims_copyable_arguments<...> : ::cuda::std::true_type {};
63+
+// A globally-qualified class name in a specialisation is rejected by the compiler.
64+
+// We re-emit the parts Thrust needs internally with the fixed syntax (the
65+
+// specialisation is written inside the cuda namespace so the name is unqualified).
66+
+// cub::DeviceTransform itself is not used by ORT and is intentionally omitted.
67+
+
68+
+#pragma once
69+
+
70+
+#include <cub/version.cuh>
71+
+
72+
+#if CUB_VERSION >= 300200
73+
+
74+
+#include <cub/device/dispatch/dispatch_transform.cuh> // cub::detail::transform::dispatch_t (Thrust)
75+
+#include <cuda/__functional/address_stability.h> // cuda::proclaims_copyable_arguments primary
76+
+
77+
+CUB_NAMESPACE_BEGIN
78+
+namespace detail
79+
+{
80+
+template <typename T>
81+
+struct __return_constant
82+
+{
83+
+ T value;
84+
+ template <typename... Args>
85+
+ _CCCL_HOST_DEVICE T operator()(Args&&...) const { return value; }
86+
+};
87+
+} // namespace detail
88+
+CUB_NAMESPACE_END
89+
+
90+
+_CCCL_BEGIN_NAMESPACE_CUDA
91+
+template <typename T>
92+
+struct proclaims_copyable_arguments<CUB_NS_QUALIFIER::detail::__return_constant<T>>
93+
+ : ::cuda::std::true_type {};
94+
+_CCCL_END_NAMESPACE_CUDA
95+
+
96+
+#endif // CUB_VERSION >= 300200
197
diff --git a/orttraining/orttraining/training_ops/cuda/reduction/all_impl.cu b/orttraining/orttraining/training_ops/cuda/reduction/all_impl.cu
298
index 638c7d6637..73063765d7 100644
399
--- a/orttraining/orttraining/training_ops/cuda/reduction/all_impl.cu

onnxruntime/platform/gpu/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
<groupId>org.bytedeco</groupId>
1414
<artifactId>onnxruntime-platform-gpu</artifactId>
15-
<version>1.24.4-${project.parent.version}</version>
15+
<version>1.25.1-${project.parent.version}</version>
1616
<name>JavaCPP Presets Platform GPU for ONNX Runtime</name>
1717

1818
<properties>

onnxruntime/platform/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
<groupId>org.bytedeco</groupId>
1414
<artifactId>onnxruntime-platform</artifactId>
15-
<version>1.24.4-${project.parent.version}</version>
15+
<version>1.25.1-${project.parent.version}</version>
1616
<name>JavaCPP Presets Platform for ONNX Runtime</name>
1717

1818
<properties>

onnxruntime/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
<groupId>org.bytedeco</groupId>
1313
<artifactId>onnxruntime</artifactId>
14-
<version>1.24.4-${project.parent.version}</version>
14+
<version>1.25.1-${project.parent.version}</version>
1515
<name>JavaCPP Presets for ONNX Runtime</name>
1616

1717
<properties>

onnxruntime/samples/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@
1212
<dependency>
1313
<groupId>org.bytedeco</groupId>
1414
<artifactId>onnxruntime-platform</artifactId>
15-
<version>1.24.4-1.5.14-SNAPSHOT</version>
15+
<version>1.25.1-1.5.14-SNAPSHOT</version>
1616
</dependency>
1717

1818
<!-- Additional dependencies required to use CUDA and cuDNN -->
1919
<dependency>
2020
<groupId>org.bytedeco</groupId>
2121
<artifactId>onnxruntime-platform-gpu</artifactId>
22-
<version>1.24.4-1.5.14-SNAPSHOT</version>
22+
<version>1.25.1-1.5.14-SNAPSHOT</version>
2323
</dependency>
2424

2525
<!-- Additional dependencies to use bundled CUDA and cuDNN -->

onnxruntime/src/gen/java/org/bytedeco/onnxruntime/AllocatorImpl.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ public class AllocatorImpl extends BaseAllocator {
3434

3535

3636
public native Pointer Alloc(@Cast("size_t") long size);
37+
public native Pointer Reserve(@Cast("size_t") long size);
3738
public native @ByVal MemoryAllocation GetAllocation(@Cast("size_t") long size);
3839
public native void Free(Pointer p);
3940
public native @ByVal @Cast("Ort::ConstMemoryInfo*") MemoryInfoImpl GetInfo();
@@ -43,4 +44,10 @@ public class AllocatorImpl extends BaseAllocator {
4344
* @return A pointer to a KeyValuePairs object that will be filled with the allocator statistics.
4445
*/
4546
public native @ByVal KeyValuePairs GetStats();
47+
48+
/** \brief Release unused memory held by the allocator.
49+
*
50+
* Calls the optional Shrink function pointer if available; does nothing otherwise.
51+
*/
52+
public native void Shrink();
4653
}

0 commit comments

Comments
 (0)