-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
buildbuild issues; typically submitted using templatebuild issues; typically submitted using template
Description
Describe the issue
Hi!
I'm an Arch Linux package maintainer. With the new 1.24.1 we run into a build failure of the cuda backend. We use CUDA 13.1.1 with GCC 15.2.1. The problematic line was last modified in 562760a
Urgency
No response
Target platform
CUDA 13.1.1
Build script
Requires unreleased Arch Linux build script. Can be polished and published if necessary.
Error / output
Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o
FAILED: [code=1] CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o
/usr/bin/g++ -DCOMPILE_HOPPER_TMA_GEMMS -DCPUINFO_SUPPORTED -DCPUINFO_SUPPORTED_PLATFORM=1 -DDISABLE_CUSPARSE_DEPRECATED -DDNNL_OPENMP -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_ATEN -DENABLE_CPU_FP16_TRAINING_OPS -DENABLE_CUDA_NHWC_OPS -DENABLE_DLPACK -DENABLE_FP4 -DENABLE_FP8 -DENABLE_STRIDED_TENSORS -DENABLE_TRAINING -DENABLE_TRAINING_APIS -DENABLE_TRAINING_CORE -DENABLE_TRAINING_OPS -DHAS_STRING_VIEW=1 -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 -DORT_ENABLE_STREAM -DORT_USE_NCCL=1 -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_DNNL=1 -DUSE_FLASH_ATTENTION=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_NCCL_P2P=1 -D_GNU_SOURCE -D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/session -I/build/onnxruntime/src/onnxruntime-cuda/orttraining/orttraining/training_api/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/pytorch_cpuinfo-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build -I/build/onnxruntime/src/onnxruntime-cuda/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/orttraining -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/gsl-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-src -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-build -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/flatbuffers-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/examples -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/tools/util/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cudnn_frontend-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/eigen3-src -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/safeint-src -isystem /opt/cuda/targets/x86_64-linux/include/cccl -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/mp11-src/include -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/onnxruntime/src=/usr/src/debug/onnxruntime -flto=auto -I/opt/cuda/targets/x86_64-linux/include -ffunction-sections -fdata-sections -std=gnu++17 -fPIC -Wno-deprecated-declarations -Wall -Wextra -Wno-deprecated-copy -Wno-dangling-reference -Wno-nonnull-compare -Wno-deprecated-literal-operator -Wno-interference-size -Wno-reorder -Wno-error=sign-compare -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o.d -o CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -c /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc
In file included from /opt/cuda/targets/x86_64-linux/include/cuda_fp4.h:347,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/providers/cuda/cuda_common.h:20,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:8:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__nv_fp4_storage_t __nv_cvt_double_to_fp4(double, __nv_fp4_interpretation_t, cudaRoundMode)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:107:56: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
107 | const __nv_fp4_interpretation_t fp4_interpretation,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__half_raw __nv_cvt_fp4_to_halfraw(__nv_fp4_storage_t, __nv_fp4_interpretation_t)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:401:57: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
401 | const __nv_fp4_interpretation_t fp4_interpretation) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/exceptions.h:13,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/common/safeint.h:6,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:7:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc: In member function ‘onnxruntime::common::Status onnxruntime::contrib::cuda::ShardedMoE<T>::ComputeInternal(onnxruntime::OpKernelContext*) const’:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: error: no matching function for call to ‘CheckInputs<onnxruntime::Tensor>(onnxruntime::contrib::MoEParameters&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, int, bool, int)’ [-Wtemplate-body]
74 | ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
75 | moe_params, input, router_probs,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76 | fc1_experts_weights, fc1_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77 | fc2_experts_weights, fc2_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
78 | fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79 | 1, // no quantization so pack size is 1
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80 | activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81 | 0)); // no block-wise quantization for sharded MoE
| ~~
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
255 | auto _status = (expr); \
| ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
74 | ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
| ^~~~~~~~~~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: note: there is 1 candidate
74 | ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
75 | moe_params, input, router_probs,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76 | fc1_experts_weights, fc1_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77 | fc2_experts_weights, fc2_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
78 | fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79 | 1, // no quantization so pack size is 1
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80 | activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81 | 0)); // no block-wise quantization for sharded MoE
| ~~
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
255 | auto _status = (expr); \
| ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
74 | ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
| ^~~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/moe/moe_base.h:10,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.h:7,
from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:10:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate 1: ‘template<class Tensor> onnxruntime::common::Status onnxruntime::contrib::moe_helper::CheckInputs(onnxruntime::contrib::MoEParameters&, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, int64_t, bool, int64_t)’
39 | Status CheckInputs(MoEParameters& parameters,
| ^~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate expects 17 arguments, 15 provided
Visual Studio Version
No response
GCC / Compiler Version
15.2.1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
buildbuild issues; typically submitted using templatebuild issues; typically submitted using template