Skip to content

[Build] no matching function for call to CheckInputs<onnxruntime::Tensor> #27269

@tpkessler

Description

@tpkessler

Describe the issue

Hi!
I'm an Arch Linux package maintainer. With the new 1.24.1 we run into a build failure of the cuda backend. We use CUDA 13.1.1 with GCC 15.2.1. The problematic line was last modified in 562760a

Urgency

No response

Target platform

CUDA 13.1.1

Build script

Requires unreleased Arch Linux build script. Can be polished and published if necessary.

Error / output

Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o
FAILED: [code=1] CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o 
/usr/bin/g++ -DCOMPILE_HOPPER_TMA_GEMMS -DCPUINFO_SUPPORTED -DCPUINFO_SUPPORTED_PLATFORM=1 -DDISABLE_CUSPARSE_DEPRECATED -DDNNL_OPENMP -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_ATEN -DENABLE_CPU_FP16_TRAINING_OPS -DENABLE_CUDA_NHWC_OPS -DENABLE_DLPACK -DENABLE_FP4 -DENABLE_FP8 -DENABLE_STRIDED_TENSORS -DENABLE_TRAINING -DENABLE_TRAINING_APIS -DENABLE_TRAINING_CORE -DENABLE_TRAINING_OPS -DHAS_STRING_VIEW=1 -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 -DORT_ENABLE_STREAM -DORT_USE_NCCL=1 -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_DNNL=1 -DUSE_FLASH_ATTENTION=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_NCCL_P2P=1 -D_GNU_SOURCE -D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/session -I/build/onnxruntime/src/onnxruntime-cuda/orttraining/orttraining/training_api/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/pytorch_cpuinfo-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build -I/build/onnxruntime/src/onnxruntime-cuda/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/orttraining -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/gsl-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-src -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-build -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/flatbuffers-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/examples -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/tools/util/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cudnn_frontend-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/eigen3-src -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/safeint-src -isystem /opt/cuda/targets/x86_64-linux/include/cccl -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/mp11-src/include -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection         -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/onnxruntime/src=/usr/src/debug/onnxruntime -flto=auto -I/opt/cuda/targets/x86_64-linux/include -ffunction-sections -fdata-sections -std=gnu++17 -fPIC -Wno-deprecated-declarations -Wall -Wextra -Wno-deprecated-copy -Wno-dangling-reference -Wno-nonnull-compare -Wno-deprecated-literal-operator -Wno-interference-size -Wno-reorder -Wno-error=sign-compare -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o.d -o CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -c /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc
In file included from /opt/cuda/targets/x86_64-linux/include/cuda_fp4.h:347,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/providers/cuda/cuda_common.h:20,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:8:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__nv_fp4_storage_t __nv_cvt_double_to_fp4(double, __nv_fp4_interpretation_t, cudaRoundMode)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:107:56: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
  107 |                        const __nv_fp4_interpretation_t fp4_interpretation,
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__half_raw __nv_cvt_fp4_to_halfraw(__nv_fp4_storage_t, __nv_fp4_interpretation_t)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:401:57: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
  401 |                         const __nv_fp4_interpretation_t fp4_interpretation) {
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/exceptions.h:13,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/common/safeint.h:6,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:7:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc: In member function ‘onnxruntime::common::Status onnxruntime::contrib::cuda::ShardedMoE<T>::ComputeInternal(onnxruntime::OpKernelContext*) const’:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: error: no matching function for call to ‘CheckInputs<onnxruntime::Tensor>(onnxruntime::contrib::MoEParameters&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, int, bool, int)’ [-Wtemplate-body]
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   75 |       moe_params, input, router_probs,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        
   76 |       fc1_experts_weights, fc1_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   77 |       fc2_experts_weights, fc2_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   78 |       fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       
   79 |       1,  // no quantization so pack size is 1
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                
   80 |       activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
   81 |       0));  // no block-wise quantization for sharded MoE
      |       ~~                                                                      
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
  255 |     auto _status = (expr);                                                                                             \
      |                     ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |   ^~~~~~~~~~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: note: there is 1 candidate
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   75 |       moe_params, input, router_probs,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        
   76 |       fc1_experts_weights, fc1_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   77 |       fc2_experts_weights, fc2_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   78 |       fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       
   79 |       1,  // no quantization so pack size is 1
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                
   80 |       activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
   81 |       0));  // no block-wise quantization for sharded MoE
      |       ~~                                                                      
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
  255 |     auto _status = (expr);                                                                                             \
      |                     ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |   ^~~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/moe/moe_base.h:10,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.h:7,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:10:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate 1: ‘template<class Tensor> onnxruntime::common::Status onnxruntime::contrib::moe_helper::CheckInputs(onnxruntime::contrib::MoEParameters&, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, int64_t, bool, int64_t)’
   39 | Status CheckInputs(MoEParameters& parameters,
      |        ^~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate expects 17 arguments, 15 provided

Visual Studio Version

No response

GCC / Compiler Version

15.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildbuild issues; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions