[Build] no matching function for call to CheckInputs<onnxruntime::Tensor>

### Describe the issue

Hi!
I'm an Arch Linux package maintainer. With the new 1.24.1 we run into a build failure of the cuda backend. We use CUDA 13.1.1 with  GCC 15.2.1. The problematic line was last modified in 562760a567cdeb1acd05bac5ee370ce075c7a0bb

### Urgency

_No response_

### Target platform

CUDA 13.1.1

### Build script

Requires unreleased Arch Linux build script. Can be polished and published if necessary.

### Error / output

```
Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o
FAILED: [code=1] CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o 
/usr/bin/g++ -DCOMPILE_HOPPER_TMA_GEMMS -DCPUINFO_SUPPORTED -DCPUINFO_SUPPORTED_PLATFORM=1 -DDISABLE_CUSPARSE_DEPRECATED -DDNNL_OPENMP -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_ATEN -DENABLE_CPU_FP16_TRAINING_OPS -DENABLE_CUDA_NHWC_OPS -DENABLE_DLPACK -DENABLE_FP4 -DENABLE_FP8 -DENABLE_STRIDED_TENSORS -DENABLE_TRAINING -DENABLE_TRAINING_APIS -DENABLE_TRAINING_CORE -DENABLE_TRAINING_OPS -DHAS_STRING_VIEW=1 -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 -DORT_ENABLE_STREAM -DORT_USE_NCCL=1 -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_DNNL=1 -DUSE_FLASH_ATTENTION=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_NCCL_P2P=1 -D_GNU_SOURCE -D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/session -I/build/onnxruntime/src/onnxruntime-cuda/orttraining/orttraining/training_api/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/pytorch_cpuinfo-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build -I/build/onnxruntime/src/onnxruntime-cuda/onnxruntime -I/build/onnxruntime/src/onnxruntime-cuda/orttraining -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/gsl-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-src -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/onnx-build -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/flatbuffers-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/examples -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cutlass-src/tools/util/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/cudnn_frontend-src/include -I/build/onnxruntime/src/onnxruntime-cuda/build/_deps/eigen3-src -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/safeint-src -isystem /opt/cuda/targets/x86_64-linux/include/cccl -isystem /build/onnxruntime/src/onnxruntime-cuda/build/_deps/mp11-src/include -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection         -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/onnxruntime/src=/usr/src/debug/onnxruntime -flto=auto -I/opt/cuda/targets/x86_64-linux/include -ffunction-sections -fdata-sections -std=gnu++17 -fPIC -Wno-deprecated-declarations -Wall -Wextra -Wno-deprecated-copy -Wno-dangling-reference -Wno-nonnull-compare -Wno-deprecated-literal-operator -Wno-interference-size -Wno-reorder -Wno-error=sign-compare -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o.d -o CMakeFiles/onnxruntime_providers_cuda.dir/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc.o -c /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc
In file included from /opt/cuda/targets/x86_64-linux/include/cuda_fp4.h:347,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/providers/cuda/cuda_common.h:20,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:8:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__nv_fp4_storage_t __nv_cvt_double_to_fp4(double, __nv_fp4_interpretation_t, cudaRoundMode)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:107:56: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
  107 |                        const __nv_fp4_interpretation_t fp4_interpretation,
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp: In function ‘__half_raw __nv_cvt_fp4_to_halfraw(__nv_fp4_storage_t, __nv_fp4_interpretation_t)’:
/opt/cuda/targets/x86_64-linux/include/cuda_fp4.hpp:401:57: warning: unused parameter ‘fp4_interpretation’ [-Wunused-parameter]
  401 |                         const __nv_fp4_interpretation_t fp4_interpretation) {
      |                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/exceptions.h:13,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/core/common/safeint.h:6,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:7:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc: In member function ‘onnxruntime::common::Status onnxruntime::contrib::cuda::ShardedMoE<T>::ComputeInternal(onnxruntime::OpKernelContext*) const’:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: error: no matching function for call to ‘CheckInputs<onnxruntime::Tensor>(onnxruntime::contrib::MoEParameters&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, const onnxruntime::Tensor*&, const onnxruntime::Tensor*&, std::nullptr_t, int, bool, int)’ [-Wtemplate-body]
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   75 |       moe_params, input, router_probs,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        
   76 |       fc1_experts_weights, fc1_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   77 |       fc2_experts_weights, fc2_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   78 |       fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       
   79 |       1,  // no quantization so pack size is 1
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                
   80 |       activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
   81 |       0));  // no block-wise quantization for sharded MoE
      |       ~~                                                                      
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
  255 |     auto _status = (expr);                                                                                             \
      |                     ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |   ^~~~~~~~~~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:78: note: there is 1 candidate
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   75 |       moe_params, input, router_probs,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                        
   76 |       fc1_experts_weights, fc1_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   77 |       fc2_experts_weights, fc2_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
   78 |       fc3_experts_weights_optional, fc3_experts_bias_optional, nullptr,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       
   79 |       1,  // no quantization so pack size is 1
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                
   80 |       activation_type_ == ort_fastertransformer::ActivationType::SwiGLU,
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      
   81 |       0));  // no block-wise quantization for sharded MoE
      |       ~~                                                                      
/build/onnxruntime/src/onnxruntime-cuda/include/onnxruntime/core/common/common.h:255:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
  255 |     auto _status = (expr);                                                                                             \
      |                     ^~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:74:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
   74 |   ORT_RETURN_IF_ERROR(::onnxruntime::contrib::moe_helper::CheckInputs<Tensor>(
      |   ^~~~~~~~~~~~~~~~~~~
In file included from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/moe/moe_base.h:10,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.h:7,
                 from /build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cuda/collective/sharded_moe.cc:10:
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate 1: ‘template<class Tensor> onnxruntime::common::Status onnxruntime::contrib::moe_helper::CheckInputs(onnxruntime::contrib::MoEParameters&, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, const Tensor*, int64_t, bool, int64_t)’
   39 | Status CheckInputs(MoEParameters& parameters,
      |        ^~~~~~~~~~~
/build/onnxruntime/src/onnxruntime-cuda/onnxruntime/contrib_ops/cpu/moe/moe_helper.h:39:8: note: candidate expects 17 arguments, 15 provided
```

### Visual Studio Version

_No response_

### GCC / Compiler Version

15.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] no matching function for call to CheckInputs<onnxruntime::Tensor> #27269

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Build] no matching function for call to CheckInputs<onnxruntime::Tensor> #27269

Description

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions