Skip to content

[BUG] Cuda Softmax op when axis != rank - 1 #22554

Open
@kuramawzw2024

Description

@kuramawzw2024

Describe the issue

when axis != rank -1, cuda softmax can tranpose input , then call softmax kernel, then transpose back;

 auto temp_input = Tensor::Create(X->DataType(), TensorShape(transposed_input_dims), alloc);

    // Perform the transpose
    ORT_RETURN_IF_ERROR(Transpose::DoTranspose(GetDeviceProp(),
                                               Stream(ctx),
                                               GetCublasHandle(ctx),
                                               permutation, *X, *temp_input));
    transposed_input = std::move(temp_input);

    // Allocate memory for the intermediate output
    intermediate_output = Tensor::Create(Y->DataType(), TensorShape(transposed_input_dims), alloc);

temp_input, intermediate_output alloc by temp allocator, but bind a null Stream,
so session run in multiple thread, this buffer may be use in multiple thread(multiple stream), may get wrong result

To reproduce

session use by multiple thread

Urgency

yes

Platform

Linux

OS Version

centos 7

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

v1.19

ONNX Runtime API

C++

Architecture

X86

Execution Provider

CUDA

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:CUDAissues related to the CUDA execution providerstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions