Skip to content

[Bug] MIGraphX EP seeing HipMemcpy via onnxruntime::GPUDataTransfer::CopyTensor that break multi stream execution #16774

Open
@TedThemistokleous

Description

Describe the issue

Currently running through a set of parity tests found in

/onnxruntime/onnxruntime/test/python/transformers/

primarily test_parity_gelu.py and test_parity_layernorm.py

We're experiencing out of order memcopies that seem to occur during kernel execution on our Navi21 card.

Here's an example output when we use ROCm tracing tools to view the sequence of events (captured with our rocprof and then used perfetto/chrome://tracing to view the traces:

image

I'm able to trigger this case consistently and cut down the GELU test to only perform 2 test runs per kernel which fails always on the second. I found that when we run only 1 test, this out of order error never happens.

I've also noticed that if I increase the hidden layer size in the test_parity_gelu.py test, I can get a point (around 100x hidden layer size) that the tests always pass and we don't get an overlap.

I've cut down the test_parity_gelu.py on a seperate branch here to my ORT fork off mainline: https://github.com/TedThemistokleous/onnxruntime/tree/debug_parity_tests.

The behavior goes away entirely if we add a sync between every single kernel run, thus undoing multi stream execution

The reason I'm bring this up to Onnxruntime is that after a few weeks of debugging this (configuration, previous builds, etc) is that I've been unable to find a working stable point using the Navi21 card (gfx 1030)

From a recent stack trace using GDB with the test I've found the following around said hipMemcpy thats being called via
onnxruntime::GPUDataTransfer::CopyTensor

here's the stack trace I've mentioned.

Thread 1 "python3" hit Breakpoint 2, 0x00007fffa6d4a3f4 in hipMemcpy () from /usr/local/lib/python3.8/dist-packages/torch/lib/libamdhip64.so
(gdb) bt
#0  0x00007fffa6d4a3f4 in hipMemcpy () from /usr/local/lib/python3.8/dist-packages/torch/lib/libamdhip64.so
#1  0x00007ffe44fcb218 in onnxruntime::GPUDataTransfer::CopyTensor(onnxruntime::Tensor const&, onnxruntime::Tensor&) const ()
   from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/libonnxruntime_providers_migraphx.so
#2  0x00007ffd379b13a4 in onnxruntime::DataTransferManager::CopyTensor(onnxruntime::Tensor const&, onnxruntime::Tensor&) const ()
   from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#3  0x00007ffd37a1d261 in onnxruntime::session_state_utils::DeserializeTensorProto(onnxruntime::Env const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnx::TensorProto const&, onnxruntime::MemBuffer const*, std::shared_ptr<onnxruntime::IAllocator> const&, std::shared_ptr<onnxruntime::IAllocator> const&, OrtValue&, onnxruntime::DataTransferManager const&, bool) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#4  0x00007ffd37a22448 in onnxruntime::session_state_utils::SaveInitializedTensors(onnxruntime::Env const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::GraphViewer const&, std::shared_ptr<onnxruntime::IAllocator> const&, onnxruntime::OrtValueNameIdxMap const&, std::vector<int, std::allocator<int> > const&, onnxruntime::ITensorAllocator&, std::function<onnxruntime::common::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, OrtValue const&, onnxruntime::OrtCallback const&, bool, bool)> const&, onnxruntime::logging::Logger const&, onnxruntime::DataTransferManager const&, onnxruntime::ExecutionPlanBase const&, onnxruntime::SessionOptions const&, std::function<void (onnxruntime::ITensorAllocator&)> const&) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#5  0x00007ffd37a17d94 in onnxruntime::SessionState::FinalizeSessionStateImpl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, onnxruntime::Node const*, onnxruntime::SessionOptions const&, bool, onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long> > >&, onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OrtMemoryInfo, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OrtMemoryInfo> > > const&, bool) [clone .localalias] () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#6  0x00007ffd37a189aa in onnxruntime::SessionState::FinalizeSessionState(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, bool, bool) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#7  0x00007ffd372f7eda in onnxruntime::InferenceSession::Initialize() () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#8  0x00007ffd372b5cde in onnxruntime::python::InitializeSession(onnxruntime::InferenceSession*, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#9  0x00007ffd372b6367 in pybind11::cpp_function::initialize<onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#52}, void, onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<ch--Type <RET> for more, q to quit, c to continue without paging--c
ar, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [42]>(onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#52}&&, void (*)(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [42])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#10 0x00007ffd3725213a in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#11 0x00000000005f6939 in PyCFunction_Call ()
#12 0x00000000005f7506 in _PyObject_MakeTpCall ()
#13 0x000000000050b8d3 in ?? ()
#14 0x0000000000570556 in _PyEval_EvalFrameDefault ()
#15 0x00000000005697da in _PyEval_EvalCodeWithName ()
#16 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#17 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#18 0x00000000005697da in _PyEval_EvalCodeWithName ()
#19 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#20 0x000000000059c427 in ?? ()
#21 0x00000000005f746f in _PyObject_MakeTpCall ()
#22 0x0000000000571019 in _PyEval_EvalFrameDefault ()
#23 0x00000000005697da in _PyEval_EvalCodeWithName ()
#24 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#25 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#26 0x00000000005697da in _PyEval_EvalCodeWithName ()
#27 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#28 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#29 0x00000000005697da in _PyEval_EvalCodeWithName ()
#30 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#31 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#32 0x00000000005697da in _PyEval_EvalCodeWithName ()
#33 0x000000000050b1f0 in ?? ()
#34 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#35 0x00000000005697da in _PyEval_EvalCodeWithName ()
#36 0x000000000050b1f0 in ?? ()
#37 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#38 0x000000000050b07e in ?? ()
#39 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#40 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#41 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#42 0x00000000005697da in _PyEval_EvalCodeWithName ()
#43 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#44 0x000000000050b17c in ?? ()
#45 0x00000000005f60b2 in PyObject_Call ()
#46 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#47 0x00000000005697da in _PyEval_EvalCodeWithName ()
#48 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#49 0x000000000059d21e in ?? ()
#50 0x00000000005f7506 in _PyObject_MakeTpCall ()
#51 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#52 0x00000000005697da in _PyEval_EvalCodeWithName ()
#53 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#54 0x000000000050b17c in ?? ()
#55 0x00000000005f60b2 in PyObject_Call ()
#56 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#57 0x00000000005697da in _PyEval_EvalCodeWithName ()
#58 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#59 0x000000000059d21e in ?? ()
#60 0x00000000005f7506 in _PyObject_MakeTpCall ()
#61 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#62 0x00000000005697da in _PyEval_EvalCodeWithName ()
#63 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#64 0x000000000050b17c in ?? ()
#65 0x00000000005f60b2 in PyObject_Call ()
#66 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#67 0x00000000005697da in _PyEval_EvalCodeWithName ()
#68 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#69 0x000000000059d21e in ?? ()
#70 0x00000000005f7506 in _PyObject_MakeTpCall ()
#71 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#72 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#73 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#74 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#75 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#76 0x00000000005697da in _PyEval_EvalCodeWithName ()
#77 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#78 0x000000000059c427 in ?? ()
#79 0x00000000005f746f in _PyObject_MakeTpCall ()
#80 0x0000000000571019 in _PyEval_EvalFrameDefault ()
#81 0x00000000005697da in _PyEval_EvalCodeWithName ()
#82 0x000000000068e547 in PyEval_EvalCode ()
#83 0x000000000067dbf1 in ?? ()
#84 0x000000000067dc6f in ?? ()
#85 0x000000000067dd11 in ?? ()
#86 0x000000000067fe37 in PyRun_SimpleFileExFlags ()
#87 0x00000000006b7c82 in Py_RunMain ()
#88 0x00000000006b800d in Py_BytesMain ()
#89 0x00007ffff7df1083 in __libc_start_main (main=0x4ef140 <main>, argc=4, argv=0x7fffffffe548, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe538) at ../csu/libc-start.c:308
#90 0x00000000005fb85e in _start ()
(gdb)

Urgency

Urgent. Blocking builds of ROCm

Target platform

Navi21

Build script

set -e

ulimit -c unlimited

cd /onnxruntime
pip3 install -r requirements-dev.txt
# Add newer cmake to the path
export PATH="/opt/cmake/bin:$PATH"
export CXXFLAGS="-D__HIP_PLATFORM_AMD__=1 -w"
./build.sh --config Release  --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ --update --build --build_wheel --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --skip_tests --rocm_home /opt/rocm --use_migraphx --migraphx_home /opt/rocm --rocm_version=`cat /opt/rocm/.info/version-dev` --allow_running_as_root

cd build/Linux/Release
#Add test launcher for onnxrt tests

echo 'InferenceSessionTests.CheckRunProfilerWithSessionOptions' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.CheckRunProfilerWithSessionOptions2' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.Test3LayerNestedSubgraph' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.Test2LayerNestedSubgraph' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
../../../tools/ci_build/github/pai/migraphx_test_launcher.sh || (gdb ./onnxruntime_test_all core -batch -ex bt && exit 1)

Error / output

Tests fail due to accuracy errors for test_parity_gelu.py and test_parity_layernorm.py

root@f12f6ee19192:/onnxruntime/onnxruntime/test/python/transformers# python3 test_parity_gelu.py --no_optimize

Testing: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=1, formula=1, fp32_gelu_op=True
/usr/local/lib/python3.8/dist-packages/onnx/mapping.py:27: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  int(TensorProto.STRING): np.dtype(np.object)
====== Diagnostic Run torch.onnx.export version 2.1.0.dev20230706+rocm5.5 ======
verbose: False, log level: 40
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

exported: ./temp/gelu_1_fp32.onnx
[FAILED] Passed_cases=1/2; Max_diff=13.15656566619873; Diff_count=2
F
======================================================================
FAIL: test_cuda (__main__.TestGeluParity)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_parity_gelu.py", line 236, in test_cuda
    self.run_one(self.optimized, gpu, hidden_size=self.hidden_size, formula=i, verbose=self.verbose)
  File "test_parity_gelu.py", line 188, in run_one
    self.run_test(
  File "test_parity_gelu.py", line 184, in run_test
    self.assertTrue(num_failure == 0, "Failed: " + test_name)
AssertionError: False is not true : Failed: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=1, formula=1, fp32_gelu_op=True

----------------------------------------------------------------------
Ran 1 test in 1.470s

FAILED (failures=1)

For layernorm

[FAILED] Passed_cases=2/100; Max_diff=7.962882041931152; Diff_count=100
F
======================================================================
FAIL: test_cuda (__main__.TestLayerNormParity)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_parity_layernorm.py", line 307, in test_cuda
    self.run_one(self.optimized, gpu, hidden_size=self.hidden_size, run_extra_tests=True, verbose=self.verbose)
  File "test_parity_layernorm.py", line 239, in run_one
    self.run_test(
  File "test_parity_layernorm.py", line 233, in run_test
    self.assertTrue(num_failure == 0, "Failed: " + test_name)
AssertionError: False is not true : Failed: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=768, epsilon=1e-05, cast_fp16=True, cast_onnx_only=False, formula=0

----------------------------------------------------------------------
Ran 2 tests in 1.860s

FAILED (failures=1)

Visual Studio Version

No response

GCC / Compiler Version

No response

Metadata

Assignees

No one assigned

    Labels

    buildbuild issues; typically submitted using templateep:CUDAissues related to the CUDA execution providerep:MIGraphXissues related to AMD MI GraphX execution providerep:ROCmquestions/issues related to ROCm execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions