[Bug] MIGraphX EP seeing HipMemcpy via onnxruntime::GPUDataTransfer::CopyTensor that break multi stream execution #16774
Description
Describe the issue
Currently running through a set of parity tests found in
/onnxruntime/onnxruntime/test/python/transformers/
primarily test_parity_gelu.py and test_parity_layernorm.py
We're experiencing out of order memcopies that seem to occur during kernel execution on our Navi21 card.
Here's an example output when we use ROCm tracing tools to view the sequence of events (captured with our rocprof and then used perfetto/chrome://tracing to view the traces:
I'm able to trigger this case consistently and cut down the GELU test to only perform 2 test runs per kernel which fails always on the second. I found that when we run only 1 test, this out of order error never happens.
I've also noticed that if I increase the hidden layer size in the test_parity_gelu.py test, I can get a point (around 100x hidden layer size) that the tests always pass and we don't get an overlap.
I've cut down the test_parity_gelu.py on a seperate branch here to my ORT fork off mainline: https://github.com/TedThemistokleous/onnxruntime/tree/debug_parity_tests.
The behavior goes away entirely if we add a sync between every single kernel run, thus undoing multi stream execution
The reason I'm bring this up to Onnxruntime is that after a few weeks of debugging this (configuration, previous builds, etc) is that I've been unable to find a working stable point using the Navi21 card (gfx 1030)
From a recent stack trace using GDB with the test I've found the following around said hipMemcpy thats being called via
onnxruntime::GPUDataTransfer::CopyTensor
here's the stack trace I've mentioned.
Thread 1 "python3" hit Breakpoint 2, 0x00007fffa6d4a3f4 in hipMemcpy () from /usr/local/lib/python3.8/dist-packages/torch/lib/libamdhip64.so
(gdb) bt
#0 0x00007fffa6d4a3f4 in hipMemcpy () from /usr/local/lib/python3.8/dist-packages/torch/lib/libamdhip64.so
#1 0x00007ffe44fcb218 in onnxruntime::GPUDataTransfer::CopyTensor(onnxruntime::Tensor const&, onnxruntime::Tensor&) const ()
from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/libonnxruntime_providers_migraphx.so
#2 0x00007ffd379b13a4 in onnxruntime::DataTransferManager::CopyTensor(onnxruntime::Tensor const&, onnxruntime::Tensor&) const ()
from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#3 0x00007ffd37a1d261 in onnxruntime::session_state_utils::DeserializeTensorProto(onnxruntime::Env const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnx::TensorProto const&, onnxruntime::MemBuffer const*, std::shared_ptr<onnxruntime::IAllocator> const&, std::shared_ptr<onnxruntime::IAllocator> const&, OrtValue&, onnxruntime::DataTransferManager const&, bool) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#4 0x00007ffd37a22448 in onnxruntime::session_state_utils::SaveInitializedTensors(onnxruntime::Env const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::GraphViewer const&, std::shared_ptr<onnxruntime::IAllocator> const&, onnxruntime::OrtValueNameIdxMap const&, std::vector<int, std::allocator<int> > const&, onnxruntime::ITensorAllocator&, std::function<onnxruntime::common::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, OrtValue const&, onnxruntime::OrtCallback const&, bool, bool)> const&, onnxruntime::logging::Logger const&, onnxruntime::DataTransferManager const&, onnxruntime::ExecutionPlanBase const&, onnxruntime::SessionOptions const&, std::function<void (onnxruntime::ITensorAllocator&)> const&) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#5 0x00007ffd37a17d94 in onnxruntime::SessionState::FinalizeSessionStateImpl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, onnxruntime::Node const*, onnxruntime::SessionOptions const&, bool, onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long> > >&, onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OrtMemoryInfo, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OrtMemoryInfo> > > const&, bool) [clone .localalias] () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#6 0x00007ffd37a189aa in onnxruntime::SessionState::FinalizeSessionState(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, bool, bool) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#7 0x00007ffd372f7eda in onnxruntime::InferenceSession::Initialize() () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#8 0x00007ffd372b5cde in onnxruntime::python::InitializeSession(onnxruntime::InferenceSession*, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#9 0x00007ffd372b6367 in pybind11::cpp_function::initialize<onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#52}, void, onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<ch--Type <RET> for more, q to quit, c to continue without paging--c
ar, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [42]>(onnxruntime::python::addObjectMethods(pybind11::module_&, std::function<void (onnxruntime::InferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&)>)::{lambda(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#52}&&, void (*)(onnxruntime::python::PyInferenceSession*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::allocator<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [42])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#10 0x00007ffd3725213a in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
#11 0x00000000005f6939 in PyCFunction_Call ()
#12 0x00000000005f7506 in _PyObject_MakeTpCall ()
#13 0x000000000050b8d3 in ?? ()
#14 0x0000000000570556 in _PyEval_EvalFrameDefault ()
#15 0x00000000005697da in _PyEval_EvalCodeWithName ()
#16 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#17 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#18 0x00000000005697da in _PyEval_EvalCodeWithName ()
#19 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#20 0x000000000059c427 in ?? ()
#21 0x00000000005f746f in _PyObject_MakeTpCall ()
#22 0x0000000000571019 in _PyEval_EvalFrameDefault ()
#23 0x00000000005697da in _PyEval_EvalCodeWithName ()
#24 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#25 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#26 0x00000000005697da in _PyEval_EvalCodeWithName ()
#27 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#28 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#29 0x00000000005697da in _PyEval_EvalCodeWithName ()
#30 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#31 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#32 0x00000000005697da in _PyEval_EvalCodeWithName ()
#33 0x000000000050b1f0 in ?? ()
#34 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#35 0x00000000005697da in _PyEval_EvalCodeWithName ()
#36 0x000000000050b1f0 in ?? ()
#37 0x000000000056c6d0 in _PyEval_EvalFrameDefault ()
#38 0x000000000050b07e in ?? ()
#39 0x000000000056b4ed in _PyEval_EvalFrameDefault ()
#40 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#41 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#42 0x00000000005697da in _PyEval_EvalCodeWithName ()
#43 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#44 0x000000000050b17c in ?? ()
#45 0x00000000005f60b2 in PyObject_Call ()
#46 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#47 0x00000000005697da in _PyEval_EvalCodeWithName ()
#48 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#49 0x000000000059d21e in ?? ()
#50 0x00000000005f7506 in _PyObject_MakeTpCall ()
#51 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#52 0x00000000005697da in _PyEval_EvalCodeWithName ()
#53 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#54 0x000000000050b17c in ?? ()
#55 0x00000000005f60b2 in PyObject_Call ()
#56 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#57 0x00000000005697da in _PyEval_EvalCodeWithName ()
#58 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#59 0x000000000059d21e in ?? ()
#60 0x00000000005f7506 in _PyObject_MakeTpCall ()
#61 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#62 0x00000000005697da in _PyEval_EvalCodeWithName ()
#63 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#64 0x000000000050b17c in ?? ()
#65 0x00000000005f60b2 in PyObject_Call ()
#66 0x000000000056ccfc in _PyEval_EvalFrameDefault ()
#67 0x00000000005697da in _PyEval_EvalCodeWithName ()
#68 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#69 0x000000000059d21e in ?? ()
#70 0x00000000005f7506 in _PyObject_MakeTpCall ()
#71 0x0000000000570787 in _PyEval_EvalFrameDefault ()
#72 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#73 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#74 0x00000000005f6ce6 in _PyFunction_Vectorcall ()
#75 0x000000000056b619 in _PyEval_EvalFrameDefault ()
#76 0x00000000005697da in _PyEval_EvalCodeWithName ()
#77 0x00000000005f6ec3 in _PyFunction_Vectorcall ()
#78 0x000000000059c427 in ?? ()
#79 0x00000000005f746f in _PyObject_MakeTpCall ()
#80 0x0000000000571019 in _PyEval_EvalFrameDefault ()
#81 0x00000000005697da in _PyEval_EvalCodeWithName ()
#82 0x000000000068e547 in PyEval_EvalCode ()
#83 0x000000000067dbf1 in ?? ()
#84 0x000000000067dc6f in ?? ()
#85 0x000000000067dd11 in ?? ()
#86 0x000000000067fe37 in PyRun_SimpleFileExFlags ()
#87 0x00000000006b7c82 in Py_RunMain ()
#88 0x00000000006b800d in Py_BytesMain ()
#89 0x00007ffff7df1083 in __libc_start_main (main=0x4ef140 <main>, argc=4, argv=0x7fffffffe548, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe538) at ../csu/libc-start.c:308
#90 0x00000000005fb85e in _start ()
(gdb)
Urgency
Urgent. Blocking builds of ROCm
Target platform
Navi21
Build script
set -e
ulimit -c unlimited
cd /onnxruntime
pip3 install -r requirements-dev.txt
# Add newer cmake to the path
export PATH="/opt/cmake/bin:$PATH"
export CXXFLAGS="-D__HIP_PLATFORM_AMD__=1 -w"
./build.sh --config Release --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ --update --build --build_wheel --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --skip_tests --rocm_home /opt/rocm --use_migraphx --migraphx_home /opt/rocm --rocm_version=`cat /opt/rocm/.info/version-dev` --allow_running_as_root
cd build/Linux/Release
#Add test launcher for onnxrt tests
echo 'InferenceSessionTests.CheckRunProfilerWithSessionOptions' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.CheckRunProfilerWithSessionOptions2' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.Test3LayerNestedSubgraph' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
echo 'InferenceSessionTests.Test2LayerNestedSubgraph' >> ../../../tools/ci_build/github/pai/migraphx-excluded-tests.txt
../../../tools/ci_build/github/pai/migraphx_test_launcher.sh || (gdb ./onnxruntime_test_all core -batch -ex bt && exit 1)
Error / output
Tests fail due to accuracy errors for test_parity_gelu.py and test_parity_layernorm.py
root@f12f6ee19192:/onnxruntime/onnxruntime/test/python/transformers# python3 test_parity_gelu.py --no_optimize
Testing: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=1, formula=1, fp32_gelu_op=True
/usr/local/lib/python3.8/dist-packages/onnx/mapping.py:27: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
int(TensorProto.STRING): np.dtype(np.object)
====== Diagnostic Run torch.onnx.export version 2.1.0.dev20230706+rocm5.5 ======
verbose: False, log level: 40
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
exported: ./temp/gelu_1_fp32.onnx
[FAILED] Passed_cases=1/2; Max_diff=13.15656566619873; Diff_count=2
F
======================================================================
FAIL: test_cuda (__main__.TestGeluParity)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_parity_gelu.py", line 236, in test_cuda
self.run_one(self.optimized, gpu, hidden_size=self.hidden_size, formula=i, verbose=self.verbose)
File "test_parity_gelu.py", line 188, in run_one
self.run_test(
File "test_parity_gelu.py", line 184, in run_test
self.assertTrue(num_failure == 0, "Failed: " + test_name)
AssertionError: False is not true : Failed: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=1, formula=1, fp32_gelu_op=True
----------------------------------------------------------------------
Ran 1 test in 1.470s
FAILED (failures=1)
For layernorm
[FAILED] Passed_cases=2/100; Max_diff=7.962882041931152; Diff_count=100
F
======================================================================
FAIL: test_cuda (__main__.TestLayerNormParity)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_parity_layernorm.py", line 307, in test_cuda
self.run_one(self.optimized, gpu, hidden_size=self.hidden_size, run_extra_tests=True, verbose=self.verbose)
File "test_parity_layernorm.py", line 239, in run_one
self.run_test(
File "test_parity_layernorm.py", line 233, in run_test
self.assertTrue(num_failure == 0, "Failed: " + test_name)
AssertionError: False is not true : Failed: device=cuda, float16=False, optimized=False, batch_size=4, sequence_length=2, hidden_size=768, epsilon=1e-05, cast_fp16=True, cast_onnx_only=False, formula=0
----------------------------------------------------------------------
Ran 2 tests in 1.860s
FAILED (failures=1)
Visual Studio Version
No response
GCC / Compiler Version
No response