forked from openbmc/pldm
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit c0fd23c
Fix crash when handling socket closure early
When the MCTP daemon closes the socket, the event loop may still
deliver `EPOLLIN` alongside `POLLHUP`. Previously, the callback did not
explicitly check for `POLLHUP` before attempting to receive data via
`recvMsg()`.
This led to a scenario where `recvMsg()` was called on a socket that
had already been closed, causing undefined behavior or memory
corruption (e.g., during buffer allocation or cleanup). While
`recvMsg()` returned a failure code (`PLDM_REQUESTER_RECV_FAIL`), the
subsequent call to `io.get_event().exit(0)` would result in a crash due
to the corrupted state.
This change adds an early check for `POLLHUP` (and `POLLERR`) at the
beginning of the IO callback and exits the event loop immediately when
detected. This prevents any further I/O operations on a dead socket and
ensures a clean shutdown without triggering undefined behavior.
Here is the gdb backtrace & its corresponding source code[1]:
```
Core was generated by `/usr/bin/pldmd'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=<optimized out>) at pthread_kill.c:44
#1 0x763c9344 in __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.39+git/sysdeps/posix/raise.c:26
#2 0x763b3f80 in __GI_abort () at abort.c:79
openbmc#3 0x764026e4 in __libc_message_impl (fmt=0x764e8050 "%s\n") at /usr/src/debug/glibc/2.39+git/sysdeps/posix/libc_fatal.c:132
openbmc#4 0x76419d8c in malloc_printerr (str=<optimized out>) at malloc.c:5772
openbmc#5 0x7641a950 in _int_free_create_chunk (av=0x20, av@entry=0x764f05fc <main_arena>, p=0x1000, p@entry=0x1c7f898, size=<optimized out>, size@entry=152, nextchunk=nextchunk@entry=0x1c7f930, nextsize=nextsize@entry=32) at malloc.c:4735
openbmc#6 0x7641bf94 in _int_free_merge_chunk (av=0x764f05fc <main_arena>, p=0x1c7f898, size=152) at malloc.c:4700
openbmc#7 0x7641c298 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=<optimized out>, have_lock@entry=0) at malloc.c:4646
openbmc#8 0x7641ec20 in __GI___libc_free (mem=mem@entry=0x1c7f8a0) at malloc.c:3398
openbmc#9 0x7681a3b8 in source_free (s=0x1c7f8a0) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:1125
openbmc#10 0x7681e6d8 in event_source_free (s=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:2590
openbmc#11 sd_event_source_unref (p=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:2595
openbmc#12 0x767eb624 in sd_bus_detach_event (bus=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-bus/sd-bus.c:3887
openbmc#13 0x767e0888 in sd_bus_close (bus=0x1bb4950) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-bus/sd-bus.c:1772
openbmc#14 sd_bus_close (bus=0x1bb4950) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-bus/sd-bus.c:1759
openbmc#15 0x767eea30 in quit_callback (event=<optimized out>, userdata=0x1bb4950) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-bus/sd-bus.c:3726
openbmc#16 0x76826288 in source_dispatch (s=s@entry=0x1c1d718) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4227
openbmc#17 0x7682697c in dispatch_exit (e=0x1bb42c0) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4349
openbmc#18 sd_event_dispatch (e=<optimized out>, e@entry=0x1bb42c0) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4801
openbmc#19 0x768282c8 in sd_event_run (e=<optimized out>, timeout=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4869
openbmc#20 0x76828558 in sd_event_loop (e=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4891
openbmc#21 0x7678f514 in sdeventplus::internal::SdEventImpl::sd_event_loop (event=<optimized out>, this=<optimized out>) at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/internal/sdevent.cpp:86
openbmc#22 sdeventplus::Event::loop (this=this@entry=0x7ed298e0) at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/event.cpp:81
openbmc#23 0x004f904c in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/pldm/1.0+git/pldmd/pldmd.cpp:433
```
[1]: https://github.com/ibm-openbmc/pldm/blob/424d866fe97b20343e7733fcab462e0ea1035379/pldmd/pldmd.cpp#L433
```
Steps to reproduce:
Keep reboot bmc, sometimes the pldmd will generate the coredump.
Backtrace from the coredump:
(gdb) bt
#0 0x00001726 in ?? ()
#1 0x00551fc4 in std::__n4861::coroutine_handle<void>::resume (this=0x1721648) at /usr/include/c++/13.2.0/coroutine:135
#2 std::__n4861::coroutine_handle<void>::operator() (this=0x1721648) at /usr/include/c++/13.2.0/coroutine:133
openbmc#3 pldm::requester::SendRecvPldmMsg<pldm::requester::Handler<pldm::requester::Request> >::HandleResponse (this=0x1721648, eid=<optimized out>, response=<optimized out>,
length=<optimized out>) at /usr/src/debug/pldm/1.0+git/requester/handler.hpp:462
openbmc#4 0x00544454 in pldm::mctp_socket::Handler::initSocket(int, int, std::vector<unsigned char, std::allocator<unsigned char> > const&)::{lambda(sdeventplus::source::IO&, int, unsigned int)#1}::operator()(sdeventplus::source::IO&, int, unsigned int) () at /usr/include/function2/function2.hpp:1044
openbmc#5 0x76df0210 in fu2::abi_400::detail::type_erasure::tables::vtable<fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >::invoke<0u, fu2::abi_400::detail::type_erasure::data_accessor*, unsigned int const&, sdeventplus::source::IO&, int, unsigned int>(fu2::abi_400::detail::type_erasure::data_accessor*&&, unsigned int const&, sdeventplus::source::IO&, int&&, unsigned int&&) const (this=<optimized out>) at /usr/include/function2/function2.hpp:1046
openbmc#6 fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >::invoke<0u, fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::IO&, int, unsigned int>(fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::IO&, int&&, unsigned int&&) (erasure=...) at /usr/include/function2/function2.hpp:1268
openbmc#7 fu2::abi_400::detail::type_erasure::invocation_table::operator_impl<0u, fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >, void (sdeventplus::source::IO&, int, unsigned int)>::operator()(sdeventplus::source::IO&, int, unsigned int) (args#2=1, args#1=8, args#0=..., this=<optimized out>) at /usr/include/function2/function2.hpp:826
openbmc#8 std::__invoke_impl<void, fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&>(std::__invoke_other, fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&) (__f=...) at /usr/include/c++/13.2.0/bits/invoke.h:61
openbmc#9 std::__invoke<fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&>(fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&) (
__fn=...) at /usr/include/c++/13.2.0/bits/invoke.h:96
openbmc#10 std::invoke<fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&>(fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >&, sdeventplus::source::detail::IOData&, int&, unsigned int&) (
__fn=...) at /usr/include/c++/13.2.0/functional:113
openbmc#11 sdeventplus::source::Base::sourceCallback<fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_default>, fu2::abi_400::detail::property<true, false, void (sdeventplus::source::IO&, int, unsigned int)> >, sdeventplus::source::detail::IOData, &sdeventplus::source::IO::get_callback, int&, unsigned int&>(char const*, sd_event_source*, void*, int&, unsigned int&) (name=0x76e01ed4 "ioCallback", userdata=<optimized out>) at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/source/base.hpp:213
openbmc#12 sdeventplus::source::IO::ioCallback (source=source@entry=warning: could not convert 'sd_event_source' from the host encoding (UTF-8) to UTF-32.
This normally should not happen, please file a bug report.
0x172a3a8, fd=8, revents=1, userdata=<optimized out>)
at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/source/io.cpp:89
openbmc#13 0x76e89528 in source_dispatch (s=0x172a3a8) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4187
openbmc#14 0x76e89838 in sd_event_dispatch (e=<optimized out>, e@entry=0x17193b8) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4808
openbmc#15 0x76e8b2c8 in sd_event_run (e=<optimized out>, timeout=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4869
openbmc#16 0x76e8b558 in sd_event_loop (e=<optimized out>) at /usr/src/debug/systemd/255.4/src/libsystemd/sd-event/sd-event.c:4891
openbmc#17 0x76df2514 in sdeventplus::internal::SdEventImpl::sd_event_loop (event=<optimized out>, this=<optimized out>)
at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/internal/sdevent.cpp:89
openbmc#18 sdeventplus::Event::loop (this=this@entry=0x7eadc860) at /usr/src/debug/sdeventplus/0.1+git/src/sdeventplus/event.cpp:82
openbmc#19 0x005003e4 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/pldm/1.0+git/pldmd/pldmd.cpp:370
https://gitlab-master.nvidia.com/dgx/bmc/pldm/-/blob/BFBMC-24.10_br/pldmd/pldmd.cpp?ref_type=heads#L370
```
Tested:
1. The firmware can be updated normally as before by PLDM.
2. No core dump generated by pldmd after keeping reboot BMC for 1000
cycles.
Change-Id: Idc5fb2cb849f2af7c12b67cdd509d81b2587c8ba
Signed-off-by: Manojkiran Eda <[email protected]>1 parent cb3862f commit c0fd23cCopy full SHA for c0fd23c
File tree
Expand file treeCollapse file tree
1 file changed
+7
-1
lines changedOpen diff view settings
Filter options
- pldmd
Expand file treeCollapse file tree
1 file changed
+7
-1
lines changedOpen diff view settings
Collapse file
pldmd/mctp_demux_handler.cpp
Copy file name to clipboardExpand all lines: pldmd/mctp_demux_handler.cpp+7-1Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
130 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
131 | 137 | | |
132 | 138 | | |
133 | 139 | | |
| |||
0 commit comments