Description
After enough runs (7-9K, but in one instance, it was 42 times) on my fastest ARM machine - Mac mini M2, the tst-pin.cc
would hang like this:
(gdb) osv info threads --timers
id address name cpu status vruntime total_cpu_time location
1 (0x400040771040) reclaimer cpu0 waiting 4.78214e-25 833 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:53
2 (0x400040857040) page_pool_l2 cpu0 waiting 2.04507e-22 309708 memory::page_pool::l2::fill_thread() at core/mempool.cc:1525
3 (0x400040882040) itimer-real cpu? unstarted 0 0 ?? at arch/aarch64/entry.S:135
4 (0x4000408a4040) itimer-virt cpu? unstarted 0 0 ?? at arch/aarch64/entry.S:135
5 (0x4000408e7040) balancer0 cpu0 waiting 1.6479e-23 18625 ?? at arch/aarch64/entry.S:135
timers: 0.107022875
6 (0x400040909040) rcu0 cpu0 waiting 8.27181e-25 1417 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
7 (0x40004092e040) page_pool_l1_0 cpu0 waiting 1.15598e-22 110247 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
8 (0x400040950040) percpu0 cpu0 waiting 2.1972e-25 375 workman::call_of_duty() at core/percpu-worker.cc:92
9 (0x400040973040) async_worker0 cpu0 waiting 2.45569e-25 416 async::async_worker::run() at core/async.cc:177
10 (0x400040996040) idle0 cpu0 running inf 17085277590378 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/14/bits/std_function.h:290
11 (0x4000409b8040) idle1 cpu1 running inf 17086932350426 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/14/bits/std_function.h:290
12 (0x400040a01040) idle2 cpu2 running inf 17086912847839 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/14/bits/std_function.h:290
13 (0x400040a4a040) idle3 cpu3 running inf 17086897666638 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/14/bits/std_function.h:290
14 (0x400040a93040) >init cpu0 waiting 6.15216e-24 708 std::_Function_handler<void(), sched::thread::reaper::reaper()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/14/bits/std_function.h:290
15 (0x400040aba040) rcu3 cpu3 waiting 6.44942e-24 1625 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
16 (0x400040add040) thread taskq cpu0 waiting 6.29433e-24 958 synch_port::_msleep(void*, mtx*, int, char const*, int) at bsd/porting/synch.cc:102
17 (0x400040af5040) page_pool_l1_3 cpu3 waiting 6.3848e-24 1500 memory::page_pool::l1::fill_thread() at core/mempool.cc:1407
18 (0x400040b06040) callout cpu0 waiting 3.74299e-23 4791 _callout_thread() at bsd/porting/callout.cc:138
timers: 0.207748666
19 (0x400040b3a040) percpu3 cpu3 waiting 6.24263e-24 1250 workman::call_of_duty() at core/percpu-worker.cc:92
20 (0x400040b5d040) async_worker3 cpu3 waiting 6.17801e-24 1125 async::async_worker::run() at core/async.cc:177
21 (0x400040b84040) rcu1 cpu1 waiting 6.34603e-24 1417 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
22 (0x400040bae040) page_pool_l1_1 cpu1 waiting 2.33508e-21 3944959 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
23 (0x400040bd4040) percpu1 cpu1 waiting 6.11338e-24 1000 workman::call_of_duty() at core/percpu-worker.cc:92
24 (0x400040bfc040) async_worker1 cpu1 waiting 6.16508e-24 1083 async::async_worker::run() at core/async.cc:177
25 (0x400040c2e040) rcu2 cpu2 waiting 5.82904e-24 500 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
26 (0x400040c56040) page_pool_l1_2 cpu2 waiting 8.11542e-23 105832 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
27 (0x400040c78040) percpu2 cpu2 waiting 6.17801e-24 1083 workman::call_of_duty() at core/percpu-worker.cc:92
28 (0x400040d3f040) async_worker2 cpu2 waiting 6.21678e-24 1167 async::async_worker::run() at core/async.cc:177
29 (0x4000412a4040) netisr cpu0 waiting 1.28988e-23 1999 netisr_osv_thread_wrapper(void (*)(void*), void*) at bsd/sys/net/netisr1.cc:26
30 (0x400041355040) >init cpu0 waiting 3.01762e-21 5137084 sched::thread::join() at core/sched.cc:1565
joining on <optimized out>
31 (0x400041371040) virtio-net-rx cpu0 waiting 3.42685e-27 338039 virtio::virtio_driver::wait_for_queue(virtio::vring*, bool (virtio::vring::*)() const) at drivers/virtio.cc:159
32 (0x4000413a4040) virtio-tx-0 cpu0 waiting 2.9737e-30 500 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
33 (0x4000413d7040) virtio-tx-1 cpu1 waiting 1.05974e-29 1875 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<o--Type <RET> for more, q to quit, c to continue without paging--c
sv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
34 (0x40004140a040) virtio-tx-2 cpu2 waiting 1.15012e-29 2000 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
35 (0x40004143d040) virtio-tx-3 cpu3 waiting 6.33219e-30 1125 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
36 (0x40004146f040) virtio-blk cpu0 waiting 4.61412e-24 7958 virtio::virtio_driver::wait_for_queue(virtio::vring*, bool (virtio::vring::*)() const) at drivers/virtio.cc:159
37 (0x400041711040) virtio-rng cpu0 waiting 1.47342e-24 2499 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:53
38 (0x400041736040) pl011-input cpu0 waiting 1.82755e-23 30834 console::LineDiscipline::read_poll(console::console_driver*) at drivers/line-discipline.cc:86
39 (0x400041781040) rand_harvestq cpu0 waiting 8.65955e-25 1458 synch_port::_msleep(void*, mtx*, int, char const*, int) at bsd/porting/synch.cc:102
timers: 0.112718500
40 (0x4000417b1040) dhcp cpu0 waiting 6.4197e-23 108126 dhcp::dhcp_worker::dhcp_worker_fn() at core/dhcp.cc:830
41 (0x400041809040) /tests/tst-pin. cpu0 waiting 5.59126e-23 82792 main(int, char**) at /usr/include/c++/14/bits/chrono.h:573
timers: 0.022137000
44 (0x400041823040) >/tests/tst-pin cpu1 terminated 1.76604e-24 3000 ?? at arch/aarch64/entry.S:135
46 (0x400041845040) >/tests/tst-pin cpu0 waiting 8.02762e-20 100013125 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:53
Number of threads: 43
(gdb) p 'sched::cpus'[0]->timers._list._non_empty_buckets
$1 = std::bitset = {[35] = 1, [36] = 1, [38] = 1}
(gdb) p 'sched::cpus'[1]->timers._list._non_empty_buckets
$2 = std::bitset = {[36] = 1}
(gdb) p 'sched::cpus'[1]->timers._last
$3 = osv::clock::uptime time_point = { 17087496162125ns }
(gdb) p 'sched::cpus'[0]->timers._last
$4 = osv::clock::uptime time_point = { 18658549ns }
(gdb) p 'sched::cpus'[2]->timers._last
$5 = osv::clock::uptime time_point = { 17087495597291ns }
(gdb) p 'sched::cpus'[3]->timers._last
$6 = osv::clock::uptime time_point = { 17087495608666ns }
(gdb) p 'sched::cpus'[0]->timers._list._last
$7 = 13595583
(gdb) p 'sched::cpus'[0]->timers._list._next
$8 = 18658549
(gdb) osv thread 41
(gdb) bt
#0 reschedule_from_interrupt () at arch/aarch64/sched.S:72
#1 0x0000000fc0244d9c in sched::cpu::schedule () at core/sched.cc:238
#2 0x0000000fc0246bd8 in sched::thread::wait (this=0x0) at core/sched.cc:1425
#3 sched::thread::do_wait_until<sched::noninterruptible, sched::thread::dummy_lock, sched::thread::sleep_impl(sched::timer&)::<lambda()> > (mtx=<synthetic pointer>..., pred=...)
at include/osv/sched.hh:1250
#4 sched::thread::wait_until<sched::thread::sleep_impl(sched::timer&)::<lambda()> > (pred=...) at include/osv/sched.hh:1261
#5 sched::thread::sleep_impl (t=...) at core/sched.cc:1625
#6 0x0000100000002a74 in sched::thread::sleep<long, std::ratio<1l, 1000l> > (duration=duration@entry=std::chrono::duration = { 1ms })
at /home/wkozaczuk/projects/osv-master/include/osv/sched.hh:1450
#7 0x0000100000001884 in main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/14/bits/chrono.h:573
#8 0x0000000fc0263988 in osv::application::run_main (this=0x600040edea10) at core/app.cc:467
#9 0x0000000fc0263b58 in operator() (__closure=0x0, app=<optimized out>) at core/app.cc:248
#10 _FUN () at core/app.cc:250
#11 0x0000000fc029a484 in operator() (__closure=<optimized out>) at libc/pthread.cc:119
#12 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...)
at /usr/include/c++/14/bits/invoke.h:61
#13 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...)
at /usr/include/c++/14/bits/invoke.h:150
#14 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
#15 0x0000000fc02494c4 in sched::thread::main (this=0x400041809040) at core/sched.cc:1419
#16 sched::thread_main_c (t=0x400041809040) at arch/aarch64/arch-switch.hh:189
#17 0x0000000fc0173788 in thread_main () at arch/aarch64/entry.S:135
(gdb) osv thread 44
(gdb) bt
#0 reschedule_from_interrupt () at arch/aarch64/sched.S:72
#1 0x0000000fc0247e5c in sched::cpu::schedule () at include/osv/sched.hh:1516
#2 sched::thread::complete (this=0x400040784738, this@entry=0x400041823040) at core/sched.cc:1518
#3 0x0000000fc02494cc in sched::thread_main_c (t=0x400041823040) at arch/aarch64/arch-switch.hh:190
#4 0x0000000fc0173788 in thread_main () at arch/aarch64/entry.S:135
(gdb) osv thread 46
(gdb) bt
#0 reschedule_from_interrupt () at arch/aarch64/sched.S:72
#1 0x0000000fc0244d9c in sched::cpu::schedule () at core/sched.cc:238
#2 0x0000000fc0246584 in sched::thread::wait (this=this@entry=0x400041845040) at core/sched.cc:1425
#3 0x0000000fc0226828 in sched::thread::do_wait_until<sched::noninterruptible, sched::thread::dummy_lock, waiter::wait(sched::timer*) const::{lambda()#1}>(sched::thread::dummy_lock&, waiter::wait(sched::timer*) const::{lambda()#1}) (mtx=<synthetic pointer>..., pred=...) at include/osv/sched.hh:1250
#4 sched::thread::wait_until<waiter::wait(sched::timer*) const::{lambda()#1}>(waiter::wait(sched::timer*) const::{lambda()#1}) (pred=...) at include/osv/sched.hh:1261
#5 waiter::wait (this=0x400040783040, tmr=0x400040784738) at include/osv/wait_record.hh:46
#6 condvar::wait (this=0x4000408e4cf0, user_mutex=0x600040770890, tmr=0x400040784738, tmr@entry=0x0) at core/condvar.cc:53
#7 0x00001000000028c0 in condvar::wait (this=<optimized out>, user_mutex=..., tmr=0x0) at /home/wkozaczuk/projects/osv-master/include/osv/condvar.h:226
#8 operator() (__closure=0x600040ed25a0) at /home/wkozaczuk/projects/osv-master/tests/tst-pin.cc:99
#9 std::__invoke_impl<void, main(int, char**)::<lambda()>&> (__f=...) at /usr/include/c++/14/bits/invoke.h:61
#10 std::__invoke_r<void, main(int, char**)::<lambda()>&> (__fn=...) at /usr/include/c++/14/bits/invoke.h:150
#11 std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
#12 0x0000000fc02494c4 in sched::thread::main (this=0x400041845040) at core/sched.cc:1419
#13 sched::thread_main_c (t=0x400041845040) at arch/aarch64/arch-switch.hh:189
#14 0x0000000fc0173788 in thread_main () at arch/aarch64/entry.S:135
From what I could tell, the main thread 41 would get stuck in sleep(1ms)
- line 105, where thread 46 would wait on condvar. So the real issue here is thread 41 getting stuck instead of sleeping only for 1 ms.
On a different ARM machine (in the middle as far as the performance goes), the test would crash instead, after 54k runs, like this:
Booted up in 48.16 ms
Cmdline: /tests/tst-pin.so
Assertion failed: sched::preemptable() (arch/aarch64/mmu.cc: page_fault: 104)
[backtrace]
0x0000000fc00e0530 <__assert_fail+32>
0x0000000fc0170a1c <page_fault+396>
0x0000000fc01702ac <???+-1072233812>
0x0000000fc02492a0 <sched::timer_list::fired()+64>
0x0000000fc0171700 <interrupt_table::invoke_interrupt(unsigned int)+208>
0x0000000fc01718c4 <interrupt+192>
0x0000000fc0170700 <???+-1072232704>
0x0000000fc0248134 <sched::cpu::idle()+52>
0x0000000fc024815c <???+-1071349412>
0x0000000fc0249250 <thread_main_c+32>
0x0000000fc016f784 <???+-1072236668>
(gdb) bt
#0 processor::halt_no_interrupts () at arch/aarch64/processor.hh:56
#1 arch::halt_no_interrupts () at arch/aarch64/arch.hh:64
#2 osv::halt () at arch/aarch64/power.cc:24
#3 0x0000000fc00e0508 in abort (fmt=fmt@entry=0xfc04ad548 "Assertion failed: %s (%s: %s: %d)\n") at runtime.cc:146
#4 0x0000000fc00e0534 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at runtime.cc:154
#5 0x0000000fc0170a20 in page_fault (ef=0x4000409b6c30) at arch/aarch64/mmu.cc:104
#6 <signal handler called>
#7 0x0000000fc024c160 in boost::intrusive::list_node_traits<void*>::set_next (next=0x0, n=0x0) at /usr/include/boost/intrusive/detail/list_node.hpp:66
#8 boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::unlink (this_node=<optimized out>)
at /usr/include/boost/intrusive/circular_list_algorithms.hpp:154
#9 boost::intrusive::list_impl<boost::intrusive::mhtraits<sched::timer_base, boost::intrusive::list_member_hook<>, &sched::timer_base::hook>, unsigned long, true, void>::pop_front_and_dispose<boost::intrusive::detail::null_disposer>(boost::intrusive::detail::null_disposer) (disposer=..., this=<optimized out>) at /usr/include/boost/intrusive/list.hpp:355
#10 boost::intrusive::list_impl<boost::intrusive::mhtraits<sched::timer_base, boost::intrusive::list_member_hook<>, &sched::timer_base::hook>, unsigned long, true, void>::pop_front() (
this=<optimized out>) at /usr/include/boost/intrusive/list.hpp:338
#11 timer_set<sched::timer_base, &sched::timer_base::hook, osv::clock::uptime>::expire (this=0x4000407950c8, now=osv::clock::uptime time_point = { 52220875ns })
at include/osv/timer-set.hh:180
#12 0x0000000fc02492a4 in sched::timer_list::fired (this=0x4000407950c0) at core/sched.cc:1668
#13 0x0000000fc0171704 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/13/bits/std_function.h:591
#14 interrupt_table::invoke_interrupt (this=this@entry=0xfc07020c8 <idt>, id=id@entry=27) at arch/aarch64/exceptions.cc:146
#15 0x0000000fc01718c8 in interrupt (frame=<optimized out>) at arch/aarch64/exceptions.cc:172
#16 <signal handler called>
#17 sched::cpu::do_idle (this=0x400040794040) at /usr/include/boost/intrusive/detail/size_holder.hpp:36
#18 0x00004000409a7040 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) osv info threads
id address name cpu status vruntime total_cpu_time location
1 (0x400040782040) reclaimer cpu0 waiting 4.03251e-24 6416 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:53
2 (0x400040868040) page_pool_l2 cpu0 waiting 2.62303e-21 3791376 memory::page_pool::l2::fill_thread() at core/mempool.cc:1525
3 (0x400040893040) itimer-real cpu? unstarted 0 0 ?? at arch/aarch64/entry.S:135
4 (0x4000408b5040) itimer-virt cpu? unstarted 0 0 ?? at arch/aarch64/entry.S:135
5 (0x4000408f8040) balancer0 cpu0 waiting 6.82812e-23 98875 ?? at arch/aarch64/entry.S:135
6 (0x40004091a040) rcu0 cpu0 waiting 9.70645e-24 15459 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
7 (0x40004093f040) page_pool_l1_0 cpu0 waiting 6.92777e-22 1013250 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
8 (0x400040961040) percpu0 cpu0 waiting 4.58827e-24 7292 workman::call_of_duty() at core/percpu-worker.cc:92
9 (0x400040984040) async_worker0 cpu0 waiting 4.95016e-24 7875 async::async_worker::run() at core/async.cc:177
10 (0x4000409a7040) idle0 cpu0 running inf 4190087 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/13/bits/std_function.h:290
11 (0x4000409c9040) idle1 cpu1 running inf 26929586 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/13/bits/std_function.h:290
12 (0x400040a12040) idle2 cpu2 running inf 16272078 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/13/bits/std_function.h:290
13 (0x400040a5b040) idle3 cpu3 running inf 18641292 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/13/bits/std_function.h:290
14 (0x400040aa4040) rcu3 cpu3 waiting 1.20846e-23 11666 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
15 (0x400040ac9040) >init cpu0 waiting 1.2188e-23 9334 std::_Function_handler<void(), sched::thread::reaper::reaper()::<lambda()> >::_M_invoke(const std::_Any_data &) at /usr/include/c++/13/bits/std_function.h:290
16 (0x400040ada040) page_pool_l1_3 cpu3 waiting 1.22397e-23 11958 memory::page_pool::l1::fill_thread() at core/mempool.cc:1407
17 (0x400040b0e040) percpu3 cpu3 waiting 1.04819e-23 8750 workman::call_of_duty() at core/percpu-worker.cc:92
18 (0x400040b35040) async_worker3 cpu3 waiting 1.0469e-23 8750 async::async_worker::run() at core/async.cc:177
19 (0x400040b58040) thread taskq cpu0 waiting 1.64273e-23 16042 synch_port::_msleep(void*, mtx*, int, char const*, int) at bsd/porting/synch.cc:102
20 (0x400040b6e040) callout cpu0 waiting 5.55891e-23 25375 _callout_thread() at bsd/porting/callout.cc:138
21 (0x400040b92040) rcu1 cpu1 waiting 1.50314e-23 16917 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
22 (0x400040bbd040) page_pool_l1_1 cpu1 waiting 1.25628e-23 11959 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
23 (0x400040be3040) percpu1 cpu1 waiting 9.97787e-24 7875 workman::call_of_duty() at core/percpu-worker.cc:92
24 (0x400040c09040) async_worker1 cpu1 waiting 1.01588e-23 8167 async::async_worker::run() at core/async.cc:177
25 (0x400040c40040) rcu2 cpu2 waiting 1.27696e-23 12833 osv::rcu::cpu_quiescent_state_thread::do_work() at core/rcu.cc:184
26 (0x400040c68040) page_pool_l1_2 cpu2 waiting 5.32834e-22 921376 memory::page_pool::l1::fill_thread() at core/mempool.cc:1410
27 (0x400040c8e040) percpu2 cpu2 waiting 1.08438e-23 8750 workman::call_of_duty() at core/percpu-worker.cc:92
28 (0x4000412b3040) async_worker2 cpu2 waiting 1.06758e-23 8458 async::async_worker::run() at core/async.cc:177
29 (0x400041315040) netisr cpu0 waiting 2.43243e-23 15750 netisr_osv_thread_wrapper(void (*)(void*), void*) at bsd/sys/net/netisr1.cc:26
30 (0x4000413c6040) >init cpu0 waiting 1.00402e-20 14702916 sched::thread::join() at core/sched.cc:1565
joining on <optimized out>
31 (0x4000413e2040) virtio-net-rx cpu0 waiting 2.46995e-28 25374 virtio::virtio_driver::wait_for_queue(virtio::vring*, bool (virtio::vring::*)() const) at drivers/virtio.cc:159
32 (0x400041415040) virtio-tx-0 cpu0 waiting 1.33768e-28 9333 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
33 (0x400041448040) virtio-tx-1 cpu1 waiting 1.50057e-28 24500 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
34 (0x40004147b040) virtio-tx-2 cpu2 waiting 1.10765e-28 18084 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<osv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
35 (0x4000414ae040) virtio-tx-3 cpu3 waiting 1.14384e-28 18666 osv::xmitter<virtio::net::txq, 4096u, std::function<bool ()>, boost::iterators::function_output_iterator<o--Type <RET> for more, q to quit, c to continue without paging--c
sv::xmitter_functor<virtio::net::txq> > >::poll_until() at include/osv/percpu_xmit.hh:401
36 (0x4000414e0040) virtio-blk cpu0 waiting 2.71548e-23 28582 virtio::virtio_driver::wait_for_queue(virtio::vring*, bool (virtio::vring::*)() const) at drivers/virtio.cc:159
37 (0x400041502040) virtio-rng cpu0 waiting 3.81666e-23 24208 condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:53
38 (0x400041527040) pl011-input cpu0 waiting 1.70994e-23 14292 console::LineDiscipline::read_poll(console::console_driver*) at drivers/line-discipline.cc:86
39 (0x400041572040) rand_harvestq cpu0 waiting 1.50573e-23 11375 synch_port::_msleep(void*, mtx*, int, char const*, int) at bsd/porting/synch.cc:102
40 (0x4000415a2040) dhcp cpu0 waiting 1.72739e-22 234208 dhcp::dhcp_worker::dhcp_worker_fn() at core/dhcp.cc:830
41 (0x400041875040) /tests/tst-pin. cpu0 waking 7.98084e-23 104709 main(int, char**) at /usr/include/c++/13/bits/stl_vector.h:1126
43 (0x40004188f040) >/tests/tst-pin cpu1 terminated -4.11962e-22 2042 ?? at arch/aarch64/entry.S:135
Number of threads: 42
It seems like a corrupt boost intrusive list under one of the buckets in the osv::timer_set::expire()
- line 180.
In another instance on the same machine, it would crash after 15k runs, a little bit differently:
Cmdline: /tests/tst-pin.so
Assertion failed: !node_algorithms::inited(this->priv_value_traits().to_node_ptr(value)) (/usr/include/boost/intrusive/list.hpp: iterator_to: 1310)
[backtrace]
0x0000000fc00e0530 <__assert_fail+32>
0x0000000fc02442f4 <???+-1071365388>
0x0000000fc0247394 <sched::timer_base::cancel()+336>
0x0000000fc024c918 <sched::cpu::schedule_next_thread(bool, std::chrono::duration<long, std::ratio<1l, 1000000000l> >)+728>
0x0000000fc0220474 <???+-1071512460>
0x0000000fc0247bb4 <sched::thread::complete()+340>
0x0000000fc0249258 <thread_main_c+40>
0x0000000fc016f784 <???+-1072236668>
I am unsure if these crashes are related or different, but given that they happen when running tst-pin.cc
I put them under the same issue.
On my 3rd arm machine (the slowest one), the test would never hang or crash after 49K and ~90K runs.
I have not tried to run this test enough times on x86_64 machine, so who knows if it is arm specific.