Description
I am running TailBench http://tailbench.csail.mit.edu/ on a 12-core simulated system.
When running moses, single thread and 2-thread are good to simulation completion. When running >= 4 worker threads, after an assertion in scheduler.h and an error of “ACCESS_INVALID_ADDRESS”, a deadlock happened. Other TailBench apps have similar problems when the number of threads is up to 2 to 4. It seems there might be some hidden race conditions when simulating multi-threaded syscall intensive apps instead of traditional benchmarks such as SPLASH-2/PARSEC.
I confirmed that TailBench apps’ implementation is thread-safe with pthread on real servers, which can scale up to 20+ threads. So it’s not due to the app implementation.
It’s also not resulted from improper configuration settings #97, thread overcommit in the simulated system #44, too short fake leave time for an overcommitted host machine #15, or unmatched memory timing configuration #25, because I made the corresponding tests and scale up to 64 simulated cores but the same problem exists.
I also configured different virtual memory configurations in Linux kernel and it seems the error of “ACCESS_INVALID_ADDRESS” has nothing to do with address space exceptions.
Disabling sim.deadlockDetection, suggested by #172, also does not work as expected. The default 130 seconds of deadlock detection is fairly enough for our 4 threads, unlike the case in with 1024 worker threads (a lot of fake leaves) #26.
Did anyone try TailBench in Zsim before and/or encounter similar problems?
Jason