Skip to content

Deadlock in scheduler.h #231

Closed
Closed
@jasonzzzzzzz

Description

@jasonzzzzzzz

I am running TailBench http://tailbench.csail.mit.edu/ on a 12-core simulated system.

When running moses, single thread and 2-thread are good to simulation completion. When running >= 4 worker threads, after an assertion in scheduler.h and an error of “ACCESS_INVALID_ADDRESS”, a deadlock happened. Other TailBench apps have similar problems when the number of threads is up to 2 to 4. It seems there might be some hidden race conditions when simulating multi-threaded syscall intensive apps instead of traditional benchmarks such as SPLASH-2/PARSEC.
I confirmed that TailBench apps’ implementation is thread-safe with pthread on real servers, which can scale up to 20+ threads. So it’s not due to the app implementation.
It’s also not resulted from improper configuration settings #97, thread overcommit in the simulated system #44, too short fake leave time for an overcommitted host machine #15, or unmatched memory timing configuration #25, because I made the corresponding tests and scale up to 64 simulated cores but the same problem exists.
I also configured different virtual memory configurations in Linux kernel and it seems the error of “ACCESS_INVALID_ADDRESS” has nothing to do with address space exceptions.
Disabling sim.deadlockDetection, suggested by #172, also does not work as expected. The default 130 seconds of deadlock detection is fairly enough for our 4 threads, unlike the case in with 1024 worker threads (a lot of fake leaves) #26.

Did anyone try TailBench in Zsim before and/or encounter similar problems?

Jason

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions