Description
We use the distrib scheduler for our numa configuration, but default to nemesis everywhere else because it has slightly better performance. We used to default to sherwood for numa, but it had significant performance issues, so the distrib scheduler was created/tuned for us. While it's performance is far better than sherwood, we'd like to see if we can close the remaining performance gap between nemesis and default to distrib everywhere.
A few weeks ago I ran distrib against nemesis for our nightly performance suite. You can see the results here. More recent results will be skewed since I added the hybrid spin/condwait scheme for nemesis in our copy of qthreads, so it probably makes sense to tune distrib performance after doing they hybrid spin/condwait work.
I'd imagine you'll want/need more info from us and that this will be more iterative than some of the others feature requests but I wanted to get an issue up as a placeholder.
Also note that we currently disable work stealing for distrib and it'd be nice to tune the work stealing so that we could enable it by default without noticeably hurting performance for well balanced workloads.
This is a relatively high priority item for us, but nemesis is still serving us well for most of our configurations so it's not blocking us on anything yet.