Description
[Imported from JIRA. Reported by Niklas Hambuechen @[email protected]) as DP-108 on 2015-03-15 02:48:31]
Copying my posts from the #haskell-distributed IRC channel:
spawn
seems to be very slow for me, even though I'm on localhost. Doing it in a loop gets me to almost 50 ms per spawn, why would it be so high? I can't use spawnAsync
in my case, but why would a spawn
on localhost take this long in the first place? My ethernet latency is 0.5ms and localhost latency is 0.1ms, so that can't be it. CPU is low too.
I have a suspicion: using strace -f -c -w
on the node onto which I spawn
the processes (a slave using simplelocalnet), I see 179596 calls to the select
syscall. That doesn't seem right given that I only do 100 spawns and nothing else. Might this be that the master is sending a lot of small numbers, which it recv
s one after the other? I think this is the only way to trigger so many select
s, and I've seen that recvInt32
does exactly such a thing (recv'ing 4 bytes at a time), and it does appear in my profiling output.
Further, the 50ms that each spawn takes are suspiciously close to the 40ms TCP ACK delay on Linux (I'm on Linux), as mentioned here: http://stackoverflow.com/a/2253620/263061.
I have found something different though that fixes the problem: setting +RTS -V0 on the slave reduces the time for each spawn
to 3ms. How can it be that this has such a huge effect?
I can get the same good results with +RTS -C0.001. But why? This sets the context switch interval; if that has such a positive effect, doesn't that mean that there are other Haskell threads around that actually run and thus stop my recv/recv from immediately being scheduled again? Assume there's only one recv that I'm running; when it gets a context switch interrupt, interrupting the recv, it should see that there are no other Haskell threads to be run, and immediately go back into my recv again, I can't see a reason why it should do anything else that's not my recv ...
Also, setting +RTS -C to something very high does not make it slower than 50ms per spawn, e.g. setting +RTS -C1 does not make it take 1 second per spawn, it's still 50ms.
Setting +RTS -N2/-N3/-N4 helps, too: I get down to 6 ms, compared to the 50 ms for -N1.
nh2: may it be that there are actually 2 recvs going on, but only one can be active at the same time if I'm running on -N1, so the system toggles between them at the interval of the context switch interval -C, which defaults to 20ms, and two of these switches make the ~50ms that I'm seeing?