[DP-108] Investigate why spawn is slow

[Imported from JIRA. Reported by Niklas Hambuechen @mail@nh2.me)  as [DP-108](https://cloud-haskell.atlassian.net/browse/DP-108) on 2015-03-15 02:48:31]
Copying my posts from the #haskell-distributed IRC channel:

`spawn` seems to be very slow for me, even though I'm on localhost. Doing it in a loop gets me to almost 50 ms per spawn, why would it be so high? I can't use `spawnAsync` in my case, but why would a `spawn` on localhost take this long in the first place? My ethernet latency is 0.5ms and localhost latency is 0.1ms, so that can't be it. CPU is low too.

I have a suspicion: using `strace -f -c -w` on the node onto which I `spawn` the processes (a slave using simplelocalnet), I see 179596 calls to the `select` syscall. That doesn't seem right given that I only do 100 spawns and nothing else. Might this be that the master is sending a lot of small numbers, which it `recv`s one after the other? I think this is the only way to trigger so many `select`s, and I've seen that `recvInt32` does exactly such a thing (recv'ing 4 bytes at a time), and it does appear in my profiling output.

Further, the 50ms that each spawn takes are suspiciously close to the 40ms TCP ACK delay on Linux (I'm on Linux), as mentioned here: http://stackoverflow.com/a/2253620/263061.

I have found something different though that fixes the problem: setting +RTS -V0 on the slave reduces the time for each `spawn` to 3ms. How can it be that this has such a huge effect?

I can get the same good results with +RTS -C0.001. But why? This sets the context switch interval; if that has such a positive effect, doesn't that mean that there are other Haskell threads around that actually _run_ and thus stop my recv/recv from immediately being scheduled again? Assume there's only one recv that I'm running; when it gets a context switch interrupt, interrupting the recv, it should see that there are no other Haskell threads to be run, and immediately go back into my recv again, I can't see a reason why it should do anything else that's not my recv ...

Also, setting +RTS -C to something very high does not make it slower than 50ms per spawn, e.g. setting +RTS -C1 does not make it take 1 second per spawn, it's still 50ms.

Setting +RTS -N2/-N3/-N4 helps, too: I get down to 6 ms, compared to the 50 ms for -N1.

nh2: may it be that there are actually 2 recvs going on, but only one can be active at the same time if I'm running on -N1, so the system toggles between them at the interval of the context switch interval -C, which defaults to 20ms, and two of these switches make the ~50ms that I'm seeing?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DP-108] Investigate why spawn is slow #206

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DP-108] Investigate why spawn is slow #206

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions