RPC/Serialization Overhead/Delay #1850
Replies: 2 comments 8 replies
-
|
Hey @L3tum 👋 |
Beta Was this translation helpful? Give feedback.
-
|
Hey @rustatian ! Here's a small (2MB zipped) repro, I stripped it down from our existing project so the configuration is (mostly) identical. There also aren't any differences in versions employed so it's as 1:1 as I can give you. There's a Dockerfile with two targets included as well as a docker-compose.yaml which also starts webgrind. The whole thing should be runnable locally as well, though. If you run it either with the Dockerfile->Dev target or locally you'll need to install the composer dependencies manually. A simple I've played around a bit with non-blocking IO. It's a sore topic for PHP obviously and I didn't rip everything out and use a framework like AMPHP for it, but I did implement a I tested the ideal number of sockets each would create and noticed that 5-10 Sockets is apparently the sweet spot (for that test anyways). NiceMultiRPC pre-connects the sockets and can scale to more sockets than that (I usually used 50). I guess the socket reuse trumps over the delay of having to connect the socket after about 10 sockets. Anyways, with these two RPC implementations I've managed to cut the test time down from 1ms to 0.07ms :) FYI I've also added a |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Heya, I'm mainly looking for other work to see if anybody actually measured this in a good capacity because I'm doubting my own numbers.
I've been on the hunt for some performance issues and noticed that our Symfony
kernel.terminateEventListener takes ~2ms in prod (so with opcache, JIT, cache warmer and what not). However we only collect Metrics there and don't do anything else. I even checked and there aren't any other listeners or hidden things executed.Curious to see why it takes so long I thought I'd "profile" (I use that term very loosely here) the
Metricsclass, since it sends off some RPC calls and does some serialization.I've made a basic
MetricsProfilerthat I inject with aCompilerPass. TheMetricsProfileris very simple just the following, for each method:The resulting log entries for a single request are here (click on this)
- Matched route "api" - Add took 0.42 ms - Add took 0.406 ms - Add took 0.3086 ms - INFO http http log {"status": 200, "method": "POST", "URI": "/", "remote_address": "192.168.16.1:51092", "read_bytes": 646, "write_bytes": 216, "start": "2024-02-01T17:26:14+0000", "elapsed": "32.5789ms"} - === The following is basically all kernel.terminate EventListener stuff === - Add took 0.9934 ms - Observe took 0.2744 ms - Add took 0.5614 ms - Add took 0.516 ms - Add took 0.7812 ms - Add took 0.3773 ms - Add took 0.6843 ms - Add took 0.5864 ms - Add took 0.3434 m - Add took 0.7013 ms - Add took 0.6131 ms - Add took 0.2735 ms - Add took 0.3319 ms - Add took 0.3253 ms - Add took 0.3438 ms - Add took 0.4394 ms - Add took 0.5912 ms - Add took 0.4409 ms - Add took 0.5453 ms - Add took 0.5968 ms - Observe took 0.4165 msObviously added together this is a bit more than 2ms, but it's also collected locally (on a frankly anemic laptop) without JIT (but with OPCache and
APP_ENV=prodandpool.debug=falseand the works).Either way this is entirely too long IMO and why I think something must be wrong on my end. But it's also the only way I can explain our issues with the
kernel.terminateListener, because it does little else but this.I've also run
xdebug.mode=profilethrough this and while I can't share the cachegrind file, here's the relevant screenshots from QCachegrind. If I understand its interface correctly, each "time unit" is 10ns here, so if the call took 44000 "units" it'd be around 440000ns or 440microseconds, or 0,4ms, which supports my measurement above.Click me!
RPC->call has these Callees

RPC->decodeResponse has these Callees

Stepping into the Protobuf callstack confirms its using the extension, no pure-php bullshit.
The worst seems to be the KV Cache though

I'm not sure why that one is so slow in particular.
I've tried to look through the code but haven't found anything obviously amiss. There's some protobuf stuff I don't really know, but I do have the protobuf extension installed and loaded
The gRPC extension is currently misbehaving so it isn't loaded, but I also haven't seen any reference to Roadrunner needing it. Sockets is installed as well though
I really want to use the Metrics plugin but this basically ruins our performance. One idea if the RPC overhead is the issue would be to batch-send the metrics, but I'm not sure how easy or quick that could be done. It could also be that prometheus-go is just particularly slow, but that would still make it a non-starter to use the plugin.
Beta Was this translation helpful? Give feedback.
All reactions