Description
We are adding metrics on our server to detect and tackle performance on some of our scripts, and we where having a strange performance lag on one part of our code where the code is only doing something like :
setTick(() => {
const data = GlobalState.data;
// do some stuff
GlobalState.data = newData;
await wait(1000);
});
Where there is a "low" player count (between 1 to 80 player), this code run normally under 5ms, however when there is more player it can go up to more than 100ms (even more than 250ms in one or two tick),
See the attached graph, (first one is the tick duration, second one is our player count)
At first we were believing there was some problems when setting the value to the callback since it will lock the data and need to update other players.
So we add a metrics around this call to mesure it
As you can see, the set in this call was always under 5 ms so this was not it, after looking more at the code and others events, we where having another tick that have the same problem, however this tick was only getting a value from an Entity state bag (and do nothing else except calculating some data and keeping it in memory)
As you can see both tick have (at some details) the exact same time in duration, after adding some more metrics around the GET_STATE_BAG_VALUE
native we see that it was this native that was inducing the lag :
When looking at the code in fivem, there is a Mutex when getting a state bag see https://github.com/citizenfx/fivem/blob/master/code/components/citizen-resources-core/src/StateBagComponent.cpp#L473
Since this problemes appear on the first time that we actually need a state bag and not after, we strongly believe that there is a lot of write pressure on this lock so it's is slow to release (maybe from another thread ?) when having a large player count.
At one point we were thinking that, since we have more people we may have more entity created (which create a state bag every time if i'm not wrong) but this was not the case as you can see here :
There seems to be no correlation between both events
Looking more at the code we also see there is a lot of time this mutex is write locked. Even when not updating the map, like here https://github.com/citizenfx/fivem/blob/master/code/components/citizen-resources-core/src/StateBagComponent.cpp#L484 (since it update only the targets i believe we can safely use another mutex here to relieve some pressure)
But i look at every place this mutex is lock and i don't see why there is this much pressure.
Our server is on 5848 build but after looking at the diff (i know there was some update on state bag some months ago), i don't think this will be resolved if we update.
Also all our graph are done on server side, i do believe there is the same problem in client but since it handles less state bag it should not appear so often.
Do you have more insights on this ?