This issue is going to touch on a few disparate topics which to me are interlinked enough to warrant being included in one issue.
Usage of redis
The caching primitive the service is using is IDistributedCache backed by redis:
|
builder.Services.AddStackExchangeRedisCache(options => |
|
{ |
|
options.Configuration = AppSettings.RedisHost; |
|
}); |
While we do have a production redis instance, there are significant reservations about using it for the purpose of this service. @peppy and @ThePooN can probably elaborate on that topic, but the gist of it from my understanding is that redis is not very reliable and has in fact had several instances of falling over, primarily due to running out of storage.
I understand @peppy has some ideas for alternatives to redis but I'll leave it to him to elaborate on that.
The cache period
Every replay that the service is caching, is cached for a full day:
|
new DistributedCacheEntryOptions |
|
{ |
|
AbsoluteExpirationRelativeToNow = TimeSpan.FromDays(1), |
|
}); |
which seems to be way over the top IMO but I guess it makes some degree of sense given that consumers of the “firehose” score API want to be using this. Will require cache hit rate monitoring. Maybe even a good idea to extract the cache duration to an envvar or something other that can be adjusted without having to apply source changes.
Back-of-napkin estimate of storage required for a day of replays for lazer alone to be in memory would be (if I’m not misreading ddog metrics):
266 800 replays/24hr * 50 KB/replay (semi-educated guess) to GB is 13,34 GB
so my bets are that 1 day is going to be too much, especially if this is to encompass stable as well (extrapolating from user numbers, you could be looking at about 5x as much storage).
Replays only enter cache via the upload operation
The download operation will never put anything in cache itself, it will only query it. If the cache is missed, any replays fetched from S3 will not be stored into the cache, not even for a short time. The only thing that puts a replay in the cache is uploading the replay.
I could see this making sense because I don’t think very many people download replays, but maybe someone with cf analytics access is able to disprove this.
This issue is going to touch on a few disparate topics which to me are interlinked enough to warrant being included in one issue.
Usage of redis
The caching primitive the service is using is
IDistributedCachebacked by redis:osu-server-replay-store/osu.Server.ReplayStore/Program.cs
Lines 21 to 24 in f461cdf
While we do have a production redis instance, there are significant reservations about using it for the purpose of this service. @peppy and @ThePooN can probably elaborate on that topic, but the gist of it from my understanding is that redis is not very reliable and has in fact had several instances of falling over, primarily due to running out of storage.
I understand @peppy has some ideas for alternatives to redis but I'll leave it to him to elaborate on that.
The cache period
Every replay that the service is caching, is cached for a full day:
osu-server-replay-store/osu.Server.ReplayStore/ReplayStoreController.cs
Lines 59 to 62 in f461cdf
which seems to be way over the top IMO but I guess it makes some degree of sense given that consumers of the “firehose” score API want to be using this. Will require cache hit rate monitoring. Maybe even a good idea to extract the cache duration to an envvar or something other that can be adjusted without having to apply source changes.
Back-of-napkin estimate of storage required for a day of replays for lazer alone to be in memory would be (if I’m not misreading ddog metrics):
so my bets are that 1 day is going to be too much, especially if this is to encompass stable as well (extrapolating from user numbers, you could be looking at about 5x as much storage).
Replays only enter cache via the upload operation
The download operation will never put anything in cache itself, it will only query it. If the cache is missed, any replays fetched from S3 will not be stored into the cache, not even for a short time. The only thing that puts a replay in the cache is uploading the replay.
I could see this making sense because I don’t think very many people download replays, but maybe someone with cf analytics access is able to disprove this.