Some initial exploration of the data from #196 suggest that registries have a way to "stick" in the cache. Some datapoints (aggregation over all deployed servers):
| Resource type |
Total size (GB) |
Total number of files |
Average size (MB) |
| registries |
585 |
70500 |
8.3 |
| packages |
89 |
73800 |
1.2 |
| artifacts |
1000 |
26400 |
37.9 |
In particular it seems wasteful to store so many registries in particular since they are quite large (7x the size of the average package). Registries are also a bit different in that all(?) requests want the latest one, so caching something like the 10 latest should be more than sufficient. This would free up space for packages and artifacts instead, which could potentially reduce network usage between package servers and storage servers.
The major reason causing this (I think) is that we now serve existing files from nginx directly but this cache hit is not recorded on the julia resulting in all resources have num_accessed = 1.
Removing the registries from the equation as suggested above should help quite a bit, but might also be worth trying some way to communicate the cache hits from nginx to julia.
Some initial exploration of the data from #196 suggest that registries have a way to "stick" in the cache. Some datapoints (aggregation over all deployed servers):
In particular it seems wasteful to store so many registries in particular since they are quite large (7x the size of the average package). Registries are also a bit different in that all(?) requests want the latest one, so caching something like the 10 latest should be more than sufficient. This would free up space for packages and artifacts instead, which could potentially reduce network usage between package servers and storage servers.
The major reason causing this (I think) is that we now serve existing files from nginx directly but this cache hit is not recorded on the julia resulting in all resources have
num_accessed = 1.Removing the registries from the equation as suggested above should help quite a bit, but might also be worth trying some way to communicate the cache hits from nginx to julia.