Skip to content

Rework the filecaching #198

@fredrikekre

Description

@fredrikekre

Some initial exploration of the data from #196 suggest that registries have a way to "stick" in the cache. Some datapoints (aggregation over all deployed servers):

Resource type Total size (GB) Total number of files Average size (MB)
registries 585 70500 8.3
packages 89 73800 1.2
artifacts 1000 26400 37.9

In particular it seems wasteful to store so many registries in particular since they are quite large (7x the size of the average package). Registries are also a bit different in that all(?) requests want the latest one, so caching something like the 10 latest should be more than sufficient. This would free up space for packages and artifacts instead, which could potentially reduce network usage between package servers and storage servers.

The major reason causing this (I think) is that we now serve existing files from nginx directly but this cache hit is not recorded on the julia resulting in all resources have num_accessed = 1.

Removing the registries from the equation as suggested above should help quite a bit, but might also be worth trying some way to communicate the cache hits from nginx to julia.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions