Skip to content

[GCS] Feature req: Use CustomTime attribute to record when cached entry was last hit (for eviction/cleanup rules) #2318

Open
@whisperity

Description

@whisperity

Hey!

We are using a Google Cloud Storage Bucket through sccache to cache across CI compilations of LLVM. Unfortunately, it looks like none of the cloud provider implementations allow for keeping the size of the cache in check — it is only possible for local-disk caching (which we can not do as the CI machines are ephemeral VMs).

In order to keep the costs of this shared cache reasonable, we implemented an Age lifecycle rule in GCS: if a cached entry is too old, GCS will deal with deleting it.
According to GCS, an Age rule is matched as:

Age is counted from when an object was uploaded to the current bucket.

This presents a problem with "death-waves": Every $n$ days after an initial CI run had populated the cache with about $6,000$ translation units, they all get deleted, and the subsequent CI run will essentially have to do an almost full rebuild. The only cache hits at $\ge n$ days will be the more recently modified content — which is often the smaller part of the files in the project.

It would be great if sccache with a cloud bucket could behave in the same LRU way as ccache locally does. Doing a full LRU is problematic without handling size limits, but it seems there is another lifecycle property that can be used, at least in the case of GCS: CustomTime.

LRU-like behaviour could be simulated by setting the Days since custom time lifecycle rule (instead of Age), making sure only files that had not been cache hit for $n$ days are evicted from the bucket.
However, sccache does not populate this field at all:

Details view of a file in an sccache bucket, showing that Custom time is empty.

It seems from the documentation that this is a simple timestamp field that could be populated through the API, so all we need is a simple additional HTTP request that tells Google to update this field to "current time" once the caching logic hits a file successfully.


It seems like similar metadata options are available for other cloud providers, such as S3 or Azure but I have no idea or experience whether these could be used for controlling the cache's lifecycle as well as it can be for GCS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions