Skip to content

Action should have automatic cache busting mechanism, or more docs about cache busting #32

Open
@sureshjoshi

Description

@sureshjoshi

In the example projects, we have this handy piece of info:

# Note that named_caches and lmdb_store falls back to partial restore keys which
# may give a useful partial result that will save time over completely clean state,
# but will cause the cache entry to grow without bound over time.
# See https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci for tips on how to periodically clean it up.
# Alternatively you change gha-cache-key to ignore old caches.

And then we have the suggestion to use this action, and instructions about manual usage and a cache nuke function: https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci#directories-to-cache

Problem is, as the partial restore key is so lenient - and the cache key is strict enough, that using the nuke function from the docs won't work most of the time.

To reduce the monotonically increasing cache usage, a user will need to explicitly and manually change the cache key, or run a nuke function in the actions that will ALSO have an impact on cache saving (e.g. lockfiles change, pants.toml change, etc).


I used https://github.com/sureshjoshi/pants-plugins as a cache testing example:

cache-not-busting

With the second-last entry, in spite of removing almost all dependencies in that commit, we’re still pulling 220MB of cache - and that never gets cleared out. We have to explicitly bust the cache with a new cache key, and run everything from scratch to get the benefit.


Here is another example where I nuke the cache, but since the cache key doesn't change - this gives the "Cache hit occurred ... not saving cache"

image


I had the idea to try to use the gh cli to prematurely delete/expire caches, but since this would happen after the cache is downloaded - it would require special treatment.

I think the most reasonable, practical answer is to add some more documentation to this Action (and probably pantsbuild.org), as well as having some sort of automatic nuke-check on cache saving.

This might require using the restore/save cache actions, if there is no hook on cache itself to know if the saving cache key will be invalidated easily.

Essentially:

  • Run the action as normal
  • During post-action hooks, ask if it's a new cache key? (e.g. was pants.toml or named-caches-hash modified)
    • If not, do nothing
    • If so, run nuke_if_too_big $named_cache_dir $named_cache_limit_mb

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions