Skip to content

Smart cache cleaning command binaries that have been unused for >X days #114

@bradzacher

Description

@bradzacher

We've been using dotslash in production at Canva for about a year now and it's working great! But we're starting to hit one small issue with it -- cache growth.

Given the nature of the binaries we're using dotslash for -- binaries that are stored in the repo -- once we "update" a cached binary the binary isn't ever run again -- because people very rarely go back to old commits.

In other words the cache is mostly full of "never going to be used again" binaries -- which is bad.

One obvious solution is to just periodically run dotslash -- clean on every device. This has the obvious downside of also cleaning used binaries; forcing the user to re-download binaries they need. On devboxes this isn't that bad because high-speed networks, etc; but on laptops this could mean minutes of wasted time redownloading binaries that shouldn't have been deleted!

A better solution would be to be able to specify a last-accessed time to delete by.

It's hard to implement this in userland because you really need to work in a "cache structure aware" fashion. You can't do something naive like find $(dotslash -- cache-dir) -atime +28 -delete because that could partially a cached binary. EG if you cache a tarball'd binary that has two executables, but only one of them is used then we'd delete the unused binary and break the dotslash cache.

You can go more advanced with the delete script but that involves building a script that's intimately aware of dotslash's implementation details. It's doable (I have a POC shell script which might work below) -- but without knowing all of dotslash's implementation details it might instead just break dotslash!

So I'd love to request a feature like dotslash -- clean 28 which would delete all cached binaries whose files all have an atime older than 28 days. Such a command could easily be scheduled to run regularly to trim the cache and prevent unbounded growth.

POC delete script
#!/bin/bash

set -euo pipefail

dotslash_cache_dir=$(dotslash -- cache-dir)

cache_dirs=$(
  find "${dotslash_cache_dir}" \
    -depth -mindepth 2 -maxdepth 2 -type d \
    -not -path "${dotslash_cache_dir}/locks" \
    -not -path "${dotslash_cache_dir}/locks/**" \
    -print
)

MINIMUM_AGE_DAYS=28

to_delete_dirs=()

for cache_dir in ${cache_dirs}; do
  recently_accessed_files=$(find "${cache_dir}" -type f -atime -${MINIMUM_AGE_DAYS} -print)
  if [[ -z "${recently_accessed_files}" ]]; then
    to_delete_dirs+=("${cache_dir}")
  fi
done

# du -csh "${to_delete_dirs[@]}"
rm -rf "${to_delete_dirs[@]}"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions