We've been using dotslash in production at Canva for about a year now and it's working great! But we're starting to hit one small issue with it -- cache growth.
Given the nature of the binaries we're using dotslash for -- binaries that are stored in the repo -- once we "update" a cached binary the binary isn't ever run again -- because people very rarely go back to old commits.
In other words the cache is mostly full of "never going to be used again" binaries -- which is bad.
One obvious solution is to just periodically run dotslash -- clean on every device. This has the obvious downside of also cleaning used binaries; forcing the user to re-download binaries they need. On devboxes this isn't that bad because high-speed networks, etc; but on laptops this could mean minutes of wasted time redownloading binaries that shouldn't have been deleted!
A better solution would be to be able to specify a last-accessed time to delete by.
It's hard to implement this in userland because you really need to work in a "cache structure aware" fashion. You can't do something naive like find $(dotslash -- cache-dir) -atime +28 -delete because that could partially a cached binary. EG if you cache a tarball'd binary that has two executables, but only one of them is used then we'd delete the unused binary and break the dotslash cache.
You can go more advanced with the delete script but that involves building a script that's intimately aware of dotslash's implementation details. It's doable (I have a POC shell script which might work below) -- but without knowing all of dotslash's implementation details it might instead just break dotslash!
So I'd love to request a feature like dotslash -- clean 28 which would delete all cached binaries whose files all have an atime older than 28 days. Such a command could easily be scheduled to run regularly to trim the cache and prevent unbounded growth.
POC delete script
#!/bin/bash
set -euo pipefail
dotslash_cache_dir=$(dotslash -- cache-dir)
cache_dirs=$(
find "${dotslash_cache_dir}" \
-depth -mindepth 2 -maxdepth 2 -type d \
-not -path "${dotslash_cache_dir}/locks" \
-not -path "${dotslash_cache_dir}/locks/**" \
-print
)
MINIMUM_AGE_DAYS=28
to_delete_dirs=()
for cache_dir in ${cache_dirs}; do
recently_accessed_files=$(find "${cache_dir}" -type f -atime -${MINIMUM_AGE_DAYS} -print)
if [[ -z "${recently_accessed_files}" ]]; then
to_delete_dirs+=("${cache_dir}")
fi
done
# du -csh "${to_delete_dirs[@]}"
rm -rf "${to_delete_dirs[@]}"
We've been using dotslash in production at Canva for about a year now and it's working great! But we're starting to hit one small issue with it -- cache growth.
Given the nature of the binaries we're using dotslash for -- binaries that are stored in the repo -- once we "update" a cached binary the binary isn't ever run again -- because people very rarely go back to old commits.
In other words the cache is mostly full of "never going to be used again" binaries -- which is bad.
One obvious solution is to just periodically run
dotslash -- cleanon every device. This has the obvious downside of also cleaning used binaries; forcing the user to re-download binaries they need. On devboxes this isn't that bad because high-speed networks, etc; but on laptops this could mean minutes of wasted time redownloading binaries that shouldn't have been deleted!A better solution would be to be able to specify a last-accessed time to delete by.
It's hard to implement this in userland because you really need to work in a "cache structure aware" fashion. You can't do something naive like
find $(dotslash -- cache-dir) -atime +28 -deletebecause that could partially a cached binary. EG if you cache a tarball'd binary that has two executables, but only one of them is used then we'd delete the unused binary and break the dotslash cache.You can go more advanced with the delete script but that involves building a script that's intimately aware of dotslash's implementation details. It's doable (I have a POC shell script which might work below) -- but without knowing all of dotslash's implementation details it might instead just break dotslash!
So I'd love to request a feature like
dotslash -- clean 28which would delete all cached binaries whose files all have anatimeolder than 28 days. Such a command could easily be scheduled to run regularly to trim the cache and prevent unbounded growth.POC delete script