ci: bound seeded Go cache size and speed up disk cleanup#38048
Conversation
The cache-seeder saved its caches with a restore-keys prefix fallback, so every go.sum change restored the previous cache and re-saved the union. Old module versions and stale build objects accumulated, growing the cache from ~3GB to ~7GB and exhausting runner disk (No space left on device). Drop restore-keys from the seeder save branches so each go.sum seeds a clean, bounded cache; PR runs keep restore-keys for warm-start fallback. Also delete the unused preinstalled toolchains in parallel and log free space before and after, to halve the cleanup time and make headroom visible. Refs: go-gitea#37974 Assisted-by: Claude:Opus-4.8
|
Current |
Temporary experiment: df shows ~89G free on / before any cleanup, so disable the toolchain deletion and run the pgsql shards on this action-only change to confirm db-tests still have ample disk headroom. Adds an end-of-job df to capture peak usage. Revert once measured. Refs: go-gitea#37974 Assisted-by: Claude:Opus-4.8
Disabling the deletion reproduced the go-gitea#37974 "No space left on device" failure on a disk-starved runner mid cache-restore, while sibling jobs on the common ~89G-free runners passed: the hosted fleet is heterogeneous and the deletion is the headroom that keeps the small-disk minority green. Keep the parallelized deletion and df logging; revert the db-test gate and end-of-job df scaffolding used for the experiment. Refs: go-gitea#37974 Assisted-by: Claude:Opus-4.8
|
Verdict: heterogeneous runner fleet. Most runs have 89G free space but some of them seem to have less than 17GB free, leading to disk space failures. Disk space cleanup is kept for those cases, |
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce GitHub Actions “No space left on device” CI flakes by preventing unbounded growth of the seeded Go caches and by speeding up/logging disk cleanup on runners before large cache restores.
Changes:
- Remove
restore-keysfrom the cache-seeder save branches so eachgo.sumseeds a clean, bounded cache (PR runs still userestore-keysfor warm-start fallback). - Parallelize deletion of unused preinstalled toolchains and log
df -h /before/after cleanup.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
.github/actions/go-cache/action.yml |
Stops cache-seeder from using restore-keys when saving caches to avoid cache “union” growth across go.sum changes. |
.github/actions/free-disk-space/action.yml |
Speeds disk cleanup by deleting multiple toolchain directories in parallel and adds before/after free-space logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: silverwind <me@silverwind.io>
Reduces the CI cache growth and disk pressure behind the flaky
No space left on devicefailures in #37974.go-cache— the cache-seeder saved with arestore-keysprefix fallback, so everygo.sumchange restored the previous cache and re-saved the union; old module versions and stale build objects accumulated (~3 GB → ~7 GB) and overflowed disk on smaller runners. Droprestore-keysfrom the seeder save branches so eachgo.sumseeds a clean, size-bounded cache. PR runs keeprestore-keysfor warm-start fallback.free-disk-space— delete the unused preinstalled toolchains in parallel (~86 s → ~54 s) and logdf -h /before/after.Measured during review: the hosted
ubuntu-latestfleet is heterogeneous — most runners have ~89 GB free on/(a full pgsql integration shard peaks at ~17 GB used), but a minority arrive nearly full and fail mid cache-restore. The toolchain deletion is the headroom that keeps those runners green, so it stays; the cache bound shrinks the footprint for every runner.Authored with assistance from Claude (Opus 4.8).