ci(l1): add cleanup step to daily snapsync workflow#6131
Conversation
The Sepolia snapsync jobs failed on 2026-02-05 due to Kurtosis being unable to allocate Docker network IPs (172.16.0.0/22 subnet exhausted). This adds a cleanup step before each snapsync test that removes stale Kurtosis enclaves and prunes unused Docker networks and images.
Greptile OverviewGreptile SummaryAdded cleanup step to both Key Changes:
Minor Suggestions:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| .github/workflows/daily_snapsync.yaml | Added cleanup step to remove stale Kurtosis enclaves and Docker resources before snapsync tests run |
Sequence Diagram
sequenceDiagram
participant GHA as GitHub Actions
participant Cleanup as Cleanup Step
participant Kurtosis as Kurtosis
participant Docker as Docker Engine
participant SnapSync as Snapsync Test
GHA->>Cleanup: Start cleanup before test
Cleanup->>Kurtosis: kurtosis clean -a
Note over Kurtosis: Remove all stale enclaves<br/>(fails gracefully with || true)
Kurtosis-->>Cleanup: Enclaves cleaned
Cleanup->>Docker: docker network prune -f
Note over Docker: Remove unused networks<br/>(force flag avoids prompt)
Docker-->>Cleanup: Networks pruned
Cleanup->>Docker: docker image prune -f
Note over Docker: Remove dangling images<br/>(force flag avoids prompt)
Docker-->>Cleanup: Images pruned
Cleanup->>GHA: Resources cleaned
GHA->>SnapSync: Run snapsync test
Note over SnapSync: Test runs with<br/>clean resources
SnapSync->>Docker: Create new networks/containers
SnapSync-->>GHA: Test completed
| run: | | ||
| kurtosis clean -a || true | ||
| docker network prune -f | ||
| docker image prune -f |
There was a problem hiding this comment.
Consider adding docker volume prune -f to also clean up unused volumes, which can accumulate from Kurtosis enclaves
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/daily_snapsync.yaml
Line: 93:93
Comment:
Consider adding `docker volume prune -f` to also clean up unused volumes, which can accumulate from Kurtosis enclaves
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| - name: Cleanup stale Docker and Kurtosis resources | ||
| run: | | ||
| kurtosis clean -a || true | ||
| docker network prune -f | ||
| docker image prune -f |
There was a problem hiding this comment.
Consider adding error logging for the cleanup step to help diagnose if cleanup fails
| - name: Cleanup stale Docker and Kurtosis resources | |
| run: | | |
| kurtosis clean -a || true | |
| docker network prune -f | |
| docker image prune -f | |
| - name: Cleanup stale Docker and Kurtosis resources | |
| run: | | |
| echo "Starting cleanup of stale resources..." | |
| kurtosis clean -a || echo "Warning: kurtosis clean failed" | |
| docker network prune -f || echo "Error: docker network prune failed" | |
| docker image prune -f || echo "Error: docker image prune failed" | |
| echo "Cleanup completed" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/daily_snapsync.yaml
Line: 89:93
Comment:
Consider adding error logging for the cleanup step to help diagnose if cleanup fails
```suggestion
- name: Cleanup stale Docker and Kurtosis resources
run: |
echo "Starting cleanup of stale resources..."
kurtosis clean -a || echo "Warning: kurtosis clean failed"
docker network prune -f || echo "Error: docker network prune failed"
docker image prune -f || echo "Error: docker image prune failed"
echo "Cleanup completed"
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR adds cleanup steps to the daily snapsync workflow to prevent Docker network exhaustion issues that caused the 2026-02-05 Sepolia jobs to fail. The cleanup runs before each snapsync test to ensure clean resources are available.
Changes:
- Added pre-test cleanup steps to both sync-lighthouse and sync-prysm jobs
- Cleanup includes kurtosis enclave removal, Docker network pruning, and Docker image pruning
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Cleanup stale Docker and Kurtosis resources | ||
| run: | | ||
| kurtosis clean -a || true | ||
| docker network prune -f | ||
| docker image prune -f |
There was a problem hiding this comment.
The cleanup logic is duplicated identically in both the sync-lighthouse and sync-prysm jobs. Consider extracting this into a reusable composite action (similar to the existing actions in .github/actions/) to improve maintainability and ensure consistency. This would make it easier to update the cleanup logic in the future if additional cleanup steps are needed.
| kurtosis clean -a || true | ||
| docker network prune -f | ||
| docker image prune -f |
There was a problem hiding this comment.
Consider adding docker container prune -f to clean up stopped containers as well. While docker image prune only removes dangling images and docker network prune removes unused networks, stopped containers can also consume resources and may contribute to resource exhaustion issues. This is optional but could provide more thorough cleanup.
kurtosis clean -a removes enclaves but the engine still holds onto network allocations. Stopping the engine releases the internal network pool, allowing docker network prune to actually free the IPs.
Motivation
The daily snapsync workflow failed on 2026-02-05 with Sepolia jobs unable to start:
The hoodi jobs succeeded but consumed Docker network resources that weren't fully released before the sepolia jobs ran.
Description
Adds a cleanup step to both
sync-lighthouseandsync-prysmjobs that runs before each snapsync test:kurtosis clean -a- removes all stale Kurtosis enclavesdocker network prune -f- removes unused Docker networksdocker image prune -f- removes dangling Docker imagesThis ensures clean resources are available for each test run and prevents accumulation of orphaned networks from previous runs.
How to Test
The workflow runs on PR when
daily_snapsync.yamlis modified. The hoodi network test will execute and verify the cleanup step works correctly.