Skip to content

ci(l1): add cleanup step to daily snapsync workflow#6131

Merged
ilitteri merged 3 commits into
mainfrom
ci/snapsync-cleanup-step
Feb 18, 2026
Merged

ci(l1): add cleanup step to daily snapsync workflow#6131
ilitteri merged 3 commits into
mainfrom
ci/snapsync-cleanup-step

Conversation

@ilitteri

@ilitteri ilitteri commented Feb 5, 2026

Copy link
Copy Markdown
Collaborator

Motivation

The daily snapsync workflow failed on 2026-02-05 with Sepolia jobs unable to start:

Failed to allocate IpAddr on subnet 172.16.0.0/22 - all taken.

The hoodi jobs succeeded but consumed Docker network resources that weren't fully released before the sepolia jobs ran.

Description

Adds a cleanup step to both sync-lighthouse and sync-prysm jobs that runs before each snapsync test:

  • kurtosis clean -a - removes all stale Kurtosis enclaves
  • docker network prune -f - removes unused Docker networks
  • docker image prune -f - removes dangling Docker images

This ensures clean resources are available for each test run and prevents accumulation of orphaned networks from previous runs.

How to Test

The workflow runs on PR when daily_snapsync.yaml is modified. The hoodi network test will execute and verify the cleanup step works correctly.

The Sepolia snapsync jobs failed on 2026-02-05 due to Kurtosis being
unable to allocate Docker network IPs (172.16.0.0/22 subnet exhausted).
This adds a cleanup step before each snapsync test that removes stale
Kurtosis enclaves and prunes unused Docker networks and images.
Copilot AI review requested due to automatic review settings February 5, 2026 14:58
@ilitteri ilitteri requested a review from a team as a code owner February 5, 2026 14:58
@github-actions github-actions Bot added the L1 Ethereum client label Feb 5, 2026
@greptile-apps

greptile-apps Bot commented Feb 5, 2026

Copy link
Copy Markdown

Greptile Overview

Greptile Summary

Added cleanup step to both sync-lighthouse and sync-prysm jobs to prevent Docker network IP exhaustion. The cleanup runs before each snapsync test and removes stale Kurtosis enclaves, unused Docker networks, and dangling images.

Key Changes:

  • Cleanup step runs kurtosis clean -a with || true to handle failure gracefully
  • Prunes Docker networks and images with -f flag to avoid prompts
  • Identical cleanup logic added to both jobs for consistency

Minor Suggestions:

  • Consider adding Docker volume cleanup (docker volume prune -f) for more thorough resource cleanup
  • Adding logging could help diagnose future cleanup issues

Confidence Score: 4/5

  • Safe to merge with minimal risk - addresses IP exhaustion issue with standard cleanup commands
  • The change adds well-established Docker and Kurtosis cleanup commands that directly address the reported IP exhaustion issue. The || true on kurtosis clean ensures graceful failure handling. Commands are duplicated consistently across both jobs. Minor improvements suggested but not blocking.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/daily_snapsync.yaml Added cleanup step to remove stale Kurtosis enclaves and Docker resources before snapsync tests run

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Cleanup as Cleanup Step
    participant Kurtosis as Kurtosis
    participant Docker as Docker Engine
    participant SnapSync as Snapsync Test
    
    GHA->>Cleanup: Start cleanup before test
    Cleanup->>Kurtosis: kurtosis clean -a
    Note over Kurtosis: Remove all stale enclaves<br/>(fails gracefully with || true)
    Kurtosis-->>Cleanup: Enclaves cleaned
    
    Cleanup->>Docker: docker network prune -f
    Note over Docker: Remove unused networks<br/>(force flag avoids prompt)
    Docker-->>Cleanup: Networks pruned
    
    Cleanup->>Docker: docker image prune -f
    Note over Docker: Remove dangling images<br/>(force flag avoids prompt)
    Docker-->>Cleanup: Images pruned
    
    Cleanup->>GHA: Resources cleaned
    GHA->>SnapSync: Run snapsync test
    Note over SnapSync: Test runs with<br/>clean resources
    SnapSync->>Docker: Create new networks/containers
    SnapSync-->>GHA: Test completed
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

run: |
kurtosis clean -a || true
docker network prune -f
docker image prune -f

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding docker volume prune -f to also clean up unused volumes, which can accumulate from Kurtosis enclaves

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/daily_snapsync.yaml
Line: 93:93

Comment:
Consider adding `docker volume prune -f` to also clean up unused volumes, which can accumulate from Kurtosis enclaves

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +89 to +93
- name: Cleanup stale Docker and Kurtosis resources
run: |
kurtosis clean -a || true
docker network prune -f
docker image prune -f

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error logging for the cleanup step to help diagnose if cleanup fails

Suggested change
- name: Cleanup stale Docker and Kurtosis resources
run: |
kurtosis clean -a || true
docker network prune -f
docker image prune -f
- name: Cleanup stale Docker and Kurtosis resources
run: |
echo "Starting cleanup of stale resources..."
kurtosis clean -a || echo "Warning: kurtosis clean failed"
docker network prune -f || echo "Error: docker network prune failed"
docker image prune -f || echo "Error: docker image prune failed"
echo "Cleanup completed"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/daily_snapsync.yaml
Line: 89:93

Comment:
Consider adding error logging for the cleanup step to help diagnose if cleanup fails

```suggestion
      - name: Cleanup stale Docker and Kurtosis resources
        run: |
          echo "Starting cleanup of stale resources..."
          kurtosis clean -a || echo "Warning: kurtosis clean failed"
          docker network prune -f || echo "Error: docker network prune failed"
          docker image prune -f || echo "Error: docker image prune failed"
          echo "Cleanup completed"
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds cleanup steps to the daily snapsync workflow to prevent Docker network exhaustion issues that caused the 2026-02-05 Sepolia jobs to fail. The cleanup runs before each snapsync test to ensure clean resources are available.

Changes:

  • Added pre-test cleanup steps to both sync-lighthouse and sync-prysm jobs
  • Cleanup includes kurtosis enclave removal, Docker network pruning, and Docker image pruning

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +89 to +93
- name: Cleanup stale Docker and Kurtosis resources
run: |
kurtosis clean -a || true
docker network prune -f
docker image prune -f

Copilot AI Feb 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup logic is duplicated identically in both the sync-lighthouse and sync-prysm jobs. Consider extracting this into a reusable composite action (similar to the existing actions in .github/actions/) to improve maintainability and ensure consistency. This would make it easier to update the cleanup logic in the future if additional cleanup steps are needed.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/daily_snapsync.yaml Outdated
Comment on lines +91 to +93
kurtosis clean -a || true
docker network prune -f
docker image prune -f

Copilot AI Feb 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding docker container prune -f to clean up stopped containers as well. While docker image prune only removes dangling images and docker network prune removes unused networks, stopped containers can also consume resources and may contribute to resource exhaustion issues. This is optional but could provide more thorough cleanup.

Copilot uses AI. Check for mistakes.
kurtosis clean -a removes enclaves but the engine still holds onto
network allocations. Stopping the engine releases the internal network
pool, allowing docker network prune to actually free the IPs.
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 Feb 10, 2026
@ilitteri ilitteri added this pull request to the merge queue Feb 18, 2026
@ilitteri ilitteri removed this pull request from the merge queue due to a manual request Feb 18, 2026
@ilitteri ilitteri enabled auto-merge February 18, 2026 14:47
@ilitteri ilitteri added this pull request to the merge queue Feb 18, 2026
Merged via the queue into main with commit a92e3e5 Feb 18, 2026
51 checks passed
@ilitteri ilitteri deleted the ci/snapsync-cleanup-step branch February 18, 2026 16:38
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants