Skip to content

feat(cleanup): enable Docker cleanup for all backends and simplify code#13670

Draft
Copilot wants to merge 5 commits intomasterfrom
copilot/review-cleanup-stages-docker
Draft

feat(cleanup): enable Docker cleanup for all backends and simplify code#13670
Copilot wants to merge 5 commits intomasterfrom
copilot/review-cleanup-stages-docker

Conversation

Copy link
Contributor

Copilot AI commented Feb 19, 2026

Description

Implemented Docker resource cleanup for all backends (aws, gce, azure, docker, k8s-*, xcloud) to fix the issue where Docker containers were only cleaned when backend=docker, leaving containers uncleaned when tests run on AWS/Azure/GCE backends.

The implementation follows the plan documented in docs/plans/docker-cleanup-all-backends.md and completes all 4 phases:

  1. Enable Docker cleanup for all backends (except k8s-local)
  2. Remove unused builder_name parameter
  3. Simplify to local Docker daemon scanning only
  4. Remove obsolete builder scanning code

Changes

Phase 1: Enable Docker Cleanup for All Backends

  • Changed condition from if cluster_backend in ("docker", "") to if not cluster_backend.startswith("k8s-local")
  • Added explanatory comment about why all backends need Docker cleanup
  • Improved logging: Changed from "Going to delete..." to "Cleaned Docker container/image: " for better clarity

Phase 2: Remove Unused Parameters

  • Removed unused builder_name parameter from clean_resources_docker() function
  • Updated call to list_resources_docker() to not pass unused parameters
  • Added docstring clarifying local-only scanning behavior

Phase 3: Simplify list_resources_docker

  • Removed builder_name and group_as_builder parameters from function signature
  • Removed call to obsolete list_clients_docker()
  • Now uses docker.from_env() directly to get local Docker client
  • Simplified from ~60 lines with parallel builder scanning to ~50 lines of straightforward local scanning

Phase 4: Remove Obsolete Code

  • Verified list_clients_docker() not used elsewhere in codebase
  • Removed entire list_clients_docker() function (47 lines of obsolete builder scanning code)
  • Removed SSH connection logic, SSHAgent usage, and ParallelObject usage for builders

Bug Fixes

  • Updated unit_tests/test_clean_cloud_resources_func.py::CleanResourcesDockerTest::test_destroy to match the new simplified function signature
  • Reverted unnecessary <untagged> string handling in image logging to match original implementation

Key Points:

  • Static builders are no longer used - all builder scanning code has been removed
  • Docker cleanup now runs for ALL backends (aws, gce, azure, docker, k8s-*, xcloud) except k8s-local
  • Simplified approach: local Docker daemon only, no remote builder scanning
  • Total code reduction: ~50 lines removed, cleaner and more maintainable

Backend Coverage (Implemented)

Backend Docker Cleanup Status
aws ✅ YES Local daemon scanned
gce ✅ YES Local daemon scanned
azure ✅ YES Local daemon scanned
docker ✅ YES Local daemon scanned
k8s-eks ✅ YES Local daemon scanned
k8s-gke ✅ YES Local daemon scanned
k8s-local ⏭️ SKIP No cleanup needed
xcloud ✅ YES Local daemon scanned

Files Modified:

  • sdcm/utils/resources_cleanup.py - Updated cleanup condition, improved logging, simplified function signature
  • sdcm/utils/common.py - Simplified list_resources_docker, removed list_clients_docker
  • unit_tests/test_clean_cloud_resources_func.py - Updated test to match new function signature
  • docs/plans/docker-cleanup-all-backends.md - Implementation plan (all phases marked complete)

Ref: SCT-99

Testing

  • Python syntax validated
  • All 4 phases implemented
  • Unit test fixed to match new signature
  • Image logging reverted to original approach
  • Full unit test suite (pending CI)
  • Manual integration test with backend=aws recommended
  • Manual integration test with backend=docker recommended

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Review cleanup stages in artifactsPipeline and hydra command docs(cleanup): clarify Docker cleanup behavior in artifacts pipeline Feb 19, 2026
Copilot AI requested a review from fruch February 19, 2026 07:49
@fruch
Copy link
Contributor

fruch commented Feb 19, 2026

@copilot but if the backend is AWS or Azure, the docker cleanup isn't gonna happen at all, also the scan of all of the builders is disabled recently, can you review this part again ?

Copy link
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot but if the backend is AWS or Azure, the docker cleanup isn't gonna happen at all, also the scan of all of the builders is disabled recently, can you review this part again ?

You're absolutely right - I need to correct my analysis.

Issue 1: Docker cleanup doesn't run for AWS/Azure backends
The condition if cluster_backend in ("docker", "") at line 127 means Docker cleanup only runs when:

  • Backend is explicitly "docker", OR
  • Backend is empty/unset

If backend is "aws" or "azure", Docker resources are NOT cleaned up. This is a problem because tests on AWS/Azure backends may still create local Docker containers (e.g., for stress tools, monitoring).

Issue 2: Builder scanning disabled in Jenkins
Line 548 has if builder_name != "local" and getpass.getuser() != "jenkins" - this means when running as the jenkins user (which is the case in CI), remote builder scanning is completely skipped, and only the local Docker daemon is scanned.

My documentation incorrectly stated these features work in all cases. I'll update the analysis and fix the documentation to reflect the actual behavior.

Copilot AI changed the title docs(cleanup): clarify Docker cleanup behavior in artifacts pipeline docs(cleanup): identify Docker cleanup limitations in artifacts pipeline Feb 19, 2026
@fruch
Copy link
Contributor

fruch commented Feb 19, 2026

@copilot list_resources_docker argument builder_name is never used, we should remove this scan for builder code from it, it should be working only locally

also update all of the in the plan, and the need to enable docker local cleanup in every cleanup
and drop for now all of the other comments or changes from the code
after the plan is done, we'll continue to the code

Copy link
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot list_resources_docker argument builder_name is never used, we should remove this scan for builder code from it, it should be working only locally

also update all of...

Done. Reverted all code changes and rewrote the plan (commit a24d0a5).

Changes:

  1. All code files reverted to original state (no comments/changes)
  2. New plan focuses on:
    • Remove unused builder_name parameter from list_resources_docker
    • Simplify to local Docker daemon only (no builder scanning)
    • Enable Docker cleanup for ALL backends: change condition to if not cluster_backend.startswith("k8s-local")

Ready to proceed with code implementation once plan is approved.

Copilot AI changed the title docs(cleanup): identify Docker cleanup limitations in artifacts pipeline docs(cleanup): implementation plan for Docker cleanup across all backends Feb 19, 2026
Copilot AI requested a review from fruch February 19, 2026 08:10
@fruch fruch added the P1 Urgent label Feb 19, 2026
@fruch fruch requested a review from dimakr February 19, 2026 08:12
@fruch fruch changed the title docs(cleanup): implementation plan for Docker cleanup across all backends SCT-99: docs(cleanup): implementation plan for Docker cleanup across all backends Feb 19, 2026
Copilot AI changed the title SCT-99: docs(cleanup): implementation plan for Docker cleanup across all backends docs(cleanup): implementation plan for Docker cleanup across all backends Feb 19, 2026
Copilot AI requested a review from fruch February 19, 2026 08:13
@fruch fruch changed the title docs(cleanup): implementation plan for Docker cleanup across all backends SCT-99: docs(cleanup): implementation plan for Docker cleanup across all backends Feb 19, 2026
Copilot AI changed the title SCT-99: docs(cleanup): implementation plan for Docker cleanup across all backends docs(plans): implementation plan for Docker cleanup across all backends Feb 19, 2026
Copilot AI requested a review from fruch February 19, 2026 08:38
@fruch fruch force-pushed the copilot/review-cleanup-stages-docker branch from 4962aa3 to a79e78a Compare February 19, 2026 08:57
Created comprehensive plan to enable Docker cleanup across all backends
(aws, gce, azure) instead of only docker backend, addressing resource
accumulation on Jenkins builders. Plan includes 4 phases to simplify
cleanup logic, remove unused builder scanning code, and add logging.
@fruch fruch force-pushed the copilot/review-cleanup-stages-docker branch from a79e78a to 358c112 Compare February 19, 2026 08:57
@fruch
Copy link
Contributor

fruch commented Feb 19, 2026

@copilot now go an implement the whole plan

Copilot AI and others added 2 commits February 19, 2026 10:24
Implemented all 4 phases of Docker cleanup plan:

Phase 1: Enable Docker cleanup for all backends
- Changed condition from 'if cluster_backend in ("docker", "")'
- To: 'if not cluster_backend.startswith("k8s-local")'
- Added explanatory comment
- Improved logging: "Cleaned Docker container/image: <name>"

Phase 2: Remove unused builder_name parameter
- Removed builder_name from clean_resources_docker() signature
- Updated call to list_resources_docker()
- Added docstring clarifying local-only behavior

Phase 3: Simplify list_resources_docker
- Removed builder_name and group_as_builder parameters
- Removed call to list_clients_docker()
- Now uses docker.from_env() directly for local daemon
- Simplified logic - no more parallel builder scanning
- Updated docstring

Phase 4: Remove obsolete list_clients_docker
- Verified function not used elsewhere in codebase
- Removed list_clients_docker() function completely
- Removed 47 lines of obsolete builder scanning code

Docker cleanup now runs for all backends (aws, gce, azure, docker,
k8s-*, xcloud) except k8s-local, scanning only the local Docker daemon.

Co-authored-by: fruch <[email protected]>
Updated plan document to reflect that all 4 phases have been
implemented in commit 3005e2a.

Co-authored-by: fruch <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot now go an implement the whole plan

Implemented all 4 phases in commits 3005e2a and f5a792b.

Summary:

  • Phase 1: Docker cleanup now runs for ALL backends (except k8s-local) with improved logging
  • Phase 2: Removed unused builder_name parameter from clean_resources_docker
  • Phase 3: Simplified list_resources_docker to use docker.from_env() directly
  • Phase 4: Removed obsolete list_clients_docker function (47 lines)

Total: -50 lines of code, simplified logic, Docker cleanup now works for aws/azure/gce backends.

Copilot AI changed the title docs(plans): implementation plan for Docker cleanup across all backends feat(cleanup): enable Docker cleanup for all backends and simplify code Feb 19, 2026
@fruch
Copy link
Contributor

fruch commented Feb 19, 2026

@copilot fix precommit issue, and this test was failing:

12:40:52  FAILED unit_tests/test_clean_cloud_resources_func.py::CleanResourcesDockerTest::test_destroy - AssertionError: expected call not found.


def delete_image(image):
LOGGER.info("Going to delete Docker image tag(s) %s on `%s'", image.tags, image.client.info()["Name"])
image_tags = image.tags or ["<untagged>"]
Copy link
Contributor

@fruch fruch Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot where this <untagged> is coming from ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 2274704. The <untagged> string was an unnecessary addition I made. Reverted to the original approach of logging image.tags directly as-is, which correctly handles all cases including None or empty lists.

@fruch fruch added the test-provision-gce Run provision test on GCE label Feb 19, 2026
…nature

Fixed unit test CleanResourcesDockerTest::test_destroy to match the
new simplified function signature after removing builder_name and
group_as_builder parameters.

The test was calling list_resources_docker with:
- builder_name=None
- group_as_builder=False

These parameters were removed in the Docker cleanup simplification,
so the test now correctly calls the function with only tags_dict.

Co-authored-by: fruch <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot fix precommit issue, and this test was failing:

12:40:52  FAILED unit_tests/test_clean_cloud_resources_func.py::CleanResourcesDockerTest::test_destroy - AssertionError...

Fixed in commit 5fe0454. Updated test to call list_resources_docker with only tags_dict parameter, removing the obsolete builder_name=None and group_as_builder=False arguments that were removed during the Docker cleanup simplification.

Reverted to original approach of logging image.tags directly without
adding custom "<untagged>" fallback. The original code logged image.tags
as-is, which correctly handles all cases including None or empty lists.

The "<untagged>" string was an unnecessary addition that wasn't in the
original implementation.

Co-authored-by: fruch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P1 Urgent test-provision-gce Run provision test on GCE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments