Skip to content

fix(BA-3250): Download directory not removed after moving files to archive directory#7115

Merged
HyeockJinKim merged 8 commits into
mainfrom
fix/BA-3250
Dec 9, 2025
Merged

fix(BA-3250): Download directory not removed after moving files to archive directory#7115
HyeockJinKim merged 8 commits into
mainfrom
fix/BA-3250

Conversation

@jopemachine

@jopemachine jopemachine commented Dec 4, 2025

Copy link
Copy Markdown
Member

resolves #7114 (BA-3250)

Checklist: (if applicable)

  • Mention to the original issue

@github-actions github-actions Bot added the size:L 100~500 LoC label Dec 4, 2025
@jopemachine jopemachine changed the title fix: Download directory not removed after moving files to archive directory fix(BA-3250): Download directory not removed after moving files to archive directory Dec 4, 2025
@github-actions github-actions Bot added the comp:storage-proxy Related to Storage proxy component label Dec 4, 2025
@jopemachine jopemachine added this to the 25.18 milestone Dec 4, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves BA-3250 by changing the artifact import process from copying files to moving them, ensuring download directories are properly removed after files are transferred to the archive directory.

Key changes:

  • Refactored cleanup logic by introducing a default cleanup_stage implementation in the base ImportStep class that relies on abstract methods registry_type and stage_storage
  • Modified storage transfer operations from copy to move for VFS-to-VFS transfers, with automatic cleanup of empty parent directories
  • Optimized archive transfer to move entire model directories at once instead of transferring files individually

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/ai/backend/storage/services/artifacts/types.py Added abstract methods registry_type and stage_storage, provided default cleanup_stage implementation
src/ai/backend/storage/services/artifacts/storage_transfer.py Changed copy operations to move operations, added _move_vfs_directory and _cleanup_empty_parents methods for VFS transfers
src/ai/backend/storage/services/artifacts/reservoir.py Implemented new abstract methods, removed custom cleanup implementation
src/ai/backend/storage/services/artifacts/huggingface.py Implemented new abstract methods, removed custom cleanup implementation
src/ai/backend/storage/services/artifacts/common.py Implemented stage_storage for verify/archive steps, removed custom cleanup, changed to transfer entire directories, fixed typo in variable name
changes/7115.fix.md Added changelog entry for the fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pass

async def cleanup_stage(self, context: ImportStepContext) -> None:
"""Default cleanup implementation that removes files"""

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Default cleanup implementation" but the method is no longer abstract, making it concrete. This implementation relies on abstract methods registry_type and stage_storage that must be implemented by subclasses. Consider clarifying the comment to indicate this is a concrete implementation that can be overridden if needed, rather than calling it "default" which might imply it does nothing.

Suggested change
"""Default cleanup implementation that removes files"""
"""
Concrete cleanup implementation that removes files for this step.
This method relies on the abstract methods `registry_type` and `stage_storage`,
which must be implemented by subclasses. Subclasses may override this method
to customize cleanup behavior if needed.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +187
self._cleanup_empty_parents(source_path.parent, source_storage.base_path)

def _cleanup_empty_parents(self, path: Path, base_path: Path) -> None:
"""Remove empty parent directories up to base_path."""
current = path
while current != base_path and current.exists():
try:
current.rmdir() # Only removes if empty

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _cleanup_empty_parents method uses synchronous rmdir() and file system operations without running them in an executor. While this might be acceptable for lightweight operations like checking if a directory is empty and removing it, it could block the event loop if the filesystem is slow (e.g., network-mounted storage). Consider wrapping the filesystem operations in asyncio.get_event_loop().run_in_executor(None, ...) for consistency with other filesystem operations in this file.

Suggested change
self._cleanup_empty_parents(source_path.parent, source_storage.base_path)
def _cleanup_empty_parents(self, path: Path, base_path: Path) -> None:
"""Remove empty parent directories up to base_path."""
current = path
while current != base_path and current.exists():
try:
current.rmdir() # Only removes if empty
await self._cleanup_empty_parents(source_path.parent, source_storage.base_path)
async def _cleanup_empty_parents(self, path: Path, base_path: Path) -> None:
"""Remove empty parent directories up to base_path."""
loop = asyncio.get_event_loop()
current = path
while current != base_path and await loop.run_in_executor(None, current.exists):
try:
await loop.run_in_executor(None, current.rmdir) # Only removes if empty

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +190
def _cleanup_empty_parents(self, path: Path, base_path: Path) -> None:
"""Remove empty parent directories up to base_path."""
current = path
while current != base_path and current.exists():
try:
current.rmdir() # Only removes if empty
current = current.parent
except OSError:
break # Not empty or other error

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] In the directory cleanup loop, if current == base_path initially (unlikely but possible if path and base_path are the same), the while condition prevents execution but doesn't provide any logging or indication. While this is likely a safe guard, consider adding a check or log statement if this edge case is encountered to aid debugging.

Copilot uses AI. Check for mistakes.
Comment thread changes/7115.fix.md
@@ -0,0 +1 @@
Move the entire directory after artifact import instead of copying files individually, and remove the model directory if it is empty No newline at end of file

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog filename is 7115.fix.md but the PR description mentions issue #7114 (BA-3250). There's a mismatch between the changelog filename (7115) and the referenced issue number (7114). Ensure the changelog filename matches the correct issue number.

Copilot uses AI. Check for mistakes.
Comment thread src/ai/backend/storage/services/artifacts/storage_transfer.py
# Move file or directory (automatically removes source)
await asyncio.get_event_loop().run_in_executor(
None, shutil.move, str(source_full_path), str(dest_full_path)
)

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _move_vfs_to_vfs method (used for individual file transfers) does not include cleanup of empty parent directories, while _move_vfs_directory does (line 180). This inconsistency means that when moving individual files between VFS storages, empty parent directories may be left behind. Consider adding the same cleanup logic here for consistency, or document why it's not needed for individual file moves.

Suggested change
)
)
# Cleanup empty artifact directories after moving a file
self._cleanup_empty_parents(source_full_path.parent, source_storage.base_path)

Copilot uses AI. Check for mistakes.
def _cleanup_empty_parents(self, path: Path, base_path: Path) -> None:
"""Remove empty parent directories up to base_path."""
current = path
while current != base_path and current.exists():

Copilot AI Dec 5, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path comparison using current != base_path (line 185) may not work as expected if the paths are not normalized or if they differ in representation (e.g., relative vs absolute, symlinks). Consider using current.resolve() != base_path.resolve() or not current.is_relative_to(base_path) for more robust path comparison, or check if current is a parent of or equal to base_path using path methods.

Suggested change
while current != base_path and current.exists():
while current.resolve() != base_path.resolve() and current.exists():

Copilot uses AI. Check for mistakes.
Comment thread src/ai/backend/storage/services/artifacts/storage_transfer.py
Comment on lines +722 to +732
@override
def stage_storage(self, context: ImportStepContext) -> AbstractStorage:
download_storage_name = context.storage_step_mappings.get(
ArtifactStorageImportStep.DOWNLOAD
)
if not download_storage_name:
raise StorageStepRequiredStepNotProvided(
"No storage mapping provided for DOWNLOAD step cleanup"
)

return context.storage_pool.get_storage(download_storage_name)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks like a repeating pattern, so it would be more natural to just return which step it is and structure it from the top.

@HyeockJinKim HyeockJinKim added this pull request to the merge queue Dec 9, 2025
Merged via the queue into main with commit 69c7777 Dec 9, 2025
28 checks passed
@HyeockJinKim HyeockJinKim deleted the fix/BA-3250 branch December 9, 2025 05:07
jopemachine added a commit that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:storage-proxy Related to Storage proxy component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Download directory not removed after moving files to archive directory

3 participants