Skip to content

Conversation

@Mahalaxmibejugam
Copy link
Contributor

@Mahalaxmibejugam Mahalaxmibejugam commented Jan 12, 2026

This PR introduces an override for the rm method in ExtendedGcsFileSystem to provide native support for deleting files and explicit folder objects within Hierarchical Namespace (HNS) enabled GCS buckets.

  • The existing rm implementation in the base GCSFileSystem is designed to operate on file objects. In a non-HNS context, pseudo-directories (object prefixes) implicitly disappear only when all files under that prefix are deleted.
  • This override extends the rm functionality for HNS buckets. It reuses the core file deletion logic from the parent class and adds the capability to explicitly delete folder objects. This makes directory removal a first-class operation, ensuring that both files and the folder entities themselves are correctly removed.

Recursive Directory Deletion: When recursive=True is specified for a directory, the implementation first deletes all file objects within the directory structure in batches. It then proceeds to delete the now-empty folder objects, starting from the deepest sub-folders and working its way up.
Non-Recursive Directory Deletion: If rm is called on a non-empty directory without recursive=True, it will raise an OSError, mimicking standard filesystem behavior and preventing accidental data loss

Cache Invalidation:

  1. File Deletion: Invalidates the cache for the direct parent and all its ancestors(rm_file cache invalidation logic).
  2. Recursive Deletion: Updates the cache for the parent of the deleted directory ensuring that subsequent listings correctly reflect the removal.

Testing:
Unit tests and integration tests are added for covering following scenarios:

  1. Deletion of single files.
  2. Recursive deletion of empty and non-empty directories.
  3. Failure modes, such as attempting to delete a non-empty directory non-recursively.
  4. Correct cache invalidation for both file and recursive directory deletions.
  5. Deletion of directories that contain placeholder objects.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

@ankitaluthra1
Copy link
Collaborator

/gcbrun

and "No such object" not in str(ex)
]
if errors:
raise errors[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After collecting all errors, why are we only returning first error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the behavior same as rm() implementation in the GCSFileSystem to be consistent. For regional buckets it is just returning the first error, so I kept it same to make sure the behavior is same for customers.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

@ankitaluthra1 ankitaluthra1 merged commit e0c0a9f into fsspec:main Jan 23, 2026
7 checks passed
@Mahalaxmibejugam Mahalaxmibejugam deleted the hns-rm branch January 23, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants