Open
Description
The is_dir
check is fairly expensive, but at least for S3 and Azure when the entries were created as a result of the client's _list_dir
method, you can tell for each entry whether it is a directory or a file and immediately set the result on the created CloudPath instance.
For example for the S3Client._list_dir
, you could write something like:
paginator = self.client.get_paginator("list_objects_v2")
for result in paginator.paginate(
Bucket=cloud_path.bucket, Prefix=prefix, Delimiter="/", MaxKeys=1000
):
# sub directory names
for result_prefix in result.get("CommonPrefixes", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_prefix.get('Prefix')}")
path._is_dir = True
yield path
# files in the directory
for result_key in result.get("Contents", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_key.get('Key')}")
path._is_dir = False
yield path
and modify S3Path.is_dir
:
def is_dir(self) -> bool:
if self._is_dir is None:
self._is_dir = self.client._is_file_or_dir(self) == "dir"
return self._is_dir
This makes a HUGE performance difference if you need to call is_dir
on the entries returned from iterdir
or glob
(in my case, when implementing a file dialog that works for cloud paths).
Not sure if this particular implementation is the best way to do this, but something like this is needed.
Metadata
Metadata
Assignees
Labels
No labels