Skip to content

S3: Relying on load to check file existance if is file/dir causes high error rate on S3 #181

Open
@fadikhou

Description

@fadikhou

Hello,

Recently I started using cloudpath library to upload entire directories to S3, but suddenly I noticed a weird behavior that we got a high error rate at S3 (according to AWS monitors) with the following errors:
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

And after a deep investigation I have found that those errors came from cloudpath.py::upload_from method since the library checks for each file whether it is exists or not before uploading it (if self.exists() and self.is_dir())

def upload_from(
    self, source: Union[str, os.PathLike], force_overwrite_to_cloud: bool = False
) -> "CloudPath":
    """Upload a file or directory to the cloud path."""
    source = Path(source)

    if source.is_dir():
        for p in source.iterdir():
            (self / p.name).upload_from(p, force_overwrite_to_cloud=force_overwrite_to_cloud)

        return self

    else:
        **if self.exists() and self.is_dir():
            dst = self / source.name**
        else:
            dst = self

        dst._upload_file_to_cloud(source, force_overwrite_to_cloud=force_overwrite_to_cloud)

        return dst

My question is:
Why we need this check ? its redundant because self.is_dir() always returns False.

Metadata

Metadata

Assignees

No one assigned

    Labels

    S3bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions