[Discussion] Incremental S3 uploads in collectstatic via ETag comparison — is this in scope?

Hi, before investing time in an implementation I wanted to check whether this direction aligns with the project's goals.

## Problem

When running `collectstatic` with `S3Storage`, every file is re-uploaded on each run regardless of whether its content has changed. For projects with a non-trivial number of static assets, this adds 2–3+ minutes to every deployment.

I verified this by reading through the current `_save()` and `exists()` implementations — `_save()` always calls `obj.upload_fileobj()` unconditionally, and `exists()` only checks for object presence without retrieving or comparing the ETag.

## Historical context

`collectfast` historically solved this by comparing local MD5 hashes against S3 ETags and skipping unchanged files. However, that project is now archived (last release 2020, repository archived May 2025), leaving a gap in the ecosystem.

## Proposed direction

Add an opt-in behaviour to `S3Storage` that, before uploading a file, compares its MD5 hash with the ETag of the existing S3 object and skips the upload if they match. No new dependencies would be required since `boto3` already exposes the ETag via `head_object`.

Rough sketch:

```python
def _should_skip_upload(self, name, content):
    try:
        obj = self.connection.meta.client.head_object(
            Bucket=self.bucket_name, Key=self._normalize_name(name)
        )
        etag = obj["ETag"].strip('"')
        content.seek(0)
        local_md5 = hashlib.md5(content.read()).hexdigest()
        content.seek(0)
        return etag == local_md5
    except ClientError:
        return False
```

Controlled by a new setting, e.g. `AWS_S3_SKIP_UNCHANGED = False` (opt-in, default off to preserve current behaviour).

## Questions

1. Is this the kind of optimisation that belongs in `django-storages`, or is it intentionally out of scope?
2. If it fits, is `AWS_S3_SKIP_UNCHANGED` an acceptable API surface, or would you prefer a different approach (subclass, mixin, etc.)?
3. Are there edge cases I should be aware of — e.g. multipart uploads where the ETag is not a plain MD5?

Happy to submit a PR if the direction makes sense to the maintainer. Raising this as a discussion first to avoid wasted effort on both sides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Discussion] Incremental S3 uploads in collectstatic via ETag comparison — is this in scope? #1561

Problem

Historical context

Proposed direction

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Discussion] Incremental S3 uploads in collectstatic via ETag comparison — is this in scope? #1561

Description

Problem

Historical context

Proposed direction

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions