This is the consolidated developer reference for S3 storage in Transferts. The inline comments in the code are the source of truth for individual mechanisms — this doc gives you the map and the rationale, then points back to the code.
Audience: developers maintaining the backend. If you only want to operate a deployment (IAM, environment variables, dashboards), see the Operational gotchas section.
- The model — who talks to whom
- Cleanup — what triggers it and how the API handles it
- Daily orphan sweep
- Operational gotchas
- Tests and fixtures
Two top-level entities own files:
TransferDraft— ephemeral upload session. No public-facing metadata (title, recipients…), just a basket of files-in-transit. Lives from the first file drop until the user clicks "Create link" (finalize) or walks away (abort / cleanup).Transfer— the live, shareable transfer. Has metadata, a public token, recipients, expiry. Public download endpoints serve files from here.
TransferFile is attached to exactly one of draft or transfer —
enforced by the DB-level transferfile_exactly_one_parent check
constraint (core/models.py, class Meta).
┌──────────────────┐ ┌──────────────────┐
│ TransferDraft │ finalize │ Transfer │
│ (ephemeral) │ ─────────────► │ (live, public) │
└────────┬─────────┘ └─────────┬────────┘
│ 1 │ 1
│ │
│ 0..N (files in flight) │ 0..N (live files)
▼ ▼
┌──────────────────────────────────────────────────────┐
│ TransferFile (exactly one parent — DB constraint) │
│ s3_key, upload_id, upload_completed_at │
└────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────┐
│ S3 bucket │
│ (MPU + objects)│
└─────────────────┘
The promotion from draft to transfer is a single UPDATE in
finalize: the draft's TransferFile rows are reparented (draft=None, transfer=…) and the draft is deleted. The S3 keys do not embed the
parent id — only the file id — so they remain valid across the
reparenting (viewsets/draft.py::add_file, near the s3_key = … line).
add_file ─────────► CreateMultipartUpload ──► (MPU open, no parts yet)
│
sign_part / upload_part_bytes (×N, one per chunk)
│
▼
complete_upload ──► CompleteMultipartUpload ──► (MPU closed, object materialized)
│
▼
upload_completed_at = now()
│
▼
finalize → reparented to Transfer
│
▼
expiry / deactivation
│
▼
deactivate(reason) → PENDING_FILE_DELETION
(sets pending_deletion_at; object stays in S3)
│
▼
delete_pending_transfer_files_task (deferred)
→ DeleteObject → DEACTIVATED
Deletion is deferred: expiry and deactivation (manual or
first-download auto-archive) all funnel through Transfer.deactivate,
which only flips the status to PENDING_FILE_DELETION and stamps
pending_deletion_at = now + TRANSFER_PURGE_DELAY_HOURS. The bytes stay
in S3 until the scheduled delete_pending_transfer_files_task runs past
that deadline — the grace window lets recipients' in-flight downloads
finish before the object disappears.
Drafts and partial uploads leave S3 state behind. Cleanup is triggered both by user actions (abort, remove file, walking away) and by errors (DB or S3 failures mid-flight). This section walks through the service API used to handle them and the call sites that wire causes to cleanups.
All S3 calls go through core/services/s3.py. The module exposes two
tiers, and the choice between them depends on whether the caller can
react to a ClientError or just wants to keep going:
| Tier | Functions | Behaviour on ClientError |
When to use |
|---|---|---|---|
| Bare | abort_multipart_upload, delete_object |
Raises to caller | Caller surfaces or reacts to failure |
| Best-effort | best_effort_abort_multipart_uploads_from_files, best_effort_delete_objects_from_files |
Logs and swallows per item | Sweeping over a list where one bad row mustn't stop the rest |
The _from_files suffix signals that the helper iterates a
queryset/list of TransferFile rows and pulls s3_key / upload_id
itself — call sites don't have to map.
Six places exercise this API. Each row tells you what triggers the cleanup, which tier is picked, and what the cleanup actually does:
| Site | Tier | Cause | Cleanup action |
|---|---|---|---|
add_file rollback (viewsets/draft.py) |
bare | DB save fails after CreateMultipartUpload |
Aborts the just-opened MPU (logs if the abort itself fails); re-raises the original DB error to the client. |
complete_upload cleanup (viewsets/draft.py) |
best-effort | CompleteMultipartUpload rejected, or size mismatch |
Aborts MPUs and deletes objects for every file of the draft; deletes the draft row; raises the S3/size error after the atomic block (see flag-then-raise). |
remove_file (viewsets/draft.py) |
best-effort | User removes a single file | Aborts the file's MPU and deletes its object; deletes the file row; logs any S3 error but does not surface it (the daily orphan sweep catches leftovers). |
abort (viewsets/draft.py) |
best-effort | User explicitly aborts the draft | Aborts MPUs and deletes objects for every file of the draft; deletes the draft row; returns 204. |
cleanup_abandoned_drafts_task (tasks.py) |
best-effort | Draft >24 h old with no finalize/abort | Per draft: aborts MPUs and deletes objects for each file; deletes the draft row. |
import_drive_file_task cleanup (tasks.py) |
bare (logged) | S3 connection drops mid-stream (Drive) | Aborts the file's MPU and deletes its object (idempotent); deletes the file row; the frontend poller notices the row disappeared and surfaces a generic error. |
complete_upload stamps upload_completed_at and clears upload_id.
finalize then reparents every file from the draft to a fresh
Transfer. The draft row is deleted.
complete_upload does not raise from inside its with transaction.atomic() block. Instead it stashes the failure in a local
error_detail and raises after the block exits. The reason is that
the cleanup branch calls draft.delete() — raising inside the atomic
block would roll that delete back, leaving the draft (and its in-flight
MPUs) lingering for the next daily orphan sweep
to catch. See the comment in
viewsets/draft.py::complete_upload near error_detail = None.
core/services/s3_sweep.py exposes run_orphan_sweep(...). It is
invoked from two places:
clean_orphan_s3_objectsmanagement command — manual operator tool, dry-run by default (pass--applyto actually delete). Also supports--min-ageand--prefixflags.sweep_orphan_s3_storage_taskCelery beat task — runs daily, callsrun_orphan_sweep(apply=True, min_age_hours=24, ...).
- Objects:
list_objects_v2paginated, cross-referenced againstTransferFile.s3_key. Orphans are batched intodelete_objects(max 1000 per S3 API call). - MPUs:
list_multipart_uploadspaginated, keyed on(s3_key, upload_id)rather thanupload_idalone — the S3 API allows multiple MPUs per key, and our cross-reference must be exact. Orphans are aborted one by one (no bulk MPU API).
Both passes apply the min-age cutoff (LastModified for objects,
Initiated for MPUs). A non-zero count from the daily task logs a
warning — that's the signal that one of the per-row cleanup paths
leaked.
add_file opens an MPU before saving the TransferFile row. There's a
tiny window between CreateMultipartUpload returning and the row being
saved where a sweep with min_age=0 would see the MPU as unknown and
abort it. 24 h is well past any legitimate upload (the per-file timeout
is much shorter) and clears actual orphans within a day.
This section is for anyone deploying or debugging in production.
These are the actual boto3 calls the code makes — the IAM policy on the bucket user must allow all of them:
| Permission | Used by |
|---|---|
s3:ListBucket |
Sweep (object listing) |
s3:GetObject |
head_object_size, presigned downloads |
s3:HeadObject |
head_object_size (size-mismatch guard in complete_upload) |
s3:DeleteObject |
Per-file cleanup |
s3:DeleteObjects |
Sweep (bulk delete batch) |
s3:PutObject |
Implicitly required for direct PUT presigned URLs and upload_part |
s3:AbortMultipartUpload |
Cleanup paths and sweep |
s3:ListMultipartUploadParts |
Implicit on complete_multipart_upload |
s3:ListBucketMultipartUploads |
Sweep (MPU listing) |
Scaleway, AWS, MinIO and Garage all expose these under the same names (modulo the standard AWS naming).
A multipart upload with zero parts is a valid object in
list_multipart_uploads — it doesn't bill, but it shows up. That's why
the sweep has a dedicated MPU pass instead of relying on the object
pass. It's also why assert_bucket_empty (in tests) checks both.
S3 returns 204 for a DeleteObject on a non-existent key. Cleanup
paths can be (and are) called on rows whose object was never
materialized — no special-casing needed.
S3 tests use moto (in-process AWS
mock).
tests/_s3_live.py— fixture and helpers. Notablyassert_bucket_empty(bucket)checks both objects and MPUs (because of the empty-MPU caveat above).tests/conftest.py—live_s3_bucketfixture, function-scoped, sets up an isolated bucket per test.tests/test_s3_cleanup_drafts.py— abort / abandoned / remove paths.tests/test_s3_cleanup_transfer.py— expiry and deactivation paths.tests/test_s3_cleanup_tasks.py— the Celery tasks themselves.tests/test_s3_cleanup_structural.py— invariants that hold across paths (e.g. "no path leaves a draft with files but no MPUs").tests/test_clean_orphan_s3_command.py— the management command's thin-wrapper layer.tests/test_s3_live_smoke.py— global smoke test against a live bucket, run end-to-end.
moto hard-codes the Initiated timestamp of MPUs to 2010-11-10 in
its list_multipart_uploads response. To exercise the "skip recent
MPUs" branch of the sweep we need the moto MPU to register as newer
than the cutoff — so test_min_age_skips_recent_orphan_mpu passes
--min-age 1000000 (~114 years), placing the cutoff in 1912 so that
2010 reads as "recent". If you write a new test for the cutoff
comparison itself, copy that pattern.