-
Notifications
You must be signed in to change notification settings - Fork 175
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Package version (if known): v13, latest
Describe the bug
When a multipart file upload to S3 is left unfinished long enough for the presigned URLs to expire and/or S3 lifecycle policies to clean up the multipart upload, Invenio remains unaware of this cleanup. The file stays marked as pending in the database with a stale upload ID.
As a result:
- Deleting the file from the UI fails.
- The draft becomes effectively stuck. The user cannot proceed.
There is no recovery path exposed to the end user, and the system does not self-heal.
Steps to Reproduce
- Create a draft record.
- Start uploading a large file using multipart upload.
- Leave the upload unfinished long enough for:
- presigned URLs to expire, and/or
- S3 lifecycle rules to clean up failed multipart uploads.
- Return to the draft.
- Attempt to delete the file from the UI or via the API.
Expected behavior
One of the following should happen:
- The file can be deleted cleanly even if the multipart upload no longer exists in S3.
- Or the system detects the missing multipart upload and resets the file state.
- Or the UI automatically refreshes upload URLs and allows the upload to resume.
- Or there is a supported administrative way to force-delete the file record.
The draft should never become permanently blocked.
Additional context
- This occurs when multipart uploads remain pending for days.
- S3 buckets may have lifecycle policies that automatically remove failed multipart uploads.
- Invenio continues to store and reuse an invalid upload ID.
- Local reproduction often “works” because uploads resume before URL expiration.
Workarounds:
Delete the draft and recreate it.
This is a lifecycle edge case but results in a hard user-facing failure with no recovery path.
file-checks
file-checks task will see the pending files that has no checksum as currupted:
Traceback (most recent call last):
File "/path-to-lib/python3.12/site-packages/invenio_files_rest/models.py", line 835, in verify_checksum
real_checksum = self.storage(**kwargs).checksum(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/invenio_files_rest/storage/base.py", line 156, in checksum
fp = self.open(mode="rb")
^^^^^^^^^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/invenio_files_rest/storage/pyfs.py", line 60, in open
return fs.open(path, mode=mode)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/fsspec/spec.py", line 1338, in open
f = self._open(
^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/s3fs/core.py", line 720, in _open
return S3File(
^^^^^^^
File "/path-to-lib/python3.12/site-packages/s3fs/core.py", line 2257, in __init__
super().__init__(
File "/path-to-lib/python3.12/site-packages/fsspec/spec.py", line 1912, in __init__
self.size = self.details["size"]
^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/fsspec/spec.py", line 1925, in details
self._details = self.fs.info(self.path)
^^^^^^^^^^^^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/path-to-lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/path-to-lib/python3.12/site-packages/s3fs/core.py", line 1492, in _info
raise FileNotFoundError(path)
FileNotFoundError: placeholder/xx/xx/xxx-xxxx-xxxx-xxxx-xxxxxxxxx/data
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working