-
Notifications
You must be signed in to change notification settings - Fork 2
Make mirror_file fail if file object already exists (#7134) #7141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mirror_file fail if file object already exists (#7134) #7141
Conversation
2aa836e
to
5ea6493
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7141 +/- ##
===========================================
- Coverage 85.24% 85.22% -0.02%
===========================================
Files 152 152
Lines 22060 22099 +39
===========================================
+ Hits 18804 18834 +30
- Misses 3256 3265 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, an attempt to mirror a file that already exists in the destination will cause S3 to return a 413 (due to IfNoneMatch
= *
), however there is no R
assertion as described in the ticket description. Was it decided the 413 was sufficient?
Also, the two parts of the reupload
subtest both start by calling self._s3.delete_object()
. Shouldn't this include a test where the file is not deleted and then reuploaded?
12ff57e
to
e5cc410
Compare
There are two objects, the file object and the info object. If the info object is present, the mirror service will skip trying to upload the file object, so all subtests after the first will skip the parts of the code we need coverage for. The two ways to circumvent this are to delete the info object or patch the method that checks for it. For some reason I had concluded the former was preferable but I can't remember my reasoning so I switched back to the latter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
src/azul/service/storage_service.py
Outdated
@@ -181,12 +198,16 @@ def upload(self, | |||
|
|||
def _object_creation_kwargs(self, *, | |||
content_type: str | None = None, | |||
tagging: Tagging | None = None): | |||
tagging: Tagging | None = None, | |||
exists_okay: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exists_okay: bool = True | |
overwrite: bool = True |
and everywhere else.
src/azul/service/storage_service.py
Outdated
**kwargs) | ||
except botocore.exceptions.ClientError as e: | ||
error = e.response['Error'] | ||
if error['Code'] == 'PreconditionFailed' and error['Condition'] == 'If-None-Match': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L
e5cc410
to
fc3d6c1
Compare
src/azul/service/storage_service.py
Outdated
parts = [ | ||
{ | ||
'PartNumber': index + 1, | ||
'ETag': etag | ||
} | ||
for index, etag in enumerate(etags) | ||
] | ||
upload.complete(MultipartUpload={'Parts': parts}) | ||
upload.complete(MultipartUpload={'Parts': parts}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That same exception needs to be handled here as well.
Additionally, we don't want to create a big MP upload only to then realize at the end that the object already exists. The proper way to deal with this for large files is to check explicitly with HeadObject. I don't know if PutObject with If-None-Match fails early before processing the entire request body or after. Unless you can find documentation about this, please add the HeadObject for small files as well.
f36a492
to
d3318e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject: [PATCH] Clean-up mirroring fixture duplication
---
Index: test/indexer/test_mirror_controller.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/indexer/test_mirror_controller.py b/test/indexer/test_mirror_controller.py
--- a/test/indexer/test_mirror_controller.py (revision 1c0a00197edc274bb9804375a59b3ecf3a9dfe3a)
+++ b/test/indexer/test_mirror_controller.py (date 1748042321516)
@@ -78,12 +78,14 @@
with self.subTest('mirror_file', corrupted=False, exists=False):
self._test_mirror_file(file, file_message)
- # Force reupload attempts even if the info object is present
- with patch.object(MirrorService, 'info_exists', return_value=False):
- with self.subTest('mirror_file', corrupted=True):
- self._test_corrupted_download(file_message)
- with self.subTest('mirror_file', corrupted=False, exists=True):
- self._test_reuploaded_file(file_message)
+ self._s3.delete_object(Bucket=self.mirror_bucket,
+ Key=self.mirror_controller.service.info_object_key(file))
+
+ with self.subTest('mirror_file', corrupted=True):
+ self._test_corrupted_download(file_message)
+
+ with self.subTest('mirror_file', corrupted=False, exists=True):
+ self._test_reuploaded_file(file_message)
_file_contents = b'lorem ipsum dolor sit\n'
Index: src/azul/indexer/mirror_controller.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/indexer/mirror_controller.py b/src/azul/indexer/mirror_controller.py
--- a/src/azul/indexer/mirror_controller.py (revision 1c0a00197edc274bb9804375a59b3ecf3a9dfe3a)
+++ b/src/azul/indexer/mirror_controller.py (date 1748042570717)
@@ -152,12 +152,12 @@
deployment_is_stable = (config.deployment.is_stable
and not config.deployment.is_unit_test
and catalog not in config.integration_test_catalogs)
- if self.service.info_exists(catalog, file):
+ if file_is_large and not deployment_is_stable:
+ log.info('Not mirroring file to save cost: %r', file)
+ elif self.service.info_exists(catalog, file):
log.info('File is already mirrored, skipping upload: %r', file)
elif self.service.file_exists(catalog, file):
assert False, R('File object is already present', file)
- elif file_is_large and not deployment_is_stable:
- log.info('Not mirroring file to save cost: %r', file)
else:
# Ensure we test with multiple parts on lower deployments
part_size = FilePart.default_size if deployment_is_stable else FilePart.min_size
Index: src/azul/service/storage_service.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/service/storage_service.py b/src/azul/service/storage_service.py
--- a/src/azul/service/storage_service.py (revision 1c0a00197edc274bb9804375a59b3ecf3a9dfe3a)
+++ b/src/azul/service/storage_service.py (date 1748043612861)
@@ -77,6 +77,10 @@
pass
+class StorageObjectExists(Exception):
+ pass
+
+
class StorageService:
def __init__(self, bucket_name: str | None = None):
@@ -94,7 +98,8 @@
Key=object_key)
except self._s3.exceptions.ClientError as e:
if int(e.response['Error']['Code']) == 404:
- raise StorageObjectNotFound
+ # REVIEW: separate commit
+ raise StorageObjectNotFound(object_key)
else:
raise e
@@ -103,7 +108,8 @@
response = self._s3.get_object(Bucket=self.bucket_name,
Key=object_key)
except self._s3.exceptions.NoSuchKey:
- raise StorageObjectNotFound
+ # REVIEW: same commit as above
+ raise StorageObjectNotFound(object_key)
else:
return response['Body'].read()
@@ -309,7 +315,7 @@
error = exception.response['Error']
code, condition = error['Code'], error['Condition']
if code == 'PreconditionFailed' and condition == 'If-None-Match':
- assert False, R('Object exists', object_key)
+ raise StorageObjectExists(object_key)
else:
raise exception
The last two commits should either be squashed or made into a split commit. Please change title StorageService supports IfNoneMatch param to Optionally prevent StorageService from overwriting objects
d3318e2
to
19410cd
Compare
Security design review
|
19410cd
to
f0d2f2b
Compare
Connected issues: #7134
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifydocker_images.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
dev
or this PR does not require reindexingdev
deploy_browser
job in the GitLab pipeline for this PR indev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
deploy_browser
job in the GitLab pipeline for this PR inanvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem