Skip to content

Commit b71426a

Browse files
committed
push: skip already existing files in the s3 bucket on push
When we push data (i.e. sha254-abcd) files to the s3 bucket we currently do this unconditionally. But if the file is already in the bucket this is unnecessary. So try to "touch" it first and only upload if the file is not already there. This saves some time even on a fast connection (and bandwidth of course). Note that we use "touch" here because we want to make sure the object metadata gets updated so that any time-based expiration policies are still honored.
1 parent c833438 commit b71426a

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

src/ctl/push.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,28 @@ def push_data_s3(self, storage, platform_id):
4747

4848
for entry in entries:
4949
i_total += 1
50+
key = f"data/{storage}/{path}/{entry}"
51+
52+
try:
53+
s3c.copy_object(
54+
Bucket="rpmrepo-storage",
55+
Key=key,
56+
CopySource={"Bucket": "rpmrepo-storage", "Key": key},
57+
MetadataDirective="COPY",
58+
)
59+
print(f"[{i_total}/{n_total}] '{key}' (exists, skipping)")
60+
continue
61+
except s3c.exceptions.ClientError as e:
62+
if e.response["Error"]["Code"] != "404":
63+
raise
5064

51-
print(f"[{i_total}/{n_total}] 'data/{storage}/{path}/{entry}'")
65+
print(f"[{i_total}/{n_total}] '{key}'")
5266

5367
with open(os.path.join(level, entry), "rb") as filp:
5468
s3c.upload_fileobj(
5569
filp,
5670
"rpmrepo-storage",
57-
f"data/{storage}/{path}/{entry}",
71+
key,
5872
)
5973

6074
def push_snapshot_s3(self, snapshot_id, snapshot_suffix):

0 commit comments

Comments
 (0)