-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass down eTag as part of accessCondition to SDK #6025
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i can understand, this is just for flow where via fs.create()
, file is created and eventually closed. And, as part of this pr, we are not aiming for calling putblock with etag.
i have one optimization to save getmetadata call for rename dst etag.
if (eTag != null) { | ||
AccessCondition accessCondition = new AccessCondition(); | ||
accessCondition.setIfMatch(eTag); | ||
openOutputStream(blob, accessCondition).close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get etag here and pass that till the point where we want to rename. No need of metadata call is required.
blob.getBlob().getProperties().getEtag();
will help. why?
blob
here is CloudBlockBlobWrapper. blob.getBlob()
is CloudBlockBlob.
Now, in CloudBlockBlob.commitBlockListImpl.preProcessResponse
, etag and LMT will be updated in the CloudBlockBlob obj.
@@ -1242,7 +1242,10 @@ public void setEncodedKey(String anEncodedKey) { | |||
* when the stream is closed. | |||
*/ | |||
private void restoreKey() throws IOException { | |||
store.rename(getEncodedKey(), getKey()); | |||
String key = getKey(); | |||
FileMetadata existingMetadata = store.retrieveMetadata(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not call this call as discussed in the other comment around how to get etag.
+
By the emptySrc file was created, till this moment some other process would have made change to that path and hence etag would change. so, now if we read etag here, we will get the wrong etag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the latest eTag on the path, not the initial eTag. Hence we need to fetch the latest eTag on that path. Simply passing the initial eTag won't help here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the eTag setting in the earlier iteration by passing down the initially fetched eTag and it results in 412 error. Hence an additional HEAD call would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from the code, there shouldn't be any write on the original file and on the close of outputstream on the tmpFile, it rename it to original file.
Is there any flow which would be writing on the original file before the rename.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Can we also handle deletes as well? They exhibit the same problem as uploads. |
Hi Alkis, can you please confirm is this issue is still been seen ? |
Yes this has been observed in the fleet. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new overloaded methods in which etag is introduced as a parameter. Lets, specify if they are srcEtag or dstEtag. Reason being, from reading that particular method, it becomes confusing and would need reference check.
Also, not all flows of rename flows in the etag. Copy / delete happens there as well. Is etag checks not required there?
Also, as a prevention for bad-code add in future, can unit tests be added.
Thanks.
|
||
@Override | ||
public void rename(String srcKey, String dstKey, String eTag) throws IOException { | ||
rename(srcKey, dstKey, false, null, true, eTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets define what etag is this. is this etag on src or dst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -435,6 +456,24 @@ public void startCopyFromBlob(CloudBlobWrapper sourceBlob, BlobRequestOptions op | |||
null, dstAccessCondition, options, opContext); | |||
} | |||
|
|||
@Override | |||
public void startCopyFromBlob(CloudBlobWrapper sourceBlob, BlobRequestOptions options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets define what etag is this. Rename of as dstEtag would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
overwriteDestination | ||
? null | ||
: AccessCondition.generateIfNotExistsCondition(); | ||
if (dstAccessCondition != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better we check if etag == null. Agree sdk checks that before populating header. But SDK can get change in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked in the calling flow already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked in which flow? etag is a field in the method, and can be null. I am suggesting we do not call SDK's setifMatch() if etag with us is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
? null | ||
: AccessCondition.generateIfNotExistsCondition(); | ||
if (dstAccessCondition != null) { | ||
dstAccessCondition.setIfMatch(eTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the object will have both if-none-match:* and if-match:ETAG.
We should not populate when overwrite condition is on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If-None-Match * is for the PutBlockList temp file flow and If-Match eTag is for the copy blob flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that overwrite will be false in copy flow, and overwrite be true in putblocklist temp file flow?
This is a public method exposed and should be agnostic to what calls it. Hence, better if we can prevent both conditions coming into accessCondition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as an if condition
throws StorageException { | ||
AccessCondition accessCondition1 = getLeaseCondition(lease); | ||
if (accessCondition1 != null && eTag != null) { | ||
accessCondition1.setIfMatch(eTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if lease condition is there, should etag check be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that should not make a difference right, but can be discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason why i feel if we have a leaseId, it wont lead to consistency issues:
- Once a blob is leased by a process, only that process can make change to that blob, since it has the leaseId.
- If we see that we had to retry although the first call would be taken out of order and before it would be taken, some other process has created a blob on that path, now if server takes the first call to process, it should fail with: "There is currently no lease on the blob.", 412, DELETE; or if we say that other process create blob with lease, it will fail with: "The lease ID specified did not match the lease ID for the blob.", 412, DELETE.
public void startCopyFromBlob(CloudBlobWrapper sourceBlob, BlobRequestOptions options, | ||
OperationContext opContext, boolean overwriteDestination, String eTag) | ||
throws StorageException, URISyntaxException { | ||
if (!overwriteDestination && backingStore.exists(convertUriToDecodedString(uri))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need an overriden impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and made it a single method
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add test for the flow changed. Better having automated test than manually testing the flow. What you say?
one change needed; other thing look fine
overwriteDestination | ||
? null | ||
: AccessCondition.generateIfNotExistsCondition(); | ||
if (!overwriteDestination && destEtag != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be (overwriteDestination && destEtag != null)
reason being, if !overwriteDestination, we have an accesssCondition with if-none-match:*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, corrected it
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
if (eTagCheck && eTag != null) { | ||
AccessCondition accessCondition = new AccessCondition(); | ||
accessCondition.setIfMatch(eTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic is incorrect. This method is creating an empty link file but we are creating the empty file conditionally if it matches the eTag
of the original file.
The logic for create
should be the following (pseudocode):
eTag = null
if (overwrite) {
eTag = readETag(dst)
storeEmptyLinkFile(tmp)
...
commitTmpFileTo(tmp, dst, eTag) // should do IfMatch(eTag)
} else {
storeEmptyLinkFile(tmp)
...
commitTmpFileTo(tmp, dst, null) // should do IfNoneMatch("*")
}
and in all cases, inside storeEmptyLinkFile
we should use IfNoneMatch("*")
because we know we do not want to overwrite the tmp file and we guarantee that it is unique by putting a UUID in its name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storeEmptyLinkFile call is made for the original file and not the temp file. Hence if it is case of file overwrite it will have if-match eTag condition else if-none-match *. The addition of if-none-match * condition for temp file is done in storeFile method in AzureNativeFileSystemStore class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 1895 the blob being referenced is for the original key
This is a draft change for how we can include the ETag as part of the access condition when creating a file. The storeEmptyLinkFile method receives the ETag as a parameter, which we have obtained through the store.retrieveMetadata call, and added it as a parameter in the FileMetadata object.
The storeEmptyLinkFile method is present in file named AzureNativeFileSystemStore.java
Subsequently, the storeEmptyLinkFile method passes down this ETag by setting it within the accessCondition when invoking the SDK method openOutputStream.