-
Notifications
You must be signed in to change notification settings - Fork 402
API: copy object will clone object metadata when possible #9500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📚 Documentation preview at https://pr-9500.docs-lakefs-preview.io/ (Updated: 9/18/2025, 12:21:17 PM - Commit: 1594d8e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
IMO the documentation should be clear what this does and clear that there is some risk that the caller should consider
Make a shallow copy of an object, copy metadata only. Limited to the same repository and branch. Experimental.
Co-authored-by: guy-har <[email protected]>
Co-authored-by: guy-har <[email protected]>
This reverts commit 5184f17.
This reverts commit 275dc9b.
This reverts commit 42c69c3.
1594d8e
to
facd253
Compare
@guy-har @itaiad200 re-implement the solution to enable physical copy as fallback to prefered clone. |
adding the underlying object time check in case of copy of copy.
we now check grace time using stat, so we get a different error from s3 when object not found
@guy-har @itaiad200 ready to review; had to fix some tests and added support to update metadata. |
} else { | ||
dstEntry.Metadata = srcEntry.Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't like this code.
already copied the src entry into dst entry.
update dst metadata only if replaceSrcMetadata was true.
pkg/catalog/catalog.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
if !props.LastModified.IsZero() && time.Since(props.LastModified) > c.CloneGracePeriod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will props.LastModified.IsZero()==true
?
Shouldn't we fail for that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption was that no last modified property means that the underlying storage has no support, meaning that we can't use gc on them - like in memory block adapter.
assert copy_stat.mtime >= obj_stat.mtime | ||
assert copy_stat.size_bytes == obj_stat.size_bytes | ||
assert copy_stat.checksum == obj_stat.checksum | ||
# do not check physical_address, as it can be a clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like can be a clone
. The test should either always clone or always copy and we should assert on that
verifyResponseOK(t, copyResp, err) | ||
|
||
// Verify the creation path, date and physical address are different | ||
// Verify the creation path, date and physical address are the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date should be different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date can be the same or greater like copy (no change)
pkg/api/controller_test.go
Outdated
}) | ||
|
||
t.Run("committed", func(t *testing.T) { | ||
t.Run("committed_clone", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't clone just for when the source is in staging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - we said that even if we test, there can be a commit in parallel so you may clone a staged object. as long as the grace period is ok; it should be the same.
if errors.Is(err, block.ErrDataNotFound) || errors.Is(err, graveler.ErrNotFound) { | ||
_ = o.EncodeError(w, req, err, gatewayErrors.Codes.ToAPIErr(gatewayErrors.ErrNoSuchKey)) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of the PR? Not sure how this is related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we clone, we stat the source object - head request returns different error than copy api. we need to align that when the block adapter fail to find the object we return the right error in the gateway.
Co-authored-by: itaiad200 <[email protected]>
Co-authored-by: itaiad200 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Still wondering about the branch limitation though
pkg/catalog/catalog.go
Outdated
if srcRepo.Name != destRepository { | ||
return nil, fmt.Errorf("%w: not on the same repository", graveler.ErrCannotClone) | ||
} | ||
if srcRef != destBranch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment still stands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to support clone between branches
pkg/catalog/catalog.go
Outdated
if srcRepo.Name != destRepository { | ||
return nil, fmt.Errorf("%w: not on the same repository", graveler.ErrCannotClone) | ||
} | ||
if srcRef != destBranch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed this one! will remove this restriction it really doesn't matter for gc/lakefs.
Create a shallow copy (more efficient) of an object when requested.
This operation is limited to objects within the same repository and branch, subject to a configurable grace time.
When copying an object, the code will verify that the object’s creation time and the storage’s last update time fall within the grace time limit. If they don't, we use the standard copy behavior.
Closes #9499