Skip to content

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Sep 9, 2025

Create a shallow copy (more efficient) of an object when requested.

This operation is limited to objects within the same repository and branch, subject to a configurable grace time.
When copying an object, the code will verify that the object’s creation time and the storage’s last update time fall within the grace time limit. If they don't, we use the standard copy behavior.

Closes #9499

@nopcoder nopcoder requested a review from guy-har September 9, 2025 12:23
@nopcoder nopcoder self-assigned this Sep 9, 2025
@nopcoder nopcoder added area/API Improvements or additions to the API include-changelog PR description should be included in next release changelog labels Sep 9, 2025
Copy link

github-actions bot commented Sep 9, 2025

📚 Documentation preview at https://pr-9500.docs-lakefs-preview.io/

(Updated: 9/18/2025, 12:21:17 PM - Commit: 1594d8e)

Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
IMO the documentation should be clear what this does and clear that there is some risk that the caller should consider

@nopcoder nopcoder changed the title API: Add support to copy object logical mode API: copy object will clone object metadata when possible Sep 18, 2025
@nopcoder
Copy link
Contributor Author

@guy-har @itaiad200 re-implement the solution to enable physical copy as fallback to prefered clone.
limited clone just to same repository, branch and no custom user-metadata.

@nopcoder nopcoder marked this pull request as draft September 18, 2025 15:32
@nopcoder nopcoder marked this pull request as ready for review September 25, 2025 06:20
@nopcoder
Copy link
Contributor Author

@guy-har @itaiad200 ready to review; had to fix some tests and added support to update metadata.

Comment on lines -2805 to -2806
} else {
dstEntry.Metadata = srcEntry.Metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't like this code.
already copied the src entry into dst entry.
update dst metadata only if replaceSrcMetadata was true.

if err != nil {
return nil, err
}
if !props.LastModified.IsZero() && time.Since(props.LastModified) > c.CloneGracePeriod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will props.LastModified.IsZero()==true?
Shouldn't we fail for that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that no last modified property means that the underlying storage has no support, meaning that we can't use gc on them - like in memory block adapter.

assert copy_stat.mtime >= obj_stat.mtime
assert copy_stat.size_bytes == obj_stat.size_bytes
assert copy_stat.checksum == obj_stat.checksum
# do not check physical_address, as it can be a clone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like can be a clone. The test should either always clone or always copy and we should assert on that

verifyResponseOK(t, copyResp, err)

// Verify the creation path, date and physical address are different
// Verify the creation path, date and physical address are the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date should be different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date can be the same or greater like copy (no change)

})

t.Run("committed", func(t *testing.T) {
t.Run("committed_clone", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't clone just for when the source is in staging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - we said that even if we test, there can be a commit in parallel so you may clone a staged object. as long as the grace period is ok; it should be the same.

Comment on lines +119 to +122
if errors.Is(err, block.ErrDataNotFound) || errors.Is(err, graveler.ErrNotFound) {
_ = o.EncodeError(w, req, err, gatewayErrors.Codes.ToAPIErr(gatewayErrors.ErrNoSuchKey))
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of the PR? Not sure how this is related

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we clone, we stat the source object - head request returns different error than copy api. we need to align that when the block adapter fail to find the object we return the right error in the gateway.

@nopcoder nopcoder requested a review from itaiad200 September 26, 2025 08:39
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Still wondering about the branch limitation though

if srcRepo.Name != destRepository {
return nil, fmt.Errorf("%w: not on the same repository", graveler.ErrCannotClone)
}
if srcRef != destBranch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still stands

Copy link
Contributor Author

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to support clone between branches

if srcRepo.Name != destRepository {
return nil, fmt.Errorf("%w: not on the same repository", graveler.ErrCannotClone)
}
if srcRef != destBranch {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I missed this one! will remove this restriction it really doesn't matter for gc/lakefs.

@nopcoder nopcoder merged commit 3439971 into master Sep 27, 2025
41 checks passed
@nopcoder nopcoder deleted the task/catalog-clone branch September 27, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/API Improvements or additions to the API include-changelog PR description should be included in next release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Add support to copy object logical mode

3 participants