refactor: v2 linked clone #500
Conversation
ca3e8b0 to
4376b2c
Compare
952c40d to
2a245b6
Compare
|
This pull request is now in conflict. Could you fix it @shuo-wu? 🙏 |
9a934db to
e40549c
Compare
|
cc @davidcheng0922 @COLDTURNIP for review |
| func (r *Replica) recoverCloneReplicaInfo() { | ||
| r.isCloneReplica = false | ||
| r.cloneSourceReplicaName = "" | ||
| r.cloneEntrypointLvolName = "" | ||
|
|
||
| if len(r.ActiveChain) == 0 || r.ActiveChain[0] == nil || !IsCloneEntrypointLvol(r.ActiveChain[0].Name) { | ||
| return | ||
| } |
There was a problem hiding this comment.
I'm not sure if my understanding the restart-repair flow correctly:
-
syncCloneReplicaInfo() wants to repair the case where the clone root parent points to the source snapshot directly instead of the entrypoint.
-
After restart, constructActiveChainFromSnapshotLvolMap() leaves the chain base nil if the entrypoint lvol is already missing.
-
recoverCloneReplicaInfo() only recognizes a linked-clone replica when ActiveChain[0] is still the entrypoint.
-
That means isCloneReplica is no longer recovered in this case, and
syncCloneReplicaInfo() returns early before the repair logic runs.
If the entrypoint is already gone at restart time, will this state still be recoverable, or will the clone metadata be lost permanently?
There was a problem hiding this comment.
I see. Do you mean the case below?
- A clone replica somehow lost its ep lvol, and its root snap lvol was directly pointing to the src replica snap lvol
- A restart happened, and the spdk server tried to reconstruct the replica cache
- The new replica cache executed
constructfirst. And inconstruct(),recoverCloneReplicaInfoconsidered this replica as non-clone replica, because the parant of the root snap is src replica snap rather than ep lvol thenr.ActiveChain[0]is nil - The new replica cache then executed
syncCloneReplicaInfo, which returned in the beginning due to the checkif !r.isCloneReplica {, no chance to reach the repair logic at all!
This case is a little bit complicated, and your concern is fair.
I am not sure whether it's really worth repairing the clone replica for these cases. Do you think it's better to blindly mark all kinds of unexpected dst replicas as ERR?
cc @derekbit
There was a problem hiding this comment.
Fair, maybe it is too little chance to happen, maybe not worth to fix now
There was a problem hiding this comment.
We can create an improvement ticket for future improvement.
derekbit
left a comment
There was a problem hiding this comment.
One question: should we rename linked-clone to fast-clone?
e40549c to
903ee43
Compare
derekbit
left a comment
There was a problem hiding this comment.
LGTM.
The PR longhorn/types is not submitted.
Introduce naming conventions for clone entrypoint lvols. Entrypoints sit between a source replica's snapshot and clone replica heads. Add constants ReplicaCloneEntrypointLvolInfix and ReplicaCloneEntrypointTmpHeadSuffix, plus helper functions for constructing, parsing, and identifying entrypoint lvol names. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add CloneEntrypointInfo struct to track entrypoint lvol metadata and its associated clone replicas. Extend the Replica struct with fields for both source replicas (cloneEntrypointMap) and clone replicas (isCloneReplica, cloneSourceReplicaName, cloneEntrypointLvolName). Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Filter out clone entrypoint lvols and their tmp heads from the snapshot lvol map construction so they do not interfere with the replica's snapshot chain. Update constructActiveChainFromSnapshotLvolMap comments to reflect that clone entrypoints are handled externally. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…struct Add recoverCloneEntrypointInfo to scan the bdev lvol map for entrypoint lvols belonging to this replica, populating cloneEntrypointMap from SPDK state. Resolve clone replica names via GetCloneReplicaNameFromEntrypointChildLvol. Add recoverCloneReplicaInfo to detect linked clones by checking whether the chain root has a clone entrypoint as parent. Both are called during construct() for recovery after restart. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Rewrite snapshotLinkedCloneSrcStart to create clone entrypoint lvols between the source snapshot and clone heads, enabling multiple linked clones per snapshot with clean lifecycle management. Add createCloneEntrypoint (3-step: clone, snapshot, delete leftover). Record clone source info in SnapshotCloneDstStart for linked clones. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Forbid replica deletion (with cleanup) while any clone entrypoint still has active clone replicas pointing to it. The guard runs early in Delete() before any SPDK cleanup to avoid side effects on a rejected delete. Forbid snapshot deletion while a valid clone entrypoint with active clone replicas is attached to that snapshot. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add syncCloneEntrypoints called from Sync() to refresh entrypoint state from SPDK, auto-cleanup entrypoints with no children, discover new entrypoints, and remove orphaned tmp-head lvols. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add syncCloneReplicaInfo to verify the clone replica's chain root parent matches the expected entrypoint during Sync. Handles three cases: parent matches (no-op), entrypoint removed and parent points to the source snapshot (recreate), or lineage corrupted (mark error). repairCloneEntrypoint delegates to the source replica via gRPC (CloneSrcStart/SrcFinish) so the source registers the entrypoint in its cloneEntrypointMap. Longhorn 12552 Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…stry handling The clone entrypoint serves the same role as a backing image: a read-only chain base. Place it at ActiveChain[0] to unify chain semantics. - Include entrypoint lvols in replicaLvolFilter - Extend constructActiveChainFromSnapshotLvolMap to load entrypoints - Guard BackingImage assignment for actual backing images only - Lock ActiveChain[0] directly in linkHeadWithParent - Update repairCloneEntrypoint to set ActiveChain[0] after reparenting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
ActiveChain[0] (backing image or clone entrypoint) shared its Children map across replicas, causing cross-replica mutation when linkHeadWithParent or removeLvolFromSnapshotLvolMapWithoutLock modified the map. - Replace additive Children insertion with full map replacement in constructActiveChainFromSnapshotLvolMap and repairCloneEntrypoint, keeping only the current replica's root lvol - Copy BackingImage.Snapshot per-replica in prepareHead with a fresh Children map instead of sharing the pointer Longhorn 12552 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
58ea611 to
2041fd2
Compare
|
Hello @shuo-wu |
Add clone metadata fields to the Replica protobuf message and populate them in ServiceReplicaToProtoReplica. New proto fields: - is_clone_replica (bool) - clone_source_replica_name (string) - clone_entrypoint_lvol_name (string) - clone_entrypoint_map (map<string, int32>) Longhorn 12552 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Updates generated gRPC stubs (spdkrpc, imrpc) to include the new DstReplicaSrcReplicaPairMap field in EngineSnapshotCloneRequest. Updates go.mod/go.sum to reference the new types pseudo-version. Longhorn 12552 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…rcReplicaPairMap Extend SnapshotClone to clone all N dst replicas simultaneously for linked-clone volumes, instead of requiring exactly 1 replica and relying on N-1 DST rebuilds afterward. Key design points: - Add isCloning flag: set for the duration of SnapshotClone (linked-clone mode only). Guards ReplicaAdd, snapshot operations, backup, and expansion from running concurrently with the clone. ValidateAndUpdate is also skipped while cloning to avoid false divergence errors. - snapshotCloneLinkedN: dispatches one goroutine per dst replica under the engine lock. Each goroutine calls SnapshotCloneDst (BdevLvolSetParent — a near-instantaneous metadata op) on its paired src replica. Results are aggregated; the engine lock is held throughout via wg.Wait(). - DstReplicaSrcReplicaPairMap: the manager (which already ran the scheduler) passes an explicit dst→src replica name map so the engine does not need to auto-detect co-location. Dst replicas absent from the map are marked ModeERR so the manager can schedule a rebuild for them later; the clone continues for all paired replicas. The entire operation fails only if no dst replica can be paired at all. - Backward compatibility: an empty DstReplicaSrcReplicaPairMap falls back to the original 1-replica deep-copy/linked-clone path so old managers continue to work. Longhorn 12552 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…cycle Integration tests against a real SPDK target: - Linked-clone volume read/write and independent snapshots - Source replica/snapshot deletion blocked by active clone entrypoints - Recovery of entrypoint and clone info after restart - syncCloneEntrypoints auto-cleanup of childless entrypoints and orphaned tmp-head lvols (including correct SPDK state simulation via BdevLvolSnapshot to create a read-only ep + undeleted tmp-head) - Repair of broken clone lineage via syncCloneReplicaInfo - N-replica simultaneous linked-clone via DstReplicaSrcReplicaPairMap Unit tests (no SPDK required): - syncWithBdevLvolMap dispatch to correct sync path - syncCloneReplicaInfo cases 1/2/3 covering entrypoint lookup logic - Full Sync() execution paths (clone replica detection, error paths) Also refactor Sync() to delegate to syncWithBdevLvolMap for testability; spdkClient=nil skips validateAndUpdate so unit tests can exercise sync logic without a live SPDK connection. Longhorn 12552 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Shuo Wu <shuo.wu@suse.com>
2041fd2 to
4c87559
Compare
|
I just added a summary in the longhorn-manager PR longhorn/longhorn-manager#4732. I think that is pretty helpful for the whole feature's review. If you are interested, PTAL! |
|
@davidcheng0922 Do you have any concern? If no, it looks good to merge. |
No more, thank you. It looks pretty solid |
|
@mergify backport v1.12.x |
✅ Backports have been createdDetails
|
Which issue(s) this PR fixes:
Issue longhorn/longhorn#12552
What this PR does / why we need it:
Special notes for your reviewer:
We introduced a special kind of empty snapshot lvol called
entrypoint lvolfor src replicas. Then, all clone replicas use an entrypoint lvol as theirActiveChain[0], which is like a backing image. After that, clone replicas should be both readable and writable like normal replicas.Additional documentation or context
Notice Claude Opus 4.6 generates the codes based on the following prompt/plan. And the plan is mainly for simple reference only. Because there are several rounds of improvement after that.