Skip to content

refactor: v2 linked clone #500

Merged
derekbit merged 14 commits into
longhorn:masterfrom
shuo-wu:refactor/linked-clone-entrypoint
May 19, 2026
Merged

refactor: v2 linked clone #500
derekbit merged 14 commits into
longhorn:masterfrom
shuo-wu:refactor/linked-clone-entrypoint

Conversation

@shuo-wu
Copy link
Copy Markdown
Contributor

@shuo-wu shuo-wu commented Apr 30, 2026

Which issue(s) this PR fixes:

Issue longhorn/longhorn#12552

What this PR does / why we need it:

Special notes for your reviewer:

We introduced a special kind of empty snapshot lvol called entrypoint lvol for src replicas. Then, all clone replicas use an entrypoint lvol as their ActiveChain[0], which is like a backing image. After that, clone replicas should be both readable and writable like normal replicas.

Additional documentation or context

Notice Claude Opus 4.6 generates the codes based on the following prompt/plan. And the plan is mainly for simple reference only. Because there are several rounds of improvement after that.

First, refactor the existing linked-clone feature for repo `longhorn-spdk-engine` based on the descriptions below:
- The source replica:
  1. will prepare an entrypoint lvol (which is actually an empty snapshot lvol) from the requested snapshot
  2. record all entrypoints and all clone replicas pointing to it. Even for a src replica detected from an existing disk after restart, it should recover these info as well.
  3. avoid or ignore all entrypoints when checking or constructing its snapshot map 
  4. automatically clean up the entrypoint when there is no clone replica pointing to it
  5. forbid the replica deletion when cleanup is true, while there are still clone replicas pointing to it
  6. forbid the snapshot deletion when there is still a valid entrypoint pointing to it
  7. sync func should verify (and probably repair) all entrypoint lvols for the src replica, and linked-clone source information for clone replicas. 
- The clone replica itself:
  1. grows a new head lvol from the entrypoint lvol.
  2. record its entrypoint and the source replica. Even for the replica detected from an existing disk after restart, it should recover these info as well.
  3. sync func should verify linked-clone source information for clone replicas. And for case that the clone replica's root parent somehow becomes `the source snapshot directly`, the sync func should repair it.

@shuo-wu shuo-wu requested a review from a team April 30, 2026 03:31
@shuo-wu shuo-wu self-assigned this Apr 30, 2026
@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch 3 times, most recently from ca3e8b0 to 4376b2c Compare May 1, 2026 01:06
@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch 2 times, most recently from 952c40d to 2a245b6 Compare May 14, 2026 22:48
@mergify
Copy link
Copy Markdown

mergify Bot commented May 14, 2026

This pull request is now in conflict. Could you fix it @shuo-wu? 🙏

@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch 8 times, most recently from 9a934db to e40549c Compare May 14, 2026 23:27
@derekbit
Copy link
Copy Markdown
Member

cc @davidcheng0922 @COLDTURNIP for review

Comment thread pkg/spdk/replica.go
Comment thread pkg/spdk/replica.go
Comment thread pkg/spdk/engine.go
Comment thread pkg/spdk/replica.go
Comment on lines +545 to +552
func (r *Replica) recoverCloneReplicaInfo() {
r.isCloneReplica = false
r.cloneSourceReplicaName = ""
r.cloneEntrypointLvolName = ""

if len(r.ActiveChain) == 0 || r.ActiveChain[0] == nil || !IsCloneEntrypointLvol(r.ActiveChain[0].Name) {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if my understanding the restart-repair flow correctly:

  1. syncCloneReplicaInfo() wants to repair the case where the clone root parent points to the source snapshot directly instead of the entrypoint.

  2. After restart, constructActiveChainFromSnapshotLvolMap() leaves the chain base nil if the entrypoint lvol is already missing.

  3. recoverCloneReplicaInfo() only recognizes a linked-clone replica when ActiveChain[0] is still the entrypoint.

  4. That means isCloneReplica is no longer recovered in this case, and
    syncCloneReplicaInfo() returns early before the repair logic runs.

If the entrypoint is already gone at restart time, will this state still be recoverable, or will the clone metadata be lost permanently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Do you mean the case below?

  1. A clone replica somehow lost its ep lvol, and its root snap lvol was directly pointing to the src replica snap lvol
  2. A restart happened, and the spdk server tried to reconstruct the replica cache
  3. The new replica cache executed construct first. And in construct(), recoverCloneReplicaInfo considered this replica as non-clone replica, because the parant of the root snap is src replica snap rather than ep lvol then r.ActiveChain[0] is nil
  4. The new replica cache then executed syncCloneReplicaInfo, which returned in the beginning due to the check if !r.isCloneReplica {, no chance to reach the repair logic at all!

This case is a little bit complicated, and your concern is fair.
I am not sure whether it's really worth repairing the clone replica for these cases. Do you think it's better to blindly mark all kinds of unexpected dst replicas as ERR?

cc @derekbit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, maybe it is too little chance to happen, maybe not worth to fix now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create an improvement ticket for future improvement.

Copy link
Copy Markdown
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: should we rename linked-clone to fast-clone?

Comment thread pkg/spdk/engine.go
Comment thread pkg/spdk/engine.go Outdated
Comment thread pkg/spdk/engine.go
Comment thread pkg/spdk/engine.go
Comment thread pkg/spdk/engine.go Outdated
Comment thread pkg/spdk/replica.go
@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch from e40549c to 903ee43 Compare May 18, 2026 03:53
Copy link
Copy Markdown
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The PR longhorn/types is not submitted.

shuo-wu added 3 commits May 18, 2026 14:33
Introduce naming conventions for clone entrypoint lvols. Entrypoints
sit between a source replica's snapshot and clone replica heads.

Add constants ReplicaCloneEntrypointLvolInfix and
ReplicaCloneEntrypointTmpHeadSuffix, plus helper functions for
constructing, parsing, and identifying entrypoint lvol names.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add CloneEntrypointInfo struct to track entrypoint lvol metadata and
its associated clone replicas. Extend the Replica struct with fields
for both source replicas (cloneEntrypointMap) and clone replicas
(isCloneReplica, cloneSourceReplicaName, cloneEntrypointLvolName).

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Filter out clone entrypoint lvols and their tmp heads from the snapshot
lvol map construction so they do not interfere with the replica's
snapshot chain. Update constructActiveChainFromSnapshotLvolMap comments
to reflect that clone entrypoints are handled externally.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
shuo-wu and others added 7 commits May 18, 2026 14:33
…struct

Add recoverCloneEntrypointInfo to scan the bdev lvol map for entrypoint
lvols belonging to this replica, populating cloneEntrypointMap from
SPDK state. Resolve clone replica names via
GetCloneReplicaNameFromEntrypointChildLvol.

Add recoverCloneReplicaInfo to detect linked clones by checking
whether the chain root has a clone entrypoint as parent. Both are
called during construct() for recovery after restart.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Rewrite snapshotLinkedCloneSrcStart to create clone entrypoint lvols
between the source snapshot and clone heads, enabling multiple linked
clones per snapshot with clean lifecycle management.

Add createCloneEntrypoint (3-step: clone, snapshot, delete leftover).
Record clone source info in SnapshotCloneDstStart for linked clones.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Forbid replica deletion (with cleanup) while any clone entrypoint
still has active clone replicas pointing to it. The guard runs early
in Delete() before any SPDK cleanup to avoid side effects on a
rejected delete.

Forbid snapshot deletion while a valid clone entrypoint with active
clone replicas is attached to that snapshot.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add syncCloneEntrypoints called from Sync() to refresh entrypoint
state from SPDK, auto-cleanup entrypoints with no children, discover
new entrypoints, and remove orphaned tmp-head lvols.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Add syncCloneReplicaInfo to verify the clone replica's chain root
parent matches the expected entrypoint during Sync. Handles three
cases: parent matches (no-op), entrypoint removed and parent points
to the source snapshot (recreate), or lineage corrupted (mark error).

repairCloneEntrypoint delegates to the source replica via gRPC
(CloneSrcStart/SrcFinish) so the source registers the entrypoint in
its cloneEntrypointMap.

Longhorn 12552

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…stry handling

The clone entrypoint serves the same role as a backing image: a
read-only chain base. Place it at ActiveChain[0] to unify chain
semantics.

- Include entrypoint lvols in replicaLvolFilter
- Extend constructActiveChainFromSnapshotLvolMap to load entrypoints
- Guard BackingImage assignment for actual backing images only
- Lock ActiveChain[0] directly in linkHeadWithParent
- Update repairCloneEntrypoint to set ActiveChain[0] after reparenting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
ActiveChain[0] (backing image or clone entrypoint) shared its Children
map across replicas, causing cross-replica mutation when
linkHeadWithParent or removeLvolFromSnapshotLvolMapWithoutLock modified
the map.

- Replace additive Children insertion with full map replacement in
  constructActiveChainFromSnapshotLvolMap and repairCloneEntrypoint,
  keeping only the current replica's root lvol
- Copy BackingImage.Snapshot per-replica in prepareHead with a fresh
  Children map instead of sharing the pointer

Longhorn 12552

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch 4 times, most recently from 58ea611 to 2041fd2 Compare May 18, 2026 22:15
@derekbit
Copy link
Copy Markdown
Member

Hello @shuo-wu
longhorn/types PR is merged.

shuo-wu and others added 4 commits May 18, 2026 17:47
Add clone metadata fields to the Replica protobuf message and populate
them in ServiceReplicaToProtoReplica.

New proto fields:
- is_clone_replica (bool)
- clone_source_replica_name (string)
- clone_entrypoint_lvol_name (string)
- clone_entrypoint_map (map<string, int32>)

Longhorn 12552

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
Updates generated gRPC stubs (spdkrpc, imrpc) to include the new
DstReplicaSrcReplicaPairMap field in EngineSnapshotCloneRequest.
Updates go.mod/go.sum to reference the new types pseudo-version.

Longhorn 12552

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…rcReplicaPairMap

Extend SnapshotClone to clone all N dst replicas simultaneously for
linked-clone volumes, instead of requiring exactly 1 replica and relying
on N-1 DST rebuilds afterward.

Key design points:

- Add isCloning flag: set for the duration of SnapshotClone (linked-clone
  mode only). Guards ReplicaAdd, snapshot operations, backup, and
  expansion from running concurrently with the clone. ValidateAndUpdate
  is also skipped while cloning to avoid false divergence errors.

- snapshotCloneLinkedN: dispatches one goroutine per dst replica under
  the engine lock. Each goroutine calls SnapshotCloneDst (BdevLvolSetParent
  — a near-instantaneous metadata op) on its paired src replica. Results
  are aggregated; the engine lock is held throughout via wg.Wait().

- DstReplicaSrcReplicaPairMap: the manager (which already ran the
  scheduler) passes an explicit dst→src replica name map so the engine
  does not need to auto-detect co-location. Dst replicas absent from the
  map are marked ModeERR so the manager can schedule a rebuild for them
  later; the clone continues for all paired replicas. The entire operation
  fails only if no dst replica can be paired at all.

- Backward compatibility: an empty DstReplicaSrcReplicaPairMap falls back
  to the original 1-replica deep-copy/linked-clone path so old managers
  continue to work.

Longhorn 12552

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
…cycle

Integration tests against a real SPDK target:
- Linked-clone volume read/write and independent snapshots
- Source replica/snapshot deletion blocked by active clone entrypoints
- Recovery of entrypoint and clone info after restart
- syncCloneEntrypoints auto-cleanup of childless entrypoints and
  orphaned tmp-head lvols (including correct SPDK state simulation
  via BdevLvolSnapshot to create a read-only ep + undeleted tmp-head)
- Repair of broken clone lineage via syncCloneReplicaInfo
- N-replica simultaneous linked-clone via DstReplicaSrcReplicaPairMap

Unit tests (no SPDK required):
- syncWithBdevLvolMap dispatch to correct sync path
- syncCloneReplicaInfo cases 1/2/3 covering entrypoint lookup logic
- Full Sync() execution paths (clone replica detection, error paths)

Also refactor Sync() to delegate to syncWithBdevLvolMap for testability;
spdkClient=nil skips validateAndUpdate so unit tests can exercise sync
logic without a live SPDK connection.

Longhorn 12552

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
@shuo-wu shuo-wu force-pushed the refactor/linked-clone-entrypoint branch from 2041fd2 to 4c87559 Compare May 19, 2026 00:48
@shuo-wu shuo-wu marked this pull request as ready for review May 19, 2026 01:19
@shuo-wu
Copy link
Copy Markdown
Contributor Author

shuo-wu commented May 19, 2026

I just added a summary in the longhorn-manager PR longhorn/longhorn-manager#4732. I think that is pretty helpful for the whole feature's review. If you are interested, PTAL!

@derekbit
Copy link
Copy Markdown
Member

@davidcheng0922 Do you have any concern? If no, it looks good to merge.

@davidcheng0922
Copy link
Copy Markdown
Contributor

@davidcheng0922 Do you have any concern? If no, it looks good to merge.

No more, thank you. It looks pretty solid

@derekbit derekbit merged commit 0325a68 into longhorn:master May 19, 2026
8 of 9 checks passed
@derekbit
Copy link
Copy Markdown
Member

@mergify backport v1.12.x

@mergify
Copy link
Copy Markdown

mergify Bot commented May 19, 2026

backport v1.12.x

✅ Backports have been created

Details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants