feat(engine, replica): v2 volume expand #8022#386
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #386 +/- ##
========================================
- Coverage 0.77% 0.67% -0.10%
========================================
Files 24 24
Lines 9866 11273 +1407
========================================
Hits 76 76
- Misses 9783 11190 +1407
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // partial success | ||
| // for the fail one, we set it to ERR | ||
| aggregatedErr := map[string]string{} | ||
| for replicaName, err := range failedReplica { | ||
| e.ReplicaStatusMap[replicaName].Mode = types.ModeERR | ||
| aggregatedErr[replicaName] = err.Error() | ||
| } | ||
|
|
||
| e.log.WithFields(logrus.Fields{ | ||
| "engineName": e.Name, | ||
| "volumeName": e.VolumeName, | ||
| "failedReplicas": aggregatedErr, | ||
| }).Error("Some replicas failed to expand and have been marked as ERR") |
There was a problem hiding this comment.
Good idea to mark the failed to expand replica as failed.
Quick question, is this possible to revert the replica expansion?
There was a problem hiding this comment.
Yes, go-spdk-helper bdev-lvol resize support to shrink size.
But it will be dangerous, I think it will directly remove the blobs without any data protection.
During the expansion, as we suspend the I/O, I think it will be ok, but I don't think we have to revert the size; instead only contain the successful ones.
If it is needed for other rpc call, maybe need a snapshot before doing it and it will become more complicated for revert.
There was a problem hiding this comment.
it will be dangerous
Why is the operation dangerous?
There was a problem hiding this comment.
The dangerous means if we allow shrink in some cases, I'm afraid the data will be erased and the fs is broken.
If we can make sure the data is not used, it might be ok.
02a8b79 to
ac856a7
Compare
d5ab71e to
89173b4
Compare
|
Blocker: ublk expansion process
For ublk,
Interesting things, if I manually execute command in the same order Tried method :
To brief, my guess, Currently, because ublk is not the priority, I will stop here, and do more research in the future. |
89173b4 to
159083a
Compare
159083a to
e3152e4
Compare
|
This pull request is now in conflict. Could you fix it @davidcheng0922? 🙏 |
f0b8910 to
dcf99a0
Compare
|
|
||
| // Suspend IO if frontend is active | ||
| if e.Frontend == types.FrontendSPDKTCPBlockdev && e.Endpoint != "" { | ||
| if err := e.initiator.Suspend(false, false); err != nil { |
There was a problem hiding this comment.
Just want to double check. This function means dm device suspension, right?
There was a problem hiding this comment.
yeah, it actually calls
dmsetup suspend --noflush=false --nolockfs=false
0dbf28f to
9d8a90d
Compare
8970b27 to
5736962
Compare
98f0ae2 to
7ca6e43
Compare
|
|
||
| func (c *SPDKClient) ReplicaExpand(name string, size uint64) error { | ||
| if name == "" { | ||
| return fmt.Errorf("failed to delete SPDK replica: missing required parameter") |
There was a problem hiding this comment.
| return fmt.Errorf("failed to delete SPDK replica: missing required parameter") | |
| return fmt.Errorf("failed to expand replica: missing required parameter") |
7ca6e43 to
6b71171
Compare
6b71171 to
4206442
Compare
Signed-off-by: David Cheng <davidcheng0922@gmail.com>
- disconnect nvme target & stop expose bdev before deleting raid Signed-off-by: David Cheng <davidcheng0922@gmail.com>
Signed-off-by: David Cheng <davidcheng0922@gmail.com>
Signed-off-by: David Cheng <davidcheng0922@gmail.com>
Signed-off-by: David Cheng <davidcheng0922@gmail.com>
Signed-off-by: David Cheng <davidcheng0922@gmail.com>
- Replace backoff with retry.RetryOnConflict for update retry - Use handleNvmeTcpFrontend instead of reconnectNvmeTcpFrontend - Remove unnecessary check in replica expand - Refactor finishExpansion for clarity and separation of concerns Signed-off-by: David Cheng <davidcheng0922@gmail.com>
…and polish log Signed-off-by: David Cheng <davidcheng0922@gmail.com>
4206442 to
f0e90a9
Compare
|
@davidcheng0922 Thank you. You can update longhorn-instance-manager PR. |
Which issue(s) this PR fixes:
longhorn/longhorn#8022
What this PR does / why we need it:
support v2 volume expansion, only for nvme frontend
Special notes for your reviewer:
Additional documentation or context
v2-expansion.webm
v2-expansion-with-live-IO.webm