You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/content/xapi/storage/sxm/index.md
+103-3
Original file line number
Diff line number
Diff line change
@@ -10,12 +10,16 @@ Title: Storage migration
10
10
-[Design](#design)
11
11
-[SMAPIv1 migration](#smapiv1-migration)
12
12
-[Preparation](#preparation)
13
-
-[Establish mirror](#establish-mirror)
13
+
-[Establishing mirror](#establishing-mirror)
14
14
-[Mirror](#mirror)
15
15
-[Snapshot](#snapshot)
16
16
-[Copy and compose](#copy-and-compose)
17
17
-[Finish](#finish)
18
18
-[SMAPIv3 migration](#smapiv3-migration)
19
+
-[Preparation](#preparation-1)
20
+
-[Establishing mirror](#establishing-mirror-1)
21
+
-[Limitations](#limitations)
22
+
-[Finish](#finish-1)
19
23
-[Error Handling](#error-handling)
20
24
-[Preparation (SMAPIv1 and SMAPIv3)](#preparation-smapiv1-and-smapiv3)
21
25
-[Snapshot and mirror failure (SMAPIv1)](#snapshot-and-mirror-failure-smapiv1)
@@ -251,7 +255,94 @@ be covered here.
251
255
252
256
## SMAPIv3 migration
253
257
254
-
More detail to come...
258
+
This section covers the mechanism of migrations *from* SRs using SMAPIv3 (to
259
+
SMAPIv1 or SMAPIv3). Although the core ideas are the same, SMAPIv3 has a rather
260
+
different mechanism for mirroring: 1. it does not require xapi to take snapshot
261
+
of the VDI anymore, since the mirror itself will take care of replicating the
262
+
existing data to the destination; 2. there is no fd passing for connection establishment anymore, and instead proxies are used for connection setup.
263
+
264
+
### Preparation
265
+
266
+
The preparation work for SMAPIv3 is greatly simplified by the fact that the mirror
267
+
at the storge layer will copy the existing data in the VDI to the destination.
268
+
This means that snapshot of the source VDI is not required anymore. So we are left
269
+
with only one thing:
270
+
271
+
1. Create a VDI used for mirroring the data of the source VDI
272
+
273
+
For this reason, the implementation logic for SMAPIv3 preparation is also shorter,
274
+
as the complexity is now handled by the storage layer, which is where it is supposed
275
+
to be handled.
276
+
277
+
### Establishing mirror
278
+
279
+
The other significant difference is that the storage backend for SMAPIv3 `qemu-dp`
280
+
SRs no longer accepts fds, so xapi needs to proxy the data between two nbd client
281
+
and nbd server.
282
+
283
+
SMAPIv3 provides the `Data.mirror uri domain remote` which needs three parameters:
284
+
`uri` for accessing the local disk, `doamin` for the domain slice on which mirroring
285
+
should happen, and most importantly for this design, a `remote` url which represents
286
+
the remote nbd server to which the blocks of data can be sent to.
287
+
288
+
This function itself, when called by xapi and forwarded to the storage layer's qemu-dp
289
+
nbd client, will initiate a nbd connection to the nbd server pointed to by `remote`.
290
+
This works fine when the storage migration happens entirely within a local host,
291
+
where qemu-dp's nbd client and nbd server can communicate over unix domain sockets.
292
+
However, it does not work for inter-host migrations as qemu-dp's nbd server is not
293
+
exposed publicly over the network (just as tapdisk's nbd server). Therefore a proxying
294
+
service on the source host is needed for forwarding the nbd connection from the
295
+
source host to the destination host. And it would be the responsiblity of
296
+
xapi to manage this proxy service.
297
+
298
+
The following diagram illustrates the mirroring process of a single VDI:
299
+
300
+

301
+
302
+
The first step for xapi is then to set up a nbd proxy thread that will be listening
303
+
on a local unix domain socket with path `/var/run/nbdproxy/export/<domain>` where
304
+
domain is the `domain` parameter mentioned above in `Data.mirror`. The nbd proxy
305
+
thread will accept nbd connections (or rather any connections, it does not
306
+
speak/care about nbd protocol at all) and sends an http put request
307
+
to the remote xapi. The proxy itself will then forward the data exactly as it is
308
+
to the remote side through the http connection.
309
+
310
+
Once the proxy is set up, xapi will call `Data.mirror`, which
311
+
will be forwarded to the xapi-storage-script and is further forwarded to the qemu-dp.
312
+
This call contains, among other parameters, the destination NBD server url (`remote`)
313
+
to be connected. In this case the destination nbd server is exactly the domain
314
+
socket to which the proxy thread is listening. Therefore the `remote` parameter
315
+
will be of the form `nbd+unix:///<export>?socket=<socket>` where the export is provided
316
+
by the destination nbd server that represents the VDI prepared on the destination
317
+
host, and the socket will be the path of the unix domain socket where the proxy
318
+
thread (which we just created) is listening at.
319
+
320
+
When this connection is set up, the proxy process will talk to the remote xapi via
321
+
http requests, and on the remote side, an http handler will proxy this request to
322
+
the appropriate nbd server of either tapdisk or qemu-dp, using exactly the same
323
+
[import proxy](#copy-and-compose) as mentioned before.
324
+
325
+
Note that this proxying service is tightly integrated with outbound SXM of SMAPIv3
326
+
SRs. This is to make it simple to focus on the migration itself.
327
+
328
+
Although there is no need to explicitly copy the VDI anymore, we still need to
329
+
transfer the data and wait for it finish. For this we use `Data.stat` call provided
330
+
by the storage backend to query the status of the mirror, and wait for it to finish
331
+
as needed.
332
+
333
+
#### Limitations
334
+
335
+
This way of establishing the connection simplifies the implementation of the migration
336
+
for SMAPIv3, but it also has limitations:
337
+
338
+
One proxy per live VDI migration is needed, which can potentially consume lots of resources in dom0, and we should measure the impact of this before we switch to using more resource-efficient ways such as wire guard that allows establishing a single connection between multiple hosts.
339
+
340
+
341
+
### Finish
342
+
343
+
As there is no need to copy a VDI, there is also no need to compose or delete the
344
+
snapshot. The cleanup procedure would therefore just involve destroy the datapath
345
+
that was used for receiving writes for the mirrored VDI.
255
346
256
347
## Error Handling
257
348
@@ -314,7 +405,16 @@ are migrating from.
314
405
315
406
### Mirror failure (SMAPIv3)
316
407
317
-
To be filled...
408
+
The `Data.stat` call in SMAPIv3 returns a data structure that includes the current
409
+
progress of the mirror job, whether it has completed syncing the existing data and
410
+
whether the mirorr has failed. Similar to how it is done in SMAPIv1, we wait for
411
+
the sync to complete once we issue the `Data.mirror` call, by repeatedly polling
412
+
the status of the mirror using the `Data.stat` call. During this process, the status
413
+
of the mirror is also checked and if a failure is detected, a `Migration_mirror_failure`
414
+
will be raised and then gets handled by the code in `storage_migrate.ml` by calling
415
+
`Storage_smapiv3_migrate.receive_cancel2`, which will clean up the mirror datapath
416
+
and destroy the mirror VDI, similar to what is done in SMAPIv1.
0 commit comments