Skip to content

Commit ef843a4

Browse files
committed
doc: Add doc for SMAPIv3 SXM
Signed-off-by: Vincent Liu <[email protected]>
1 parent dae69e0 commit ef843a4

File tree

2 files changed

+109
-3
lines changed

2 files changed

+109
-3
lines changed

doc/content/xapi/storage/sxm/index.md

+105-3
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,16 @@ Title: Storage migration
1010
- [Design](#design)
1111
- [SMAPIv1 migration](#smapiv1-migration)
1212
- [Preparation](#preparation)
13-
- [Establish mirror](#establish-mirror)
13+
- [Establishing mirror](#establishing-mirror)
1414
- [Mirror](#mirror)
1515
- [Snapshot](#snapshot)
1616
- [Copy and compose](#copy-and-compose)
1717
- [Finish](#finish)
1818
- [SMAPIv3 migration](#smapiv3-migration)
19+
- [Preparation](#preparation-1)
20+
- [Establishing mirror](#establishing-mirror-1)
21+
- [Limitations](#limitations)
22+
- [Finish](#finish-1)
1923
- [Error Handling](#error-handling)
2024
- [Preparation (SMAPIv1 and SMAPIv3)](#preparation-smapiv1-and-smapiv3)
2125
- [Snapshot and mirror failure (SMAPIv1)](#snapshot-and-mirror-failure-smapiv1)
@@ -251,7 +255,96 @@ be covered here.
251255

252256
## SMAPIv3 migration
253257

254-
More detail to come...
258+
This section covers the mechanism of migrations *from* SRs using SMAPIv3 (to
259+
SMAPIv1 or SMAPIv3). Although the core ideas are the same, SMAPIv3 has a rather
260+
different mechanism for mirroring: 1. it does not require xapi to take snapshot
261+
of the VDI anymore, since the mirror itself will take care of replicating the
262+
existing data to the destination; 2. there is no fd passing for connection establishment
263+
anymore, instead proxies are used for connection setup. We will cover more details
264+
on these below.
265+
266+
### Preparation
267+
268+
The preparation work for SMAPIv3 is greatly simplified by the fact that the mirror
269+
at the storge layer will copy the existing data in the VDI to the destination.
270+
This means that snapshot of the source VDI is not required anymore. So we are left
271+
with only one things:
272+
273+
1. Create a VDI used for mirroring the data of the source VDI
274+
275+
For this reason, the implementation logic for SMAPIv3 preparation is also shorter,
276+
as the complexity is now handled by the storage layer, which is where it is supposed
277+
to be handled.
278+
279+
### Establishing mirror
280+
281+
The other significant difference is that the storage backend for SMAPIv3 `qemu-dp`
282+
SRs no longer accepts fds, so xapi needs to proxy the data between two nbd client
283+
and nbd server.
284+
285+
SMAPIv3 provides the `Data.mirror uri domain remote` which needs three parameters:
286+
`uri` for accessing the local disk, `doamin` for the domain slice on which mirroring
287+
should happen, and most importantly for this design, a `remote` url which represents
288+
the remote nbd server to which the blocks of data can be sent to.
289+
290+
This call itself, when called by xapi and forwarded to the storage layer's qemu-dp
291+
nbd client, will initiate a nbd connection to the nbd server pointed to by `remote`.
292+
This works fine when the storage migration happens entirely within a local host,
293+
where qemu-dp's nbd client and nbd server can communicate over unix domain sockets.
294+
However, it does not work for inter-host migrations as qemu-dp's nbd server is not
295+
exposed publicly over the network (just as tapdisk's nbd server). Therefore a proxying
296+
service on the source host is needed for forwarding the nbd connection from the
297+
source host to the destination host. And it would be the responsiblity of
298+
xapi to manage this proxy service.
299+
300+
The following diagram illustrates the mirroring process of a single VDI:
301+
302+
![sxm mirror](sxm-mirror-v3.svg)
303+
304+
The first step for xapi is then to set up a nbd proxy thread that will be listening
305+
on a local unix domain socket with path `/var/run/nbdproxy/export/<domain>` where
306+
domain is the `domain` parameter mentioned above in `Data.mirror`. The nbd proxy
307+
thread will accept nbd connections (or rather any connections, it does not
308+
speak/care about nbd protocol at all) and sends an http put request
309+
to the remote xapi. The proxy itself will then forward the data exactly as it is
310+
to the remote side through the http connection.
311+
312+
Once the proxy is set up, xapi will call `Data.mirror`, which
313+
will be forwarded to the xapi-storage-script and is further forwarded to the qemu-dp.
314+
This call contains, among other parameters, the destination NBD server url (`remote`)
315+
to be connected. In this case the destination nbd server is exactly the domain
316+
socket to which the proxy thread is listening. Therefore the `remote` parameter
317+
will be of the form `nbd+unix:///<export>?socket=<socket>` where the export is provided
318+
by the destination nbd server that represents the VDI prepared on the destination
319+
host, and the socket will be the path of the unix domain socket where the proxy
320+
thread (which we just created) is listening at.
321+
322+
When this connection is set up, the proxy process will talk to the remote xapi via
323+
http requests, and on the remote side, an http handler will proxy this request to
324+
the appropriate nbd server of either tapdisk or qemu-dp, using exactly the same
325+
[import proxy](#copy-and-compose) as mentioned before.
326+
327+
Note that this proxying service is tightly integrated with outbound SXM of SMAPIv3
328+
SRs. This is to make it simple to focus on the migration itself.
329+
330+
Although there is no need to explicitly copy the VDI anymore, we still need to
331+
transfer the data and wait for it finish. For this we use `Data.stat` call provided
332+
by the storage backend to query the status of the mirror, and wait for it to finish
333+
as needed.
334+
335+
#### Limitations
336+
337+
This way of establishing the connection simplifies the implementation of the migration
338+
for SMAPIv3, but it also has limitations:
339+
340+
One proxy per live VDI migration is needed, which can potentially consume lots of resources in dom0, and we should measure the impact of this before we switch to using more resource-efficient ways such as wire guard that allows establishing a single connection between multiple hosts.
341+
342+
343+
### Finish
344+
345+
As there is no need to copy a VDI, there is also no need to compose or delete the
346+
snapshot. The cleanup procedure would therefore just involve destroy the datapath
347+
that was used for receiving writes for the mirrored VDI.
255348

256349
## Error Handling
257350

@@ -314,7 +407,16 @@ are migrating from.
314407

315408
### Mirror failure (SMAPIv3)
316409

317-
To be filled...
410+
The `Data.stat` call in SMAPIv3 returns a data structure that includes the current
411+
progress of the mirror job, whether it has completed syncing the existing data and
412+
whether the mirorr has failed. Similar to how it is done in SMAPIv1, we wait for
413+
the sync to complete once we issue the `Data.mirror` call, by repeatedly polling
414+
the status of the mirror using the `Data.stat` call. During this process, the status
415+
of the mirror is also checked and if a failure is detected, a `Migration_mirror_failure`
416+
will be raised and then gets handled by the code in `storage_migrate.ml` by calling
417+
`Storage_smapiv3_migrate.receive_cancel2`, which will clean up the mirror datapath
418+
and destroy the mirror VDI, similar to what is done in SMAPIv1.
419+
318420

319421
### Copy failure (SMAPIv1)
320422

doc/content/xapi/storage/sxm/sxm-mirror-v3.svg

+4
Loading

0 commit comments

Comments
 (0)