Skip to content

Commit dae69e0

Browse files
committed
doc: Add doc on how SMAPIv1 SXM works
Signed-off-by: Vincent Liu <[email protected]>
1 parent e10a62b commit dae69e0

File tree

6 files changed

+139
-2
lines changed

6 files changed

+139
-2
lines changed

doc/content/xapi/storage/sxm/index.md

+119-2
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@ Title: Storage migration
99
- [Thought experiments on an alternative design](#thought-experiments-on-an-alternative-design)
1010
- [Design](#design)
1111
- [SMAPIv1 migration](#smapiv1-migration)
12+
- [Preparation](#preparation)
13+
- [Establish mirror](#establish-mirror)
14+
- [Mirror](#mirror)
15+
- [Snapshot](#snapshot)
16+
- [Copy and compose](#copy-and-compose)
17+
- [Finish](#finish)
1218
- [SMAPIv3 migration](#smapiv3-migration)
1319
- [Error Handling](#error-handling)
1420
- [Preparation (SMAPIv1 and SMAPIv3)](#preparation-smapiv1-and-smapiv3)
@@ -122,10 +128,44 @@ it will be handled just as before.
122128

123129
## SMAPIv1 migration
124130

131+
This section is about migration from SMAPIv1 SRs to SMAPIv1 or SMAPIv3 SRs, since
132+
the migration is driven by the source host, it is usally the source host that
133+
determines most of the logic during a storage migration.
134+
135+
First we take a look at an overview diagram of what happens during SMAPIv1 SXM:
136+
the diagram is labelled with S1, S2 ... which indicates different stages of the migration.
137+
We will talk about each stage in more detail below.
138+
139+
![overview-v1](sxm-overview-v1.svg)
140+
141+
### Preparation
142+
143+
Before we can start our migration process, there are a number of preparations
144+
needed to prepare for the following mirror. For SMAPIv1 this involves:
145+
146+
1. Create a new VDI (called leaf) that will be used as the receiving VDI for all the new writes
147+
2. Create a dummy snapshot of the VDI above to make sure it is a differencing disk and can be composed later on
148+
3. Create a VDI (called parent) that will be used to receive the existing content (of the snapshot)
149+
150+
Note that the leaf VDI needs to be attached and activated (to a non-exsiting `mirror_vm`)
151+
since it will later on accept writes to mirror what is written on the source host.
152+
153+
The parent VDI may be created in two different ways: 1. If there is a "similar VDI",
154+
clone it on the destination host and use it as the parent VDI; 2. If there is no
155+
such VDI, create a new blank VDI. The similarity here is defined by the distances
156+
between different VDIs in the VHD tree, which is exploiting the internal representation
157+
of the storage layer, hence we will not go into too much detail about this here.
158+
159+
Once these preparations are done, a `mirror_receive_result` data structure is then
160+
passed back to the source host that will contain all the necessary information about
161+
these new VDIs, etc.
162+
163+
### Establishing mirror
164+
125165
At a high level, mirror establishment for SMAPIv1 works as follows:
126166

127167
1. Take a snapshot of a VDI that is attached to VM1. This gives us an immutable
128-
copy of the current state of the VDI, with all the data until the point we took
168+
copy of the current state of the VDI, with all the data up until the point we took
129169
the snapshot. This is illustrated in the diagram as a VDI and its snapshot connecting
130170
to a shared parent, which stores the shared content for the snapshot and the writable
131171
VDI from which we took the snapshot (snapshot)
@@ -135,8 +175,79 @@ client VDI will also be written to the mirrored VDI on the remote host (mirror)
135175
4. Compose the mirror and the snapshot to form a single VDI
136176
5. Destroy the snapshot on the local host (cleanup)
137177

178+
#### Mirror
179+
180+
The mirroring process for SMAPIv1 is rather unconventional, so it is worth
181+
documenting how this works. Instead of a conventional client server architecture,
182+
where the source client connects to the destination server directly through the
183+
NBD protocol in tapdisk, the connection is established in xapi and then passed
184+
onto tapdisk.
185+
186+
The diagram below illustrates this prcess. First, xapi on the source host will
187+
initiate an http request to the remote xapi. This request contains the necessary
188+
information about the VDI to be mirrored, and the SR that contains it, etc. This
189+
information is then passed onto the http handler on the destination host (called
190+
`nbd_handler`) which then processes this information. Now the unusual step is that
191+
both the source and the destination xapi will pass this connection onto tapdisk,
192+
by sending the fd representing the socket connection to the tapdisk process. On
193+
the source this would be nbd client in the tapdisk process, and on the destination
194+
this would be the nbd server in the tapdisk process. After this step, we can consider
195+
a client-server connection is established between two tapdisks on the client and
196+
server, as if the tapdisk on the source host makes a request to the tapdisk on the
197+
destination host and initiates the connection. On the diagram, this is indicated
198+
by the dashed lines between the tapdisk processes. Logically, we can view this as
199+
xapi creates the connection, and then passes this connection down into tapdisk.
200+
201+
![mirror](sxm-mirror-v1.svg)
202+
203+
#### Snapshot
204+
205+
The next step would be create a snapshot of the VDI. This is easily done as a
206+
`VDI.snapshot` operation. If the VDI was in VHD format, then internally this would
207+
create two children for, one for the snapshot, which only contains the metadata
208+
information and tends to be small, the other for the writable VDI where all the
209+
new writes will go to. The shared base copy contains the shared blocks.
210+
211+
![snapshot](sxm-snapshot-v1.svg)
212+
213+
#### Copy and compose
214+
215+
Once the snapshot is created, we can then copy the snapshot from the source
216+
to the destination. This step is done by `sparse_dd` using the nbd protocol. This
217+
is also the step that takes the most time to complete.
218+
219+
`sparse_dd` is a process forked by xapi that does the copying of the disk blocks.
220+
`sparse_dd` can speak a number of protocols, including nbd. In this case, `sparse_dd`
221+
will initiate an http put request to the destination host, with a url of the form
222+
`<address>/services/SM/nbdproxy/<sr>/<vdi>`. This http request then
223+
gets handled by the http handler on the destination host B, which will then spawn
224+
a handler thread. This handler will find the
225+
"generic" nbd server[^2] of either tapdisk or qemu-dp, depending on the destination
226+
SR type, and then start proxying data between the http connection socket and the
227+
socket connected to the nbd server.
228+
229+
[^2]: The server is generic because it does not accept fd passing, and I call those
230+
"special" nbd server.
231+
232+
![sxm new copy](sxm-new-copy-v1.svg)
233+
234+
Once copying is done, the snapshot and mirrored VDI can be then composed into a
235+
single VDI.
236+
237+
#### Finish
238+
239+
At this point the VDI is migrated to the new host! Mirror is still on at this point
240+
though because that will not be destroyed until the VM itself has been migrated
241+
as well. Some cleanups are done at this point, such as deleting the snapshot
242+
that is taken on the source, etc.
243+
244+
The end results look like the following. Note that VM2 is in dashed line as it
245+
is not yet created yet. The next steps would be to migrate the VM1 itself to the
246+
destination as well, but this is part of the VM migration process and will not
247+
be covered here.
248+
249+
![final](sxm-final-v1.svg)
138250

139-
more detail to come...
140251

141252
## SMAPIv3 migration
142253

@@ -215,6 +326,12 @@ failure during copying.
215326

216327
## SMAPIv1 Migration implementation detail
217328

329+
The following doc refers to the xapi a [version](https://github.com/xapi-project/xen-api/blob/v24.37.0/ocaml/xapi/storage_migrate.ml)
330+
of xapi that is before 24.37 after which point this code structure has undergone
331+
many changes as part of adding support for SMAPIv3 SXM. Therefore the following
332+
tutorial might be less relevant in terms of the implementation detail. Although
333+
the general principle should remain the same.
334+
218335
```mermaid
219336
sequenceDiagram
220337
participant local_tapdisk as local tapdisk

doc/content/xapi/storage/sxm/sxm-final-v1.svg

+4
Loading

doc/content/xapi/storage/sxm/sxm-mirror-v1.svg

+4
Loading

doc/content/xapi/storage/sxm/sxm-new-copy-v1.svg

+4
Loading

doc/content/xapi/storage/sxm/sxm-overview-v1.svg

+4
Loading

doc/content/xapi/storage/sxm/sxm-snapshot-v1.svg

+4
Loading

0 commit comments

Comments
 (0)