@@ -9,6 +9,12 @@ Title: Storage migration
9
9
- [ Thought experiments on an alternative design] ( #thought-experiments-on-an-alternative-design )
10
10
- [ Design] ( #design )
11
11
- [ SMAPIv1 migration] ( #smapiv1-migration )
12
+ - [ Preparation] ( #preparation )
13
+ - [ Establish mirror] ( #establish-mirror )
14
+ - [ Mirror] ( #mirror )
15
+ - [ Snapshot] ( #snapshot )
16
+ - [ Copy and compose] ( #copy-and-compose )
17
+ - [ Finish] ( #finish )
12
18
- [ SMAPIv3 migration] ( #smapiv3-migration )
13
19
- [ Error Handling] ( #error-handling )
14
20
- [ Preparation (SMAPIv1 and SMAPIv3)] ( #preparation-smapiv1-and-smapiv3 )
@@ -122,10 +128,44 @@ it will be handled just as before.
122
128
123
129
## SMAPIv1 migration
124
130
131
+ This section is about migration from SMAPIv1 SRs to SMAPIv1 or SMAPIv3 SRs, since
132
+ the migration is driven by the source host, it is usally the source host that
133
+ determines most of the logic during a storage migration.
134
+
135
+ First we take a look at an overview diagram of what happens during SMAPIv1 SXM:
136
+ the diagram is labelled with S1, S2 ... which indicates different stages of the migration.
137
+ We will talk about each stage in more detail below.
138
+
139
+ ![ overview-v1] ( sxm-overview-v1.svg )
140
+
141
+ ### Preparation
142
+
143
+ Before we can start our migration process, there are a number of preparations
144
+ needed to prepare for the following mirror. For SMAPIv1 this involves:
145
+
146
+ 1 . Create a new VDI (called leaf) that will be used as the receiving VDI for all the new writes
147
+ 2 . Create a dummy snapshot of the VDI above to make sure it is a differencing disk and can be composed later on
148
+ 3 . Create a VDI (called parent) that will be used to receive the existing content of the disk (the snapshot)
149
+
150
+ Note that the leaf VDI needs to be attached and activated on the destination host (to a non-exsiting ` mirror_vm ` )
151
+ since it will later on accept writes to mirror what is written on the source host.
152
+
153
+ The parent VDI may be created in two different ways: 1. If there is a "similar VDI",
154
+ clone it on the destination host and use it as the parent VDI; 2. If there is no
155
+ such VDI, create a new blank VDI. The similarity here is defined by the distances
156
+ between different VDIs in the VHD tree, which is exploiting the internal representation
157
+ of the storage layer, hence we will not go into too much detail about this here.
158
+
159
+ Once these preparations are done, a ` mirror_receive_result ` data structure is then
160
+ passed back to the source host that will contain all the necessary information about
161
+ these new VDIs, etc.
162
+
163
+ ### Establishing mirror
164
+
125
165
At a high level, mirror establishment for SMAPIv1 works as follows:
126
166
127
167
1 . Take a snapshot of a VDI that is attached to VM1. This gives us an immutable
128
- copy of the current state of the VDI, with all the data until the point we took
168
+ copy of the current state of the VDI, with all the data up until the point we took
129
169
the snapshot. This is illustrated in the diagram as a VDI and its snapshot connecting
130
170
to a shared parent, which stores the shared content for the snapshot and the writable
131
171
VDI from which we took the snapshot (snapshot)
@@ -135,8 +175,79 @@ client VDI will also be written to the mirrored VDI on the remote host (mirror)
135
175
4 . Compose the mirror and the snapshot to form a single VDI
136
176
5 . Destroy the snapshot on the local host (cleanup)
137
177
178
+ #### Mirror
179
+
180
+ The mirroring process for SMAPIv1 is rather unconventional, so it is worth
181
+ documenting how this works. Instead of a conventional client server architecture,
182
+ where the source client connects to the destination server directly through the
183
+ NBD protocol in tapdisk, the connection is established in xapi and then passed
184
+ onto tapdisk.
185
+
186
+ The diagram below illustrates this prcess. First, xapi on the source host will
187
+ initiate an http request to the remote xapi. This request contains the necessary
188
+ information about the VDI to be mirrored, and the SR that contains it, etc. This
189
+ information is then passed onto the http handler on the destination host (called
190
+ ` nbd_handler ` ) which then processes this information. Now the unusual step is that
191
+ both the source and the destination xapi will pass this connection onto tapdisk,
192
+ by sending the fd representing the socket connection to the tapdisk process. On
193
+ the source this would be nbd client process of tapdisk, and on the destination
194
+ this would be the nbd server process of the tapdisk. After this step, we can consider
195
+ a client-server connection is established between two tapdisks on the client and
196
+ server, as if the tapdisk on the source host makes a request to the tapdisk on the
197
+ destination host and initiates the connection. On the diagram, this is indicated
198
+ by the dashed lines between the tapdisk processes. Logically, we can view this as
199
+ xapi creates the connection, and then passes this connection down into tapdisk.
200
+
201
+ ![ mirror] ( sxm-mirror-v1.svg )
202
+
203
+ #### Snapshot
204
+
205
+ The next step would be create a snapshot of the VDI. This is easily done as a
206
+ ` VDI.snapshot ` operation. If the VDI was in VHD format, then internally this would
207
+ create two children for, one for the snapshot, which only contains the metadata
208
+ information and tends to be small, the other for the writable VDI where all the
209
+ new writes will go to. The shared base copy contains the shared blocks.
210
+
211
+ ![ snapshot] ( sxm-snapshot-v1.svg )
212
+
213
+ #### Copy and compose
214
+
215
+ Once the snapshot is created, we can then copy the snapshot from the source
216
+ to the destination. This step is done by ` sparse_dd ` using the nbd protocol. This
217
+ is also the step that takes the most time to complete.
218
+
219
+ ` sparse_dd ` is a process forked by xapi that does the copying of the disk blocks.
220
+ ` sparse_dd ` can supports a number of protocols, including nbd. In this case, ` sparse_dd `
221
+ will initiate an http put request to the destination host, with a url of the form
222
+ ` <address>/services/SM/nbdproxy/<sr>/<vdi> ` . This http request then
223
+ gets handled by the http handler on the destination host B, which will then spawn
224
+ a handler thread. This handler will find the
225
+ "generic" nbd server[ ^ 2 ] of either tapdisk or qemu-dp, depending on the destination
226
+ SR type, and then start proxying data between the http connection socket and the
227
+ socket connected to the nbd server.
228
+
229
+ [ ^ 2 ] : The server is generic because it does not accept fd passing, and I call those
230
+ "special" nbd server/fd receiver.
231
+
232
+ ![ sxm new copy] ( sxm-new-copy-v1.svg )
233
+
234
+ Once copying is done, the snapshot and mirrored VDI can be then composed into a
235
+ single VDI.
236
+
237
+ #### Finish
238
+
239
+ At this point the VDI is synchronised to the new host! Mirror is still working at this point
240
+ though because that will not be destroyed until the VM itself has been migrated
241
+ as well. Some cleanups are done at this point, such as deleting the snapshot
242
+ that is taken on the source, destroying the mirror datapath, etc.
243
+
244
+ The end results look like the following. Note that VM2 is in dashed line as it
245
+ is not yet created yet. The next steps would be to migrate the VM1 itself to the
246
+ destination as well, but this is part of the VM migration process and will not
247
+ be covered here.
248
+
249
+ ![ final] ( sxm-final-v1.svg )
138
250
139
- more detail to come...
140
251
141
252
## SMAPIv3 migration
142
253
@@ -168,10 +279,10 @@ helps separate the error handling logic into the `with` part of a `try with` blo
168
279
which is where they are supposed to be. Since we need to accommodate the existing
169
280
SMAPIv1 migration (which has more stages than SMAPIv3), the following stages are
170
281
introduced: preparation (v1,v3), snapshot(v1), mirror(v1, v3), copy(v1). Note that
171
- each stage also roughly corresponds to a helper function that is called within ` MIRROR .start` ,
282
+ each stage also roughly corresponds to a helper function that is called within ` Storage_migrate .start` ,
172
283
which is the wrapper function that initiates storage migration. And each helper
173
284
functions themselves would also have error handling logic within themselves as
174
- needed (e.g. see `Storage_smapiv1_migrate.receive_start) to deal with exceptions
285
+ needed (e.g. see ` Storage_smapiv1_migrate.receive_start ` ) to deal with exceptions
175
286
that happen within each helper functions.
176
287
177
288
### Preparation (SMAPIv1 and SMAPIv3)
@@ -215,6 +326,14 @@ failure during copying.
215
326
216
327
## SMAPIv1 Migration implementation detail
217
328
329
+ {{% notice info %}}
330
+ The following doc refers to the xapi a [ version] ( https://github.com/xapi-project/xen-api/blob/v24.37.0/ocaml/xapi/storage_migrate.ml )
331
+ of xapi that is before 24.37 after which point this code structure has undergone
332
+ many changes as part of adding support for SMAPIv3 SXM. Therefore the following
333
+ tutorial might be less relevant in terms of the implementation detail. Although
334
+ the general principle should remain the same.
335
+ {{% /notice %}}
336
+
218
337
``` mermaid
219
338
sequenceDiagram
220
339
participant local_tapdisk as local tapdisk
0 commit comments