@@ -9,6 +9,12 @@ Title: Storage migration
9
9
- [ Thought experiments on an alternative design] ( #thought-experiments-on-an-alternative-design )
10
10
- [ Design] ( #design )
11
11
- [ SMAPIv1 migration] ( #smapiv1-migration )
12
+ - [ Preparation] ( #preparation )
13
+ - [ Establish mirror] ( #establish-mirror )
14
+ - [ Mirror] ( #mirror )
15
+ - [ Snapshot] ( #snapshot )
16
+ - [ Copy and compose] ( #copy-and-compose )
17
+ - [ Finish] ( #finish )
12
18
- [ SMAPIv3 migration] ( #smapiv3-migration )
13
19
- [ Error Handling] ( #error-handling )
14
20
- [ Preparation (SMAPIv1 and SMAPIv3)] ( #preparation-smapiv1-and-smapiv3 )
@@ -122,10 +128,44 @@ it will be handled just as before.
122
128
123
129
## SMAPIv1 migration
124
130
131
+ This section is about migration from SMAPIv1 SRs to SMAPIv1 or SMAPIv3 SRs, since
132
+ the migration is driven by the source host, it is usally the source host that
133
+ determines most of the logic during a storage migration.
134
+
135
+ First we take a look at an overview diagram of what happens during SMAPIv1 SXM:
136
+ the diagram is labelled with S1, S2 ... which indicates different stages of the migration.
137
+ We will talk about each stage in more detail below.
138
+
139
+ ![ overview-v1] ( sxm-overview-v1.svg )
140
+
141
+ ### Preparation
142
+
143
+ Before we can start our migration process, there are a number of preparations
144
+ needed to prepare for the following mirror. For SMAPIv1 this involves:
145
+
146
+ 1 . Create a new VDI (called leaf) that will be used as the receiving VDI for all the new writes
147
+ 2 . Create a dummy snapshot of the VDI above to make sure it is a differencing disk and can be composed later on
148
+ 3 . Create a VDI (called parent) that will be used to receive the existing content (of the snapshot)
149
+
150
+ Note that the leaf VDI needs to be attached and activated (to a non-exsiting ` mirror_vm ` )
151
+ since it will later on accept writes to mirror what is written on the source host.
152
+
153
+ The parent VDI may be created in two different ways: 1. If there is a "similar VDI",
154
+ clone it on the destination host and use it as the parent VDI; 2. If there is no
155
+ such VDI, create a new blank VDI. The similarity here is defined by the distances
156
+ between different VDIs in the VHD tree, which is exploiting the internal representation
157
+ of the storage layer, hence we will not go into too much detail about this here.
158
+
159
+ Once these preparations are done, a ` mirror_receive_result ` data structure is then
160
+ passed back to the source host that will contain all the necessary information about
161
+ these new VDIs, etc.
162
+
163
+ ### Establishing mirror
164
+
125
165
At a high level, mirror establishment for SMAPIv1 works as follows:
126
166
127
167
1 . Take a snapshot of a VDI that is attached to VM1. This gives us an immutable
128
- copy of the current state of the VDI, with all the data until the point we took
168
+ copy of the current state of the VDI, with all the data up until the point we took
129
169
the snapshot. This is illustrated in the diagram as a VDI and its snapshot connecting
130
170
to a shared parent, which stores the shared content for the snapshot and the writable
131
171
VDI from which we took the snapshot (snapshot)
@@ -135,8 +175,79 @@ client VDI will also be written to the mirrored VDI on the remote host (mirror)
135
175
4 . Compose the mirror and the snapshot to form a single VDI
136
176
5 . Destroy the snapshot on the local host (cleanup)
137
177
178
+ #### Mirror
179
+
180
+ The mirroring process for SMAPIv1 is rather unconventional, so it is worth
181
+ documenting how this works. Instead of a conventional client server architecture,
182
+ where the source client connects to the destination server directly through the
183
+ NBD protocol in tapdisk, the connection is established in xapi and then passed
184
+ onto tapdisk.
185
+
186
+ The diagram below illustrates this prcess. First, xapi on the source host will
187
+ initiate an http request to the remote xapi. This request contains the necessary
188
+ information about the VDI to be mirrored, and the SR that contains it, etc. This
189
+ information is then passed onto the http handler on the destination host (called
190
+ ` nbd_handler ` ) which then processes this information. Now the unusual step is that
191
+ both the source and the destination xapi will pass this connection onto tapdisk,
192
+ by sending the fd representing the socket connection to the tapdisk process. On
193
+ the source this would be nbd client in the tapdisk process, and on the destination
194
+ this would be the nbd server in the tapdisk process. After this step, we can consider
195
+ a client-server connection is established between two tapdisks on the client and
196
+ server, as if the tapdisk on the source host makes a request to the tapdisk on the
197
+ destination host and initiates the connection. On the diagram, this is indicated
198
+ by the dashed lines between the tapdisk processes. Logically, we can view this as
199
+ xapi creates the connection, and then passes this connection down into tapdisk.
200
+
201
+ ![ mirror] ( sxm-mirror-v1.svg )
202
+
203
+ #### Snapshot
204
+
205
+ The next step would be create a snapshot of the VDI. This is easily done as a
206
+ ` VDI.snapshot ` operation. If the VDI was in VHD format, then internally this would
207
+ create two children for, one for the snapshot, which only contains the metadata
208
+ information and tends to be small, the other for the writable VDI where all the
209
+ new writes will go to. The shared base copy contains the shared blocks.
210
+
211
+ ![ snapshot] ( sxm-snapshot-v1.svg )
212
+
213
+ #### Copy and compose
214
+
215
+ Once the snapshot is created, we can then copy the snapshot from the source
216
+ to the destination. This step is done by ` sparse_dd ` using the nbd protocol. This
217
+ is also the step that takes the most time to complete.
218
+
219
+ ` sparse_dd ` is a process forked by xapi that does the copying of the disk blocks.
220
+ ` sparse_dd ` can speak a number of protocols, including nbd. In this case, ` sparse_dd `
221
+ will initiate an http put request to the destination host, with a url of the form
222
+ ` <address>/services/SM/nbdproxy/<sr>/<vdi> ` . This http request then
223
+ gets handled by the http handler on the destination host B, which will then spawn
224
+ a handler thread. This handler will find the
225
+ "generic" nbd server[ ^ 2 ] of either tapdisk or qemu-dp, depending on the destination
226
+ SR type, and then start proxying data between the http connection socket and the
227
+ socket connected to the nbd server.
228
+
229
+ [ ^ 2 ] : The server is generic because it does not accept fd passing, and I call those
230
+ "special" nbd server.
231
+
232
+ ![ sxm new copy] ( sxm-new-copy-v1.svg )
233
+
234
+ Once copying is done, the snapshot and mirrored VDI can be then composed into a
235
+ single VDI.
236
+
237
+ #### Finish
238
+
239
+ At this point the VDI is migrated to the new host! Mirror is still on at this point
240
+ though because that will not be destroyed until the VM itself has been migrated
241
+ as well. Some cleanups are done at this point, such as deleting the snapshot
242
+ that is taken on the source, etc.
243
+
244
+ The end results look like the following. Note that VM2 is in dashed line as it
245
+ is not yet created yet. The next steps would be to migrate the VM1 itself to the
246
+ destination as well, but this is part of the VM migration process and will not
247
+ be covered here.
248
+
249
+ ![ final] ( sxm-final-v1.svg )
138
250
139
- more detail to come...
140
251
141
252
## SMAPIv3 migration
142
253
0 commit comments