vsock: Enable live migrations (snapshot-restore) by jemoreira · Pull Request #936 · rust-vmm/vhost-device

jemoreira · 2026-02-11T23:35:41Z

Summary of the PR

The vsock device only accesses virtqueues when they are ready.

The most common case where this matters is after receiving a VHOST_USER_GET_VRING_BASE message from the client (because of a suspend operation, for example), if one of the host side applications or sister VMs sends us a message it must be kept queued until the virtqueues are set to ready again with a kick.

Enable live migrations (snapshot-restore) of the vsock device.

The virtio spec mandates that when restoring from a snapshot (such as when the VM was migrated to a different host) the device must send a TRANSPORT RESET event via the event virtqueue to the device. The device in turn, upon receiving the event, must drop all existing connections while keeping all listeners and read the CID again from the device configuration space and update the listeners with the (potentially different) CID.

On the device side care must be taken to ensure no new connections are established until that transport reset event is handled otherwise the guest will silently drop them and the peer may not know about it. Given that the driver must read the CID from the configuration space during normal boot and after doing a reset that's the best signal for the device to "activate" and start processing packets from the host and sibling VMs. Feature negotiation also happens during both normal boot and restore, but in the case of restore it happens before the transport reset and therefore can't be used as reliable signal to "activate" the device.

The device doesn't need to save/load any state for this. When loading a snapshot the device simply notes that it must send a TRANSPORT RESET event to the driver as soon as possible, which it then does when it receives a kick on the event vring. Independently of whether the device is started to restore a previous VM state or brand new (the device actually doesn't know until it set_device_state_fd is called), the device always starts in "inactive" state, meaning it will drop any packets coming from any source. As mentioned before the device "activates" once the driver has read the configuration space.

Unlike other VMMs, QEMU takes ownership of the event vring and handles sending of the TRANSPORT RESET event itself. This change attempts to handle both approaches by sending the TRANSPORT RESET event if the vring is kicked, but doesn't treat it as precondition to activate the device, instead choosing to activate unconditionally once the driver reads the configuration space.

There is a race in this implementation, that occurs when the snapshot was taken before the guest driver read the configuration and then it reads immediately upon restore, before the transport reset event was handled. While this could be mitigated by waiting for the reset event to be sent AND the config to be read AFTER the event was sent, QEMU's decision to take over the event queue makes that approach unviable. Hopefully, it will be very unlikely a snapshot will be taken before the driver is fully initialized as that would be of very little value in practice.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

All commits in this PR have Signed-Off-By trailers (with
git commit -s), and the commit message has max 60 characters for the
summary and max 75 characters for each description line.
All added/changed functionality has a corresponding unit/integration
test.
All added/changed public-facing functionality has entries in the "Upcoming
Release" section of CHANGELOG.md (if no such section exists, please create one).
Any newly added unsafe code is properly documented.

jemoreira · 2026-02-12T00:07:23Z

Converted to draft to address the test failures

dorindabassey

Thank you for this PR, just left a few comments.

vhost-device-vsock/src/vhu_vsock.rs

jemoreira · 2026-02-26T22:16:41Z

Hello, just a friendly reminder that this PR is ready for review.

vhost-device-vsock/src/vhu_vsock.rs

stefano-garzarella · 2026-02-27T15:43:03Z

vhost-device-vsock/src/vhu_vsock.rs

+
 // Queue mask to select vrings.
-const QUEUE_MASK: u64 = 0b11;
+const QUEUE_MASK: u64 = 0b111;


I'm wondering if this should be controlled by a param in the CLI. IIRC in QEMU the event queue is handled by the vmm.

Did you test this with QEMU?

I didn't test with QEMU, only with CrosVm. I was very surprised to learn that QEMU takes ownership of the third queue. Luckily no cmdline parameter is needed, instead the device always activates after the config is read. This unfortunately deprives us of the ability to avoid a particular race condition (see the details in the commit message), but it should be very unlikely to trigger anyways.

I have now tested that the device works well with QEMU, however when I tried taking a snapshot and restoring it failed with an error saying that the device doesn't support it, even though it advertises the DEVICE_STATE protocol feature. I suspect the issue is in the qemu frontend, not in the backend.

I didn't test with QEMU, only with CrosVm. I was very surprised to learn that QEMU takes ownership of the third queue.

IIRC this was already in that way for vhost-vsock, so when we introduced vhost-user-vsock we shared the same code. The VMM knows perfectly when a snapshot/migration is starting, so why it's strage that QEMU does it?

Luckily no cmdline parameter is needed, instead the device always activates after the config is read. This unfortunately deprives us of the ability to avoid a particular race condition (see the details in the commit message), but it should be very unlikely to trigger anyways.

mmm, so why not have a cmdline parameter to enabled this and avoid the race at all?

I have now tested that the device works well with QEMU, however when I tried taking a snapshot and restoring it failed with an error saying that the device doesn't support it, even though it advertises the DEVICE_STATE protocol feature. I suspect the issue is in the qemu frontend, not in the backend.

yeah, the fronted needs to enable that in some way. Annoying...

why it's strage that QEMU does it?

Because the queue is called "event", not "transport_reset". If other events are added to the virtio-vsock spec then having qemu handling that queue will be a problem. It also seems to me that it doesn't match the intent of vhost-user, where reading and writing the queues is the responsibility of the backend, not the frontend; but that's just my (probably very uninformed) opinion.

mmm, so why not have a cmdline parameter to enabled this and avoid the race at all?

Because the probability of hitting that race is extremely low, possibly 0. I don't know for sure that it's valid to attempt to take a snapshot of an uninitialized driver.

On the other hand, having the extra flag that changes the behavior based on what the VMM does is a significant burden for the users. So far what QEMU and CrosVm do, but what about other VMMs? I doubt this behavior is document anywhere in an accessible manner, so it's possible that the only way to know for sure is to look at the VMM's source code. The user could also just try it with and without the flag and see what happens, but for VMMs that share the event queue with the backend both combinations will work most of the time.

So I between a very unlikely race and some usability issues I chose the former, but I'm fine either way. Just let me know your preference.

why it's strage that QEMU does it?

Because the queue is called "event", not "transport_reset". If other events are added to the virtio-vsock spec then having qemu handling that queue will be a problem. It also seems to me that it doesn't match the intent of vhost-user, where reading and writing the queues is the responsibility of the backend, not the frontend; but that's just my (probably very uninformed) opinion.

As I mentioned, this comes from vhost (in-kernel) device implementation.

mmm, so why not have a cmdline parameter to enabled this and avoid the race at all?

Because the probability of hitting that race is extremely low, possibly 0. I don't know for sure that it's valid to attempt to take a snapshot of an uninitialized driver.

On the other hand, having the extra flag that changes the behavior based on what the VMM does is a significant burden for the users. So far what QEMU and CrosVm do, but what about other VMMs? I doubt this behavior is document anywhere in an accessible manner, so it's possible that the only way to know for sure is to look at the VMM's source code. The user could also just try it with and without the flag and see what happens, but for VMMs that share the event queue with the backend both combinations will work most of the time.

So I between a very unlikely race and some usability issues I chose the former, but I'm fine either way. Just let me know your preference.

I think we need to find another way, for example, can we discover if the event queue is offloaded to the device from the frontend or not (e.g. check if the device called set_vring_* etc.) and do this only if it's set up?

vhost-device-vsock/src/vhu_vsock_thread.rs

stefano-garzarella · 2026-02-27T15:49:05Z

vhost-device-vsock/src/vhu_vsock.rs

+            // The last byte of the config is read when the driver is initializing or after it has
+            // processed a transport reset event. Either way, no transport reset will be pending
+            // after this.


mmm, so if a driver decides to not read the configuration, the device will not be activated? Is this conform to the spec?

Also, why we need to wait this, and not for example the feature negotiation?

The spec says the driver needs to read the CID from the config space as part of device initialization process. When restoring from a snapshot, the spec says the driver MUST read the config space again because its CID may have changed and drop any existing connections, but keep any listeners, now associated to the new CID. The device informs the driver that a restore/migration occurred by sending the transport reset event.

Because the driver may eventually silently drop any existing connections the device should ensure no new connections are established until after the drop happens or it knows for sure it won't happen. The only common action that happens during a regular boot and a restore is the driver reading the config. The features are also read, but in the case of the restore this is done from the frontend side, not the driver, and happens before the transport reset, so it can't be used as the signal to activate.

I've updated the commit message with this information too.

Thanks for the info, this helps the review a lot.

vhost-device-vsock/src/vhu_vsock.rs

vhost-device-vsock/src/vhu_vsock_thread.rs

stefano-garzarella · 2026-02-27T15:55:34Z

vhost-device-vsock/src/thread_backend.rs

                let cid_map = self.cid_map.read().unwrap();
                if cid_map.contains_key(&dst_cid) {
-                    let (sibling_raw_pkts_queue, sibling_groups_set, sibling_event_fd) =
+                    let (sibling_raw_pkts_queue_opt, sibling_groups_set, sibling_event_fd) =


Why we need this change? Can be split in another commit?

I hope these things are mentioned in the commit description.

This is to prevent packets from sibling VMs to be delivered to the guest driver before a potential transport reset event is processed (just like it's done for packages from the host). If any of these packets are allowed through a connection could be established that the transport reset would destroy later, but the sibling VM won't know about this. I don't know if just dropping the packet is the best course of action, but that's what the code already does for packets addressed to unknown CIDs and I just wanted to treat this case as if the CID is not reachable yet.

In fact, my first option was to delay adding the whole tuple to cid_map until activation, but the presence of the CID in the map is also used to check something in or near the main function, so I opted for just this instead.

I can split it into its own commit, but it doesn't make a lot of sense on its own. Let me know if you still prefer it separately.

stefano-garzarella · 2026-02-27T15:58:08Z

vhost-device-vsock/src/vhu_vsock_thread.rs

+
+    /// Sends a TRANSPORT_RESET event to the guest driver. Returns true if it was able to send it,
+    /// false if there were no buffers available in the vring.
+    pub fn reset_transport(&mut self, vring: &VringRwLock, event_idx: bool) -> Result<bool> {


So, after a reset what will be the state of this device, can it be re-used or need to be stopped ?

I'm asking because I don't see any reset of connection etc.

So, yes please describe this a bit better in the commit description and please add a section in the documentation to explain how this live migration is supposed to work.

This isn't a "device reset", the spec calls it a "transport reset", but effectively it just tells the driver that all connections it new about are gone. Those connections only existed in the driver side as the device has just been started from scratch when this event is sent.

I've added more details to the commit message. By documentation, do you mean the README.md file? I'm not sure what to add there beyond the fact that the device supports VM migrations as this is usually handled by the VMM itself (not the user directly) and the processes/protocols are described in the virtio and vhost-user specs.

This isn't a "device reset", the spec calls it a "transport reset", but effectively it just tells the driver that all connections it new about are gone. Those connections only existed in the driver side as the device has just been started from scratch when this event is sent.

I'm a bit lost here, especially where this is called.

So we have:

EVT_QUEUE_EVENT => { let reset_pending = &mut *self.transport_reset_pending.lock().unwrap(); if *reset_pending { thread.reset_transport(vring_evt, evt_idx)?; *reset_pending = false; } }

IIUC this means that it's called when the driver fills the event queue. Why this?
I mean, why the guest will fill up that queue again after a snapshot?

What happen if the guest fills the event queue before we set self.transport_reset_pending?

In QEMU we use it in a different way: when the migration is completed (post-load), we use one of the buffer that the driver already queued in the event_queue.

Here for me is not clear at all, why after a snapshot/migration a driver needs to fill the event queue again with free descriptors.

I've added more details to the commit message. By documentation, do you mean the README.md file? I'm not sure what to add there beyond the fact that the device supports VM migrations as this is usually handled by the VMM itself (not the user directly) and the processes/protocols are described in the virtio and vhost-user specs.

Yep a new section in the README.md will be nice especially to explain how it supposed to interact with the driver, because I'm still a lot confused about it, so I'm pretty sure in the future we forgot about this.

Added the section in the README file.

vhost-device-vsock/src/vhu_vsock_thread.rs

The most common case where this matters is after receiving a VHOST_USER_GET_VRING_BASE message from the client, if one of the host side applications or sister VMs sends us a message it must be kept queued until the virtqueues are set to ready again with a kick. Signed-off-by: Jorge E. Moreira <jemoreira@google.com>

stefano-garzarella

I'm still really confused, especially on why we do this on EVT_QUEUE_EVENT. So a section in the readme that explain better all the snapshot/migration support will be nice to have.

In addition, I think a parameter to enable this only with some VMMs would be better IMO because I still think what QEMU is doing makes more sense.

Note: I'll be off and with limited internet access from March 7 to March 29, if others want to merge, I'm not against, but I'd like to have a section in the readme with a clear design on how this is going to work.

vhost-device-vsock/src/vhu_vsock_thread.rs

stefano-garzarella · 2026-03-05T10:10:41Z

vhost-device-vsock/src/vhu_vsock.rs

+
 // Queue mask to select vrings.
-const QUEUE_MASK: u64 = 0b11;
+const QUEUE_MASK: u64 = 0b111;


I didn't test with QEMU, only with CrosVm. I was very surprised to learn that QEMU takes ownership of the third queue.

IIRC this was already in that way for vhost-vsock, so when we introduced vhost-user-vsock we shared the same code. The VMM knows perfectly when a snapshot/migration is starting, so why it's strage that QEMU does it?

Luckily no cmdline parameter is needed, instead the device always activates after the config is read. This unfortunately deprives us of the ability to avoid a particular race condition (see the details in the commit message), but it should be very unlikely to trigger anyways.

mmm, so why not have a cmdline parameter to enabled this and avoid the race at all?

I have now tested that the device works well with QEMU, however when I tried taking a snapshot and restoring it failed with an error saying that the device doesn't support it, even though it advertises the DEVICE_STATE protocol feature. I suspect the issue is in the qemu frontend, not in the backend.

yeah, the fronted needs to enable that in some way. Annoying...

stefano-garzarella · 2026-03-05T10:11:32Z

vhost-device-vsock/src/vhu_vsock.rs

+            // The last byte of the config is read when the driver is initializing or after it has
+            // processed a transport reset event. Either way, no transport reset will be pending
+            // after this.


Thanks for the info, this helps the review a lot.

vhost-device-vsock/src/vhu_vsock_thread.rs

stefano-garzarella · 2026-03-05T10:21:38Z

vhost-device-vsock/src/vhu_vsock_thread.rs

+
+    /// Sends a TRANSPORT_RESET event to the guest driver. Returns true if it was able to send it,
+    /// false if there were no buffers available in the vring.
+    pub fn reset_transport(&mut self, vring: &VringRwLock, event_idx: bool) -> Result<bool> {


This isn't a "device reset", the spec calls it a "transport reset", but effectively it just tells the driver that all connections it new about are gone. Those connections only existed in the driver side as the device has just been started from scratch when this event is sent.

I'm a bit lost here, especially where this is called.

So we have:

EVT_QUEUE_EVENT => { let reset_pending = &mut *self.transport_reset_pending.lock().unwrap(); if *reset_pending { thread.reset_transport(vring_evt, evt_idx)?; *reset_pending = false; } }

IIUC this means that it's called when the driver fills the event queue. Why this?
I mean, why the guest will fill up that queue again after a snapshot?

What happen if the guest fills the event queue before we set self.transport_reset_pending?

In QEMU we use it in a different way: when the migration is completed (post-load), we use one of the buffer that the driver already queued in the event_queue.

Here for me is not clear at all, why after a snapshot/migration a driver needs to fill the event queue again with free descriptors.

I've added more details to the commit message. By documentation, do you mean the README.md file? I'm not sure what to add there beyond the fact that the device supports VM migrations as this is usually handled by the VMM itself (not the user directly) and the processes/protocols are described in the virtio and vhost-user specs.

Yep a new section in the README.md will be nice especially to explain how it supposed to interact with the driver, because I'm still a lot confused about it, so I'm pretty sure in the future we forgot about this.

stefano-garzarella · 2026-03-05T10:31:10Z

vhost-device-vsock/src/thread_backend.rs

In the commit there some something to be fixed:
the device must send a TRANSPORT RESET event via the event virtqueue to the device. -> to the driver
The device in turn, upon receiving the event -> The driver in turn

The device doesn't need to save/load any state for this. When loading a snapshot the device simply notes that it must send a TRANSPORT RESET event to the driver as soon as possible, which it then does when it receives a kick on the event vring.

So to recap, this is unclear to me, why the driver will send a kick to the event vring ?
Is that in the spec or in the implementation?

In the commit there some something to be fixed

Oops! too many d-words :). Fixed it.

Why the driver will send a kick to the event vring?

I can't reply to the other comment on this same topic for some reason, so I'll reply here too:

I mean, why the guest will fill up that queue again after a snapshot?

What happen if the guest fills the event queue before we set self.transport_reset_pending?

The driver probably doesn't (necessarily) push more buffers or send a kick after a restore, the fronted does (or should, at least CrosVm does).

I'll admit I haven't looked at QEMU's source in depth, but I'd image that if the rx queue has buffers in it a kick is sent to the device after restore to activate that queue, otherwise the device won't be able to send data to the driver until the driver pushes more buffers, which it has no reason to do since the ones it already pushed are just sitting there.

The backend could avoid waiting for the kick if it stored the queue state in its saved state, but there is still the question of when to send the buffer: Immediately after loading its state and replying to the "check" call? Wouldn't that be too soon, for example if it doesn't yet have call fd?

I went looking at the QEMU source code after all, or to be precise I asked Gemini to look for me. It seems QEMU does not in fact send that kick like crosvm does and instead depends on the driver to kick the queues when responding to the transport event OR on the device implementation to store the state of the queue. The virtio spec doesn't mention that the driver should kick the queues on transport reset though. In any case this approach seems to work well with the vhost backend, but the vhost-user backend is marked by qemu as unmigratable.

Other vhost-user devices, like vhost-user-blk appear to have a "kick right away" logic similar to what crosvm does for vhost-user-vsock. So, maybe it would be a good idea to add it in vhost-user-vsock too.

In the commit there some something to be fixed

Oops! too many d-words :). Fixed it.

Why the driver will send a kick to the event vring?

I can't reply to the other comment on this same topic for some reason, so I'll reply here too:

I mean, why the guest will fill up that queue again after a snapshot?

What happen if the guest fills the event queue before we set self.transport_reset_pending?

The driver probably doesn't (necessarily) push more buffers or send a kick after a restore, the fronted does (or should, at least CrosVm does).

Sorry, why a frontend should inject a kick? This seems more an hack to me?

A kick should means: "there is something new in the avail ring, please process it". What new stuff are in the vring after that kick?

I'll admit I haven't looked at QEMU's source in depth, but I'd image that if the rx queue has buffers in it a kick is sent to the device after restore to activate that queue, otherwise the device won't be able to send data to the driver until the driver pushes more buffers, which it has no reason to do since the ones it already pushed are just sitting there.

The device IMO should know that is starting with a driver already initialized and should not wait for a kick, but start to process the queue just after starting.

That said, for a TX queue may have sense, but we are not using at all in that sense in this case, since no new buffers will be there. Also we are not processing anything from the guest, so I'm still really confused why a kick is needed if the device doesn't need to process anything from the guest, but this seems more a frontend -> backend notification. Again, this seems purely an hack, and should not be enabled by default IMHO.

The backend could avoid waiting for the kick if it stored the queue state in its saved state, but there is still the question of when to send the buffer: Immediately after loading its state and replying to the "check" call? Wouldn't that be too soon, for example if it doesn't yet have call fd?

What about VHOST_USER_SET_VRING_ENABLE event? (maybe we need to extend the

The [virtio spec](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-4950001) mandates that when restoring from a snapshot (such as when the VM was migrated to a different host) the device must send a TRANSPORT RESET event via the event virtqueue to the driver. The driver in turn, upon receiving the event, must drop all existing connections while keeping all listeners and read the CID again from the device configuration space and update the listeners with the (potentially different) CID. On the device side care must be taken to ensure no new connections are established until that transport reset event is handled otherwise the guest will silently drop them and the peer may not know about it. Given that the driver must read the CID from the configuration space during normal boot and after doing a reset that's the best signal for the device to "activate" and start processing packets from the host and sibling VMs. Feature negotiation also happens during both normal boot and restore, but in the case of restore it happens before the transport reset and therefore can't be used as reliable signal to "activate" the device. The device doesn't need to save/load any state for this. When loading a snapshot the device simply notes that it must send a TRANSPORT RESET event to the driver as soon as possible, which it then does when it receives a kick on the event vring. Independently of whether the device is started to restore a previous VM state or brand new (the device actually doesn't know until it `set_device_state_fd` is called), the device always starts in "inactive" state, meaning it will drop any packets coming from any source. As mentioned before the device "activates" once the driver has read the configuration space. Unlike other VMMs, QEMU takes ownership of the event vring and handles sending of the TRANSPORT RESET event itself. This change attempts to handle both approaches by sending the TRANSPORT RESET event if the vring is kicked, but doesn't treat it as precondition to activate the device, instead choosing to activate unconditionally once the driver reads the configuration space. There is a race in this implementation, that occurs when the snapshot was taken before the guest driver read the configuration and then it reads immediately upon restore, before the transport reset event was handled. While this could be mitigated by waiting for the reset event to be sent AND the config to be read AFTER the event was sent, QEMU's decision to take over the event queue makes that approach unviable. Hopefully, it will be very unlikely a snapshot will be taken before the driver is fully initialized as that would be of very little value in practice. Signed-off-by: Jorge E. Moreira <jemoreira@google.com>

jemoreira · 2026-03-06T00:32:47Z

Note: I'll be off and with limited internet access from March 7 to March 29, if others want to merge, I'm not against, but I'd like to have a section in the readme with a clear design on how this is going to work.

I am also going to be on vacation during that same period, we can resume after the 29th if you're still not convinced about this. I tried to address every comment you left with the latest push, except the command line flag because I think everyone is just going to run it in "qemu mode", see that it works with other VMMs too and never use it.

stefano-garzarella · 2026-03-06T08:26:34Z

vhost-device-vsock/README.md


+## Live migration
+
+This device implementation advertises support for live migrations by offering the VHOST_USER_PROTOCOL_F_DEVICE_STATE protocol feature, however this doesn't work with Qemu yet as it marks its vsock frontend as "unmigratable". This feature does work with CrosVm and potentially other virtual machine managers.


Okay, for example, if this feature is not negotiated, we can skip all of this, no?

stefano-garzarella · 2026-03-06T08:30:02Z

vhost-device-vsock/README.md

+
+The state saving flow is trivial as the device doesn't save any state as mentioned.
+
+The state loading flow is a bit more complicated because the virtio-vsock spec mandates that the device must send a VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event to the driver. During a restore the backend is started no differently than during a regular boot. When the frontend sends the VHOST_USER_SET_DEVICE_STATE_FD command with LOAD direction the backend doesn't load anything, but it takes note that a transport reset event needs to be sent to the driver via the event vring when possible. In order to make sure this event is sent when the queue is ready, the backend waits for the event queue to be kicked before sending the event. While these kicks usually come from the driver, this particular one is actually sent by the vhost-user frontend. This implementation depends on the frontend to kick all queues with pending buffers after a restore because the driver is unlikely to do so as it probably did it before the snapshot was taken.


In order to make sure this event is sent when the queue is ready, the backend waits for the event queue to be kicked before sending the event. While these kicks usually come from the driver, this particular one is actually sent by the vhost-user frontend.

This is the hack IMO. The kick means: "hey, I'm the driver and there is a new available buffer for you in the virtqueue", while here we are using for something completely different, and even not in the spec. If this is what we want from the frontend (and I don't understand why we want this), we should put into the spec and not guess here that we will receive that event.

stefano-garzarella · 2026-03-06T08:32:38Z

vhost-device-vsock/src/vhu_vsock.rs

            return Vec::new();
        }

+        if offset + size == buf.len() {


Instead of doing this, can we add a set_enabled callback to the VhostUserBackend trait and do this in that callback?

When would that set_enabled callback be called? This has to happen after the config is read because that's the only signal (according to the spec) the driver gives the device that the transport reset happened.

Also another question, why a driver should read the config space after a live migration?

For the driver should be transparent, no?

Ah, okay, here we are acking that the driver reset it. I need to reread it again, all of those assumptions are not clear to me and also not in the VIRTIO spec at all.

Should we extend it?

stefano-garzarella · 2026-03-06T08:35:38Z

vhost-device-vsock/src/vhu_vsock.rs

+                let reset_pending = &mut *self.transport_reset_pending.lock().unwrap();
+                if *reset_pending {
+                    thread.reset_transport(vring_evt, evt_idx)?;
+                    *reset_pending = false;
+                }


Just to be clear, this should not be here IMO, but in some other place, because here we should handle events coming from the driver (or backends like unix socket, etc.).
This should be done as response of some vhost_user message, like the set_enabled, etc. not to a random kick injected by the frontend.

This is the only place where the backend implementation is given access to the vrings.

The spec clearly says that "the back-end must start a ring upon receiving a kick", so without this kick the vring will not be ready and any attempt to write to it will simply fail with the NoReady error.

I see, but maybe we need to find something else. Or put this behavior into the spec because we are really implementing something custom for crosvm. (I'll try to find something better when I'm back)

Also, why not doing this checks in any case? I mean for every event?

What is confusing me is that we are using a kick to do something else. If we need a notification mechanism between frontend and backend, we should add a new message, but we should not reuse the kick which should be only used by the driver to notify the device.

stefano-garzarella · 2026-03-06T08:37:31Z

Note: I'll be off and with limited internet access from March 7 to March 29, if others want to merge, I'm not against, but I'd like to have a section in the readme with a clear design on how this is going to work.

I am also going to be on vacation during that same period, we can resume after the 29th if you're still not convinced about this. I tried to address every comment you left with the latest push, except the command line flag because I think everyone is just going to run it in "qemu mode", see that it works with other VMMs too and never use it.

Agree on that, I pointed out what is unclear to me. I think we can find a solution, but I don't like the kick injection TBH and I think we should avoid that. Let's continue when we are back ;-)

Enjoy your time off!

jemoreira requested review from dorindabassey, epilys, stefano-garzarella, stsquad and vireshk as code owners February 11, 2026 23:35

jemoreira changed the title ~~Suspend resume~~ vsock: Enable live migrations (snapshot-restore) Feb 11, 2026

jemoreira force-pushed the suspend_resume branch from ef051b4 to ba8a093 Compare February 11, 2026 23:39

jemoreira marked this pull request as draft February 12, 2026 00:07

jemoreira force-pushed the suspend_resume branch 2 times, most recently from d8fce24 to 2bcfd05 Compare February 13, 2026 00:17

jemoreira marked this pull request as ready for review February 13, 2026 00:17

jemoreira force-pushed the suspend_resume branch from 2bcfd05 to 04ec2db Compare February 13, 2026 19:12

dorindabassey reviewed Feb 17, 2026

View reviewed changes

vhost-device-vsock/src/vhu_vsock.rs Outdated Show resolved Hide resolved

vhost-device-vsock/src/vhu_vsock.rs Show resolved Hide resolved

jemoreira force-pushed the suspend_resume branch from 04ec2db to f2d4a4a Compare February 18, 2026 23:24

jemoreira requested a review from dorindabassey February 18, 2026 23:39

stefano-garzarella reviewed Feb 27, 2026

View reviewed changes

vhost-device-vsock/src/vhu_vsock_thread.rs Show resolved Hide resolved

jemoreira force-pushed the suspend_resume branch from f2d4a4a to ed40b69 Compare March 4, 2026 01:41

jemoreira requested a review from stefano-garzarella March 4, 2026 19:55

stefano-garzarella reviewed Mar 5, 2026

View reviewed changes

jemoreira force-pushed the suspend_resume branch from ed40b69 to c6725ee Compare March 6, 2026 00:23

stefano-garzarella reviewed Mar 6, 2026

View reviewed changes


		## Live migration

		This device implementation advertises support for live migrations by offering the VHOST_USER_PROTOCOL_F_DEVICE_STATE protocol feature, however this doesn't work with Qemu yet as it marks its vsock frontend as "unmigratable". This feature does work with CrosVm and potentially other virtual machine managers.


		The state saving flow is trivial as the device doesn't save any state as mentioned.

		The state loading flow is a bit more complicated because the virtio-vsock spec mandates that the device must send a VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event to the driver. During a restore the backend is started no differently than during a regular boot. When the frontend sends the VHOST_USER_SET_DEVICE_STATE_FD command with LOAD direction the backend doesn't load anything, but it takes note that a transport reset event needs to be sent to the driver via the event vring when possible. In order to make sure this event is sent when the queue is ready, the backend waits for the event queue to be kicked before sending the event. While these kicks usually come from the driver, this particular one is actually sent by the vhost-user frontend. This implementation depends on the frontend to kick all queues with pending buffers after a restore because the driver is unlikely to do so as it probably did it before the snapshot was taken.

Conversation

jemoreira commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the PR

The vsock device only accesses virtqueues when they are ready.

Enable live migrations (snapshot-restore) of the vsock device.

Requirements

Uh oh!

jemoreira commented Feb 12, 2026

Uh oh!

dorindabassey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jemoreira commented Feb 26, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jemoreira Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jemoreira Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stefano-garzarella left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jemoreira commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jemoreira commented Feb 11, 2026 •

edited

Loading

jemoreira Mar 3, 2026 •

edited

Loading

jemoreira Mar 3, 2026 •

edited

Loading