Add support for virtio device reset#5891
Conversation
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5891 +/- ##
==========================================
- Coverage 83.01% 82.80% -0.21%
==========================================
Files 277 277
Lines 30129 30221 +92
==========================================
+ Hits 25012 25025 +13
- Misses 5117 5196 +79
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| for queue in self.queues_mut() { | ||
| *queue = Queue::new(queue.max_size); | ||
| } | ||
| self._reset() |
There was a problem hiding this comment.
should this be called before the rest? This is a semantic change from the old code where a failed reset left the device activated.
There was a problem hiding this comment.
It left it activated in Firecracker but set the state as FAILED so in theory the guest shouldn't continue using it. But does it matter much since we'll support reset for all devices except for vhost-user-block (for now)?
There was a problem hiding this comment.
deactivate(), set_acked_features(0), and the queue wipe all run before _reset() is consulted. For a backend that returns false (vhost-user-block), the VMM-side state is already mutated while the external backend was never told to stop
There was a problem hiding this comment.
Ok, I re-ordered the links and made reset() call _reset() first thing. However, obviously the device will end up in a FAILED state regardless.
| self.device_state.is_activated() | ||
| } | ||
|
|
||
| fn deactivate(&mut self) { |
There was a problem hiding this comment.
what about the rate limiters? are we assuming we want to preserve the state across reset?
There was a problem hiding this comment.
I think we must since otherwise guest can bypass rate limiter by resetting
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
| for queue in self.queues_mut() { | ||
| *queue = Queue::new(queue.max_size); | ||
| } | ||
| self._reset() |
There was a problem hiding this comment.
deactivate(), set_acked_features(0), and the queue wipe all run before _reset() is consulted. For a backend that returns false (vhost-user-block), the VMM-side state is already mutated while the external backend was never told to stop
| } | ||
|
|
||
| fn _reset(&mut self) -> bool { | ||
| false |
There was a problem hiding this comment.
Returning false here combined with the new MMIO path means any status=0 write after ACKNOWLEDGE permanently sets FAILED for vhost-user-block.
There was a problem hiding this comment.
As noted in the other comment, even before this PR we'd set the device to FAILED state if we got a reset after DRIVER_OK. Does it matter now that this can happen after ACKNOWLEDGE?
| } else if status == INIT { | ||
| { | ||
| let mut locked_device = self.device.lock().expect("Poisoned lock"); | ||
| if locked_device.is_activated() { |
There was a problem hiding this comment.
Was dropping the is_activated() guard intentional? Previously the device-level reset only ran when is_activated(). Now it runs whenever device_status != INIT. Combined with _reset() -> false backends this set FAILED for resets issued during early init
There was a problem hiding this comment.
It was intentional yes, as it was explicitly called out in the commit message. As far as I can tell the "activated" state is a Firecracker term rather than a virtio spec term. I don't think the spec says we must ignore resets before reaching a certain state such as DRIVER_OK. It's true that for vhost-user-block an early reset will lead to a FAILED state, but is this a problem? Already before this PR a reset after DRIVER_OK would lead to a FAILED state, does it matter that it now leads to a FAILED state even before DRIVER_OK?
| fn deactivate(&mut self); | ||
|
|
||
| /// Reset the device. Returns true on success, false otherwise. | ||
| /// It must not be overridden. |
There was a problem hiding this comment.
Should "must not be overridden" be enforced somehow?
There was a problem hiding this comment.
I felt the same, but unfortunately Rust doesn't seem to have a final keyword like C++. I'm open to suggestions though if you can think of a different way to do it.
The reset() trait method returns Option<(Arc<dyn VirtioInterrupt>, Vec<EventFd>)> for no apparent reason and the values are never used by any caller. Change the return value to a bool to signify success or failure. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Simplify the MMIO transport reset logic by always performing the MMIO transport reset unconditionally and calling device.reset() regardless of activation state. This requires updating the affected unit tests and implementing reset() for the DummyDevice. The unit test assumed that reset() is never supported (i.e. always returns false), but that is going to change soon so we want to make sure that resetting moves the device to the deactivated state. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The reset success path locked self.device, then tried to lock it again via self.virtio_device().lock() to reset queues. This was a deadlock that was never triggered because no device previously implemented reset() (all returned false). Use the already-held guard instead. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Don't clear virtio_interrupt on reset. The interrupt object holds Arc references to the PCI device's MSI-X configuration which remains valid across reset. Clearing it will cause a panic at the unwrap() in the activation path when the guest re-probes the device after reset. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reset device_feature_select and driver_feature_select to 0 on device reset, matching what the MMIO transport does with features_select and acked_features_select. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add test_bus_device_reset_failure to verify that when device.reset() returns false, the FAILED bit is set in device_status. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a deactivate() method that sets the device state to Inactive. This will be used by the generic reset implementation in a subsequent patch. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Move the queue reset logic from the MMIO and PCI transport code into the default reset() implementation in the VirtioDevice trait. This is generic virtio state that should be reset for all devices, regardless of transport. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Complete the reset() implementation by deactivating the device and setting acked_features to 0. This must happen for all virtio backends. Introduce a _reset() method that is called at the beginning of reset() and can be overridden by virtio backends with backend specific reset code. Make all backends return false for now since none of them supports reset yet. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a clear() method to RxBuffers that resets all fields to their initial state. This will be used by the virtio-net reset implementation. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the _reset() method for the virtio-net device, resetting the net specific state in-place. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-net device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Suggested-by: Adam Jensen <adam@acj.sh> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the _reset() method for the virtio-block device, resetting the device-specific state in-place. For the async io_uring engine, replace the ring with a fresh one so that any in-flight operations are cancelled by the kernel when the old ring fd is closed. This only adds support for the Virtio backend for now and not VhostUser. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-block device reset works end-to-end by unbinding and rebinding the guest driver for a scratch block device and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Pmem does not have any backend specific state that needs resetting so implement the _reset() method by simply returning true. The rest of the state is handled by the generic reset(). Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-pmem device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the _reset() method for the virtio-balloon device, resetting the balloon specific state in-place. Do not deflate the balloon on reset. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-balloon device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The entropy device does not have any backend specific state that needs resetting so implement the _reset() method by simply returning true. The test_failed_reset_blocks_reinitialization() test used an entropy device assuming it doesn't support reset. Since it now does, adjust the test to check that the device is first deactivated after a reset and then the driver can perform the initialization sequence again. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-rng device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Implement the _reset() method for the virtio-vsock device, resetting the vsock specific state in-place. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-vsock device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
The virtio-mem device does not have any backend specific state that needs resetting so implement the _reset() method by simply returning true. Plugged memory regions remain intact. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a test that verifies virtio-mem device reset works end-to-end by unbinding and rebinding the guest driver and checking the device remains functional. Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5891 Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add support for virtio device reset
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.