-
Notifications
You must be signed in to change notification settings - Fork 398
Viostor: Fix a potential memory corruption caused by writing SRB result #1326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Viostor: Fix a potential memory corruption caused by writing SRB result #1326
Conversation
Can one of the admins verify this patch? |
Hello, it is still a draft. I rebased the commit from our release branch which means applying some recent changes including the clang formatting. So, I would like to test I did not break anything by the rebase. Of course, I would be happy if someone looks at my changes since my solutions has a drawback of limiting number of SRBs being processed by the driver at once. Similar issue is present also in vioscsi and I plan to port my changes to that driver as well. |
25f6b31
to
8452185
Compare
…t status When passing SRB to the host, such as read or write request, the buffers to be read/written are passed in form of so-called scatter-gather lists (SG lists), containing physical addresses and lengths of the buffers. Initial SG lista are passed to viostor (and vioscsi as well) by Storport as part of the SRB. Viostor/vioscsi enlarges each SG list by two small buffers: - one to specify the command for the host, - one to receive the result. However, if a SRB is completed as part of a bus reset request to the driver, the memory to rceive SRB's result no longer belongs to the SRB (since the SRB was completed). The host, however, does not know that, thus, when it completes its work, it writes the result to a memory location no longer owned by the driver. This commit is a remedy for such a memory corruption. Memory for receiving results for SRB commands is now part of the adapter extension. It seems, that memory allocations during request processing (i.e. BuildIo callback) are not possible for miniports servicing boot disks, thus, the space reserved for receiving SRB results is limited. This means, there is an upper limit on number of SRBs being processed by the driver at once. Signed-Off-By: Martin Drab <[email protected]>
8452185
to
a8d0d42
Compare
Just to explain "Can one of the admins verify this patch?" comment. According to the new Red Hat policy, we are stopping running CI by default for PRs from What does this mean? When you send PR, the CI will not be triggered. The comment |
test this please |
Ah, I see, thank you for the explanation. |
Thanks for sharing your patch, Martin. Just so we are on the same page, can you clarify / specify the following? You mentioned:
Can you please share what we call these in
It is our understanding that all StorPort memory allocations should only be requested following initialisation and before
Have you been able to determine what the upper bound is, whether mathematically or in testing? Have you undertaken any comparative performance testing with and without this patch? We did see value in refactoring the several disparate SRB routines into For example, instead of: VioStorVQLock(DeviceExtension, MessageId, &LockHandle, FALSE);
...
VioStorVQUnlock(DeviceExtension, MessageId, &LockHandle, FALSE); ...you might use: VioStorVQLock(DeviceExtension, QUEUE_TO_MESSAGE(QueueNumber), &LockHandle, FALSE);
...
VioStorVQUnlock(DeviceExtension, QUEUE_TO_MESSAGE(QueueNumber), &LockHandle, FALSE); ... and call Can you please confirm the issue definitely exists in Also, does this PR also attempt to fix the issue that PR #1295 was supposed to fix? |
Hello,
Look into
As far as I know, this does not break compatibility with the StorPort, since elements of the original SG list are not modified, they are just passed as part of another list. NOTE: I looked into the I have a patch ready for this as well and will add it into this PR shortly.
Basically, the main problem here is this:
Remove (2) and things should get much better. It can be done for My PRs only deal with potential memory corruption. The right way to fix the issue (on my opinion) would be to:
AFAIK, there is currently no way for viostor to signal bus reset to the host, so a modification on host side is required (or I am missing something). |
28d796f
to
7782fd4
Compare
When a pending request gets completed by the driver before it is handled by the host, it may happen that the host have not read the command header yet. This means a read from a physical memory no longer owned by the driver. Moving the physical memory holding the command header to the Status table as well enforces that the physical memory is always owned by the driver when the host code is reading from it. Signed-Off-By: Martin Drab <[email protected]>
7782fd4
to
e35ba95
Compare
OK, I moved also the physical memory used to transfer command header ( |
test this please |
@MartinDrab All the best, |
@benyamin-codez @MartinDrab Thanks, |
@MartinDrab @vrozenfe @wangyan0507 Thank you for your posts. Martin, I just wanted to say we are on the same page regarding the wrapping of the StorPort provided SG list, Vadim, I agree it would indeed be helpful if you could share some of your IOMMU / DMAv3 work for the reasons you mentioned. There appears to be a few issues related to SG list mechanics at present, and I'm sure sharing your work would assist anyone scoping assessment or development work in this space. Obviously we have a problem if StorPort destroys the SRB artefacts while we are still relying on data stored in the extension, including in the SG list. Having said that, if the SRB has been completed, then we shouldn't have any need for the SG list, whether this is the StorPort generated addresses at So absent a solution in the SG list mechanics, if we were to focus on the SRB completion mechanics during reset instead, there are a couple of perhaps prudent observations: Martin mentioned the // viostor
BOOLEAN
VirtIoResetBus(IN PVOID DeviceExtension, IN ULONG PathId)
{
UNREFERENCED_PARAMETER(PathId);
PADAPTER_EXTENSION adaptExt; // <-- Orphan...?
adaptExt = (PADAPTER_EXTENSION)DeviceExtension; // <-- Orphan...?
CompletePendingRequests(DeviceExtension);
return TRUE;
// vioscsi
BOOLEAN
VioScsiResetBus(IN PVOID DeviceExtension, IN ULONG PathId)
{
UNREFERENCED_PARAMETER(PathId);
// CompletePendingRequestsOnReset(DeviceExtension); // <-- This line is perhaps missing
return DeviceReset(DeviceExtension);
} Here, I suggest that My other observation, which I have mentioned elsewhere, is something common to both I'm a couple of days away from running up my Best regards, P.S.: Just a further note wrt the mechanics of the |
Well, I do not think this is the right solution (although the documentation states that you should complete pending SRBs on bus reset). Let's say, for example, a bus reset callback is invoked and there are 5 pending SRBs, so we complete them. Meanwhile, some of them may also get completed by the host but this information does not reach Storport (it gets informed about
Yes but AFAIK QEMU's code related to viostor (virtio-blk.c) does not epect such a request (unlike vioscsi), thus, I am afraid we cannot achieve this only from the driver side. It would be great to have such mechanism since then the following scenario might be possible:
(at least I hope such a scenario would be possible).
Such behavior is an error if it happens after
I am slowly starting to believe that someone run into the same/similar issues in the past and
I agree. This PR is not a reaction/solution to #907. |
Thanks for the follow up, Martin.
So I think I missed the mark with this one. This is just setting the SRB status of the SRB that called for the reset, not the pending SRBs - unless it also completes itself... 8^O That would probably be bad, yes...? It's probably worth a check via trace, or failing that, in the debugger...
In This all occurs under It was my understanding that StorPort would then retry the transfer, but perhaps this is the host (the requesting process) instead. Regardless, at some point StorPort may report an exception to the host and the host may retry the request anyway. There could potentially be a problem if we timeout (exceed 10s) and StorPort resumes even though we are still holding a spinlock. Although it probably would be a rare event to ever exceed the timeout, we should probably test without the @wangyan0507, during pending request completion
Yes, I was just highlighting they seem to share a common factor, i.e. SRB operations under / following reset...
The solution does seem to have a bit of a work-around vibe, especially with the bugcheck in the mix too. It is possible our pending request completion mechanics might not be using the correct method... We should probably research this. I think it worth suggesting that part of the problem might also be that we aren't considering the multi-tier reset mechanics present in StorPort. Presently we just: case SRB_FUNCTION_RESET_BUS:
case SRB_FUNCTION_RESET_DEVICE:
case SRB_FUNCTION_RESET_LOGICAL_UNIT:
{
// none or some reset action
return true;
} So, e.g. when StorPort requests a bus or device reset, having a LUN reset occur instead, might result in undefined behaviour, and this might be the case for other combinations also. This could be due to a legacy inheritance from the SCSI Port driver, as we see in Anyway, if we keep poking around, I'm sure we'll work it out.
Yes, we might need to consider this one further... Best regards, |
IOMMU/DMA v3 support for vioscsi - WIP SER='-serial tcp:127.0.0.1:4445' can survive both configurations Doesn't work on WS2016 and pre 1803 Win10s. Regarding the viostor reset – I was experimenting with that in the past for my own research. In my opinion, it can be a challenging task. If we’re talking about QEMU only, then resetting Virtio queues might be enough. But in the case of FPGA IP-backed Virtio, we would probably need a complete HBA reset (and IIRC, we don’t have an interface for that in QEMU or the Virtio-block spec). However, if we're speaking about QEMU configurations, it's important to note that Virtio-blk and Virtio-scsi are two different devices. If you need a SCSI-like device, use virtio-scsi, while virtio-blk historically was a solution for a local image on host-attached storage. You shouldn’t expect long delays (like 30 or 60 seconds) for such a configuration. Honestly, I’m still unsure if we need the bus reset logic for the virtio-blk driver, but I definitely don’t know all the use cases for which the virtio-blk device is used. In any case, please keep in mind that virtio-blk and virtio-scsi are distinct devices, and I personally see no reason to copy the same code or features from one driver to the other without considering the actual use cases. All the best, BTW, do you think it would be useful to set up an NG or chat room (like on Slack or something similar)? |
Thanks, Vadim.
I think at the very least we should use a discussion, but I'm open to alternatives. A discussion for reset behaviour might be in order too. Re Have we tried any alternatives such as after completing pending requests, then calling IIRC, a HBA reset starts by writing a zero to the status register, before acking and asserting the suitable driver.
With new feature sets and active development, e.g. IOThread VQ Mapping, and the existing bug fixes in the pipeline here, that could very well change...
I think the "without considering the actual use cases" part is really important here. Having said that, and given that both drivers are now Storport miniport drivers, a carefully managed consolidation of the code base between the two drivers would certainly make their differences more conspicuous and meaningful. This should not only improve RAS, TCO and manageability, but it has flow on effects too, e.g. writing automated QE code becomes more economical and manageable, amongst other advantages. It would likely also assist with migrating to other storage-based VirtIO devices, e.g. Best Regards, |
@benyamin-codez If I set the global timeout value to 60s, and the sleep time bigger than 60s(e.g. 100s), it will also cause this issue. Best Regards, |
When passing SRB to the host, such as read or write request, the buffers to be read/written are passed in form of so-called scatter-gather lists (SG lists), containing physical addresses and lengths of the buffers. Initial SG lista are passed to viostor (and vioscsi as well) by Storport as part of the SRB. Viostor/vioscsi enlarges each SG list by two small buffers:
However, if a SRB is completed as part of a bus reset request to the driver, the memory to rceive SRB's result no longer belongs to the SRB (since the SRB was completed). The host, however, does not know that, thus, when it completes its work, it writes the result to a memory location no longer owned by the driver.
This commit is a remedy for such a memory corruption. Memory for receiving results for SRB commands is now part of the adapter extension. It seems, that memory allocations during request processing (i.e. BuildIo callback) are not possible for miniports servicing boot disks, thus, the space reserved for receiving SRB results is limited. This means, there is an upper limit on number of SRBs being processed by the driver at once.