Skip to content

Want CFS/NSNR and related NVMe Error Injection Capabilities #698

Open
@rmustacc

Description

@rmustacc

In the field, there are a number of ways that NVMe drives can fail. There are few things that it would be great to simulate in Propolis. I'm going to call out a few of these here:

  • We would like the ability to set the drive into a CFS state where by that bit is set and it will refuse to process additional I/O. There are two different paths forward we would like to have on the device:
    • On reset, the device continues to basically issue an NSNR until we format it. I realize that may be a little complicated as it requires some persistent state to be kept and tracked around. I think it'd be fine is this was only in-memory state.
    • On reset the CFS is cleared and the device begins processing again. The former is one we've seen more commonly on certain firmware revs, but this behavior seems useful to test.
  • A device starting up in a read-only media mode
  • Injecting asynchronous events that cover:
    • NVM subsystem reliability which results in a read-only media and related information in the SMART / health log. PArticularly, bit 3 is the most relevant in the critical warning field, but others may be interesting.
    • Transient / persistent internal errors, the latter which results in CFS being set.

Obviously with all these these are things we likely don't want to include in the product as a whole; however, the media read-only mode based on crucible going down to a single replica may be interesting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request.storageRelated to storage devices/backends.testingRelated to testing and/or the PHD test framework.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions