Skip to content

Handle failed cache devices #3796

@jbaublitz

Description

@jbaublitz

Currently, our plan is to have RAID cover data devices only. Cache is not included and this leads to a problem. With RAID, you would theoretically be able to remove failed devices, but that does not seem like enough justification to add RAID to a cache, which will otherwise not benefit a lot. We need to find another way to handle failed devices in the cache so that users don't get into a position where they cannot start their pool.

There are two separate cases we need to handle: a started and stopped pool. In a started pool, it's relatively easy to add remove cache functionality. That is straightforward. In the stopped pool case, there are a few considerations. If a cache fails to be set up, we could theoretically just remove the cache for the user so that the pool can still be set up and they can re-add the cache. The other option, if we're willing to target only V2 pools for this functionality, would be to provide a way to remove the cache from stopped pools. As the Stratis metadata is available in V2 in all cases, we would theoretically be able to target a specific metadata operation to remove the cache on a stopped pool. This, however, would not be able to be done in V1.

@drckeefe @mulkieran I'm open to your feedback here.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions