Skip to content

[RFE] Temporary cleaning mode override for the consumer of a BareMetalHost #2946

@pierrecregut

Description

@pierrecregut

User Story

The setting is a multi-tenant deployment with two kind of actors: consumers of BareMetalHost who hold a reference on a BareMetalHost, typically through a hostclaim, an infrastructure administrator responsible for the state of the BareMetalHosts given to consumers.
As a consumer, I want to control the level of cleaning applied when the BareMetalHost is deprovisionned but I still have a reference on it. For example I may want to disable cleaning (required for the implementation of capm3 node reuse with hostclaims).
As the administrator, I want to enforce a minimum level of cleaning as soon as the baremetalhost is given back to the pool (the consumerRef is removed).

Detailed Description

There is only one field (automatedCleaningMode). If the consumer overwrite the value, the default intended by the administrator is lost.

Anything else you would like to add:
Review of solutions:

  1. Annotation for single override of the cleaning mode.
    The user can set an annotation (baremetal.metal3.io/cleaning). The value is the cleaning level for the next cleaning operation. If the node is available, the cleaning operation is performed immediately. If the node is provisionned, the cleaning (or lack of) will be performed during deprovisioning. The consumer (typically the hostclaim controller) is responsible for pushing the annotation at the right time. More details in [RFE] Annotation to re-trigger automated cleaning #2922.

  2. Cleaning mode override
    We introduce a new field in the spec of the BareMetalHost to specify how cleaning is performed while the consumerRef exists. To enforce the fact that this field is cleared when the consumerRef is cleared, we either
    a. put it inside the consumerRef or
    b. in a consumerOverride field with an admission constraint requesting that the field is nil unless a consumerRef exists.
    We also need a field in the status that records the last level of cleaning performed with the override. The reason is that we may clear the consumerRef of an unprovisionned BareMetalHost. If we rely only on automated cleaning and the override specified no cleaning, then the host will not be cleaned before it is handled to another user. As soon as the override disapear, if the bareMetalHost is available but the laast cleaning performed (recorded in status) is weaker than themode specified in automatedCleaning, a manual cleaning is performed immediately.

  3. Fully declarative cleaning
    We use the same fields as in the previous proposal. The main change is that we disable automatedCleaning and always relie on the comparison between the cleaning status and the current mode. The current mode is either the value in the override if it exists, or the value of .spec.automatedCleaningMode. When the BareMetalHost is available and a current mode is set to a stronger value than the one recorded in status, a cleaning operation is performed.

Solution 1. is very imperative in spirit and without an additional cleaningStatus, the end user cannot know the level of cleaning performed on an available host.
Solution 2. distinguishes a declarative behavior while the overrides is defined (cleaning status follows strictly the value specified in the override) from the legacy behaviour used when the overrides is nil.
The main drawback of 3. is that it changes the behaviour of automatedCleaningMode. As soon as the field is changed, on an available BareMetalHost, a cleaning operation may be performed if the previous cleaning status was not good enough. If the bareMetalHost is provisionned (provisioning), there is no change. The setting will only apply on the next deprovisioning operation. But the semantics of 3 is much cleaner and simpler to understand.

Solution 2.b (3.b) seem better than 2.a (3.a) because it does not modify the type of consumerRef (a corev1 reference).

/kind feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue is ready to be actively worked on.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions