HostClaim Implementation #2795
Replies: 12 comments 54 replies
-
|
[New Component] HostClaim Data Templater This is a summary of a requirement expressed by Didrik Koren and @lentzi90 on metal3 slack.
|
Beta Was this translation helpful? Give feedback.
-
|
[Detail] Status of the HostClaim
|
Beta Was this translation helpful? Give feedback.
-
|
[Tests] HardwareData e2e tests HostClaims rely a lot on the separation BareMetalHost/HardwareData to hide the BMH from the end user. Current e2e test only look at the hardwareDetails in status (potentially fed by an annotation of the resource). We probably should add e2e tests on HardwareData and provide the HardwareData with inspection disabled in the e2e tests of HostClaims. |
Beta Was this translation helpful? Give feedback.
-
|
[Task Breakdown]
|
Beta Was this translation helpful? Give feedback.
-
|
Node Reuse We have completely changed the implementation to do all the work in the capm3 provider. When a m3machine associated to a hostclaim marked for node reuse, is deleted, we do not delete the hostclaim. Instead, we clear its fields (image, userdata, etc.) turn it off and remove the consumerRef (https://github.com/pierrecregut/cluster-api-provider-metal3/blob/hostclaim/baremetal/metal3machine_host_manager.go#L194-L217). When we create the new m3machine, we try to associate it with a hostclaim with no consumerRef and the same node-reuse label (https://github.com/pierrecregut/cluster-api-provider-metal3/blob/hostclaim/baremetal/metal3machine_host_manager.go#L606C1-L633C3). If it fails, we create a new hostclaim. If it succeeds, we wait until the node is marked available (deprovisioning takes time) using an available condition which reflects BMH condition (#2888). This does not solve the issue that no label will be removed if the size of the deployment shrinks. But this problem existed in the standard capm3 implementation without hostclaim. At least the cluster manager can delete the hostclaim and we added an annotation to record the time when the consumerRef was removed (it may not be necessary: we could use the transition time of the Available condition which should be true for such a HostClaim). The time can be used to garbage collect hostClaims that are unbound for too long. Comments are welcome on the choice of approach and its implementation (@lentzi90, the original authors of the node-reuse implementation) |
Beta Was this translation helpful? Give feedback.
-
|
Automated Cleaning Mode As we agree on the way to handle node-reuse, a side question is how we handle cleaning. @dtantsur has rightfully pointed out that end-users should not be able to do whatever they want with cleaning. A solution would be to say that with hostclaims, cleaning is always metadata (forced on the bmh while removing the finalizer) when the hostclaim is deleted, and could be whatever the user wants (propagated from hostclaim to bmh during Associate/Update), when the hostclaim is just deprovisioned. Usually, this is disabled because of node-reuse. @lentzi90 do you agree there are no valid use case where somebody may want to disable cleaning when we delete the hostclaim. |
Beta Was this translation helpful? Give feedback.
-
|
On a somewhat related note to https://github.com/orgs/metal3-io/discussions/2795#discussioncomment-15485815, we need to decide what we do about force-deleted HostClaims. We already have bad experience with users force-deleting BareMetalHosts, somebody will definitely try to "solve" their problems by On top of that, in many of the cleaning suggestions, forced deletion of a host claim will leave the BMH with whatever Any ideas here? |
Beta Was this translation helpful? Give feedback.
-
|
Today we touched upon the interaction between HostFirmwareSettings/HostFirmwareComponents and HostClaims. While I don't think we need to solve this problem in the first iteration of HostClaims, @jacob-anders and I have discussed it briefly, here are our thoughts:
|
Beta Was this translation helpful? Give feedback.
-
|
Selection within a namespace for HostDeployPolicy They would like to have a minimal description of the BMH, (bmc address and credential), launch inspection and label the BMH on the result of the inspection. So far, so good. The problem is that all BMH will end up in the same namespace independently of the characteristics. So they would like to be able to tell in the HDP, customer A (ie namespace A or with label user=A) can use the BMH of this namespace with label size value to small or medium but not large because he is not a premium customer. This could be implemented as an added selector put in the HDP object but the semantics is really tricky. The main issue is that now we may have several selectors coming from all the HDP policies in the namespace of the BMH that are satisfied by the user. Different HDP tend to be interpreted as a disjunction of requirements (you need to satisfy one of them).
|
Beta Was this translation helpful? Give feedback.
-
|
Hide the inventory Hardware teams do not want to share their inventory with customers. So there should be at least a mode where user do not need to have read right on HardwareData. This can be easily achieved by copying/synchronizing hardwareData to the namespace of the user and pointing (hardwareData field in hostclaim status) to this resource. Copy of BMH metadata should be done on this copy. |
Beta Was this translation helpful? Give feedback.
-
|
Hello, Went through the blueprint and the discussion here and this feature is very interesting for us. would like to add some questions/remarks about it, not sure if this is the right place. Disclaimer:
We have seen the following "problems" that the current proposal seem to tackle:
So the above points, very nice to see this proposal tackling them. We are concerned about 1 more aspect that we are not sure how much it is tight to this proposal, specifically
In the current proposal, the
The idea I had while going through the proposal and discussion was: if
Sidenote: we can potentially have similar behavior if hostClaim controller is part of BMO but will require a "golden" BMO that runs he hostClaim controller and some very sketchy RBACs. Also having this as a "recognized" pattern will help, instead of us doing downstream very sketchy changes. Im not sure whether:
If something is unclear or more input needed, let me know (i can also join the office hours to discuss there if needed) ( and also thanks a lot for all the work, as i mentioned above |
Beta Was this translation helpful? Give feedback.
-
|
When an explicit HostDeployPolicy is missing, does it make sense to assume that HostClaims from a namespace can use BareMetalHosts in the same namespace? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
As discussed during nov 2025 MTM, this is a discussion on the implementation of HostClaims covering at least:
Beta Was this translation helpful? Give feedback.
All reactions