Skip to content

Conversation

@Rozzii
Copy link
Member

@Rozzii Rozzii commented Jul 21, 2025

This commit:

  • Implements a new spec field of the BMH resource and implements related reconciliation logic in the bare metal host controller
  • Extends Ironic node update to servicing, preparing and deprovisioning states too
  • Renames getInstanceUpdateOpts to getProvisioningInstanceUpdateOptsForNode to provide a more explicit name for the function and make it consistent with other UpdateOptsForNode type functions

Notes:

  • The new spec field is optional and not using it should provide the same UX as older BMO versions
  • Kernel pre provisioning arguments can't be changed when a node transitions from inspected (available) state to provisioning state if Ironic is configured to fast-track deployments.

I think this PR counts as ✨ because although the communication between BMO and the provisioner does change and there could be a +3 increase in the number of update operations, the new functionality is optional.

Related design PR: metal3-io/metal3-docs#552

@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2025
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zaneb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot requested review from elfosardo and honza July 21, 2025 12:50
@metal3-io-bot metal3-io-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 21, 2025
@Rozzii Rozzii moved this to BMO WIP in Metal3 - Roadmap Jul 21, 2025
@Rozzii Rozzii added this to the BMO - v0.11 milestone Jul 21, 2025
@Rozzii
Copy link
Member Author

Rozzii commented Jul 21, 2025

/retest

@Rozzii
Copy link
Member Author

Rozzii commented Jul 23, 2025

/ok-to-test

@metal3-io-bot metal3-io-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 23, 2025
@Rozzii Rozzii force-pushed the node_spec_kern_args branch from 88beeae to 5322972 Compare July 23, 2025 10:31
@Rozzii
Copy link
Member Author

Rozzii commented Jul 23, 2025

@dtantsur this is the initial idea based on the proposal, I am testing it manually , so far it worked I think I will add the e2e in a separate PR, but I think I need to add some unit testing still to this PR.

@Rozzii
Copy link
Member Author

Rozzii commented Jul 23, 2025

/cc @dtantsur

This commit:
 - Implements a new spec field of the BMH resource and implements
   related reconciliation logic in the bare metal host controller
 - Extends Ironic node update to servicing and preparing states too
 - Renames getInstanceUpdateOpts to getProvisioningInstanceUpdateOptsForNode
   to provide a more explicit name for the function and make it consistent
   with other UpdateOptsForNode type functions

Notes:
 - The new spec field is optional and not using it should provide the same UX
   as older BMO versions
 - Kernel pre provisioning arguments can't be changed when a node transitions
   from inspected (available) state to provisioning state if Ironic
   is configured to fast-track deployments.

Signed-off-by: Adam Rozman <[email protected]>
@Rozzii Rozzii force-pushed the node_spec_kern_args branch from 4516861 to 7ae19c9 Compare August 7, 2025 13:08
Rozzii added 2 commits August 12, 2025 16:49
This commit:

 - Combines the extra kernel params from both BMH and the PreprovisioningImage
   CRs.

Signed-off-by: Adam Rozman <[email protected]>
This commit:
 - Makes the deprovisioning process able to update the extra kernel parameter
 field of the ironic node's driver_info before the actual deprovisioning would
 start.

This feature can come handy when the user needs different kernel options during
e.g. automated node cleaning compared to inspection.

Signed-off-by: Adam Rozman <[email protected]>
@Rozzii
Copy link
Member Author

Rozzii commented Aug 13, 2025

I think the essential parts are there, I will push an e2e related PR as a followup but this is long enough as it is.

Commits will be squashed if folks are okay with the implementation, but I will keep them separate for the time being.

@Rozzii Rozzii changed the title WIP ✨ introduce preprovisioningKernelParams to BMH ✨ introduce preprovisioningKernelParams to BMH Aug 14, 2025
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 14, 2025
@Rozzii
Copy link
Member Author

Rozzii commented Aug 14, 2025

/cc @fmuyassarov @lentzi90

@Rozzii
Copy link
Member Author

Rozzii commented Aug 14, 2025

/cc @kashifest
FYI

@kashifest
Copy link
Member

/cc @zaneb @dtantsur

@metal3-io-bot metal3-io-bot requested a review from zaneb August 14, 2025 19:37
@Rozzii Rozzii self-assigned this Aug 15, 2025
@zaneb
Copy link
Member

zaneb commented Aug 15, 2025

If I'm not mistaken, passing kargs to ironic only works when supplying a kernel+ramdisk (not a prebuilt ISO) for the IPA image.
Assuming that is the case, I would like to see the kargs also passed through to the PreprovisioningImage API so that they're available to anyone building their own ISO.

@Rozzii
Copy link
Member Author

Rozzii commented Aug 15, 2025

If I'm not mistaken, passing kargs to ironic only works when supplying a kernel+ramdisk (not a prebuilt ISO) for the IPA image. Assuming that is the case, I would like to see the kargs also passed through to the PreprovisioningImage API so that they're available to anyone building their own ISO.

It is available, this feature also integrates the kernel params passed via the PreprovisioningImage.
Idea is that if preprov image is linked to the BMH then BMO combines the kernel params from the preprov image and from the BMH both on a per node basis and it is checked when any of the LCM operations are initiated.

@zaneb
Copy link
Member

zaneb commented Aug 18, 2025

Note that the documentation for the ExtraKernelParams field in the PreprovisioningImage Status states that it "[o]nly makes sense for initrd images".

When booting an initrd+kernel image, ironic either has to put the kernel args into the iPXE config or (for virtualmedia) it builds a custom ISO containing the initrd+kernel+args. But when an ISO is supplied that is not built by ironic, there is no way for it to modify the kernel args. There's no standard place in an ISO where they are stored. So the only way to append to them is to pass them to the PreprovisioningImage controller to build into the ISO at the time it creates it.

@Rozzii
Copy link
Member Author

Rozzii commented Aug 18, 2025

Note that the documentation for the ExtraKernelParams field in the PreprovisioningImage Status states that it "[o]nly makes sense for initrd images".

When booting an initrd+kernel image, ironic either has to put the kernel args into the iPXE config or (for virtualmedia) it builds a custom ISO containing the initrd+kernel+args. But when an ISO is supplied that is not built by ironic, there is no way for it to modify the kernel args. There's no standard place in an ISO where they are stored. So the only way to append to them is to pass them to the PreprovisioningImage controller to build into the ISO at the time it creates it.

Yes that is correct and I agree with you, this feature takes into consideration what you have noted. The reason why I am concatenating the kernel params coming from BMH and PreprovImage is because that way folks who have customized ironics e.g. with custom deploy step that could in theory do whatever with the images would be supported also, so if someone would have a feature to repackage ISOs or would do some sort of magic in their own preprov image controller , they could also take advantage of the BMH preprovExtraKernelArgs param, for others it is just a noop.

@zaneb
Copy link
Member

zaneb commented Aug 19, 2025

AIUI this implementation correctly handles the case where the PreprovisioningImage's status.format is ImageFormatInitRD, but not the case where it is ImageFormatISO. It is up to the PreprovisioningImage controller to select the format from the list sent in spec.acceptFormats (for virtualmedia it contains both).

@kashifest
Copy link
Member

@zaneb @Rozzii planning to do .rc release this week and then minor release next week, do you think this feature can land in the upcoming release.

@Rozzii
Copy link
Member Author

Rozzii commented Sep 3, 2025

AIUI this implementation correctly handles the case where the PreprovisioningImage's status.format is ImageFormatInitRD, but not the case where it is ImageFormatISO. It is up to the PreprovisioningImage controller to select the format from the list sent in spec.acceptFormats (for virtualmedia it contains both).

Sorry for the late reply I was not working the last two weeks.

I don't think I am touching any IPA format related field or at least not implementing new control logic. What you have mentioned should be operating logic with this feature too.

The only thing I aim to do related to PreprovisioningImage CR is to concatenate the the kernel params provided by PreprovisioningImage CR and BMH CR.

In case of the virtualmedia pre-built ISOs the new logic would mean a NO-OP from Ironic side because after BMO updates the preprov kernel params related Ironic node API endpoint with the concatenated params Ironic will disregard it. Ironic will disregard preprov kernel params coming from API or from config file or env variable in case a pre-built IPA ISO is used regardless if it was provided by a controller or "manually".

I might be misunderstanding the problem, in any case my intention is to not get in the way of PreprovisioningImage controller.

IMO this feature could land, I propose to put this behind a feature gate, mark it as alpha or beta, I have not used feature gates so far so I have to look and ask around, please wait with the RC until I add the gate.

@Rozzii
Copy link
Member Author

Rozzii commented Sep 3, 2025

v0.11-rc.0 will go out without this anyhow so most likely this wont get merged to 0.11 but anyhow it would be nice to come to a consensus soon.

@kashifest
Copy link
Member

v0.11-rc.0 will go out without this anyhow so most likely this wont get merged to 0.11 but anyhow it would be nice to come to a consensus soon.

I would say lets take it in even after the .rc if we have a consensus before the actual minor

@zaneb
Copy link
Member

zaneb commented Sep 4, 2025

To make this feature work universally we need to add an extraKernelParams field to the Spec of PreprovisioningImage.

When a PreprovisioningImage is used:

  • BMH controller copies the preprovisioningExtraKernelParams from the BMH Spec to the PreprovisioningImage Spec
  • PPI controller passes the extraKernelParams from the Spec to the image provider in the ImageData struct
  • The default image provider copies the extraKernelParams input to the GeneratedImage output when the image type is initrd
  • BMH controller passes the extraKernelParams field from the PreprovisioningImage status to ironic

And when a PreprovisioningImage is not used, the BMH controller passes the preprovisioningExtraKernelParams from the BMH directly to ironic, the same as in the PR currently.

I don't think this is super complicated to implement. The PreprovisioningImage controller side of things is +4 lines of code. The BMH controller side is +1/-6 lines of code. Let's go ahead and do it now rather than set ourselves up for inconsistent behaviour between image types.

@Rozzii
Copy link
Member Author

Rozzii commented Sep 4, 2025

To make this feature work universally we need to add an extraKernelParams field to the Spec of PreprovisioningImage.

When a PreprovisioningImage is used:

* BMH controller copies the preprovisioningExtraKernelParams from the BMH Spec to the PreprovisioningImage Spec

* PPI controller passes the extraKernelParams from the Spec to the image provider in the [ImageData struct](https://github.com/metal3-io/baremetal-operator/blob/main/pkg/imageprovider/imageprovider.go#L11-L16)

* The [default image provider](https://github.com/metal3-io/baremetal-operator/blob/main/pkg/imageprovider/default.go) copies the extraKernelParams input to the GeneratedImage output when the image type is initrd

* BMH controller passes the extraKernelParams field from the PreprovisioningImage status to ironic

And when a PreprovisioningImage is not used, the BMH controller passes the preprovisioningExtraKernelParams from the BMH directly to ironic, the same as in the PR currently.

I don't think this is super complicated to implement. The PreprovisioningImage controller side of things is +4 lines of code. The BMH controller side is +1/-6 lines of code. Let's go ahead and do it now rather than set ourselves up for inconsistent behaviour between image types.

I understand your approach, I also don't think it is super complicated, I am not objecting to implementing this, if you are okay with BMH dictating extra kernel params to the PreprovisioningImage I am fine with it.

On the other hand I would like to clarify the intended use of the PreprovisioningImage controller because with the changes you are proposing I feel that we would provide a parallel way of specifying extraKernel params as there was such a way for PreprovImages already AFAIK. I am fine with that but I would like to make sure we are on the same page.

The reason why I have not yet implemented what you are asking for is that I have seen the extraKernelParams status field here https://doc.crds.dev/github.com/metal3-io/baremetal-operator/metal3.io/PreprovisioningImage/[email protected]#status-extraKernelParams and checked https://github.com/openshift/image-customization-controller .

So my assumption was that the desired behavior from UX/developer perspective was that the "external co-controller/image provider" implementation would (and they do) facilitate a way to specify extraKernelParams used during preprov "image building" and then it is reported by the co-controller/image provider into the status field of the PreprovImage CR and then BMH would read that status field. So in the current way of using the PreprovImage CR , it is the PreprovImage CR's status field that dictates extraKernelParams to BMH and that field is populated by external means. I have considered it redundant or potentially confusing for users to implement an extraKernelParams status field and a spec field the same time and to present the same information from to different "directions" that belongs to two different approaches of supplying this data.

That said, we can leave it to the external controller/image provider to reconcile the extra kernel params if it would be coming from a new spec field of the PreprovisioningImage CR if you say we should extend this "interface" between the image providers and BMH. I am not an active user of the PreprovisioningImage CR and related controllers so I was just worried that I would confuse users and devs or introduce silent incompatibilities. Many of the image providers are not even FOSS and they are used to implement custom preprov image handling soo it is hard for me to estimate the effect of the new spec field.

I also find the approach of BMH dictating extraKernelParams to the PreprovImage CR the more consistent and logical way same as you have explained, but I was deterred by the involvement of the external controllers and the fact that they are reporting external kernel params already.

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this patch could be far, far simpler if we stopped doing the same work 2-3 times per reconcile.

HardwareProfile profile.Profile
RootDeviceHints *metal3api.RootDeviceHints
CustomDeploy *metal3api.CustomDeploy
PreprovisioningExtraKernelParams string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even from the naming alone here it doesn't make sense to me that ProvisionData would ever need to contain any data about PreProvisioning.

}
deployImageInfo[deployRamdiskKey] = hostImage.ImageURL
// Extra params from the kernel have been already posted to
// Ironic node API in the Register function, so extra kernel args in driver info need
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er, this is the Register function.

driverOpts := clients.UpdateOptsData{
"kernel_append_params": fmtPreprovExtraKernParams(data.PreprovisioningExtraKernelParams),
}
updater.SetDriverInfoOpts(driverOpts, ironicNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to set this again when it was already set in Register(). (And we definitely shouldn't set it 4 different times for 4 different provisioning states we might be in.) We can get rid of most of the changes to the provisioner API as well (only the ManagementAccessData needs to change).

Copy link
Member Author

@Rozzii Rozzii Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update it in so many places because I wanted to make it changeable not used only for initial deployment but I want to allow changing the kernel parameters before the LCM operations. Before cleaning, after registration, before deployment before servicing etc.

And I know this is a limitation for pre-built IPA ISOs or LiveISO boots, but for dynamically built IPA ISO or PXE it is very useful for debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ilo_username": bmcCreds.Username,
"ilo_password": bmcCreds.Password,
"ilo_address": a.hostname,
"kernel_append_params": preProvExtraKernParams,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not driver-specific (which certainly appears to be the case) then I don't feel like adding it to every driver is the right thing to do. You're already setting kernel_append_params later in the Register() call as part of setDeployParams(), there is no need to do it twice. So we can get rid of the change to the DriverInfo() API as well.

CurrentImage: getCurrentImage(info.host),
PreprovisioningImage: preprovImg,
PreprovisioningNetworkData: preprovisioningNetworkData,
PreprovisioningExtraKernelParams: r.retrievePreprovisioningExtraKernelParamsSpec(info, prov),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is actually the only function that we need to pass PreprovisioningExtraKernelParams to (see other comments), so once the others are removed we should pass preprovImg as a param here instead of calling getPreprovImage() a second time.

updater := clients.UpdateOptsBuilder(p.log)

deployImageInfo := setDeployImage(p.config, bmcAccess, data.PreprovisioningImage)
deployImageInfo := setDeployImage(p.config, bmcAccess, data.PreprovisioningImage, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: either pass data.PreprovisioningImage and data.PreprovisioningExtraKernelParams or just data. Don't pass data and one subfield of it.

comExtraKernelParams += " " + data.PreprovisioningExtraKernelParams
}
if trimPreprovImgKParams != "" {
comExtraKernelParams += " " + hostImage.ExtraKernelParams
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are adding the kargs supplied by the PreprovisioningImage to the ones passed in the BMH Spec.
But you already did that in the BMH controller (which is the correct place to do it - this shouldn't be provisioner-specific). So the ones from the PreprovisioningImage will be duplicated.

CurrentImage *metal3api.Image
PreprovisioningImage *PreprovisioningImage
PreprovisioningNetworkData string
PreprovisioningExtraKernelParams string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that we can't use the ExtraKernelParams field inside PreprovisioningImage.GeneratedImage, but when no image is set then PreprovisioningImage has to be nil, and I don't see a way around that.
I wonder if we should stop inheriting from the imageprovider.GeneratedFormat struct - which we were only really using for convenience - here though, so that we don't have two copies of the field with different values.

@zaneb
Copy link
Member

zaneb commented Sep 4, 2025

I have considered it redundant or potentially confusing for users to implement an extraKernelParams status field and a spec field the same time and to present the same information from to different "directions" that belongs to two different approaches of supplying this data.

Unfortunately the domain we are modelling is inherently confusing: for an ISO image the user-supplied params must be an input to the PreprovisioningImage so they can be built into the ISO, and for an initrd image the image-supplied params must be an output from the PreprovisioningImage so that ironic can add them to the PXE command line.

We could try to give them different names to disambiguate (but unfortunately the one that really needs to change is the one that already exists and can't be changed without breaking APIs). But the only other real choice we have to make is whether we add the user-supplied params to the image-supplied params inside the preprovisioningimage controller or inside the bmh controller. You could make an argument either way. I figured the API made more sense if the user-supplied params were always flowing through the PreprovisioningImage (and getting added to along the way). But if it were done in the BMH controller then no changes would be required to existing PreprovisioningImage controllers to get this to work, so there's a case for that too.

@Rozzii
Copy link
Member Author

Rozzii commented Sep 4, 2025

Unfortunately the domain we are modelling is inherently confusing: for an ISO image the user-supplied params must be an input to the PreprovisioningImage so they can be built into the ISO, and for an initrd image the image-supplied params must be an output from the PreprovisioningImage so that ironic can add them to the PXE command line.

Okay, I see this is a solid idea for the target workflow, it makes sense for me. I was initally surprised that the extraKernelParams spec was missing from the PreprovisioningImage for the exact same reason. Now that I understand the intended way of directing the data I will modify it accordingly to your proposal (or at least according to my understanding of your proposal).

I will get back with the changes, and big thanks for the very useful review!

I will reply to the comments related to implementation details, reconciliation, API etc over time, I will need to check why I did some stuff the way I did as I have implemented this some time ago.

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2025
@metal3-io-bot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Status: BMO WIP

Development

Successfully merging this pull request may close these issues.

4 participants