Skip to content

Opt-out updated bootimage for GCP and AWS #93065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 38 additions & 11 deletions machine_configuration/mco-update-boot-images.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,49 @@ include::_attributes/common-attributes.adoc[]

toc::[]

The Machine Config Operator (MCO) uses a boot image to start a {op-system-first} node. By default, {product-title} does not manage the boot image.
The Machine Config Operator (MCO) uses a boot image to scale up {op-system-first} nodes. Whether the Machine Config Operator (MCO) updates the boot image in your cluster depends upon the platform your cluster is built upon.

This means that the boot image in your cluster is not updated along with your cluster. For example, if your cluster was originally created with {product-title} 4.12, the boot image that the cluster uses to create nodes is the same 4.12 version, even if your cluster is at a later version. If the cluster is later upgraded to 4.13 or later, new nodes continue to scale with the same 4.12 image.
* For {gcp-first}, starting {product-title} 4.16, the MCO updates the boot image with each cluster update by default. New nodes scale with the same version as the cluster.

This process could cause the following issues:
* For {aws-first}, starting {product-title} 4.17, the MCO updates the boot image with each cluster update by default. New nodes scale with the same version as the cluster.

* For all other platforms, by default the MCO does not update the boot image with each cluster update. For example, if your cluster was originally created with {product-title} 4.12, the boot image that the MCO uses to create nodes is the same 4.12 version, even if your cluster is at a later version. If the cluster is later upgraded to 4.13 or later, new nodes continue to scale with the same 4.12 image.

For {gcp-short} and {aws-short}, you can disable the default behavior, if needed. When disabled, the boot image no longer updates with the cluster.

[NOTE]
====
The ability to configure this behavior is available for only {gcp-short} and {aws-short} clusters. It is not supported for clusters managed by the {cluster-capi-operator}.
====

You might want to disable the use of updated boot images, for example, if you are not using the default user data secret, named `worker-user-data`, in your machine set, or you have modified the `worker-user-data` secret. This is because the MCO updates the machine set to use a managed version of the secret. By using the updated boot images feature, you are giving up the capability to customize the secret stored in the machine set object.

However, using an older boot image could cause the following issues:

* Extra time to start nodes
* Certificate expiration issues
* Version skew issues

To avoid these issues, you can configure your cluster to update the boot image whenever you update your cluster. By modifying the `MachineConfiguration` object, you can enable this feature. Currently, the ability to update the boot image is available for only Google Cloud Platform (GCP) and Amazon Web Services (AWS) clusters. It is not supported for clusters managed by the {cluster-capi-operator}.
For information on how to disable the default behavior, see "Disabling updated boot images". If you disable the default behavior, you can re-enable the default behavior at any time. For more information, see "Re-enabling updated boot images".

How the cluster behaves after disabling or re-enabling the default behavior, depends upon when you made the change, including the following scenarios:

* If you disable the behavior before updating to a new {product-title} version:
** The boot image version used by the machine sets remains at the same after the update.
** The boot image version on the existing nodes in your cluster also remains the same after the update.
** Any new nodes use that same version.

If you are not using the default user data secret, named `worker-user-data`, in your machine set, or you have modified the `worker-user-data` secret, you should not use managed boot image updates. This is because the Machine Config Operator (MCO) updates the machine set to use a managed version of the secret. By using the managed boot images feature, you are giving up the capability to customize the secret stored in the machine set object.
* If you disable the behavior after updating to a new {product-title} version:
** The boot image version used by the machine sets is converted to match the updated {product-title} version.
** The boot image version on the existing nodes in your cluster remains unchanged.
** Any new nodes use the updated {product-title} version when a machine set is scaled.
** If you update to a later {product-title} version, the boot image version in the machine sets remains at the current version and are not updated with the cluster.

* If you re-enable the behavior:
** The boot image version used by the machine sets is converted to the current version, if different.
** The boot image version on the nodes in your cluster match the {product-title} version that was present when you disabled the behavior.
** The boot image version on existing nodes does not change.
** When you scale up nodes, the new nodes use the current {product-title} version in the cluster.

To view the current boot image used in your cluster, examine a machine set:

Expand Down Expand Up @@ -48,13 +78,10 @@ spec:
----
<1> This boot image is the same as the originally-installed {product-title} version, in this example {product-title} 4.12, regardless of the current version of the cluster. The way that the boot image is represented in the machine set depends on the platform, as the structure of the `providerSpec` field differs from platform to platform.

If you configure your cluster to update your boot images, the boot image referenced in your machine sets matches the current version of the cluster.

include::modules/mco-update-boot-images-configuring.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources

* xref:../nodes/clusters/nodes-cluster-enabling-features.adoc#nodes-cluster-enabling[Enabling features using feature gates]
* xref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[Disabling updated boot images]
* xref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-configuring_machine-configs-configure[Re-enabling updated boot images]

include::modules/mco-update-boot-images-disable.adoc[leveloffset=+1]
include::modules/mco-update-boot-images-configuring.adoc[leveloffset=+1]
90 changes: 39 additions & 51 deletions modules/mco-update-boot-images-configuring.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,35 @@

:_mod-docs-content-type: PROCEDURE
[id="mco-update-boot-images-configuring_{context}"]
= Configuring updated boot images
= Re-enabling updated boot images

By default, {product-title} does not manage the boot image. You can configure your cluster to update the boot image whenever you update your cluster by modifying the `MachineConfiguration` object.
By default, for {gcp-first} and {aws-first} clusters, by default the Machine Config Operator (MCO) manages and updates the boot image in the machine sets in your cluster. This means that the MCO updates the boot images whenever you update your cluster.

Currently, the ability to update the boot image is available for only Google Cloud Platform (GCP) and Amazon Web Services (AWS) clusters. It is not supported for clusters managed by the {cluster-capi-operator}.
If you disabled this default behavior, where the boot images are not updated, you can revert to the default behavior by editing the `MachineConfiguration` object.

Re-enabling the default behavior after some nodes had been created does not affect existing nodes or machine sets. The machine sets retain the boot image version that was present when the feature was disabled. Any existing nodes retain their current boot image. However, the cluster boot image is updated again if the cluster is upgraded to a new {product-title} version in the future, and new nodes created after that point use the updated boot image.

// The following admonition is intended to address https://issues.redhat.com/browse/MCO-1604
[IMPORTANT]
====
If any of the machine sets for which you want to enable updated boot images uses a `*-user-data` secret based on Ignition version 2.2.0, the Machine Config Operator converts the Ignition version to 3.4.0 when you re-enable updated boot images. {product-title} versions 4.5 and lower use Ignition version 2.2.0. If this conversion fails, the MCO or your cluster could degrade. An error message that includes _err: converting ignition stub failed: failed to parse Ignition config_ is added to the output of the `oc get ClusterOperator machine-config` command.
In that case, use the following general steps to correct the problem:

. Disable updated boot images. For more information, see "Disabling updated boot images".
. Manually update the `*-user-data` secret to use Ignition version to 3.2.0.
. Re-enable updated boot images as described in this section.
====

.Procedure

. Edit the `MachineConfiguration` object, named `cluster`, to enable the updating of boot images by running the following command:
. Edit the `MachineConfiguration` object, named `cluster`, to re-enable the default boot image update behavior for some or all of your machine sets:
+
[source,terminal]
----
$ oc edit MachineConfiguration cluster
----

* Optional: Configure the boot image update feature for all the machine sets:
* Optional: Re-enable the default behavior for all machine sets:
+
[source,yaml]
----
Expand All @@ -33,15 +46,17 @@ spec:
# ...
managedBootImages: <1>
machineManagers:
- resource: machinesets
apiGroup: machine.openshift.io
- apiGroup: machine.openshift.io <2>
resource: machinesets <3>
selection:
mode: All <2>
mode: All <4>
----
<1> Activates the boot image update feature.
<2> Specifies that all the machine sets in the cluster are to be updated.
<1> Configures the boot image update feature.
<2> Specifies the API group. This must be `machine.openshift.io`.
<3> Specifies the resource within the specified API group to apply the change. This must be `machinesets`.
<4> Specifies that the default behavior is re-enabled for all machine sets in the cluster.

* Optional: Configure the boot image update feature for specific machine sets:
* Optional: Re-enable the default behavior for specific machine sets:
+
[source,yaml]
----
Expand All @@ -54,62 +69,35 @@ spec:
# ...
managedBootImages: <1>
machineManagers:
- resource: machinesets
apiGroup: machine.openshift.io
- apiGroup: machine.openshift.io <2>
resource: machinesets <3>
selection:
mode: Partial
mode: Partial <4>
partial:
machineResourceSelector:
machineResourceSelector: <5>
matchLabels:
update-boot-image: "true" <2>
region: "east"
----
<1> Activates the boot image update feature.
<2> Specifies that any machine set with this label is to be updated.
<1> Configures the boot image update feature.
<2> Specifies the API group. This must be `machine.openshift.io`.
<3> Specifies the resource within the specified API group to apply the change. This must be `machinesets`.
<4> Specifies that the default behavior is re-enabled for specific machine sets.
<5> Specifies that any machine set with matching labels is to be updated. A positive match of the label selector re-enables the default behavior for that machine set.
+
[TIP]
====
If an appropriate label is not present on the machine set, add a key-value pair by running a command similar to following:

----
$ oc label machineset.machine ci-ln-hmy310k-72292-5f87z-worker-a update-boot-image=true -n openshift-machine-api
$ oc label machineset.machine ci-ln-hmy310k-72292-5f87z-worker-a region: "east" -n openshift-machine-api
----
====

.Verification

. View the current state of the boot image updates by viewing the machine configuration object:
+
[source,terminal]
----
$ oc get machineconfiguration cluster -n openshift-machine-api -o yaml
----
+
.Example machine set with the boot image reference
+
[source,yaml]
----
kind: MachineConfiguration
metadata:
name: cluster
# ...
status:
conditions:
- lastTransitionTime: "2024-09-09T13:51:37Z" <1>
message: Reconciled 1 of 2 MAPI MachineSets | Reconciled 0 of 0 CAPI MachineSets
| Reconciled 0 of 0 CAPI MachineDeployments
reason: BootImageUpdateConfigurationAdded
status: "True"
type: BootImageUpdateProgressing
- lastTransitionTime: "2024-09-09T13:51:37Z" <2>
message: 0 Degraded MAPI MachineSets | 0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
reason: BootImageUpdateConfigurationAdded
status: "False"
type: BootImageUpdateDegraded
----
<1> Status of the boot image update. {cluster-capi-operator} machine sets and machine deployments are not currently supported for boot image updates.
<2> Indicates if any boot image updates failed. If any of the updates fail, the Machine Config Operator is degraded. In this case, manual intervention might be required.
include::snippets/mco-update-boot-images-verification.adoc[]

. Get the boot image version by running the following command:
* Get the boot image version by running the following command:
+
[source,terminal]
----
Expand Down
110 changes: 104 additions & 6 deletions modules/mco-update-boot-images-disable.adoc
Original file line number Diff line number Diff line change
@@ -1,26 +1,56 @@
// Module included in the following assemblies:
//
// * machine_configuration/mco-update-boot-images.adoc
// * machine-configuration/mco-update-boot-images.adoc
// * nodes/nodes-nodes-managing.adoc

:_mod-docs-content-type: PROCEDURE
[id="mco-update-boot-images-disable_{context}"]
= Disabling updated boot images

To disable the updated boot image feature, edit the `MachineConfiguration` object so that the `machineManagers` field is an empty array.
By default, for {gcp-first} and {aws-first} clusters, by default the Machine Config Operator (MCO) manages and updates the boot image in the machine sets in your cluster. This means that the MCO updates the boot images whenever you update your cluster.

If you disable this feature after some nodes have been created with the new boot image version, any existing nodes retain their current boot image. Turning off this feature does not rollback the nodes or machine sets to the originally-installed boot image. The machine sets retain the boot image version that was present when the feature was enabled and is not updated again when the cluster is upgraded to a new {product-title} version in the future.
You can disable this default behavior on some or all of your machine sets by editing the `MachineConfiguration` object. When disabled, the Machine Config Operator (MCO) no longer manages the boot image in your cluster and no longer updates the boot image with each cluster update.

If you disable this feature after some nodes have been created with the new boot image version, any existing nodes retain their current boot image. Disabling this feature does not rollback the nodes or machine sets to the originally-installed boot image. The machine sets retain the boot image version that was present when the feature was enabled and is not updated again when the cluster is upgraded to a new {product-title} version in the future.

After disabling a {gcp-short} and {aws-short} cluster, you can re-enable the default behavior at any time. For more information, see "Re-enabling updated boot images".

.Procedure

. Disable updated boot images by editing the `MachineConfiguration` object:
. Edit the `MachineConfiguration` object to disable the default boot image update behavior for some or all of your machine sets:
+
[source,terminal]
----
$ oc edit MachineConfiguration cluster
----

. Make the `machineManagers` parameter an empty array:
* Optional: Disable the behavior for all machine sets:
+
[source,yaml]
----
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
name: cluster
namespace: openshift-machine-config-operator
spec:
# ...
managedBootImages: <1>
machineManagers:
- apiGroup: machine.openshift.io <2>
resource: machinesets <3>
selection:
mode: None <4>
----
+
--
<1> Configures the boot image update feature.
<2> Specifies an API group. This must be `machine.openshift.io`.
<3> Specifies the resource within the specified API group to apply the change. This must be `machinesets`.
<4> Specifies that the default behavior is disabled for all machine sets in the cluster.
--
+
Alternatively, you can make the `machineManagers` parameter an empty array:
+
[source,yaml]
----
Expand All @@ -34,4 +64,72 @@ spec:
managedBootImages: <1>
machineManagers: []
----
<1> Remove the parameters listed under `machineManagers` and add the `[]` characters to disable boot image updates.
<1> Remove the parameters that are listed under `machineManagers` and add the `[]` characters to disable the updating of boot images.

* Optional: Disable the default behavior for specific machine sets:
+
[source,yaml]
----
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
name: cluster
namespace: openshift-machine-config-operator
spec:
# ...
managedBootImages: <1>
machineManagers:
- apiGroup: machine.openshift.io <2>
resource: machinesets <3>
selection:
mode: Partial <4>
partial:
machineResourceSelector: <5>
matchLabels:
region: "east"
----
<1> Configures the boot image update feature.
<2> Specifies an API group. This must be `machine.openshift.io`.
<3> Specifies the resource within the specified API group to apply the change. This must be `machinesets`.
<4> Specifies that the default behavior is disabled for specific machine sets.
<5> Specifies that any machine set with this label is to be updated. A positive match of the label selector disables the default behavior for that machine set.

.Verification

include::snippets/mco-update-boot-images-verification.adoc[]

* Get the boot image version by running the following command:
+
[source,terminal]
----
$ oc get machinesets <machineset_name> -n openshift-machine-api -o yaml
----
+
.Example machine set with the boot image reference
+
[source,yaml]
----
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: ci-ln-77hmkpt-72292-d4pxp
update-boot-image: "true"
name: ci-ln-77hmkpt-72292-d4pxp-worker-a
namespace: openshift-machine-api
spec:
# ...
template:
# ...
spec:
# ...
providerSpec:
# ...
value:
disks:
- autoDelete: true
boot: true
image: projects/rhcos-cloud/global/images/rhcos-9-6-20250402-0-gcp-x86-64 <1>
# ...
----
<1> This boot image is the same as the current {product-title} version.
Loading