diff --git a/enhancements/machine-config/manage-boot-images.md b/enhancements/machine-config/manage-boot-images.md index 5694455de5..b9b2bf0427 100644 --- a/enhancements/machine-config/manage-boot-images.md +++ b/enhancements/machine-config/manage-boot-images.md @@ -5,6 +5,7 @@ authors: reviewers: - "@yuqi-zhang" - "@mrunal" + - "@jlebon" - "@cgwalters, for rhcos context" - "@joelspeed, for machine-api context" - "@sdodson, for installer context" @@ -13,7 +14,7 @@ approvers: api-approvers: - "@joelspeed" creation-date: 2023-10-16 -last-updated: 2024-03-08 +last-updated: 2025-04-10 tracking-link: - https://issues.redhat.com/browse/MCO-589 see-also: @@ -27,13 +28,14 @@ superseded-by: ## Summary -This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. It will also be user opt-in and is planned to be released behind a feature gate. +This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. This is now released as an opt-in feature and will be rolled out on a per-platform basis (see projected roadmap). This will eventually be on by default, and the MCO will enforce an accepted skew and require non-platform managed bootimage updates to be acknowledged by the cluster admin. -For `MachineSet` managed clusters, the end goal is to create an automated mechanism that can: +For `MachineSet` managed clusters, the end goal is to create automated mechanisms that can: - update the boot images references in `MachineSets` to the latest in the payload image - ensure stub Ignition config referenced in each `Machinesets` is in spec 3 format +- ensure cluster is within acceptable skew to prevent scaling failures -For clusters that are not managed by `MachineSets`, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images. +For clusters that are not managed by `MachineSets`, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images to be compliant with the acceptable skew. In such cases, the admin will be expected to record their cluster's boot image in the skew enforcement API object. ## Motivation @@ -42,12 +44,18 @@ Currently, bootimage references are [stored](https://github.com/openshift/instal - Afterburn [[1](https://issues.redhat.com/browse/OCPBUGS-7559)],[[2](https://issues.redhat.com/browse/OCPBUGS-4769)] - podman [[1](https://issues.redhat.com/browse/OCPBUGS-9969)] - skopeo [[1](https://issues.redhat.com/browse/OCPBUGS-3621)] +- composefs [[1](https://github.com/openshift/os/issues/1678#issuecomment-2546310833)] +- sigstore GA [[1](https://issues.redhat.com/browse/OCPNODE-2619)],[[2](https://issues.redhat.com/browse/OCPBUGS-38809)] -Additionally, the stub Ignition config [referenced](https://github.com/openshift/installer/blob/1ca0848f0f8b2ca9758493afa26bf43ebcd70410/pkg/asset/machines/gcp/machines.go#L197) in the `MachineSet` is also not managed. This stub is used by the ignition binary in firstboot to auth and consume content from the `machine-config-server`(MCS). The content served includes the actual Ignition configuration and the target OCI format RHCOS image. The ignition binary now does first boot provisioning based on this, then hands off to the `machine-config-daemon`(MCD) first boot service to do the reboot into the target OCI format RHCOS image. +Additionally, the stub Ignition config [referenced](https://github.com/openshift/installer/blob/1ca0848f0f8b2ca9758493afa26bf43ebcd70410/pkg/asset/machines/gcp/machines.go#L197) in the `MachineSet` is also not managed. This stub is also generated by the installer, and it typically consists of an endpoint and a Certificate Authority(CA), which are used by the ignition binary in firstboot to auth and consume content from the `machine-config-server`(MCS). This CA is called the [`RootCA`](https://github.com/openshift/installer/blob/99b48742d2f89b4978fe14cb6fe842e283c0ce4d/pkg/asset/tls/root.go#L19-L20) by the installer which is bit of a misnomer, as it does not give actual root access to the cluster - it is only used to generate the MCS TLS cert pair. Going forward, the proposal will be refer to these artifacts as the MCS CA & TLS cert. In 4.19, the MCO took ownership of [rotating the MCS CA & TLS cert](https://issues.redhat.com/browse/MCO-1208). -There has been [a previous effort](https://github.com/openshift/machine-config-operator/pull/1792) to manage the stub Ignition config. It was [reverted](https://github.com/openshift/machine-config-operator/pull/2126) and then [brought back](https://github.com/openshift/machine-config-operator/pull/2827#issuecomment-996156872) just for bare metal clusters. For other platforms, the `*-managed` stubs still get generated by the MCO, but are not injected into the `MachineSet`. The proposal plans to utilize these unused `*-managed` stubs, but it is important to note that this stub is generated(and synced) by the MCO and will ignore/override any user customizations to the original stub Ignition config. This limitation will be mentioned in the documentation, and a later release will provide support for user customization of the stub, either via API or a workaround thorugh additional documentation. This should not be an issue for the majority of users as they very rarely customize the original stub Ignition config. +The content served by the MCS includes the actual Ignition configuration and the target OCI format RHCOS image. The ignition binary now does first boot provisioning based on this, then hands off to the `machine-config-daemon`(MCD) first boot service to do the reboot into the target OCI format RHCOS image. -In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. +Hence, it is critical that the Ignition binary in the boot image is able to process the stub Ignition config to make the initial MCS request and join the cluster. On clusters installed [prior to 4.6](https://docs.redhat.com/en/documentation/openshift_container_platform/4.6/html/release_notes/ocp-4-6-release-notes#ocp-4-6-ignition-spect-updated-v3), the stub Ignition would be of the spec 2 format - which cannot be used by the newer boot images. In very rare cases, some users also [customize](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/postinstallation_configuration/post-install-node-tasks#machine-node-custom-partition_post-install-node-tasks) the original stub Ignition config. Therefore, to make boot image updates as seamless as possible, there is a need to upgrade these stub Ignition configs to the spec 3 format, while preserving any customizations done to it at install time. + +> **_NOTE:_** There has been [a previous effort](https://github.com/openshift/machine-config-operator/pull/1792) to manage the stub Ignition config. It was [reverted](https://github.com/openshift/machine-config-operator/pull/2126) and then [brought back](https://github.com/openshift/machine-config-operator/pull/2827#issuecomment-996156872) just for bare metal clusters. For other platforms, the `*-managed` stubs still get generated by the MCO, but are not injected into the `MachineSet`. Since they are generated and synced by the MCO, it ignores any user customizations. This proposal originally used the `*-managed` stubs in its initial implementation, but with the [MCS CA & TLS management](https://issues.redhat.com/browse/MCO-1208) work in place, the upgrade path mentioned above is preferred. + +This is also a soft pre-requisite for both dual-stream RHEL support in OpenShift, and on-cluster layered builds. RPM-OSTree presently does a deploy-from-self to get a new-enough rpm-ostree to deploy image-based RHEL CoreOS systems, and we would like to avoid doing this for bootc if possible. We would also like to prevent RHEL8->RHEL10 direct updates once that is available for OpenShift. ### User Stories @@ -57,9 +65,9 @@ In certain long lived clusters, the MCS TLS cert contained within the above Igni ### Goals -The MCO will take over management of the boot image references and the stub Ignition configuration. The installer is still responsible for creating the `MachineSet` at cluster bring-up, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a different version of RHCOS during node scaleup. +The MCO will take over management of the boot image references and the stub Ignition configuration. The installer is still responsible for creating the `MachineSet` at cluster bring-up, but once cluster installation is complete the MCO will ensure that boot images are in sync with the latest payload. From the user standpoint, this should cause less compatibility issues as nodes will no longer need to pivot to a substantially different version of RHCOS during node scaleup. -This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation" and for additional safety, the MSBIC will also ensure that machinesets that have a valid OwnerReference will be excluded from boot image updates. +This should not interfere with existing workflows such as Hive and ArgoCD. As this is an opt-in mechanism, the cluster admin will be protected against such scenarios of accidental "reconciliation" and for additional safety, the MSBIC will also ensure that machinesets that have a valid OwnerReference will be excluded from boot image updates. We will work with affected teams to transition them to the new workflow before we turn this feature on by default. ### Non-Goals @@ -74,29 +82,25 @@ __Overview__ - The `machine-config-controller`(MCC) pod will gain a new sub-controller `machine_set_boot_image_controller`(MSBIC) that monitors `MachineSet` changes and the `coreos-bootimages` [ConfigMap](https://github.com/openshift/installer/pull/4760) changes. - Before processing a MachineSet, the MSBIC will check if the following conditions are satisfied: - - `ManagedBootImages` feature gate is active - The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object. - The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing. - - The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after atleast 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. + - The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after at least 1 master node has successfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. If any of the above checks fail, the MSBIC will exit out of the sync. - Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference. -- Next, it will check if the stub secret referenced within the `providerSpec` field of the `MachineSet` is managed i.e. `worker-user-data-managed` and not `worker-user-data`. If it is unmanaged, the cloned `providerSpec` will be updated to reference the managed stub secret. This step is platform/arch agnostic. - +- Next, it will check if the stub Ignition secret referenced within the `providerSpec` field of the `MachineSet` is of the spec 3 format. If it is not of the spec 3 format, it will attempt to upgrade it to spec 3. - Finally, the MSBIC will attempt to patch the `MachineSet` if an update is required. #### Error & Alert Mechanism MSBIC sync failures may be caused by multiple reasons: -- The MSBIC notices an OwnerReference and is able to determine that updating the `MachineSet` will likely cause thrashing. This is considered a misconfiguration and in such cases, the user is expected to exclude this `MachineSet` from boot image management. - The `coreos-bootimages` ConfigMap is unavailable or in an incorrect format. This will likely happen if a user manually edits the ConfigMap, overriding the CVO. - The `coreos-bootimages` ConfigMap takes too long to be stamped by the MCO. This indicates that there are larger problems in the cluster such as an upgrade failure/timeout or an unrelated cluster failure. +- The stub Ignition referenced in the `MachineSet` could not be upgraded to the spec 3 format. This will only happen when a user has heavily customized their Ignition stub, which is quite rare(and unsupported potentially). Resolving this will need manual intervention and this will be explained in the documentation. - Patching the `MachineSet` fails. This indicates a temporary API server blip, or larger RBAC issues. An error condition will be applied on the operator level `MachineConfiguration` object when the sync failures of a given `MachineSet` exceed a threshold amount for a period of time. The condition will include information regarding the sync failures and the logs of the MSBIC can be checked for additional details. -In addition to this, a Prometheus alert will also be triggered by the MSBIC. This alert will list the misbehaving `MachineSet` and will be cleared automatically by the MSBIC if the sync is successfully completed later. - Note: In the future, patches to `MachineSets` will be prevented when they are not authoritative [#1465](https://github.com/openshift/enhancements/pull/1465). This will need to be accounted for within the logic of the MSBIC. ### Workflow Description @@ -111,27 +115,49 @@ Any form factor using the MCO and `MachineSets` will be impacted by this proposa - Standalone OpenShift: Yes, this is the main target form factor. - microshift: No, as it does [not](https://github.com/openshift/microshift/blob/main/docs/contributor/enabled_apis.md) use `MachineSets`. - Hypershift: No, Hypershift does not have this issue. -- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team will update the comment on the backdoor annotation to indicate that it is mutually exclusive with this feature. +- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team has [updated the comment](https://github.com/openshift/hive/pull/2596/files) on the backdoor annotation to indicate that it is mutually exclusive with this feature. ##### Supported platforms -The initial release(phase 0) will support GCP. In future releases, we will add in support for remaining platforms as we gain confidence in the functionality and understand the specific needs of those platforms. For platforms that cannot be supported, we aim to atleast provide documentation to perform the boot image updates manually. Here is an exhaustive list of all the platforms: +The initial release(phase 0) will support GCP. In future releases, we will add in support for remaining platforms as we gain confidence in the functionality and understand the specific needs of those platforms. For platforms that cannot be supported, we aim to at least provide documentation to perform the boot image updates manually. Here is an exhaustive list of all the platforms: -- gcp - aws - azure -- alibabacloud +- baremetal +- gcp +- ibmcloud - nutanix -- powervs - openstack +- powervs - vsphere -- baremetal -- libvirt -- ovirt -- ibmcloud This work will be tracked in [MCO-793](https://issues.redhat.com/browse/MCO-793). +##### Projected timeline + +This is a tentative timeline for managed platforms, subject to change: + +| Platform | TechPreview (opt-in) | GA (opt-in) | Default-On (opt-out) | +| -------- | ------- | ------- | ------- | +| gcp | [4.16](https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html-single/machine_configuration/index#mco-update-boot-images) |[4.17](https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/machine_configuration/index#mco-update-boot-images) |4.19 | +| aws | [4.17](https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/machine_configuration/index#mco-update-boot-images) |[4.18](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html-single/machine_configuration/index#mco-update-boot-images) |4.19 | +| vsphere | 4.20 |4.21 |4.22 | +| azure | 4.20 |4.21 |4.22 | +| baremetal| 4.21 |4.22 |4.23 | +| openstack| 4.21 |4.22 |4.23 | +| nutanix | 4.22 |4.23 |4.24 | +| ibmcloud | 4.22 |4.23 |4.24 | +| non-managed | * |* |4.23* | + +For **non-managed*** cases, boot image updates will be user initiated and supported via documentation. Hence, this will not be guarded by a feature gate, and the 4.23 timeline described above is for documentation. For managed cases, depending on initial feedback received from default-on behavior, it may be viable for the GA stage to be opt-out as well. This will be evaluated on a platform by platform basis. + +The skew enforcement mechanism could be developed in parallel to the above timeline, in a platform agnostic manner: +| | DevPreview | TechPreview | GA | +| -------- | ------- | ------- | ------- | +| skew management | 4.21 |4.22 |4.23* | + +Decoupling the default-on and skew enforcement mechanisms would help iron out any edge cases unique to a platform and would also aid in refining the skew enforcement workflow. It is important to note that while skew management could be developed independantly, it should only be deemed ready for **GA*** after most widely used platforms have reached the opt-out stage. This is to minimize user disruption as much as possible. + ##### Cluster API backed machinesets As the Cluster API move is impending(initial release in 4.16 and default-on release in 4.17), it is necessary that this enhancement plans for the changes required in an CAPI backed cluster. Here are a couple of sample YAMLs used in CAPI backed `Machinesets`, from the [official Openshift documentation](https://docs.openshift.com/container-platform/4.14/machine_management/capi-machine-management.html#capi-sample-yaml-files-gcp). @@ -207,8 +233,11 @@ Much of the existing design regarding architecture & platform detection, opt-in, #### Opt-in Mechanism This proposal introduces a new field in the MCO operator API, `ManagedBootImages` which encloses an array of `MachineManager` objects. A `MachineManager` object contains the resource type of the machine management object that is being opted-in, the API group of that object and a union discriminant object of the type `MachineManagerSelector`. This object `MachineManagerSelector` contains: -- The union discriminator, `Mode`, can be set to two values : All and Partial. -- Partial: This is a set of label selectors that will be used by users to opt-in a custom selection of machine resources. When the Mode is set to Partial mode, all machinesets matched by this object would be considered enrolled for updates. In the first iteration of this API, this object will only allow for label matching with MachineResources. In the future, additional ways of filtering may be added with another label selector, e.g. namespace. For all other values of Mode, this selector object i +- The union discriminator, `Mode`, can be set to three values : All, Partial and None. + - **All**: All machine resources described by this resource/apiGroup type will be opted-in for boot image updates. In most cases, this effectively enables boot image updates for the whole cluster, unless there are multiple kinds of machine resources present in the cluster. + - **Partial**: This is a set of label selectors that will be used by users to opt-in a custom selection of machine resources. When the Mode is set to Partial mode, all machinesets matched by this object would be considered enrolled for updates. In the first iteration of this API, this object will only allow for label matching with MachineResources. In the future, additional ways of filtering may be added with another label selector, e.g. namespace. + - **None**: All machine resources described by this resource/apiGroup type will be excluded from boot image updates. In most cases, this effectively disables boot image updates for the whole cluster, unless there are multiple kinds of machine resources present in the cluster. + ``` type ManagedBootImages struct { @@ -248,6 +277,7 @@ type MachineManagerSelector struct { // Valid values are All and Partial. // All means that every resource matched by the machine manager will be updated. // Partial requires specified selector(s) and allows customisation of which resources matched by the machine manager will be updated. + // None means that every resource matched by the machine manager will NOT be updated. // +unionDiscriminator // +kubebuilder:validation:Required Mode MachineManagerSelectorMode `json:"mode"` @@ -266,7 +296,7 @@ type PartialSelector struct { } // MachineManagerSelectorMode is a string enum used in the MachineManagerSelector union discriminator. -// +kubebuilder:validation:Enum:="All";"Partial" +// +kubebuilder:validation:Enum:="All";"Partial";"None" type MachineManagerSelectorMode string const ( @@ -276,6 +306,9 @@ const ( // Partial represents a configuration mode that will register resources specified by the parent MachineManager only // if they match with the label selector. Partial MachineManagerSelectorMode = "Partial" + + // None represents a configuration mode that excludes all resources specified by the parent MachineManager from boot image updates. + None MachineManagerSelectorMode = "None" ) // MachineManagerManagedResourceType is a string enum used in the MachineManager type to describe the resource @@ -357,6 +390,131 @@ spec: name: "cluster" namespace: "default" ``` + +Alongside the implementation of default-on behavior, a Status field for ManagedBootImages is also planned. This would reflect the +current ManagedBootImages configuration and if unspecified, it will represent the current cluster defaults. +``` +type MachineConfigurationStatus struct { + ... + ... + + // managedBootImagesStatus reflects what the latest cluster-validated boot image configuration is + // and will be used by Machine Config Controller while performing boot image updates. + // +openshift:enable:FeatureGate=ManagedBootImages + // +optional + ManagedBootImagesStatus ManagedBootImages `json:"managedBootImagesStatus"` +} + +``` +Here are some examples to illustrate how this works. + +Scenario: No admin configuration and the currrent release **does not** opt-in by default: +``` +apiVersion: operator.openshift.io/v1 +kind: MachineConfiguration +spec: +status: + managedBootImagesStatus: + machineManagers: + - resource: machinesets + apiGroup: machine.openshift.io + selection: + mode: None +``` +Scenario: No admin configuration and the currrent release **does** opt-in by default: +``` +apiVersion: operator.openshift.io/v1 +kind: MachineConfiguration +spec: +status: + managedBootImagesStatus: + machineManagers: + - resource: machinesets + apiGroup: machine.openshift.io + selection: + mode: All +``` +Regardless of the default-on behavior of the release, if the admin were to add a configuration, the status must reflect that in the next update. +``` +apiVersion: operator.openshift.io/v1 +kind: MachineConfiguration +spec: + managedBootImages: + machineManagers: + - resource: machinesets + apiGroup: machine.openshift.io + selection: + mode: Partial + partial: + machineResourceSelector: + matchLabels: {} +status: + managedBootImagesStatus: + machineManagers: + - resource: machinesets + apiGroup: machine.openshift.io + selection: + mode: Partial + partial: + machineResourceSelector: + matchLabels: {} +``` + +#### Skew Enforcement + +This would introduced as an new knob in the `MachineConfiguration` Spec: +``` +type MachineConfigurationSpec struct { + ... + ... + // bootImageSkewEnforcement allows an admin to set the behavior of the boot image skew enforcement mechanism. + // +optional + BootImageSkewEnforcement SkewEnforcementSelector `json:"bootImageSkewEnforcement"` +} + +// +kubebuilder:validation:XValidation:rule="has(self.mode) && (self.mode == 'Automatic' || self.mode =='Manual') ? has(self.bootImageOCPVersion) : !has(self.bootImageOCPVersion)",message="BootImageOCPVersion is required when type is Automatic or Manual, and forbidden otherwise" +// +union +type SkewEnforcementSelector struct { + // mode determines the underlying behavior of skew enforcement mechanism. + // Valid values are Automatic, Manual and Disabled. + // Automatic means that the MCO will store the OCP version associated with the last boot image update in the + // BootImageOCPVersion field. + // Manual means that the cluster admin is expected to perform manual boot image updates and store OCP version + // associated with the last boot image update in the BootImageOCPVersion field. + // In Automatic and Manual mode, the MCO will prevent upgrades when the boot image skew exceeds the + // skew limit described by the release image. + // Disabled means that the MCO will permit upgrades when the boot image exceeds the skew limit + // described by the release image. This may affect the cluster's ability to scale. + // +unionDiscriminator + // +kubebuilder:validation:Required + Mode SkewEnforcementSelectorMode `json:"mode"` + + // bootImageOCPVersion provides a string which will be used to enforce the skew limit. + // Only permitted when mode is set to "Automatic" or "Manual". + // +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]*.[0-9]*.[0-9]*$')",message="bootImageOCPVersion must be in a semver compatible format of x.y.z" + // +kubebuilder:validation:MaxLength:=8 + // +optional + BootImageOCPVersion string `json:"bootImageOCPVersion,omitempty"` +} + + +// SkewEnforcementSelectorMode is a string enum used to indicate the cluster's boot image skew enforcement mode. +// +kubebuilder:validation:Enum:="Automatic";"Manual";"Disabled" +type SkewEnforcementSelectorMode string + +const ( + // Automatic represents a configuration mode that allows automatic skew enforcement. + Automatic SkewEnforcementSelectorMode = "Automatic" + + // Manual represents a configuration mode that allows manual skew enforcement. + Manual SkewEnforcementSelectorMode = "Manual" + + // Disabled represents a configuration mode that disables boot image skew enforcement. + Disabled SkewEnforcementSelectorMode = "Disabled" +) + +``` + #### Tracking boot image history Note: This section is just an idea for the moment and is considered out of scope. This CR will require thorough API review in a follow-up enhancement. @@ -472,24 +630,112 @@ status: The goal of this is to provide information about the "lineage" of a machine management resource to the user. The user can then manually restore their machine management resource to an earlier state if they wish to do so by following documentation. ### Implementation Details/Notes/Constraints [optional] +The reconciliation loop below is run on any `MachineSet` that is opted in for updates when any of the following in-cluster resources are added or updated: +1. A `MachineSet`'s providerSpec field. This is where a MachineSet's boot image references reference are stored. +2. The `coreos-bootimages` ConfigMap, which is the cluster's golden reference for boot images. This is typically updated by the CVO during an upgrade. +3. The singleton `MachineConfiguration` object called `cluster`. This is used to configure the Managed Boot Images feature. + + + +```mermaid +flowchart-elk TD; + Start((Start)) -->MachineSetOwnerCheck[Does the MachineSet have an OwnerReference?] + MachineSetOwnerCheck -->|Yes|Stop + MachineSetOwnerCheck -->|No| ConfigMapCheck[Has the coreos-bootimages ConfigMap been stamped by the MCO?] ; + + ConfigMapCheck -->|Yes|ArchType[Determine arch type of MachineSet, for eg: x86_64, aarch64] ; + ConfigMapCheck -->|No| Wait((Poll coreos-bootimages ConfigMap until it has been stamped)); + Wait --> |ConfigMap has been stamped| ArchType + Wait --> |Timeout| Error + ArchType -->PlatformType[Determine platform type of MachineSet, for eg: gcp, aws, vsphere] ; + PlatformType -->ProviderSpec[Grab providerSpec from MachineSet, for eg: GCPProviderSpec, AWSProviderSpec etc] ; + + subgraph PlatformSpecific[Platform Specific] + ProviderSpec -->IgnitionCheck[Is stub Ignition referenced in ProviderSpec in spec 3 format?] ; + IgnitionCheck -->|Yes|CompareBootImage[Compare bootimage in ProviderSpec against the coreos-bootimage ConfigMap] ; + IgnitionCheck -->|No| IgnitionUpgrade[Attempt Ignition Upgrade]; + IgnitionUpgrade -->|Ignition Upgrade Successful| CompareBootImage; + end + + IgnitionUpgrade -->|Ignition Upgrade Failed| Error[Throw an error to the cluster admin]; + Error -->Stop[Stop]; + CompareBootImage -->|Mismatch| Patch[Patch MachineSet]; + CompareBootImage -->|Match| Stop[Stop]; + + Patch --> Stop((Stop)); +``` -![Sub Controller Flow](manage_boot_images_flow.jpg) - -![MachineSet Reconciliation Flow](manage_boot_images_reconcile_loop.jpg) - -The implementation has a GCP specific MVP here: +The implementation has a GCP specific MVP that can be found here: - https://github.com/openshift/machine-config-operator/pull/4083 ### Risks and Mitigations -The biggest risk in this enhancement would be delivering a bad boot image. To mitigate this, we have outlined a revert option. +The biggest risk in this enhancement would be delivering a bad boot image. However, the chances of this are quite minimal, as it would imply the [boot images](https://github.com/openshift/installer/blob/main/data/data/coreos/rhcos.json) used in the current version of OCP are invalid, which would have been through several iterations of CI. -How will security be reviewed and by whom? TBD +How will security be reviewed and by whom? This is a solution aimed at reducing usage of outdated artifacts and should not introduce any security concerns that do not currently exist. -How will UX be reviewed and by whom? TBD +How will UX be reviewed and by whom? The UX element involved include the user opt-in and opt-out, which is currently up for debate. +#### Enabling Boot Image Updates by default + +This will be done on a platform by platform basis. Some key benchmarks have to be met for a platform to be considered ready +for default on: +- Sufficent runtime(say, at least 1 release) has been accumulated while boot image updates has been GAed for this platform. +- Periodic tests have been added for this platform in CI and have met certain passing metrics. This should include Y stream upgrade e2es. +- Any teams that are affected by default-on +behavior have been notified and assisted with the transition. + +The default-on flow could look like this: +```mermaid +flowchart-elk LR; + Start((Start)) -->PlatformCheck[Does the cluster platform have boot image updates suppport?] + PlatformCheck -->|Yes|MachineConfigurationCheck[Does the cluster currently have a boot image update configuration?] + PlatformCheck -->|No| Stop ; + MachineConfigurationCheck -->|Yes|Stop + MachineConfigurationCheck -->|No| UpdateConfig["The MCO will inject a default configuration based on the platform, which could be opt-in or opt-out"] ; + UpdateConfig --> Stop((Stop)); +``` +Some points to note: +- For bookkeeping purposes, the MCO will annotate the `MachineConfiguration` object when opting in the cluster by default. +- This mechanism will be active on installs and upgrades. +- If the cluster admin wishes to opt-out of the feature, they have to do so by explicitly opting out the cluster via the API knob prior to the upgrade. +- If any of the MachineSets have an OwnerReference, it will be skipped for boot image updates. This will cause an alert/warning to the cluster admin, but it will no longer cause a degrade. + + +### Enforcement of bootimage skew + +There should be some mechanism that will alert the user when a cluster's bootimage are out of date. To allow for this, the release payload will gain a new field, which will store the OCP version of the minimum acceptable boot image for that release. + +Generally speaking, we would like to keep the bootimage version aligned to the RHEL version we are shipping in the payload. For example, a 9.6 bootimage will be allowed until 9.8 is shipped via RHCOS. We would like to keep this customizable, such that any major breaking changes outside of RHEL major/minor can still be enforced as a one-off. + +#### Enforcement options + +Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to either update or acknowledge risk before upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster). + +#### Proactive +Add a new field in the `MachineConfiguration` object for configuration of the skew enforcement mechanism. More details about this can be found in the [API extensions section](#skew-enforcement). This field will store the OCP version of the cluster's current boot image and allows for easy comparison against the skew limit described in the release payload. + - For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images for all machine resources in the cluster. + - For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this API object every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). + +The cluster admin may also choose to opt-out of skew management via this field, acknowledge that their scaling ability may be limited. + +This object can then be monitored to enforce skew limits. If the skew is determined to be too large, the MCO can update its `ClusterOperator` object with an `Upgradeable=False` condition, along with remediation steps in the `Condition` message. This will signal to the CVO that the cluster is not suitable for an upgrade. + +To remediate, the cluster admin would then have to do one of the following: +- Turn on boot image updates if it is a machineset backed cluster. +- Manually update the boot image and update the skew enforcement object if it is a non machineset backed cluster. +- Opt-out of skew enforcement altogether, giving up scaling ability. + +A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will use the OCP version from install time to determine skew instead. + +#### Reactive +1. Have the MCS reject new ignition requests if the aformentioned object indicates that the cluster's bootimages are out of date. The MCS would then signal to the cluster admin that scale-up is not available until the skew has been resolved. Raising the alarm from the MCS at the cluster level will help prevent avoid additional noise for the cluster infra team, and make apparent that the scaling failure was intentional. The MCS will also attempt to serve an Ignition config that writes a message to `/etc/issue` explaining that the bootimage is too old, which will be visible from the node's console. +2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. + +RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. + ### Drawbacks TBD, based on the open questions below. diff --git a/enhancements/machine-config/manage_boot_images_flow.jpg b/enhancements/machine-config/manage_boot_images_flow.jpg deleted file mode 100644 index 6619a791d6..0000000000 Binary files a/enhancements/machine-config/manage_boot_images_flow.jpg and /dev/null differ diff --git a/enhancements/machine-config/manage_boot_images_reconcile_loop.jpg b/enhancements/machine-config/manage_boot_images_reconcile_loop.jpg deleted file mode 100644 index 14de6a73f7..0000000000 Binary files a/enhancements/machine-config/manage_boot_images_reconcile_loop.jpg and /dev/null differ