-
Notifications
You must be signed in to change notification settings - Fork 492
MCO-1504: Update bootimage management enhancement #1761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@djoshy: This pull request references MCO-1504 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of a couple alternate routes for the opt-out here:
- Deleting the configmap. This may add complexity on the MCO to "book keep" the creation/deletion of the configmap. It might be safer to use a field within the configmap to indicate opting out of the skew instead.
- Add a new cluster level "skew-enforcement" knob within the ManagedBootImages API field. I think it is important to keep this separate from the knob that selects machine resources for boot image updates, as using a single control for the "opt-in" and "skew" mechanism may makes things a bit confusing.
Happy to hear other ideas too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)
On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on doing this via API as well.
maybe annoying for some users who have to ack every few releases.
Could you clarify this? My understanding was that if a cluster has opted out of skew enforcement, they wouldn't have to do that again. From the MCO POV, this means that:
- We no longer proactively degrade the cluster if the boot images are out of date.
- If they attempt scaling after that, and the skew is large enough, either of the reactive approaches should cover this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lets avoid configmaps, they don't age well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an draft for this in the API extension section, PTAL!
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in from #1698, in case I was missing something. How would the daemon know the "acceptable" skew during firstboot? I think we could potentially do this after the pivot and yell at the admin, but IMO the "reject join" approach would probably cover this case and never let the firstboot daemon get to pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'd have to inject that information into the payload.
Also this would cover cases where the environment doesn't use the MCS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I might have misunderstood something here then, does the first boot daemon have access to the release payload? I thought all it had was the target MachineConfig
when it does the first boot pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly Jerry meant "inject that information into the Ignition config"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going with the OCP versions to check skew, would this approach be viable? Wouldn't (2) be comparing something that could be checked at the cluster level?
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wking @yuqi-zhang
picking up the converstation from #1698:
From this point on, MCO will target RHEL 10 for new nodes scaling into this MC
I'll let Jerry weigh in here, but my read here was that we aren't planning on doing any MCP specific enforcement. I think Jerry was implying this would be result from the aforementioned enforcement methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )
When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)
On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'd have to inject that information into the payload.
Also this would cover cases where the environment doesn't use the MCS
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )
When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10
1885750
to
189212c
Compare
In the last push:
|
189212c
to
34024e9
Compare
34024e9
to
c2ef0ca
Compare
// skewEnforcement allows an admin to set behavior of the boot image skew enforcement mechanism. | ||
// Enabled means that the MCO will degrade and prevent upgrades when the boot image skew is too large. | ||
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is | ||
// too large. This may also hinder the cluster's scaling ability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define too large? What are the potential pitfalls of "too large" of a skew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By too large, I meant to say that it fails the skew guidance from the release image - I'll clarify the godoc to better desribe this. The main pitfall is that scaling would most likely fail, i.e. the pivot to release OS image isn't possible if your current boot image is below x. If scaling is a non-issue for the cluster in question, they could disable it and the cluster would be able to carry out upgrades again.
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is | ||
// too large. This may also hinder the cluster's scaling ability. | ||
// +optional | ||
SkewEnforcement SkewEnforcementSelectorMode `json:"skewEnforcement"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to make the enum values here represent the actual skew of the images, what might this look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we should make the skew configurable? My understanding was that it needed to be something constant for a release(defined in the releaseImage
) and it could potentially change between releases, but not something an operator/admin would get to manually set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I more mean, you have skew enforcement as Enabled or Disabled. What if skew enforcement were more like ReleaseRecommended and Disabled, would that make more sense, and allow for a future expansion where an admin could opt in and say, actually, I want SingleRelease skew, or DualRelease skew? Allowing them to set their own guidelines and override what is recommended by the release image itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what's the use case you're thinking about? I think the most likely scenario is that they want to override the skew check in the release image because they don't care about scaling. I'm not sure if they'd be interested in making the skew check tighter than we require.
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lets avoid configmaps, they don't age well
- Opt-out of skew enforcement altogether, giving up scaling ability. | ||
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who maintains the Machine API/Cluster API components, and would have to deal with the customers complaining that their machines can't scale up, I'm a hard no to this idea.
Ignition failures are hard to diagnose already and we are constantly triaging them already as people assume they are a failure in our ability to provision instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was proposed because pivoting was going to fail anyway and this was a way of warning the user. cc @yuqi-zhang if I'm missing something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's 2 intents behind this idea:
- we don't force users to be up to date if they don't want scaling, so this is mostly a fallback error that would hopefully not be hit. If instead we want to say "we always require proactive user action", then we wouldn't need this fallback error
- the MCS failure will bubble up via the MCO's CO object so the MCO actually degrades alongside no nodes joining the cluster, instead of the "stuck in provisioned state" we have today (which the MCO would not surface), essentially loudly failing before we even get to the ignition stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's incredibly obvious, eg from MCS logs as to why it is not serving the ignition, then debugging this may become easier, but generally any "I scaled up and my node didn't join the cluster" issue goes to the cluster infra team and this behaviour sounds like it'll make this more common. I'd be keen to make sure we do all we can to avoid more noise for the cluster infra team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue
with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this section to include these suggestions.
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a breaking change, why now?
I understand there's lots changing about our boot images, but, is this a one off, or a constant issue going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specifically is for dual-stream support, where in some version of OCP (likely 4.20?) that we will have a special RHEL-10 pool (design TBD), so your workers in the same OCP version will run different RHEL majors.
We will eventually have to have a RHEL9->10 upgrade path, so dual stream aside generally speaking I think we'd need to have cross compatibility, so we should probably clarify this.
But we would never want a RHEL9->11 upgrade path, I think would be the only breaking case.
In the last push:
|
fd8861c
to
54d1b56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this!
One thing that I think is implied but should probably be spelled out more is how skew comparison actually works. I.e. are we literally parsing RHCOS bootimage version strings and doing comparisons (in that case, recent versioning changes make that trickier)?
Or I think a saner approach is to compare OCP versions instead given that RHCOS bootimage versioning is not super meaningful to the rest of OCP. I.e. the skew policies would reference OCP versions and the coreos-bootimages configmap would reference the OCP version it's for?
In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. | ||
In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. | ||
|
||
**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is confusing because two paragraphs above we say that the MCO will ignore user customizations in the stub and here we say that we can now preserve user customizations. Can we fold this sentence back into that paragraph and reword to reflect exactly what the strategy is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I can reconcile this 👍
@@ -77,7 +85,7 @@ __Overview__ | |||
- `ManagedBootImages` feature gate is active | |||
- The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object. | |||
- The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing. | |||
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after atleast 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. | |||
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after at least 1 master node has successfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. | |||
|
|||
If any of the above checks fail, the MSBIC will exit out of the sync. | |||
- Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment lower than this, but: should the MSBIC add an owner reference to itself on the MachineSet after updating it? (And obviously change the precondition checks above to check whether the MachineSet has either no owner, or the MSBIC as owner.)
Otherwise, other controllers might have the same logic and also update without taking ownership and you still get thrashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hesitate to add an owner reference to a MachineSet
because the MCO is only really taking ownership of one specific field inside it, the providerSpec
. The remaining fields are still managed and updated by the MAPI operator. If someone else is causing thrashing in the providerSpec
field, we hope to get it ironed out in the early stages of supporting that platform by socializing this feature and the EP.
cc @JoelSpeed in case you have anything to add here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think adding an owner reference would prevent thrashing. Generally there's nothing that would exist within the cluster that would be updating the boot images, but many users have add-ons that might.
GitOps may be forcing a specific spec, and would revert any change the MSBIC makes.
Hive also forces the MachineSet spec, and would revert any change the MSBIC makes.
In the future, we plan to support MachineDeployments, which would own the MachineSets. In this case we would want the MSBIC to update the MachineDeployment instead of the MachineSet.
We are also working on migrating MAPI to CAPI. In the future, writes to the MAPI resource may be denied, and the writer is then expected to update the CAPI resource instead, we will need to discuss how this is going to work.
Out of interest, what object you would expect the owner reference to point to?
```mermaid | ||
flowchart-elk TD; | ||
Start((Start)) -->MachineSetOwnerCheck[Does the MachineSet have an OwnerReference?] | ||
MachineSetOwnerCheck -->|Yes|Stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If we add an ownerReference for ourselves, I think this would require changing.)
``` | ||
Some points to note: | ||
- For bookkeeping purposes, the MCO will annotate the `MachineConfiguration` object when opting in the cluster by default. | ||
- This mechanism will be active on installs and upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could it make sense to have different behaviours for new installs vs upgrades? So e.g. when we GA bootimage updates for a platform, we turn it on for new installs. For upgrades, we turn it on on the next release. This provides a natural "rollout" and gives us a higher chance of finding issues before it's on across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me and Jerry went back and forth over this, but realistically delaying this by one release doesn't buy us much more information. The biggest failure point (at least for the currently GAed platforms) is having an un-upgradeable, user customized, spec 2 ignition stub, and any clusters that are born on 4.19 or later would always have a spec 3 stub - so we'd only get this sort of feedback on the "upgrade" cases.
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. | ||
|
||
A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will default to the install time boot image, which can be recovered from the [aleph version](https://github.com/coreos/coreos-assembler/pull/768) of the control plane nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past the first update, can you clarify how the MSBIC knows which bootimage version is in a MachineSet? Will it add e.g. an annotation on the MachineSet when it patches it?
The way this relates to this line here is that I think rather than using the aleph of the control plane nodes, we could also just make the installer add the necessary annotation when it creates the MachineSet, right?
Clusters born from installers without that patch won't have the annotation which implies it's at least older than the OCP release containing the patch. Cluster born from installers with it will have it available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past the first update, can you clarify how the MSBIC knows which bootimage version is in a MachineSet? Will it add e.g. an annotation on the MachineSet when it patches it?
Based on the current discussion from the EP, I think the consensus is leaning towards an API for skew management instead of the configmap, so the "boot image version" of the cluster would have to be a field within there, editable by the user(for the non-managed cases) and the boot image controller. In fact, if we go with the approach of using OCP versions from your comment, we might be able to skip the boot image metadata determination issue altogether? For example:
-
When the boot image controller successfully completes a cluster wide boot image update, store the OCP version of the last update(retrievable from the boot images configmap) in the Skew Enforcement API.
-
The skew enforcement mechanism would monitor this value in the API, and compare against the skew limit described by the current release image. If the value in the API is empty, we default to OCP version at install time(I'm sure this is somewhere in the cluster)
Does that seem plausible, or am I missing something? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skew enforcement mechanism would monitor this value in the API, and compare against the skew limit described by the current release image. If the value in the API is empty, we default to OCP version at install time(I'm sure this is somewhere in the cluster)
When an admin manually has to update the boot image, how do they tell the cluster what version the new boot image is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #1761 (comment), PTAL
- Opt-out of skew enforcement altogether, giving up scaling ability. | ||
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue
with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly Jerry meant "inject that information into the Ignition config"?
d839abb
to
ddb8885
Compare
|
||
### Enforcement of bootimage skew | ||
|
||
There should be some mechanism that will alert the user when a cluster's bootimage are out of date. To allow for this, the release payload will gain a new field, which will store the OCP version of the minimum acceptable boot image for that release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I know this has been discussed before, but it is unclear to me how to add an additional field in the release image. Is there an API? Based on openshift docs, the release image is pulled by the installer from https://github.com/openshift/installer/blob/main/pkg/asset/releaseimage/default.go#L23, which I think is populated from ocp-build-data.
@djoshy: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is a follow-up update for #1496 and proposes a strategy for implementing an opt-out and skew enforcement mechanism for boot image updates. A lot of this is based on #1698 by @yuqi-zhang - Thanks, Jerry!
All comments and questions welcome. I have a few open questions for which I'll be leaving comments below.
cc @jlebon @wking
And sorta unrelated: I've also move some of the older flowcharts to mermaid diagrams as they are more maintainable.