Skip to content

enhancements/authentication: Pod Security Admission Enforcement #1747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

ibihim
Copy link
Contributor

@ibihim ibihim commented Jan 30, 2025

What

A proposal how to safely start enforcing.

Why

It isn't as trivial as we hoped for in 4.11 :)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2025
@openshift-ci openshift-ci bot requested a review from jan--f January 30, 2025 17:29
Copy link
Contributor

openshift-ci bot commented Jan 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dustymabe for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested a review from jwmatthews January 30, 2025 17:29
@ibihim ibihim force-pushed the ibihim/psa-enforcement branch from 12f022d to f95e7a9 Compare February 3, 2025 15:09
@ibihim ibihim changed the title [WIP] enhancements/authentication: Pod Security Admission Enforcement Config enhancements/authentication: Pod Security Admission Enforcement Config Feb 3, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2025
@ibihim ibihim changed the title enhancements/authentication: Pod Security Admission Enforcement Config enhancements/authentication: Pod Security Admission Enforcement Feb 3, 2025
Comment on lines 213 to 218
// TargetModeConditional indicates that the user is willing to let the cluster
// automatically enforce a stricter enforcement once there are no violating Namespaces.
// If violations exist, the cluster stays in its previous state until those are resolved.
// This allows a gradual move towards label and global config enforcement without
// immediately breaking workloads that are not yet compliant.
TargetModeConditional PSATargetMode = "Conditional"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we keep the cluster in the previous state? Is there another middle ground we can use?

The automatic enforcement thing is something that gives me pause and could be prone to some confusion. What if we had something like:

  • Privileged - no pod security restrictions
  • RestrictedExceptViolating - strictest possible enforcement, automatically exempting namespaces that would violate
  • Restricted - strictest possible configuration. Does not rollout if there are violating namespaces.

Also, do we already have an API mapping to the PodSecurity Admission Controller configuration? If not, would having an API surface that maps to that help us come up with something that isn't an all-or-nothing type transition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@ibihim ibihim Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an interesting idea, but how would such a thing work?

I mean it would be pointless, if violating namespaces would be added dynamically to such a list, wouldn't it?
So would it be a list that is set initially and could only be reduced? What if there are like 100 namespaces out of 102 in there? Does it signal to the customer, that they are "enforcing restricted" (false sense of security)?

Copy link
Contributor Author

@ibihim ibihim Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving the customer the capability to exclude NS upfront in a central place (through an API that maps the PodSecurity Configuration), would be an interesting concept, but it would mean that the kube-apiserver needs to roll-out a new version. It would be easier to label the particular namespace by hand with privileged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping to the PodSecurity Admission configuration is linked in the enhancement:

https://github.com/openshift/cluster-kube-apiserver-operator/blob/218530fdea4e89b93bc6e136d8b5d8c3beacdd51/pkg/cmd/render/render.go#L350-L358.

Ah, it looks like we don't allow for any configuration from the customer side to control the granularity of this beyond labeling the namespaces. I do wonder if it would make sense to provide the customer more nuanced control through an API that maps to this with some of our own defaulting logic? Why don't we do this today?

It is an interesting idea, but how would such a thing work?

Haven't gone into detailed thinking of how, but my general line of thinking is instead of all-or-nothing, something like RestrictedExceptViolating could set the default enforcement to restricted with some kind of configuration to not enforce that mode on Namespaces that would violate the restricted enforcement mode. Have to spend some more time thinking on what approach would be best, but I think that could be:

  • configures the exemptions.namespaces in the PodSecurity admission plugin configuration with violating namespaces
  • controller adds label to the violating namespaces to specify that the namespace should be evaluated with the privileged mode.

I mean it would be pointless, if violating namespaces would be added dynamically to such a list, wouldn't it?
So would it be a list that is set initially and could only be reduced? What if there are like 100 namespaces out of 102 in there?

From the motivation section of this enhancement, it sounded like we are pretty confident that the number of violating namespaces are generally pretty low. The scenario you point out is definitely something that could happen, but how likely is it?

Does it signal to the customer, that they are "enforcing restricted" (false sense of security)?

Maybe, but I would expect that we should still have some mechanism in place to bring violating namespaces to the attention of cluster administrators. Is it better to be "secure by default, except for these cases we have made you aware of" or "not secure at all"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this in one of the recent Arch Calls, right?

It is nearly impossible for automated auditing tools to recognize. Which would make us incompatible with some auditing tools. And this is an important thing for us to avoid.

The OCP Arch Call - 4th of February 2025

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think was specifically related to the exemptions piece. I believe a labeling/annotation approach would be auditable because it would be making an update call to the kube-apiserver to update the namespace labels/annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could modify the enhancement, such that the psa label syncer enforces labels, where it works and doesn't where it does not.

Currently, I proposed that if not all of those namespaces would work, we don't proceed to label them at all and wait for those violations to be resolved.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass focusing on the new API. Left a couple more ideas for how we may be able to get away from an "automated" transition thing.

The automated transitioning piece is still giving me pause and if we can come up with a solution that achieves the same goal with a fully declarative approach it might result in a better user and maintainer experience

In addition to adjusting how the `OpenShiftPodSecurityAdmission` `FeatureGate` behaves, administrators need visibility and control throughout this transition.
A new API is necessary to provide this flexibility.

#### New API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjenning Asked in slack which group this API would be added to. Would be good to include that information here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the very first pitch in the Arch Call and @JoelSpeed's comment, I would put it into config/v1alpha1. I think this is the right package for configs that should be modifiable by users.

// to "Restricted" on the kube-apiserver.
// This represents full enforcement, where both Namespace labels and the global config
// enforce Pod Security Admission restrictions.
EnforcmentModeFull PSAEnforcementMode = "FullEnforcement"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just Restricted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the "LabelEnforcement" mode, which tells the PSA label syncer to set the enforce label AND on top of that it modifies the PodSecurity configuration of the kube apiserver to restricted by default.

Should I revisit the comment above to make it more explicit?

Comment on lines 209 to 210
// in a fully privileged (pre-rollout) state, ignoring any label enforcement
// or global config changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mode actually ignore label enforcement?

If I am following https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller correctly, it seems like the defaults only apply when there are no labels. Is Privileged effectively the same as disabling the PodSecurity admission plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TargetModePrivileged is set, based on what I tried to articulate, is that we effectively disable any restrictions, which means it should behave as it does in all cluster before.

In detail, we could argue about how we would like to do it. I would just revert everything, like removing the enforce label set by the PSA label syncer and set the global config back to privileged.
Logically (based on the name) more correct would be to set the enforce label to privileged. But as mentioned above, I would like to keep the permutations of possible state as minimal as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the expected behavior from a user perspective. If I, as a user, were to specify Privileged as the target mode, what happens? If in this state I add a PSa label to a namespace to enforce a different mode like restricted or baseline for that namespace, will it be enforced?

I think there is a distinct difference between "I don't want Pod Security at all" and "I want the most privileged Pod Security default, but still respect anything that would override the default" and I'm trying to make sure I understand which one of those this value is supposed to represent.

Copy link
Contributor Author

@ibihim ibihim Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value currently represents, currently as in this enhancement, the intend to "I don't want Pod Security at all".
From a user perspective it might be more logical to have it actually set to "I want the most privileged Pod Security default"...

... until something goes wrong and they want "the previous state, before it started to fall apart".

I am fine both ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "I don't want Pod Security at all", maybe something like None is more reflective of that and in that case should disable the Pod Security admission plugin all together?

TargetModeConditional PSATargetMode = "Conditional"

// TargetModeRestricted indicates that the user wants the strictest possible
// enforcement, causing the cluster to ignore any existing violations and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring existing violations, should the cluster operator go degraded with any existing violations being the reason?

Maybe a knob for users to control this would be useful here to support use cases of:

  • Forcefully enable restricted PSA configuration
  • Enable restricted PSA configuration only if no violations (I realize this is what the Conditional enum type is for, but maybe that shouldn't be a separate option?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I elaborated a bit more on my thinking here in #1747 (comment)

// PSATargetMode reflects the user’s chosen (“target”) enforcement level.
type PSATargetMode string

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an option that is reflective of the LabelEnforcement enforcement mode that would be specified in the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I mentioned it in the Non-Goals.

The reason is, that I want to keep the permutations of possible state as minimal as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this question relates to the non goals you outlined. What target mode(s) might result in the LabelEnforcement enforcement mode being populated in the status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is part of this non-goal.

I am not eager to maintain too many states. Either you are privileged, restricted or on the way progressing forward through "LabelOnly" to "FullEnforcement".

Comment on lines 242 to 253
type PSAEnforcementConfigSpec struct {
// targetMode is the user-selected Pod Security Admission enforcement level.
// Valid values are:
// - "Privileged": ensures the cluster runs with no restrictions
// - "Conditional": defers the decision to cluster-based evaluation
// - "Restricted": enforces the strictest Pod Security admission
//
// If this field is not set, it defaults to "Conditional".
//
// +kubebuilder:default=Conditional
TargetMode PSATargetMode `json:"targetMode"`
}
Copy link
Contributor

@everettraven everettraven Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing from my comment on the Restricted target mode, maybe instead of only being able to specify a single enum value this should be a discriminated union that enables more flexibility in configuration?

To further show my line of thinking:

To represent the Privileged option a user would still do:

targetMode: Privileged

To represent a forceful enablement of Restricted:

targetMode: Restricted
restricted:
  violationPolicy: Ignore
  # room for future configuration options

To represent an enablement of Restricted only if no violations (would go to degraded on failure to enable):

targetMode: Restricted
restricted:
  violationPolicy: Block

To facilitate the transition case that the proposed Conditional targetMode seems to be targeting, maybe the default can be something like:

targetMode: Restricted
restricted:
  violationPolicy: LabelMinimallyRestrictive

Where any violating namespaces encountered will be labeled with the appropriately minimally restrictive PSS mode (which might be privileged), prior to attempting to set the default enforced by the PodSecurity admission controller to Restricted.

Copy link
Contributor Author

@ibihim ibihim Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly more expressive. Maybe something like:

targetMode:
  level: restricted
  violationPolicy: Block

or

targetMode: restricted
violationPolicy: Block

The default could be Block then, which would also work for targetMode: privileged 😄

With labelMinimallyRestrictive... the label syncer isn't able to determine this yet and it changes its view point. Currently it determines the future outlook based on the SA-based RBAC. Now it would need to have another level of concern: testing individual workloads against up to 2 levels (do you satisfy restricted, do you satisfy privileged?).

@ibihim ibihim force-pushed the ibihim/psa-enforcement branch from 5d1a078 to 930c8a4 Compare February 17, 2025 10:44
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of my questions probably result from a lack of knowledge about PSA and how it actually works/what configurations exist today. Where can I learn more about the various options for it's configuration so that I can better review this EP?

This enhancement introduces a **new cluster-scoped API** and changes to the relevant controllers to rollout [Pod Security Admission (PSA)](https://kubernetes.io/docs/concepts/security/pod-security-admission/) enforcement [in OpenShift](https://www.redhat.com/en/blog/pod-security-admission-in-openshift-4.11).
Enforcement means that the `PodSecurityAdmissionLabelSynchronizationController` sets the `pod-security.kubernetes.io/enforce` label on Namespaces, and the PodSecurityAdmission plugin enforces the `Restricted` [Pod Security Standard (PSS)](https://kubernetes.io/docs/concepts/security/pod-security-standards/) globally on Namespaces without any label.

The new API allows users to either enforce the `Restricted` PSS or maintain `Privileged` PSS for several releases. Eventually, all clusters will be required to use `Restricted` PSS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, all clusters will be required to use Restricted PSS.

How and why? There will be no get out of jail card at all eventually? Why is this not a cluster admins choice?

Copy link
Contributor Author

@ibihim ibihim Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secure by default? 😄

We would like to nudge our users to move to the most secure setup at some point. @deads2k suggested that there could be a period, where we tolerate being Privileged and start blocking upgrades after x-releases.

There is no reason to run with a Privileged global setup. A user can set namespaces individually to match the required PSS, if not properly labeled by the PodSecurityAdmissionLabelSynchronizationController (which should only happen in the case that the SCCs are user-based).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a section into Motivation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the motivation of "secure by default", I don't think we should totally prevent a cluster admin from choosing the security profile they desire for their cluster.

I think it is totally reasonable for us to have an opinionated and strongly encouraged default.

Although these numbers are now quite low, it is essential to avoid any scenario where users end up with failing workloads.

To ensure a safe transition, this proposal suggests that if a potential failure of workloads is being detected in release `n`, that the operator moves into `Upgradeable=false`.
The user would need to either resolve the potential failures or set the enforcing mode to `Privileged` for now in order to be able to upgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps they label the violating namespaces as privileged and allow the rest of the cluster to be restricted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add this option.

If a user encounters `status.violatingNamespaces` it is expected to:

- resolve the violating Namespaces in order to be able to `Upgrade` or
- set the `spec.enforcementMode=Privileged` and solve the violating Namespaces later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user defers this, how do we know that they will fix it later? What guidance can we provide them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need provide guidance to the customer, the question would be what is the most effective way? Blog entries? This enhancement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would seek the guidance of the appropriate technical writer. I could imagine a combination of release notes and product docs/KCS

#### PodSecurityAdmissionLabelSynchronizationController

The [PodSecurityAdmissionLabelSynchronizationController (PSA label syncer)](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) must watch the `status.enforcementMode` and the `OpenShiftPodSecurityAdmission` `FeatureGate`.
If `spec.enforcementMode` is `Restricted` and the `FeatureGate` `OpenShiftPodSecurityAdmission` is enabled, the syncer will set the `pod-security.kubernetes.io/enforce` label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This label goes on each namespace? Or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each namespace, except it is:

  • a run-level 0 namespace (default, kube-system,...), an
  • openshift pre-fixed namespace or
  • a namespace which disabled the PSA label syncer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will list those Namespaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying this behaviour briefly in the EP would be useful for readers IMO


### Fresh Installs

Needs to be evaluated. The System Administrator needs to pre-configure the new API’s `spec.enforcementMode`, choosing whether the cluster will be `Privileged` or `Restricted` during a fresh install.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the installer generate the correct (Restricted) configuration to be installed? Then most people won't care, but if someone really wanted to, they could pause at the manifests stage and edit the resource to Privileged instead

Copy link
Contributor Author

@ibihim ibihim Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if the default will be Restricted, it would default to enforcement, right? The question that is left open for me is: if a customer wants to default to Privileged on fresh installs, is there an option to do it? Is there some documentation wrt? Or is it common to let the SysAdmin change that config post-installation when setting up a cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have this ability already generally. Lets say we did nothing here, then the cluster installs, the resource gets generated and the enforcement field is empty. They can now go set that to privileged and continue with their cluster until we force them backwards.

Or, we can enable them to change this at install time. Which comes in two forms.

We add an explicit option to the install config, and plumb the value through.

Or, we generate the manifests and document that a modified manifest at the manifests stage could be used to change the behaviour for fresh clusters.

The first of the install time options requires the second to be implemented anyway.

Setting it from the installer in this manner will allow us then to have a distinction between clusters that were installed with PSA restricted, vs those that upgraded and had no opinion. I suspect there is some value in that 🤔


While the root causes need to be identified in some cases, the result of identifying a violating Namespace is understood.

#### New SCC Annotation: `security.openshift.io/ValidatedSCCSubjectType`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: fix casing to match change

- List which namespaces aren't managed by the PSA label syncer
- Explain, why we want to force restricted PSA enforcement eventually.
- Add a guide on how to handle violations.
@ibihim
Copy link
Contributor Author

ibihim commented Feb 24, 2025

A lot of my questions probably result from a lack of knowledge about PSA and how it actually works/what configurations exist today. Where can I learn more about the various options for it's configuration so that I can better review this EP?

@JoelSpeed, I tried to set as many links as possible. Especially at the beginning, but it doesn't seem to be obvious as you are not the first to ask 😄

Upstream resources:

Downstream resources:

  • Initial blog post from the authors of the PSA label syncer controller: blog post.
  • Integrating PSA into OpenShift: enhancement.
  • The enhancement for the PSA label syncer: enhancement.
  • OpenShift Docs for the user perspective: docs.
  • Some other Red Hat labeled blog post wrt PSA and SCCs: blog.

- Are prefixed with `openshift`,
- Have the label `security.openshift.io/scc.podSecurityLabelSync=false`.
- Have the `pod-security.kubernetes.io/enforce` label set manually.
- Are not a run-level zero Namespace:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Are not a run-level zero Namespace:
- Are a run-level zero Namespace:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the label syncer doesn't ignore run-level zero (or one) namespaces explicitly; it just happens that the ones listed here are run-level 0.

Having said that, admission plugins (and also PSa) are not enabled for runlevel 0/1 namespaces, so any labels will be ineffective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will correct the whole list to reflect this: https://github.com/openshift/enhancements/pull/1747/files#r1971301549


To ensure a safe transition, this proposal suggests that if a potential failure of workloads is being detected in release `n`, that the operator moves into `Upgradeable=false`.
The user would need to either resolve the potential failures, setting a higher PSS label for that Namespace or set the enforcing mode to `Privileged` for now in order to be able to upgrade.
`Privileged` will keep the cluster in the previous state, the non enforcing state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth elaborating on what the "non enforcing state" entails. Does this mean the default global enforcement is privileged but cluster admins can make certain namespaces more restrictive? Or does it mean that there is no pod security enforcement at all (i.e PodSecurity admission plugin disabled)?

Comment on lines 48 to 50
The temporary `Privileged` mode (opt-out from PSA enforcement) exists solely to facilitate a smooth transition. Once a vast majority of clusters have adapted their workloads to operate under `Restricted` PSS, maintaining the option to run in `Privileged` mode would undermine these security objectives.

OpenShift strives to offer the highest security standards. Enforcing PSS ensures that OpenShift is at least as secure as upstream Kubernetes and that OpenShift complies with upstreams security best practices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't upstream Kubernetes ship with privileged global enforcement by default? Even though it is strongly recommended to set a more restrictive policy, ultimately it's up to the cluster administrator.

Are there any use cases that customers may have that could warrant an option to set the cluster to a less restrictive global state than Restricted? Maybe there is a middle ground that allows cluster administrators to do this, but they explicitly acknowledge that they are moving to an explicitly unsupported state?

Something else to think about purely from an API change perspective - removing an allowed value for a field is going to be a breaking change to users and will likely require an explicit admin action when we want to remove it as an allowed value.

### New API

This API is used to support users to enforce PSA.
As this is a transitory process, this API will loose its usefulness once PSA enforcement isn't optional anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to previous comments I've made, I think it might be worth keeping this API around to allow cluster admins flexibility in how they configure the PodSecurity admission plugin. I think it is fine for us to have opinionated and encouraged defaults.

Have we thought through different use cases that admins may have and why they may or may not want to be able to configure the global enforcement mode?

Copy link
Contributor

@everettraven everettraven Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I think the introduction of an API means we have to support it. If we don't want to support a new API long term, is there an alternative approach that does not use an API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

David's suggestion is to use LegacyConfig, which would infer that the field is introduced and deprecated immediately, and can then go through the deprecation cycle

Copy link
Contributor Author

@ibihim ibihim Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine a use-case, where someone wouldn't want to have the highest possible security setting on their production cluster.
If I would imagine that I have a test-cluster, this could be annoying to deal with, if the legacyMode is gone. But it isn't like I could not add a mechanism to auto-label namespaces with another PSS.


#### PodSecurityAdmissionLabelSynchronizationController

The [PodSecurityAdmissionLabelSynchronizationController (PSA label syncer)](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) must watch the `status.enforcementMode` and the `OpenShiftPodSecurityAdmission` `FeatureGate`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [PodSecurityAdmissionLabelSynchronizationController (PSA label syncer)](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) must watch the `status.enforcementMode` and the `OpenShiftPodSecurityAdmission` `FeatureGate`.
The [PodSecurityAdmissionLabelSynchronizationController (PSA label syncer)](https://github.com/openshift/cluster-policy-controller/blob/master/pkg/psalabelsyncer/podsecurity_label_sync_controller.go) must watch the `spec.enforcementMode` and the `OpenShiftPodSecurityAdmission` `FeatureGate`.

?

Copy link
Contributor Author

@ibihim ibihim Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would assume, based on the current state of the enhancement, that spec indicates the intend and status is its actual state aka green-lighting of the intend. So the psa label syncer would not care what the intend is, but what the PodSecurityReadinessController signals in the status.

In the Upgradeable=false scenario, this shouldn't be the case as the cluster wouldn't upgrade, if the spec doesn't reflect the doable state and in the auto-legacy-mode-switching scenario, the "status" would be the spec and vice versa...

So I think you are correct.

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 17, 2025
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Apr 25, 2025
Copy link
Contributor

openshift-ci bot commented Apr 25, 2025

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ibihim
Copy link
Contributor Author

ibihim commented May 5, 2025

/reopen
remove-lifecycle rotten

@openshift-ci openshift-ci bot reopened this May 5, 2025
Copy link
Contributor

openshift-ci bot commented May 5, 2025

@ibihim: Reopened this PR.

In response to this:

/reopen
remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ibihim
Copy link
Contributor Author

ibihim commented May 5, 2025

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 5, 2025
@ibihim ibihim force-pushed the ibihim/psa-enforcement branch 6 times, most recently from c935c2f to 9421702 Compare May 9, 2025 17:11
@ibihim ibihim force-pushed the ibihim/psa-enforcement branch from 9421702 to f3e0563 Compare May 9, 2025 17:17

## Open Questions

## Rerun the evaluation before enforcing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to add alerts in release n and run the actual final evaluation right before potentially enforcing as between release n and release n+1 could be a gap of months.

Copy link
Contributor

openshift-ci bot commented May 12, 2025

@ibihim: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 7aad97c link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// PSAEnforcementConfig is a config that supports the user in the PSA enforcement transition.
// The spec struct enables a user to stop the PSA enforcement, if necessary.
// The status struct supports the user in identifying obstacles in PSA enforcement.
type PSAEnforcementConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was to create a top level LegacyConfig that could be used again in the future for other similar scenarios. I wasn't party to any conversation where that was reversed at least

// PSAEnforcementConfigSpec is a configuration option that enables the customer to dictate the PSA enforcement outcome.
type PSAEnforcementConfigSpec struct {
// enforcementMode gives the user different options:
// - "" will be interpreted as "Restricted".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In release N, this API doesn't actually do anything. We would want to reword this to be more like

Valid values are Restricted, Legacy and omitted.
Omitted means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
The current default is Legacy.

Then in the release where we enforce restricted, we would update that doc

// taken place, which is particularly important during upgrades to ensure the
// controller had an opportunity to check for violations before enforcing PSA.
// +optional
LastEvaluationTime metav1.Time `json:"lastEvaluationTime,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will tick, and cause regular updates. Do we need to keep a timestamp? Or can we use a condition like LastTransitionTime kind of thing?

Perhaps we want to track the last time we saw a new violating namespace?

If a user encounters `status.violatingNamespaces` it is expected to:

- resolve the violations in the Namespaces to be able to safely move to `Restricted` or
- set the `spec.enforcementMode=Legacy` and solve the violating Namespaces later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't doing this automatically for them?


## Rerun the evaluation before enforcing

It could be better to run the evaluation before enforcing `Restricted` mode as the last run can be a couple of months old.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be a couple of months old?

Comment on lines +381 to +384
- Release `n-1`:
- Backport the API.
- Release `n`:
- Enable the `PodSecurityReadinessController` to use the API by setting it's `status` and if violating Namespaces are found also `spec.enforcementMode = Legacy`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be separate? What's the benefit of having the API with no consumer in the previous release?

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2025
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 21, 2025
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jun 29, 2025
Copy link
Contributor

openshift-ci bot commented Jun 29, 2025

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants