docs: add RFC about CRD naming and policy lifecycle by dottorblaster · Pull Request #45 · rancher-sandbox/runtime-enforcer

dottorblaster · 2025-11-20T13:44:05Z

What this PR does / why we need it:
Turning the conversation about the revisit of the CRD into an RFC that we are going to implement progressively :-)

Which issue(s) this PR fixes
Issue #34

Andreagit97

thank you for this!

Andreagit97 · 2025-11-21T13:42:40Z

+
+Changes from the previous version:
+- The WorkloadSecurityPolicy was renamed into WorkloadPolicy
+- The WorkloadSecurityPolicyTuning was deleted and replaced by the status in the WorkloadPolicy resource.


Probably we can remove this one since it is no longer true

Andreagit97 · 2025-11-21T13:45:33Z

+apiVersion: security.rancher.io/v1alpha1
+kind: WorkloadPolicyProposal
+metadata:
+  name: workloadpolicyproposal-sample


I would highlight how the name should be

Suggested change

name: workloadpolicyproposal-sample

name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

Andreagit97 · 2025-11-21T13:46:19Z

+apiVersion: security.rancher.io/v1alpha1
+kind: WorkloadPolicy
+metadata:
+  name: postgres-policy


Suggested change

name: postgres-policy

name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

Andreagit97 · 2025-11-21T13:47:50Z

+A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
+As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. 


Suggested change

A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.

As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

I would avoid reference to a previous version that is just in the google doc

You're right, my bad

Andreagit97 · 2025-11-21T13:50:15Z

+A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
+As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. 
+
+So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory.


We can avoid this phrase for the same reason " avoid reference to a previous version that is just in the google doc"

Suggested change

So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory.

Andreagit97 · 2025-11-21T13:50:57Z

+- Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required.
+- Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label
+
+Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...). 


Suggested change

Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).

Since the label is not compulsory, we cannot rely on it to understand if a workload is covered or not; we should use a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).

Andreagit97 · 2025-11-21T13:52:06Z

+
+Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.
+
+SUSE is already distributing maintained container images through AppCo. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.


This repo will be open source, so I'm not sure we want these details here

Write drunk, edit sober.

E. Hemingway

Andreagit97 · 2025-11-21T13:55:17Z

+    otel-collector:
+      imagePolicyRef: otel-collector # name of the ImagePolicy


If I recall correctly, we ended up with something like this to allow us to use different profiles for different rules. Today we just have executables, but tomorrow who knows

Suggested change

otel-collector:

imagePolicyRef: otel-collector # name of the ImagePolicy

otel-collector:

rules:

executables:

imagePolicyRef: otel-collector

Yes, this way we can inject different rulesets for different cases. Files will be implemented over my dead body, but still worth thinking about them

holyspectral

I think it looks good to me! Some comments.

holyspectral · 2025-11-21T17:23:50Z

+
+- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.
+- When the observed workload is deleted, the associated WorkloadPolicyProposal is deleted as well.
+- When we switch from a proposal to a real policy we delete the proposal and don’t recreate it again


A edge case: If we delete the real policy, should we recreate the policy proposal again?

IMO, If we delete the real policy, I don't think we need to recreate the policy proposal again?
The reason is in monitor mode, if the policy is deleted, our controller will create a new proposal when it observes workload behavior. In protect mode, if the policy is deleted, the workload is no longer protected, so new behavior can be observed. Our controller will create a new proposal when it detects activity as well.

Yes, the policy proposal will be automatically recreated once the policy is deleted and we restart the learning phase. Do you want me to specify that?

I think it's useful to be specified, but when talking about the WorkloadPolicy, here we're talking about the WorkloadPolicyProposal

holyspectral · 2025-11-21T17:28:29Z

+
+Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.
+
+Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.


Suggested change

Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

Vendors already distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

kyledong-suse

Thank you so much for working on this RFC!
Generally LGTM. Just a couple of minor comments.

flavio

Overall LGTM, I left some comments and 👍

There are some sections that are missing, compared to the initial draft document we had:

Transitioning from Learn to Monitor mode
Transitioning from Monitor to Protect mode
Transitioning from Protect to Monitor mode
Transitioning from Protect to Learn mode - this is not on the document, and is actually something we wanted to add but in a different place of the document. I think it would make sense to promote that to a h<x> section
Removing a WorkloadPolicy

I think it would be worth to have a section explaining how this has no impact on how we plan to integrate with Tetragon (the Tetragon Integration section of the original doc).

flavio · 2025-11-25T08:40:14Z

+apiVersion: security.rancher.io/v1alpha1
+kind: WorkloadPolicyProposal
+metadata:
+  name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>


nit, then name of the deployment is not including numbers, this might seem similar to the random name associated with the underlying pods.

I've also changed the resource type to be a StatefulSet, which is a more realistic way to deploy a db.

Suggested change

name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

metadata:

name: statefulsets-pgsql # <workload_type>-<workload_name>

flavio · 2025-11-25T08:40:49Z

+  - apiVersion: apps/v1
+    kind: Deployment
+    name: pgsql-8646457455


Suggested change

- apiVersion: apps/v1

kind: Deployment

name: pgsql-8646457455

- apiVersion: v1

kind: StatefulSet

name: pgsql

flavio · 2025-11-25T08:41:22Z

+
+Notes on the behavior:
+
+- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.


Suggested change

- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.

- The WorkloadPolicyProposal has an `ownerReference` that ties it back to the workload resource for which the behaviour was observed.

flavio · 2025-11-25T08:44:56Z

+
+Changes compared to the current implementation:
+
+- The rules section has been replaced by rulesByContainer. This new field holds a map with the name of the containers as key, and the list of the container rules as value.


- The rules section has been replaced by `rulesByContainer`. This new field holds a map with the name of the containers as key, and the list of the container rules as value. - The `WorkloadPolicy` does not have the label selector field to identify the pods to protect.

flavio · 2025-11-25T08:45:30Z

+           - /usr/bin/otel-collector
+```
+
+Changes compared to the current implementation:


I think we also dropped the label selector. I think we don't need them anymore, isn't it? If that's the case, please mention that

It is mentioned in the WorkloadPolicy section 👍

Maybe I'm missing something, but I was under the impression we wanted to keep the podSelector. There is also a section in this document stating

- Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required. - Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label

During some conversations we opted to make the label mandatory for a first iteration.

Yes, that's the case. We decided to require the user to enter the "special" label to bind a policy to a Pod.

As for the WorkloadPolicyProposal, the example above is not showing the selector anymore, which I think is correct. We do not need that selector to be able to associate a Pod seen by our agent to the workload it belongs to.

If that's the case (we technically don't need that selector), can you add a line pointing out that selector has been dropped from the CRD?

flavio · 2025-11-25T08:48:49Z

+- In case of workload rollout, the WorkloadPolicy remains unchanged. If it causes issues with the rollout, the user is in charge of rolling back to the previous version or destroying the policy
+
+## Binding a WorkloadPolicy
+A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.


I think this section is wrong, it's the last proposal that we then revisited during the call.

There's no podSelector inside of the WorkloadPolicy. The binding is done by adding the <special label>: <policy name> to the Pod definition.

flavio

LGTM, thanks for all the changes you've applied

dottorblaster force-pushed the crd-lifecycle-rfc branch from b7ab12e to c0d46bd Compare November 20, 2025 16:40

dottorblaster changed the title ~~doc: add RFC about CRD naming and policy lifecycle~~ docs: add RFC about CRD naming and policy lifecycle Nov 20, 2025

docs: add RFC about CRD naming and policy lifecycle

8f690cd

dottorblaster force-pushed the crd-lifecycle-rfc branch from c0d46bd to 8f690cd Compare November 20, 2025 16:59

dottorblaster marked this pull request as ready for review November 20, 2025 17:13

dottorblaster mentioned this pull request Nov 20, 2025

Revisit current CRDs and user workflow #34

Closed

3 tasks

dottorblaster requested review from Andreagit97, flavio, holyspectral and kyledong-suse November 20, 2025 17:16

Andreagit97 reviewed Nov 21, 2025

View reviewed changes

fixup! docs: add RFC about CRD naming and policy lifecycle

216dd12

holyspectral reviewed Nov 21, 2025

View reviewed changes

kyledong-suse reviewed Nov 21, 2025

View reviewed changes

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

fixup! docs: add RFC about CRD naming and policy lifecycle

66030c2

This was referenced Nov 24, 2025

[EPIC] WorkloadPolicyProposal rework #49

Closed

[EPIC] WorkloadPolicy rework #50

Closed

flavio requested changes Nov 25, 2025

View reviewed changes

dottorblaster mentioned this pull request Nov 25, 2025

Remove ClusterWorkloadSecurityPolicy from the codebase #57

Closed

holyspectral reviewed Nov 25, 2025

View reviewed changes

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

fixup! docs: add RFC about CRD naming and policy lifecycle

5b4c2f5

dottorblaster requested review from Andreagit97 and flavio November 25, 2025 16:37

flavio approved these changes Nov 26, 2025

View reviewed changes

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

flavio linked an issue Nov 26, 2025 that may be closed by this pull request

Revisit current CRDs and user workflow #34

Closed

3 tasks

dottorblaster merged commit e1b913d into main Nov 26, 2025
10 of 11 checks passed

dottorblaster deleted the crd-lifecycle-rfc branch November 26, 2025 09:43

kyledong-suse mentioned this pull request Dec 4, 2025

chore: remove obsolete ClusterWorkloadSecurityPolicy from the codebase #65

Merged

4 tasks

	name: workloadpolicyproposal-sample
	name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

	name: postgres-policy
	name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

		A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
		As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

	Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).
	Since the label is not compulsory, we cannot rely on it to understand if a workload is covered or not; we should use a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).


		Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.

		SUSE is already distributing maintained container images through AppCo. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

		otel-collector:
		imagePolicyRef: otel-collector # name of the ImagePolicy


		Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.

		Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

	Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.
	Vendors already distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

	name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>
	metadata:
	name: statefulsets-pgsql # <workload_type>-<workload_name>


		Notes on the behavior:

		- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.

	- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.
	- The WorkloadPolicyProposal has an `ownerReference` that ties it back to the workload resource for which the behaviour was observed.


		Changes compared to the current implementation:

		- The rules section has been replaced by rulesByContainer. This new field holds a map with the name of the containers as key, and the list of the container rules as value.

Uh oh!

Conversation

dottorblaster commented Nov 20, 2025

Uh oh!

Andreagit97 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dottorblaster Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holyspectral left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kyledong-suse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flavio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dottorblaster Nov 21, 2025 •

edited

Loading