Skip to content

docs: add RFC about CRD naming and policy lifecycle#45

Merged
dottorblaster merged 4 commits into
mainfrom
crd-lifecycle-rfc
Nov 26, 2025
Merged

docs: add RFC about CRD naming and policy lifecycle#45
dottorblaster merged 4 commits into
mainfrom
crd-lifecycle-rfc

Conversation

@dottorblaster

Copy link
Copy Markdown
Member

What this PR does / why we need it:
Turning the conversation about the revisit of the CRD into an RFC that we are going to implement progressively :-)

Which issue(s) this PR fixes
Issue #34

@dottorblaster dottorblaster changed the title doc: add RFC about CRD naming and policy lifecycle docs: add RFC about CRD naming and policy lifecycle Nov 20, 2025

@Andreagit97 Andreagit97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this!

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

Changes from the previous version:
- The WorkloadSecurityPolicy was renamed into WorkloadPolicy
- The WorkloadSecurityPolicyTuning was deleted and replaced by the status in the WorkloadPolicy resource.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can remove this one since it is no longer true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoopsie

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
apiVersion: security.rancher.io/v1alpha1
kind: WorkloadPolicyProposal
metadata:
name: workloadpolicyproposal-sample

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highlight how the name should be

Suggested change
name: workloadpolicyproposal-sample
name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
apiVersion: security.rancher.io/v1alpha1
kind: WorkloadPolicy
metadata:
name: postgres-policy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: postgres-policy
name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
Comment on lines +141 to +142
A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.
A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid reference to a previous version that is just in the google doc

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, my bad

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach.
As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid this phrase for the same reason " avoid reference to a previous version that is just in the google doc"

Suggested change
So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
- Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required.
- Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label

Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).
Since the label is not compulsory, we cannot rely on it to understand if a workload is covered or not; we should use a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...).

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.

SUSE is already distributing maintained container images through AppCo. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo will be open source, so I'm not sure we want these details here

@dottorblaster dottorblaster Nov 21, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write drunk, edit sober.

  • E. Hemingway

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
Comment on lines +196 to +197
otel-collector:
imagePolicyRef: otel-collector # name of the ImagePolicy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, we ended up with something like this to allow us to use different profiles for different rules. Today we just have executables, but tomorrow who knows

Suggested change
otel-collector:
imagePolicyRef: otel-collector # name of the ImagePolicy
otel-collector:
rules:
executables:
imagePolicyRef: otel-collector

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this way we can inject different rulesets for different cases. Files will be implemented over my dead body, but still worth thinking about them

@holyspectral holyspectral left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me! Some comments.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.
- When the observed workload is deleted, the associated WorkloadPolicyProposal is deleted as well.
- When we switch from a proposal to a real policy we delete the proposal and don’t recreate it again

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A edge case: If we delete the real policy, should we recreate the policy proposal again?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, If we delete the real policy, I don't think we need to recreate the policy proposal again?
The reason is in monitor mode, if the policy is deleted, our controller will create a new proposal when it observes workload behavior. In protect mode, if the policy is deleted, the workload is no longer protected, so new behavior can be observed. Our controller will create a new proposal when it detects activity as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the policy proposal will be automatically recreated once the policy is deleted and we restart the learning phase. Do you want me to specify that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to be specified, but when talking about the WorkloadPolicy, here we're talking about the WorkloadPolicyProposal

Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account.

Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.
Vendors already distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

@kyledong-suse kyledong-suse left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this RFC!
Generally LGTM. Just a couple of minor comments.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md

@flavio flavio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I left some comments and 👍

There are some sections that are missing, compared to the initial draft document we had:

  • Transitioning from Learn to Monitor mode
  • Transitioning from Monitor to Protect mode
  • Transitioning from Protect to Monitor mode
  • Transitioning from Protect to Learn mode - this is not on the document, and is actually something we wanted to add but in a different place of the document. I think it would make sense to promote that to a h<x> section
  • Removing a WorkloadPolicy

I think it would be worth to have a section explaining how this has no impact on how we plan to integrate with Tetragon (the Tetragon Integration section of the original doc).

Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
apiVersion: security.rancher.io/v1alpha1
kind: WorkloadPolicyProposal
metadata:
name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, then name of the deployment is not including numbers, this might seem similar to the random name associated with the underlying pods.

I've also changed the resource type to be a StatefulSet, which is a more realistic way to deploy a db.

Suggested change
name: deploy-pgsql-8646457455 # <workload_type>-<workload_name>
metadata:
name: statefulsets-pgsql # <workload_type>-<workload_name>

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
Comment on lines +62 to +64
- apiVersion: apps/v1
kind: Deployment
name: pgsql-8646457455

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- apiVersion: apps/v1
kind: Deployment
name: pgsql-8646457455
- apiVersion: v1
kind: StatefulSet
name: pgsql

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

Notes on the behavior:

- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed.
- The WorkloadPolicyProposal has an `ownerReference` that ties it back to the workload resource for which the behaviour was observed.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated

Changes compared to the current implementation:

- The rules section has been replaced by rulesByContainer. This new field holds a map with the name of the containers as key, and the list of the container rules as value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- The rules section has been replaced by `rulesByContainer`. This new field holds a map with the name of the containers as key, and the list of the container rules as value.
- The `WorkloadPolicy` does not have the label selector field to identify the pods to protect.

- /usr/bin/otel-collector
```

Changes compared to the current implementation:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also dropped the label selector. I think we don't need them anymore, isn't it? If that's the case, please mention that

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mentioned in the WorkloadPolicy section 👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I was under the impression we wanted to keep the podSelector. There is also a section in this document stating

- Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required.
- Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During some conversations we opted to make the label mandatory for a first iteration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the case. We decided to require the user to enter the "special" label to bind a policy to a Pod.

As for the WorkloadPolicyProposal, the example above is not showing the selector anymore, which I think is correct. We do not need that selector to be able to associate a Pod seen by our agent to the workload it belongs to.

If that's the case (we technically don't need that selector), can you add a line pointing out that selector has been dropped from the CRD?

Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md Outdated
- In case of workload rollout, the WorkloadPolicy remains unchanged. If it causes issues with the rollout, the user is in charge of rolling back to the previous version or destroying the policy

## Binding a WorkloadPolicy
A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is wrong, it's the last proposal that we then revisited during the call.

There's no podSelector inside of the WorkloadPolicy. The binding is done by adding the <special label>: <policy name> to the Pod definition.

Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md
Comment thread docs/rfc/0004-crds-policy-lifecycle.md

@flavio flavio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the changes you've applied

Comment thread docs/rfc/0004-crds-policy-lifecycle.md
@flavio flavio linked an issue Nov 26, 2025 that may be closed by this pull request
3 tasks
@dottorblaster dottorblaster merged commit e1b913d into main Nov 26, 2025
10 of 11 checks passed
@dottorblaster dottorblaster deleted the crd-lifecycle-rfc branch November 26, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit current CRDs and user workflow

5 participants