docs: add RFC about CRD naming and policy lifecycle#45
Conversation
b7ab12e to
c0d46bd
Compare
c0d46bd to
8f690cd
Compare
Andreagit97
left a comment
There was a problem hiding this comment.
thank you for this!
|
|
||
| Changes from the previous version: | ||
| - The WorkloadSecurityPolicy was renamed into WorkloadPolicy | ||
| - The WorkloadSecurityPolicyTuning was deleted and replaced by the status in the WorkloadPolicy resource. |
There was a problem hiding this comment.
Probably we can remove this one since it is no longer true
| apiVersion: security.rancher.io/v1alpha1 | ||
| kind: WorkloadPolicyProposal | ||
| metadata: | ||
| name: workloadpolicyproposal-sample |
There was a problem hiding this comment.
I would highlight how the name should be
| name: workloadpolicyproposal-sample | |
| name: deploy-pgsql-8646457455 # <workload_type>-<workload_name> |
| apiVersion: security.rancher.io/v1alpha1 | ||
| kind: WorkloadPolicy | ||
| metadata: | ||
| name: postgres-policy |
There was a problem hiding this comment.
| name: postgres-policy | |
| name: deploy-pgsql-8646457455 # <workload_type>-<workload_name> |
| A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach. | ||
| As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. |
There was a problem hiding this comment.
| A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach. | |
| As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. | |
| A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. |
There was a problem hiding this comment.
I would avoid reference to a previous version that is just in the google doc
There was a problem hiding this comment.
You're right, my bad
| A workload is protected by a WorkloadPolicy through a podSelector, like in the current approach. | ||
| As proposed in the previous version, we suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. | ||
|
|
||
| So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory. |
There was a problem hiding this comment.
We can avoid this phrase for the same reason " avoid reference to a previous version that is just in the google doc"
| So the difference with the previous version is that we simply leave users to choose their preferred approach. Having a dedicated label is still suggested, but not compulsory. |
| - Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required. | ||
| - Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label | ||
|
|
||
| Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...). |
There was a problem hiding this comment.
| Now that the label is no longer compulsory, we cannot rely on it to understand if a workload is covered or not; we should fall back to a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...). | |
| Since the label is not compulsory, we cannot rely on it to understand if a workload is covered or not; we should use a kubectl plugin that scrapes the resources and helps the user to understand the situation (potential conflict, partial workload coverage,...). |
|
|
||
| Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account. | ||
|
|
||
| SUSE is already distributing maintained container images through AppCo. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”. |
There was a problem hiding this comment.
This repo will be open source, so I'm not sure we want these details here
There was a problem hiding this comment.
Write drunk, edit sober.
- E. Hemingway
| otel-collector: | ||
| imagePolicyRef: otel-collector # name of the ImagePolicy |
There was a problem hiding this comment.
If I recall correctly, we ended up with something like this to allow us to use different profiles for different rules. Today we just have executables, but tomorrow who knows
| otel-collector: | |
| imagePolicyRef: otel-collector # name of the ImagePolicy | |
| otel-collector: | |
| rules: | |
| executables: | |
| imagePolicyRef: otel-collector |
There was a problem hiding this comment.
Yes, this way we can inject different rulesets for different cases. Files will be implemented over my dead body, but still worth thinking about them
holyspectral
left a comment
There was a problem hiding this comment.
I think it looks good to me! Some comments.
|
|
||
| - The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed. | ||
| - When the observed workload is deleted, the associated WorkloadPolicyProposal is deleted as well. | ||
| - When we switch from a proposal to a real policy we delete the proposal and don’t recreate it again |
There was a problem hiding this comment.
A edge case: If we delete the real policy, should we recreate the policy proposal again?
There was a problem hiding this comment.
IMO, If we delete the real policy, I don't think we need to recreate the policy proposal again?
The reason is in monitor mode, if the policy is deleted, our controller will create a new proposal when it observes workload behavior. In protect mode, if the policy is deleted, the workload is no longer protected, so new behavior can be observed. Our controller will create a new proposal when it detects activity as well.
There was a problem hiding this comment.
Yes, the policy proposal will be automatically recreated once the policy is deleted and we restart the learning phase. Do you want me to specify that?
There was a problem hiding this comment.
I think it's useful to be specified, but when talking about the WorkloadPolicy, here we're talking about the WorkloadPolicyProposal
|
|
||
| Most of the time, a Redis/Tomcat/NodeJS container image is always going to behave in the same way. There could be some exceptions, we must take that scenario into account. | ||
|
|
||
| Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”. |
There was a problem hiding this comment.
| Vendors alreadu distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”. | |
| Vendors already distribute maintained container images through their platforms. It would make sense to tie our profiles to the container images, rather than thinking about the concept of “workload”. |
kyledong-suse
left a comment
There was a problem hiding this comment.
Thank you so much for working on this RFC!
Generally LGTM. Just a couple of minor comments.
flavio
left a comment
There was a problem hiding this comment.
Overall LGTM, I left some comments and 👍
There are some sections that are missing, compared to the initial draft document we had:
- Transitioning from Learn to Monitor mode
- Transitioning from Monitor to Protect mode
- Transitioning from Protect to Monitor mode
- Transitioning from Protect to Learn mode - this is not on the document, and is actually something we wanted to add but in a different place of the document. I think it would make sense to promote that to a
h<x>section - Removing a WorkloadPolicy
I think it would be worth to have a section explaining how this has no impact on how we plan to integrate with Tetragon (the Tetragon Integration section of the original doc).
| apiVersion: security.rancher.io/v1alpha1 | ||
| kind: WorkloadPolicyProposal | ||
| metadata: | ||
| name: deploy-pgsql-8646457455 # <workload_type>-<workload_name> |
There was a problem hiding this comment.
nit, then name of the deployment is not including numbers, this might seem similar to the random name associated with the underlying pods.
I've also changed the resource type to be a StatefulSet, which is a more realistic way to deploy a db.
| name: deploy-pgsql-8646457455 # <workload_type>-<workload_name> | |
| metadata: | |
| name: statefulsets-pgsql # <workload_type>-<workload_name> |
| - apiVersion: apps/v1 | ||
| kind: Deployment | ||
| name: pgsql-8646457455 |
There was a problem hiding this comment.
| - apiVersion: apps/v1 | |
| kind: Deployment | |
| name: pgsql-8646457455 | |
| - apiVersion: v1 | |
| kind: StatefulSet | |
| name: pgsql |
|
|
||
| Notes on the behavior: | ||
|
|
||
| - The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed. |
There was a problem hiding this comment.
| - The WorkloadPolicyProposal has an ownerReference that ties it back to the workload resource for which the behaviour was observed. | |
| - The WorkloadPolicyProposal has an `ownerReference` that ties it back to the workload resource for which the behaviour was observed. |
|
|
||
| Changes compared to the current implementation: | ||
|
|
||
| - The rules section has been replaced by rulesByContainer. This new field holds a map with the name of the containers as key, and the list of the container rules as value. |
There was a problem hiding this comment.
- The rules section has been replaced by `rulesByContainer`. This new field holds a map with the name of the containers as key, and the list of the container rules as value.
- The `WorkloadPolicy` does not have the label selector field to identify the pods to protect.
| - /usr/bin/otel-collector | ||
| ``` | ||
|
|
||
| Changes compared to the current implementation: |
There was a problem hiding this comment.
I think we also dropped the label selector. I think we don't need them anymore, isn't it? If that's the case, please mention that
There was a problem hiding this comment.
It is mentioned in the WorkloadPolicy section 👍
There was a problem hiding this comment.
Maybe I'm missing something, but I was under the impression we wanted to keep the podSelector. There is also a section in this document stating
- Basic user -> use default k8s workload selectors -> everything works out of the box, no rollout required.
- Advanced user (real production scenario) -> enforce a unique label on workloads and use this label as a selector -> a rollout could be required if the workload was initially created without the label
There was a problem hiding this comment.
During some conversations we opted to make the label mandatory for a first iteration.
There was a problem hiding this comment.
Yes, that's the case. We decided to require the user to enter the "special" label to bind a policy to a Pod.
As for the WorkloadPolicyProposal, the example above is not showing the selector anymore, which I think is correct. We do not need that selector to be able to associate a Pod seen by our agent to the workload it belongs to.
If that's the case (we technically don't need that selector), can you add a line pointing out that selector has been dropped from the CRD?
| - In case of workload rollout, the WorkloadPolicy remains unchanged. If it causes issues with the rollout, the user is in charge of rolling back to the previous version or destroying the policy | ||
|
|
||
| ## Binding a WorkloadPolicy | ||
| A workload is protected by a WorkloadPolicy through a podSelector. We suggest the usage of a unique label security.rancher.io/policy, but we don’t enforce it by default since putting it in the spec.template would cause a rollout. |
There was a problem hiding this comment.
I think this section is wrong, it's the last proposal that we then revisited during the call.
There's no podSelector inside of the WorkloadPolicy. The binding is done by adding the <special label>: <policy name> to the Pod definition.
flavio
left a comment
There was a problem hiding this comment.
LGTM, thanks for all the changes you've applied
What this PR does / why we need it:
Turning the conversation about the revisit of the CRD into an RFC that we are going to implement progressively :-)
Which issue(s) this PR fixes
Issue #34