Skip to content

feat: rework of the current POC#18

Open
Andreagit97 wants to merge 17 commits into
rancher-sandbox:mainfrom
Andreagit97:POC-rework
Open

feat: rework of the current POC#18
Andreagit97 wants to merge 17 commits into
rancher-sandbox:mainfrom
Andreagit97:POC-rework

Conversation

@Andreagit97

@Andreagit97 Andreagit97 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

This PR is a possible first rework to support K8s network policies syntax.
I would say we can still consider this repo as a POC since we still need to iterate a lot before reaching something production ready

The first commit allows to disable the CNI watcher for quick debugging.
At the moment the design is partial it takes only in consideration pod -> pod connections on the same node.
In the case of pod -> pod on the same node we receive 4 flow observation for each packet exchange

Screenshot from 2026-06-10 16-55-44

The initial idea is to keep only one of the egress observation (the one between Deployment -> Deployment) of the 4 flows we receive. This won't be enough to handle all cases but could be a good starting point.

To give a concrete example let's consider this workload:

apiVersion: v1
kind: Service
metadata:
  name: http-service
spec:
  selector:
    app: http-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: http-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-server
  template:
    metadata:
      labels:
        app: http-server
    spec:
      containers:
      - name: http-server
        image: nginx:alpine
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: http-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-client
  template:
    metadata:
      labels:
        app: http-client
    spec:
      containers:
        - image: curlimages/curl:8.1.1
          name: curl-client
          command: ["sleep", "10d"]

These are the flows we obtain from OBI after a kubectl exec -n default deployment/http-client -- curl http://http-service:80

// client Pod-> kube-dns service
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"58373","direction":"request","dst.address":"10.96.0.10","dst.name":"kube-dns","dst.port":"53","iface.direction":"egress","k8s.dst.name":"kube-dns","k8s.dst.namespace":"kube-system","k8s.dst.owner.name":"kube-dns","k8s.dst.owner.type":"Service","k8s.dst.type":"Service","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"domain","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"53","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"58373","transport":"UDP"}}

// client Pod -> coredns Pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"58373","direction":"request","dst.address":"10.0.0.57","dst.name":"coredns-7d764666f9-j9sjs","dst.port":"53","iface.direction":"egress","k8s.dst.name":"coredns-7d764666f9-j9sjs","k8s.dst.namespace":"kube-system","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"coredns","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"domain","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"53","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"58373","transport":"UDP"}}

// coreDNS Pod -> client pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"58373","direction":"response","dst.address":"10.0.0.245","dst.name":"http-client-6d87bb58d7-v7jfc","dst.port":"58373","iface.direction":"ingress","k8s.dst.name":"http-client-6d87bb58d7-v7jfc","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-client","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"coredns-7d764666f9-j9sjs","k8s.src.namespace":"kube-system","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"coredns","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"domain","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"53","src.address":"10.0.0.57","src.name":"coredns-7d764666f9-j9sjs","src.port":"53","transport":"UDP"}}

// kube-dns service -> client pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"58373","direction":"response","dst.address":"10.0.0.245","dst.name":"http-client-6d87bb58d7-v7jfc","dst.port":"58373","iface.direction":"ingress","k8s.dst.name":"http-client-6d87bb58d7-v7jfc","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-client","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"kube-dns","k8s.src.namespace":"kube-system","k8s.src.owner.name":"kube-dns","k8s.src.owner.type":"Service","k8s.src.type":"Service","network.protocol.name":"domain","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"53","src.address":"10.96.0.10","src.name":"kube-dns","src.port":"53","transport":"UDP"}}

// client Pod -> server Service
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"35796","direction":"request","dst.address":"10.96.18.232","dst.name":"http-service","dst.port":"80","iface.direction":"egress","k8s.dst.name":"http-service","k8s.dst.namespace":"default","k8s.dst.owner.name":"http-service","k8s.dst.owner.type":"Service","k8s.dst.type":"Service","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"www","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"80","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"35796","transport":"TCP"}}

// client Pod -> server Pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"35796","direction":"request","dst.address":"10.0.0.164","dst.name":"http-server-85d56547df-922sz","dst.port":"80","iface.direction":"egress","k8s.dst.name":"http-server-85d56547df-922sz","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-server","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"www","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"80","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"35796","transport":"TCP"}}

// server Pod -> client Pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"35796","direction":"response","dst.address":"10.0.0.245","dst.name":"http-client-6d87bb58d7-v7jfc","dst.port":"35796","iface.direction":"ingress","k8s.dst.name":"http-client-6d87bb58d7-v7jfc","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-client","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-server-85d56547df-922sz","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-server","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"www","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"80","src.address":"10.0.0.164","src.name":"http-server-85d56547df-922sz","src.port":"80","transport":"TCP"}}

// Server service -> client pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"35796","direction":"response","dst.address":"10.0.0.245","dst.name":"http-client-6d87bb58d7-v7jfc","dst.port":"35796","iface.direction":"ingress","k8s.dst.name":"http-client-6d87bb58d7-v7jfc","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-client","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-service","k8s.src.namespace":"default","k8s.src.owner.name":"http-service","k8s.src.owner.type":"Service","k8s.src.type":"Service","network.protocol.name":"www","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"80","src.address":"10.96.18.232","src.name":"http-service","src.port":"80","transport":"TCP"}}

These are the flow we would keep after the filters

// client Pod -> coredns Pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"58373","direction":"request","dst.address":"10.0.0.57","dst.name":"coredns-7d764666f9-j9sjs","dst.port":"53","iface.direction":"egress","k8s.dst.name":"coredns-7d764666f9-j9sjs","k8s.dst.namespace":"kube-system","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"coredns","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"domain","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"53","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"58373","transport":"UDP"}}

// client Pod -> server Pod
2026-06-10T14:30:19Z	INFO	flowcollector	parsed datapoint	{"attrs": {"client.port":"35796","direction":"request","dst.address":"10.0.0.164","dst.name":"http-server-85d56547df-922sz","dst.port":"80","iface.direction":"egress","k8s.dst.name":"http-server-85d56547df-922sz","k8s.dst.namespace":"default","k8s.dst.node.ip":"172.18.0.2","k8s.dst.node.name":"kind-control-plane","k8s.dst.owner.name":"http-server","k8s.dst.owner.type":"Deployment","k8s.dst.type":"Pod","k8s.src.name":"http-client-6d87bb58d7-v7jfc","k8s.src.namespace":"default","k8s.src.node.ip":"172.18.0.2","k8s.src.node.name":"kind-control-plane","k8s.src.owner.name":"http-client","k8s.src.owner.type":"Deployment","k8s.src.type":"Pod","network.protocol.name":"www","network.type":"ipv4","obi.ip":"172.18.0.2","server.port":"80","src.address":"10.0.0.245","src.name":"http-client-6d87bb58d7-v7jfc","src.port":"35796","transport":"TCP"}}

As you can see, we just keep the deployment -> deployment communication without the service in the middle. This is unfortunately only an approximation but for now it could be a good starting point.

In my local setup i obtain these policies

- apiVersion: security.security.rancher.io/v1alpha1
  kind: NetworkPolicyProposal
  metadata:
    creationTimestamp: "2026-06-12T14:39:48Z"
    generation: 1
    name: deployment-http-client-egress
    namespace: default
    resourceVersion: "854"
    uid: 587e0d10-65e6-4ae5-a633-13d3905177e9
  spec:
    egress:
    - ports:
      - port: 80
        protocol: TCP
      to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: default
        podSelector:
          matchLabels:
            app: http-server
    - ports:
      - port: 53
        protocol: UDP
      to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
        podSelector:
          matchLabels:
            k8s-app: kube-dns
    podSelector:
      matchLabels:
        app: http-client
    policyTypes:
    - Egress
- apiVersion: security.security.rancher.io/v1alpha1
  kind: NetworkPolicyProposal
  metadata:
    creationTimestamp: "2026-06-12T14:39:48Z"
    generation: 1
    name: deployment-http-server-ingress
    namespace: default
    resourceVersion: "855"
    uid: 6ff3b9a3-8c25-447e-a24c-4271ee59a7b6
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: default
        podSelector:
          matchLabels:
            app: http-client
      ports:
      - port: 80
        protocol: TCP
    podSelector:
      matchLabels:
        app: http-server
    policyTypes:
    - Ingress
- apiVersion: security.security.rancher.io/v1alpha1
  kind: NetworkPolicyProposal
  metadata:
    creationTimestamp: "2026-06-12T14:39:48Z"
    generation: 1
    name: deployment-coredns-ingress
    namespace: kube-system
    resourceVersion: "856"
    uid: f8738263-1834-45c1-9c7b-acb930ed9bdd
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: default
        podSelector:
          matchLabels:
            app: http-client
      ports:
      - port: 53
        protocol: UDP
    podSelector:
      matchLabels:
        k8s-app: kube-dns
    policyTypes:
    - Ingress

You can try the current implementation with

kind create cluster
tilt up # please disable the CNI watcher for now

kubectl apply -f <the workload above>
kubectl exec -n default deployment/http-client -- curl http://http-service:80

Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
@dottorblaster

dottorblaster commented Jun 15, 2026

Copy link
Copy Markdown
Member

I like the direction very much! Some additional points:

  • The reflect.DeepEqual rule comparison during peers lookup is order-sensitive on the inner slices. Do we want to tackle this right now or do we just want to track that for later?
  • We had a CIDR fallback for external, extra-node egress traffic. Do we want to tackle this now, do you have something in mind or do we want to track it for the future?
  • When it comes to the linter feel free to add exceptions for now, we can revisit package names for example once this lands on main

Everything else I think is a problem for our future selves, and I think this is a sweet spot to start building on top. Thanks! ❤️

Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>

@kyledong-suse kyledong-suse left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this rework! I think the direction is right, and it's really a very good start.

I tried it locally with the http-client → http-service workload and the generated proposals. I left couple of comments. For the findEgressPeer / findIngressPeer dedupe, I think we can improve it later. I think we can dedupe by peer workload and merge all ports into one rule per peer workload.

We have some overlap, so I checked out your PR, and added a couple things from my branch on top of yours locally. I'll keep playing with it.

  • promotion for NetworkPolicyProposalnetworkingv1.NetworkPolicy
  • in enforcement_controller reconciler, create/update/delete networkingv1.NetworkPolicy

Comment on lines -91 to -93
// lastObserved is when flows for this workload were last seen.
// +optional
LastObserved *metav1.Time `json:"lastObserved,omitempty"`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we keep this LastObserved. It's mainly for debug purpose if there is an unexpected traffic block. We can check the LastObserved timestamp to verify this proposal is learned before or after the block.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that makes sense, thanks!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second though, i believe this could be dangerous. if we update our resource every time we see a flow associated to it there is the risk that we will update the resource every time we scan it. I believe this is the same reason why we didn't add this field on the WorkloadPolicyProposal in the runtime-enforcer. Maybe we can come back to this when we feel it is really needed for debugging reasons

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see you point. Yeah, I agree we remove it for now.

Comment thread internal/receiver/receiver.go Outdated
@Andreagit97

Copy link
Copy Markdown
Collaborator Author

The reflect.DeepEqual rule comparison during peers lookup is order-sensitive on the inner slices. Do we want to tackle this right now or do we just want to track that for later?

I can try to tackle that now

We had a CIDR fallback for external, extra-node egress traffic. Do we want to tackle this now, do you have something in mind or do we want to track it for the future?

Yep i believe we need to address it but i wouldn't do it now. I would probably add a test with external traffic and implement it.

When it comes to the linter feel free to add exceptions for now, we can revisit package names for example once this lands on main

i will take a look at what we need to fix

Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
@Andreagit97 Andreagit97 force-pushed the POC-rework branch 2 times, most recently from a4e497e to 3b2cf16 Compare June 17, 2026 07:08
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
we don't rely anymore on `reflect.DeepEqual`. For now we do a manual
comparsion of the inner fields, maybe in the future we can improve the
solution using an hash key or similar approaches

Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
Signed-off-by: Andrea Terzolo <andrea.terzolo@suse.com>
@Andreagit97 Andreagit97 changed the title [WIP] rework of the current POC feat: rework of the current POC Jun 17, 2026
@Andreagit97 Andreagit97 marked this pull request as ready for review June 17, 2026 08:54
@Andreagit97 Andreagit97 linked an issue Jun 17, 2026 that may be closed by this pull request
for _, wk := range workloads {
if !topology.SupportedWorkloadTypes[wk.OwnerKind] {
continue
connections := ts.store.DrainFlows()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DrainFlows() clears the in-memory egress/ingress before reconcile. If reconcile fails, those workload connections will be lost. Shouldn't we consider to move the "Drain" after reconcile completed successfully?

Comment on lines +70 to +72
if err == nil {
// Policy already exists, do nothing.
return ctrl.Result{}, nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a TODO here? So if the proposal spec changes we will update the policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Policy proposal generation doesn't work properly

3 participants