Skip to content

Conversation

@Andreagit97
Copy link
Contributor

This PR introduces a Proof of Concept to address the issues discussed in #4191.
This approach attempts to solve the two main problems described in the issue:

  1. Decrease memory usage for each policy.
  2. Instead of deploying a new program for each policy, deploy a unique eBPF program that can be shared by multiple policies.

The primary use case is deploying a distinct policy for each K8s workload where the sensors and filters are identical, but the specific values being enforced (e.g., a list of binaries) differ for each workload.

Warning

  • This PR is intended to demonstrate a potential design and start a discussion. It is not intended for a code review.
  • Only the significant parts of the logic needed to explain the concept have been implemented. It is not a complete, functioning solution.
  • Tests, comprehensive comments, and validation checks are entirely missing.
  • The poc/ directory in this branch contains sample YAML files and a README.md to help test and understand this approach.

Ideal Design Explanation

Let's start from the ideal solution we have in mind, and then let's see how this is translated into the POC.
The proposed solution is based on two core concepts: "Templates" and "Bindings".

Template

A "template" is a TracingPolicy that specifies variables which can be populated at runtime, rather than being hardcoded at load time. Selectors within the policy reference these variables by name.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "block-process-template"
spec:
  variables:
  - name: "targetExecPaths"
    type: "linux_binprm" # this could be used for extra validation but it's probably not strictly necessary
  kprobes:
  - call: "security_bprm_creds_for_exec"
    syscall: false
    args:
    - index: 0
      type: "linux_binprm"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Equal"
        valuesFromVariable: "targetExecPaths"

When a template policy is deployed, it loads the necessary eBPF programs and maps, but it has no runtime effect because it lacks concrete values for its comparisons.

Binding

A "binding" is a new resource (e.g., TracingPolicyBinding) that provides concrete values for a template's variables and applies them to specific workloads.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicyBinding
metadata:
  name: "block-process-template-values-1"
spec:
  policyTemplateRef:
    name: "block-process-template"
  podSelector:
    matchLabels:
      app: "my-app-1"
  bindings:
  - name: "targetExecPaths"
    values:
    - "/usr/bin/true"
    - "/usr/bin/ls"

The policy logic becomes active only when a TracingPolicyBinding is deployed. This action populates the template's eBPF maps with the specified values for the cgroups matching the podSelector.

POC Implementation

To minimize changes for this POC, we reuse the existing TracingPolicy resource and its OptionSpec to simulate both templates and bindings.
Template: A template is defined as a TracingPolicy using these options:

  - name: binding # Ideally "variable", but "binding" is used in the POC
    value: "targetExecPaths"
  - name: arg-type
    value: "linux_binprm"

Binding: A binding is also a TracingPolicy (which would ideally be a TracingPolicyBinding) that references the template and provides values. This POC currently supports only one binding.

apiVersion: cilium.io/v1alpha1  
kind: TracingPolicy  
metadata:  
  name: "block-process-template-values-1"  
spec:  
  podSelector:  
    matchLabels:  
      app: "my-deployment-1"  
  options:  
  - name: binding  
    value: "targetExecPaths"  
  - name: values  
    value: "/usr/bin/nmap"  
  - name: policy-template-ref  
    value: "block-process-template"

Details

  • When the template TracingPolicy is deployed, the eBPF programs and maps are loaded.
  • A new BPF_MAP_TYPE_HASH, cg_to_policy_map is introduced. It stores a mapping from cgroupid-> policy_id. This allows us to look up a policy ID from a cgroupid, which is the reverse of the current policy_filter_cgroup_maps (a BPF_MAP_TYPE_HASH_OF_MAPS).
  • When a "binding" TracingPolicy is deployed:
    • It is assigned a new policy_id.
    • For all cgroups matching its podSelector, an entry (cgroupid->policy_id) is added to the cg_to_policy_map.
    • The binding's main job is to populate this map, thereby activating the template's logic for the targeted cgroups.
    • To store the values from the binding, new BPF_MAP_TYPE_HASH_OF_MAPS are used: pol_str_maps_*. This implementation is very specific to string/charbuf/filename types and the eq/neq operators, but the concept can be extended to other types/operators, more on this later.
    • These maps are keyed by the policy_id (obtained from cg_to_policy_map).
    • The value is a hash set of strings (the values from the binding), using the same 11-map-size-bucket technique as the existing string_maps_*.

Note

A cgroup_id can only be associated with one policy_id (binding) at a time. A new binding for the same cgroup should either be rejected or overwrite the existing one. For example, binding cgroup1 to both policy_1 (values: /bin/ls) and policy_5 (values: /bin/cat) simultaneously is not logical.

Current Limitations & Hacks

  • The value-matching logic is currently limited to:
    • matchArgs / matchData filters
    • String / charbuf / filename types
    • eq / neq operators
  • Extending this to other types/operators would require different eBPF maps/approaches. We think that we could also have a v1 with only some operators/types supported but the design of the API and eBPF program should be flexible enough to allow future extensions without breaking changes.
  • Same thing for multiple bindings per template, currently only one binding is supported but the design should be extensible to support multiple bindings without API changes. I'm not sure multi-binding support would be really needed in practice for this reason i would avoid complicating the code too much until we have a real use case for it.
  • A hack is used to signal the eBPF program to use the new pol_str_maps_* instead of a hardcoded value: we set vallen=8 in the selector_arg_filter. I've to admit i've not verified this approach too much since i think this is not a sustainable solution but just works for the POC.

Summary & Goals

This design provides a path toward achieving the two goals of the issue:

  1. Single eBPF Program: A single, shared eBPF program can serve n policies (e.g., 512-1024 or more), as they all reference the same template. This drastically reduces the number of eBPF programs loaded in the kernel.
  2. Low Memory Overhead: The memory increase for each new policy (binding) is minimal. It's limited to new entries in cg_to_policy_map and the pol_str_maps_* (likely a few KB per policy, assuming non-massive value lists).

Signed-off-by: Andrea Terzolo <[email protected]>
@Andreagit97 Andreagit97 requested a review from a team as a code owner October 31, 2025 12:04
@Andreagit97 Andreagit97 marked this pull request as draft October 31, 2025 12:04
if (!policy_filter_check(config->policy_id))
return 0;
// todo: this should replace the policy filter check above
if (config->cgroup_filter && !get_policy_from_cgroup())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a duplicate logic of the above policy_filter_check(). Instead of hardcoding a policy value inside the policy config and using it as a key for a global BPF_MAP_TYPE_HASH_OF_MAPS, we could just use a simple hashmap BPF_MAP_TYPE_HASH and configure at runtime the policy_id associated with the groups.

If we deploy a k8s-aware policy (without bindings) and we assign it a policy_id=4

kind: TracingPolicy
metadata:
  name: "lseek-podfilter"
spec:
  podSelector:
    matchLabels:
      app: "lseek-test"
  kprobes:
  - call: "sys_lseek"
    syscall: true
    args:
    - index: 0
      type: "int"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Equal"
        values:
        - "-1"
      matchActions:
      - action: Sigkill

We can translate the pod selectors into a simple hash map identical to the one introduced in this patch

cgroup_id1 -> 4
cgroup_id2 -> 4

So instead of starting from the policy_id to get a HashMap like

cgroup_id1 -> 1
cgroup_id2 -> 1

We immediately jump inside the hash map with the cgroup-id, and we can recover the policy_id number from there.
WDYT?

DEFINE_ARRAY_OF_STRING_MAPS(10)
#endif

#define POLICY_STR_OUTER_MAX_ENTRIES 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some discussion here, #1408 about having one single map instead of 11 different maps. We can also accept the performance loss and use just one unique shared map to simplify things

PolicyID uint32 `align:"policy_id"`
Flags uint32 `align:"flags"`
Pad uint32 `align:"pad"`
CgroupFilter uint32 `align:"cgroup_filter"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an hack to enable/disable the new logic

"github.com/cilium/tetragon/pkg/labels"
)

type policy interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, I created an interface to reuse as much code as I could from the existing policy filter. If we unify the logics ebpf side https://github.com/cilium/tetragon/pull/4279/files#r2481227336, we can probably use just one unique policy type without an interface


const templateValue = "*"

func checkTemplateValueIsValid(arg *v1alpha1.ArgSelector, op uint32, ty uint32) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly highlight the use cases supported

pi.policyStringHash = make([]*program.Map, 0, numSubMaps)

for stringMapIndex := range numSubMaps {
policyStrMap := program.MapBuilderPolicy(fmt.Sprintf("%s_%d", sensors.PolicyStringHashMapPrefix, stringMapIndex), prog)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I used a MapBuilderPolicy to simplify the code, but if in the future we want to support multiple bindings per policy, this cannot be a map shared by all selectors of the policy.

@kkourt
Copy link
Contributor

kkourt commented Nov 3, 2025

Thanks!

I've raised a point in the original issue, and I'm not sure if it's addressed here. What happens if the same workload is matched by multiple templates?

I'm guessing the answer is somewhere, and I'm probably missing it.
I think the best way to move forward with this is to write a CFP: https://github.com/cilium/design-cfps, so that we can discuss all the design options, the semantics of the CRDs or new primitives we introduce, as well as the implementation options.

@mtardy
Copy link
Member

mtardy commented Nov 10, 2025

See this CFP cilium/design-cfps#80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants