WIP:DNM: Bpfman operator APIs review #2221

msherif1234 · 2025-03-06T22:29:58Z

Apis review for bpfman-operator in preparation for GA in OCP 4.19

Signed-off-by: Mohamed Mahmoud <[email protected]>

openshift-ci · 2025-03-06T22:30:03Z

Hello @msherif1234! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

everettraven

I had intended to get further on this pass, but got around to reviewing this later than I had planned. I'll plan to circle back around to this on Monday to do a more thorough pass.

That being said, a couple of my initial comments seem to be applicable throughout your API surface. More specifically, writing user readable GoDoc. I would highly recommend looking at https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc and updating your GoDoc to follow these guidelines in general.

everettraven · 2025-03-07T20:28:31Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+type BpfApplicationProgramState struct {
+	BpfProgramStateCommon `json:",inline"`
+
+	// Type specifies the bpf program type


A couple things here:

In OpenShift we generally recommend using the serialized name in the GoDoc because it matches a bit better with the name format that users of something like kubectl explain ... will see. Kubernetes prefers using the field name. Because this is an upstream project, we can't necessarily hold you strictly to the OpenShift conventions, but I would highly recommend deciding which one you'd like to go with and stay consistent.

This GoDoc is also not very helpful for users of tooling, like kubectl explain ..., that display the documentation generated from the GoDoc (which does not include the validation tags). I would recommend following the guidance outlined in https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc to come up with a more useful description of the field for users.

At a brief glance, the second part of ^ applies to most, if not all, of the GoDoc across your APIs.

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

everettraven · 2025-03-07T20:53:48Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+// BpfApplicationSpec defines the desired state of BpfApplication
+type BpfApplicationStateSpec struct {
+	// Node is the name of the node for this BpfApplicationStateSpec.
+	Node string `json:"node"`


What is the "node" in this context? Is it a Kubernetes Node? Are there any validations you can enforce to ensure a user cannot supply an invalid node name?

Also, this looks like it needs the +required marker as it appears to be a required field.

Node is the nodeName this CRD is not something the customers will be using to configure the feature, this object reflect the state of the application on the node that this instance is running on. the objects that are customer facing are in bpf_application_type.go for namespaces resource and in cluster_bpf_application_type.go for cluster scope resource

Why are you using an entire CRD to communicate state of resources instead of the status subresource of your existing CRD types?

status won't scale with large cluster so idea was to have nodeState object that reflect per prog type status what happened and overall/ node

Just to expand on that a little...

A single BpfApplication object can create a lot of state! For example, it might load N eBPF probe(s) into every pod on every node.

When reconciling a BpfApplicationCRD things get complicated since there is one controller running on every node. Each controller is watching for BpfApplicationCRD changes. As every instance of the controller completes its reconciliation it would write back state to that BpfApplicationCRD, which then triggers another update... creating a bit of an "update storm".

Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

aside: I would be very happy to not have *StateCRDs

Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

That is one way. You could use server-side apply as well to allow for multiple field managers.

Not saying this is better, but another way I could think of is that you have the BpfApplication CRD be a higher level configuration option for users to specify the configuration across multiple nodes. The BpfApplication controller could then create NodeBpfApplication resources that are managed by the controllers running on the nodes. The NodeBpfApplication would specify a single node so that the controllers running on each node can identify which ones they are responsible for reconciling.

I was thinking about this some more. If you are going to continue with this direction of separate state CRDs, you probably should not include a spec. Instead, have an empty spec and only ever populate the status with information for a user.

This would help prevent confusion from the user side that there are configurable options on this CRD.

If you are mostly replicating the declarative configuration they have specified...

Actually, we are mostly not replicating the declarative config. The user specifies the high-level intent in the BpfApplication CRD, and then the bpfman agent on each node applies that intent to the given node. For example, we have an option for the user to indicate that a program should be installed on the primary node interface. The agent running on the node figures out what that interface is, and reflects the name (e.g., eth0) in the state CRD. Or if there's a container selector, the agent finds the containers that match. Also, we need to maintain the program IDs and link IDs so that we can unload/remove them when needed. All these items are likely different on each node.

Regarding your comment about putting the per node info in the status field, we just want to confirm. Are recommending that we move all of the info currently in the Spec section to the Status section? Are there any downsides to maintaining a large amount of info in the Status section?

Yes, I would recommend that anything that is capturing state to convey to users goes into the status subresource. It would help prevent users from thinking that the resource is something they can modify. Generally users know not to mess with the status, but it is common practice that the spec is a user defined desired state.

AFAIK, there is no difference in maintaining the info in the status vs the spec. If it works for you today to store things in the spec, there is no reason I can think of as to why it wouldn't work just as well moved to the status.

everettraven · 2025-03-07T20:59:19Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+	// The number of times the BpfApplicationState has been updated.  Set to 1
+	// when the object is created, then it is incremented prior to each update.
+	// This allows us to verify that the API server has the updated object prior
+	// to starting a new Reconcile operation.
+	UpdateCount int64 `json:"updateCount"`


Who/What sets this to 1 when the object is created? Why do you need this to verify that the API server has the updated object? Is using the metadata.generation, which increments on every spec change insufficient?

controllers are responsible of creating this object and updating it

everettraven · 2025-03-07T21:03:13Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+// +kubebuilder:printcolumn:name="Node",type=string,JSONPath=".spec.node"
+// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason`
+// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
+type BpfApplicationState struct {


I realized mid-review of some of the above fields that this seems to be a separate CRD for conveying the observed state of a BpfApplication to users. Is there a reason why you are using a separate CRD for this instead of utilizing the status subresource on the BpfApplication CRD itself?

Signed-off-by: Mohamed Mahmoud <[email protected]>

everettraven · 2025-03-11T20:09:51Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+type BpfApplicationProgramState struct {
+	BpfProgramStateCommon `json:",inline"`
+
+	// type specifies the bpf program type


This GoDoc is not descriptive enough for an end-user that knows nothing about this API.

What is a "bpf program type"? How do I know which one I need to use? What values can I put here? Do I have to specify one? What constraints are there when I set a particular value?

When writing the GoDoc for each field it is important to ensure you are conveying enough information to your end users. The GoDoc for each field is what is used in the generated documentation that is shown to users when using something like oc explain ... to get more information on an API. The markers are not included in the generated documentation.

More specifically, your documentation should answer the questions:

What is the purpose of this field? What does setting it allow me to achieve?

How does setting this field interact with other fields?

What are the limitations of the field?

Is the field optional or required?

If the field is optional, what happens if I don't set it?

To get you started, I would expect the GoDoc for this field to read something like:

// type is a required field used to specify the type of the bpf program. // The type dictates where in the kernel the program is executed. // The Linux kernel may restrict or allow certain functionality depending on the program type. // // Allowed values are XDP, TC, TCX, Uprobe, and UretProbe. // // When set to XDP, the program can attach to network devices and will be called on every incoming packet // received by the network device. When using the XDP program type, the xdp field is required. // // When set to TC .... // ...

In general, these GoDoc guidelines apply to every field.

I get the spirit of the request and I'll make a pass beefing up the descriptions. But I don't think we can or should give a full eBPF primer in the descriptions. Is it worth providing links to further reading in the description?

I would accept links to further reading as long as there is a decent base description. I agree that you shouldn't give a full eBPF primer in the descriptions, but you should give at least a basic description of what each allowed value maps to. Think about the familiarity of your user base and include relevant information to answer the questions I shared above.

@everettraven I am in the process of going through all the descriptions for all the attributes and beefing up the content as you described above. In your example, you have the attribute type as all lower-case. When I display the CRD, the Description is under the attribute name, which is has the first letter of each word capitalized. Using my sample below, would you want attribute names in the Description to be interfaceSelector as shown below, or like the attribute name Interface Selector? I would think the latter. Is there a better way to display the content where it makes more sense to go with interfaceSelector?

$ kubectl describe crd clusterbpfapplications : Interface Selector: Description: interfaceSelector to determine the network interface (or interfaces) Max Properties: 1 Min Properties: 1 Properties: :

I found the answer: oc explain ...
I also found dev-guide/api-conventions.md and I'm reading through it.

Your example and the examples in the api-conventions.md have each new sentence on a different line and long sentences wrapping. In my terminal, the output seems to wrap at 80 characters and new lines in the comments are honored (see below). Is that just a setting in my terminal (looked through my terminal settings and don't see it) so making sure we don't need to manage line spacing in the GoDoc comments.

$ kubectl explain clusterbpfapplication.spec.programs GROUP: bpfman.io KIND: ClusterBpfApplication VERSION: v1alpha1 FIELD: programs <[]Object> DESCRIPTION: programs is the list of eBPF programs in the ClusterBpfApplication that should be loaded in kernel memory. eBPF programs in this list will be loaded on the system based the nodeSelector. Even if eBPF program is loaded in kernel memory, it cannot be triggered until an attachment point is provided. :

Generally we should try to respect the 80 characters, but this isn't something we have strictly enforced AFAIK. If you can put each sentence on a new line that is generally preferred. My example was just a general example, I wasn't expecting it to be taken literally.

@everettraven I have a bpfman-operator PR that made a pass through the ClusterBpfApplication CRD and updated all the field descriptions (see PR#419). I also cut and pasted kubectl explain for each of the fields to help visualize the changes in a gist:
https://gist.github.com/Billy99/b871e60f04944d4b03c9c0106d2c8a43

I am getting ready to propagate my description changes to the other CRDs so hoping you could take a look and see if this what you had in mind before I propagate.

everettraven · 2025-03-11T20:13:17Z

bpfman-operator/apis/v1alpha1/bpf_application_types.go

+	// program
+	Name string `json:"name"`
+
+	// type specifies the bpf program type


My prior comment on the BpfApplicationProgramState type field applies here.

bpfman-operator/apis/v1alpha1/bpf_application_types.go

everettraven · 2025-03-11T20:18:36Z

bpfman-operator/apis/v1alpha1/bpf_application_types.go

+	// loaded. The application can selectively choose which program(s) to run
+	// from this list based on the optional attach points provided.
+	// +kubebuilder:validation:MinItems:=1
+	Programs []BpfApplicationProgram `json:"programs,omitempty"`


Is this meant to be an optional field? What happens if I create a BpfApplication and specify no programs?

Generally for lists we recommend having an upper bound to the number of entries. Is there a reasonable upper bound that can be specified here?

Additionally, what other constraints exist here? Can I have multiple program entries with the same .name value? If not there should be an admission time check that prevents this, likely using a CEL expression. If yes, are there any constraints when specifying it multiple times?

The purpose of a BpfApplication is to ask bpfman to load and possibly attach eBpf programs. I don't see a use case for a BpfApplication CRD with no programs identified. Regarding the bound, I don't know what an acceptable bound would be. The same goes for links.

In that case, this should probably be a required field. For an upper bound, how many programs would you generally expect a user to specify? What number would surprise you as being a really high number?

Agreed, it should be required. We'll fix that.

I didn't want to arbitrarily limit it. However, we'll try to come up with some upper bounds on what's reasonable.

You can always increase it in the future if needed. You can never decrease it though.

If you truly can't enforce a reasonable upper bound then that is fine, but usually there is a reasonable one that can be set

everettraven · 2025-03-11T20:20:15Z

bpfman-operator/apis/v1alpha1/bpf_application_types.go

+	Spec   BpfApplicationSpec `json:"spec,omitempty"`
+	Status BpfAppStatus       `json:"status,omitempty"`


Both these fields should have godoc and either a +required or +optional marker

everettraven · 2025-03-11T20:20:39Z

bpfman-operator/apis/v1alpha1/bpf_application_types.go

+type BpfApplicationList struct {
+	metav1.TypeMeta `json:",inline"`
+	metav1.ListMeta `json:"metadata,omitempty"`
+	Items           []BpfApplication `json:"items"`


Field should have godoc and either a +required or +optional marker

everettraven · 2025-03-11T20:22:25Z

bpfman-operator/apis/v1alpha1/cluster_bpf_application_state_types.go

+// BpfApplicationSpec defines the desired state of BpfApplication
+type ClBpfApplicationStateSpec struct {
+	// node is the name of the node for this BpfApplicationStateSpec.
+	Node string `json:"node"`


Can the node name be any arbitrary string value? What validations can you put in place to ensure that invalid values are rejected at admission time?

this auto field with the cluster nodes name again state object is not customer facing this more of the state of the object on specific node

bpfman-operator/apis/v1alpha1/cluster_bpf_application_types.go

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go

Signed-off-by: Mohamed Mahmoud <[email protected]>

everettraven · 2025-03-12T11:46:29Z

bpfman-operator/apis/v1alpha1/cluster_bpf_application_types.go

+	// loaded. The application can selectively choose which program(s) to run
+	// from this list based on the optional attach points provided.
+	// +kubebuilder:validation:MinItems:=1
+	Programs []ClBpfApplicationProgram `json:"programs,omitempty"`


Same questions as above here

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go

everettraven · 2025-03-12T11:58:53Z

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=1
+	// +kubebuilder:default:={}
+	Links []ClFentryAttachInfo `json:"links"`


Does this need to be a list? Will it ever be possible to specify multiple "links" for a FentryProgram? If I don't specify it, what is the default behavior?

Fentry and Fexit programs can only ever have one link. However, I made this a slice also so that the link syntax and semantics were consistent with the other program types. In this case, you can have a max of one link on the list.

For all program types, if a link does not exist in the list, it should not be attached, and if it was attached previously, but is no longer on the list, it will be detached. The same is true for Fentry/Fexit.

Gotcha. This should be explained in the documentation on the field so that users understand this limitation as well.

I would also recommend not making this a slice if it can never have more than a single link to make it clear to a user instead of them thinking that they can specify multiple due to it being a yaml list.

For example, when a user sees:

links: - ...

they may think they can add more values here and then encounter an admission failure.

Instead, if they saw:

link: field: value

it would be clear that this type only allows a single link. No workflow happens where a user attempts to add another link and encounters an admission failure.

Ultimately it is up to you if you would like to take this suggestion or not.

I initially implemented it the way you suggested. However, when we looked at it holistically, we thought it made more sense for the Fentry/Fexit "links" fields to be a slice just like all the other programs even though in these two cases, it can only hold one link. As a side effect, it helps make the controller code more common between program types. We'd prefer to keep it this way, but we can improve the comments to make it more clear.

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go

bpfman-operator/apis/v1alpha1/cluster_bpf_application_types.go

bpfman-operator/apis/v1alpha1/cluster_kprobe_program_types.go

everettraven · 2025-03-12T12:14:13Z

bpfman-operator/apis/v1alpha1/cluster_kprobe_program_types.go

+	// are optional and may be udated after the bpf program has been loaded
+	// +optional
+	// +kubebuilder:default:={}
+	Links []ClKprobeAttachInfo `json:"links"`


Any reasonable upper bound to this? What happens if I don't specify any links?

no errors but on the same time nothing, I don't think we have way to define upper bound for this it will depends on the kernel how much it can handle b4 it complains

Is there any reason to specify this program type without providing at least one link? Re: upper bounds, even if it depends on the kernel is there a reasonable value you can specify here? Would it make sense for a user to be able to specify 10,000 links? 100,000?

While programs may be attached based on configuration, it’s often important for all eBPF programs used by a BPF-based application (like netobserv) to be loaded at the same time. This ensures they can share a common set of global variables and maps. In this use case, all programs are loaded together, but attachments occur selectively based on the application’s configuration.

I'm not familiar with eBPF, so I'm having trouble parsing how this relates to the questions I've asked in this thread.

Is there a number of links that you think would be fairly common for users to specify? What number would make you think that it is a really high number of links?

Also, can I duplicate links or do they need to be unique?

My answer was in response to why it makes sense to load a program without any links - they may not be needed initially, but may added/modified later based on configuration or other events.

The links need to be unique.

As with programs, we'll come up with an upper bound.

everettraven · 2025-03-12T12:15:07Z

bpfman-operator/apis/v1alpha1/cluster_tc_program_types.go

+	// are optional and may be udated after the bpf program has been loaded
+	// +optional
+	// +kubebuilder:default:={}
+	Links []ClTcAttachInfo `json:"links"`


Same questions as on other program type links.

bpfman-operator/apis/v1alpha1/cluster_tc_program_types.go

everettraven · 2025-03-12T12:40:38Z

bpfman-operator/apis/v1alpha1/cluster_tc_program_types.go

+
+	// Optional container pid to attach the tc program in.
+	// +optional
+	ContainerPid *int32 `json:"containerPid"`


What does it mean if there is no container pid specified?

bpfman-operator/apis/v1alpha1/cluster_tracepoint_program_types.go

everettraven · 2025-03-12T12:44:08Z

bpfman-operator/apis/v1alpha1/cluster_uprobe_program_types.go

+	// offset added to the address of the function for uprobe.
+	// +optional
+	// +kubebuilder:default:=0
+	Offset uint64 `json:"offset"`


is there a reasonable upper bound you can set here?

The combination of target, function, and offset is used to define the location of an instruction at which the user wants to invoke the uProbe program. So, the max offset would be the size of the largest allowable executable or library. The theoretical limit would depend on a number of factors including things like file format, OS, and file system. In any case, it would be pretty big, and I don't think it would be very helpful because in most cases it would need to be much smaller in practice. If the user provides an offset that is out of bounds for the given target, the eBPF verifier will return an error.

How would that error manifest to users?

The BpfApplication CRD would have Status == "Error", and it would list which BpfApplicationState CRD was in error.

The BpfApplicationState CRD in question would have Status == "Error"

The failed link would have LinkStatus == "AttachError" in the state CRD

What is the impact to user workloads in this state?

Other links that don't have errors will still be attached. It's up to the user application to decide what to do when this happens. Then, they may also need to figure out why what they are trying to do doesn't work.

eBPF is complicated, and the kernel isn't very helpful in telling you what's wrong when there's an error. So, we will also log what we can about the error, but often it's just EINVAL and the eBPF application developer would need to use their knowledge of eBPF and the Linux kernel to figure out what's wrong.

If you have a failed attachment, does the container it would have been attached to still function correctly?

I want to make sure I'm understanding correctly that outside of the ebpf program itself going into an error state, that a misconfiguration in the offset doesn't have any negative side effects to the actual workload.

If a misconfiguration here would cause workloads to be impacted we should generally try to prevent that configuration from attempting to be rolled out where we can. That might mean admission time validation like setting an arbitrary but generally safe maximum, or doing some runtime evaluations that fail fast prior to attempting to roll out the configuration.

To clarify, by "workload" i'm meaning the non-ebpf applications that are running on a cluster via things like Deployments

Got it. No, a failed attachment would not affect the container/workload.

We check what we can, but the loads and link attachments end up as a kernel syscall. The kernel then validates everything to make sure it's allowed and safe (e.g., https://docs.kernel.org/bpf/verifier.html). If a load or attach isn't valid, the kernel doesn't load or attach it and returns an error.

That said, I'll think about whether there's a reasonable limit we can use that's less than U64_MAX.

everettraven · 2025-03-12T12:44:37Z

bpfman-operator/apis/v1alpha1/cluster_uprobe_program_types.go

+	Offset uint64 `json:"offset"`
+
+	// target is the Library name or the absolute path to a binary or library.
+	Target string `json:"target"`


Any validations you can put in place to ensure a valid value is provided by a user at admission time?

bpfman-operator/apis/v1alpha1/cluster_xdp_program_types.go

everettraven · 2025-03-12T12:46:43Z

bpfman-operator/apis/v1alpha1/cluster_tc_program_types.go

+	// +optional
+	// +kubebuilder:validation:MaxItems=11
+	// +kubebuilder:default:={pipe,dispatcher_return}
+	ProceedOn []TcProceedOnValue `json:"proceedOn"`


Does each entry need to be unique?

u mean we don't duplicate entries in the above list ? if so yes

I think I can use +kubebuilder:validation:UniqueItems:=true to gurantee that will add it

oh that didn't go well

spec.validation.openAPIV3Schema.properties[spec].properties[programs].items.properties[tc].properties[links].items.properties[proceedOn].uniqueItems: Forbidden: uniqueItems cannot be set to true since the runtime complexity becomes quadratic]

any other alternatives ?

tried different options as discussed over slack with no luck
https://redhat-internal.slack.com/archives/CE4L0F143/p1741298311519169

I think the error you are seeing related to default may be because it is thinking you mean to literally set the string "Pipe,DispatcherReturn" instead of a slice containing 2 entries.

I think the default tag you are looking for is +kubebuilder:default:=[Pipe,DispatcherReturn] (it might need the quotations for the string values, not sure)

this is my default // +kubebuilder:default:={Pipe,DispatcherReturn} and its working fine with the [] initializaton

bpfman-operator/apis/v1alpha1/cluster_xdp_program_types.go

Signed-off-by: Mohamed Mahmoud <[email protected]>

…ess more comments Signed-off-by: Mohamed Mahmoud <[email protected]>

Signed-off-by: Mohamed Mahmoud <[email protected]>

everettraven · 2025-03-14T12:04:29Z

bpfman-operator/apis/v1alpha1/bpf_application_types.go

+type BpfApplicationProgram struct {
+	// name is the name of the function that is the entry point for the BPF
+	// program
+	// +kubebuilder:validation:Pattern="^[a-zA-Z][a-zA-Z0-9_]+."


General advice is to use CEL expressions to enforce regex matching now so that you can provide a more user-friendly message than returning a regex pattern to them on failed validations.

can u pls share more info on how to do so with cel ?

Something like

api/insights/v1alpha1/types_insights.go

Line 322 in 8a7efbf

// +kubebuilder:validation:XValidation:rule=`self.matches("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")`,message=`resource must consist of only lowercase alphanumeric characters and hyphens`

bpfman-operator/apis/v1alpha1/cluster_bpf_application_state_types.go

Signed-off-by: Mohamed Mahmoud <[email protected]>

openshift-ci · 2025-03-20T19:46:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: msherif1234
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Mohamed Mahmoud <[email protected]>

openshift-ci · 2025-04-11T21:59:06Z

@msherif1234: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify-deps	`e0efe84`	link	true	`/test verify-deps`
ci/prow/unit	`e0efe84`	link	true	`/test unit`
ci/prow/verify	`e0efe84`	link	true	`/test verify`
ci/prow/unit	`e0efe84`	link	true	`/test unit`
ci/prow/verify-deps	`e0efe84`	link	true	`/test verify-deps`
ci/prow/verify-client-go	`e0efe84`	link	true	`/test verify-client-go`
ci/prow/verify	`e0efe84`	link	true	`/test verify`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed

This API is huge!

I think we might want to tackle this in smaller pieces, and focus on one area, iterate on that, and apply the lessons learned then to other areas of the API rather than trying to review and correct the whole thing in one go.

I'd suggest that we also make sure to fix the linter issues. The linters enforce standards that help the API to become complete and have the validations that we would want to see for a comprehensive API.

The more you can self help on fixing the linter problems, the easier it becomes for us to tackle the trickier problems of API review.

For example, adding a +optional or +required as per your requirements to every field would then help us working out what does and doesn't serialize well at the moment which the current structures, and where we might need additional validations

JoelSpeed · 2025-04-24T16:12:44Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'UProbe' ?  has(self.uprobe) : !has(self.uprobe)",message="uprobe configuration is required when type is uprobe, and forbidden otherwise"
+// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'URetProbe' ?  has(self.uretprobe) : !has(self.uretprobe)",message="uretprobe configuration is required when type is uretprobe, and forbidden otherwise"
+type BpfApplicationProgramState struct {
+	BpfProgramStateCommon `json:",inline"`


Generally we recommend to not add other fields to unions, and have just the discriminator and the members.

It's been found to be confusing in the past when we have mixed members and non members

JoelSpeed · 2025-04-24T16:14:16Z

bpfman-operator/apis/v1alpha1/bpf_application_state_types.go

+	// programs is a list of bpf programs contained in the parent application.
+	// It is a map from the bpf program name to BpfApplicationProgramState
+	// elements.
+	Programs []BpfApplicationProgramState `json:"programs,omitempty"`


Missing SSATags, needs +listType with a value of atomic or map

JoelSpeed · 2025-04-24T16:15:53Z

bpfman-operator/apis/v1alpha1/cluster_bpf_application_state_types.go

+	// kretprobe defines the desired state of the application's KprobePrograms.
+	// +unionMember
+	// +optional
+	KRetProbe *ClKretprobeProgramInfoState `json:"kretprobe,omitempty"`


Why not kRetProbe for the json value?

JoelSpeed · 2025-04-24T16:18:04Z

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go

+	// Whether the program should be attached to the function.
+	// +optional
+	// +kubebuilder:validation:MaxItems=1
+	// +kubebuilder:default:={}


This default is not correct, you are defaulting to the empty object, but this is a list

Why would you want to default to the empty object?

Given you don't have omitempty, defaults have no effect anyway for Go clients

JoelSpeed · 2025-04-24T16:19:12Z

bpfman-operator/apis/v1alpha1/cluster_fexit_program_types.go

+
+// ClFexitProgramInfo defines the Fexit program details
+type ClFexitProgramInfo struct {
+	ClFexitLoadInfo `json:",inline"`


Why are you creating so many inlines? These are very uncommon in Kube APIs, what's the motivation?

JoelSpeed · 2025-04-24T16:20:14Z

bpfman-operator/apis/v1alpha1/cluster_kprobe_program_types.go

+
+	// offset added to the address of the function for kprobe.
+	// The offset must be zero for kretprobes.
+	// TODO: Add a webhook to enforce kretprobe offset=0.


You can likely achieve this validation with CEL validations

This commit includes changes based on multiple review comments. Signed-off-by: Andre Fredette <[email protected]>

anfredette · 2025-04-30T22:25:10Z

@everettraven, Mohamed has left Red Hat, and we still have work to do on this PR. Can someone give other members of the bpfman team permission to update this pr? Thanks, Andre

JoelSpeed · 2025-05-01T08:41:33Z

@anfredette Since the PR came from Mohamed's own repository, he is the only one who can give that access. You could either contact him and ask him to make you a collaborator on his fork of the repo, or you'll have to copy the branch and make a new PR

anfredette · 2025-05-01T11:59:19Z

I’ll see if I can get in contact with Mohamed. Thanks, Andre

…

On Thu, May 1, 2025 at 4:42 AM Joel Speed ***@***.***> wrote: *JoelSpeed* left a comment (openshift/api#2221) <#2221 (comment)> @anfredette <https://github.com/anfredette> Since the PR came from Mohamed's own repository, he is the only one who can give that access. You could either contact him and ask him to make you a collaborator on his fork of the repo, or you'll have to copy the branch and make a new PR — Reply to this email directly, view it on GitHub <#2221 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHY7X6MBTB5E347RG4PQR324HM5HAVCNFSM6AAAAABYP2NEL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBUGM4TANZQGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

anfredette · 2025-05-01T12:04:06Z

@msherif1234 can you add me and @frobware as collaborators on this pr?

msherif1234 · 2025-05-09T09:25:59Z

Done

WIP:DNM: Bpfman operator APIs review

f107af4

Signed-off-by: Mohamed Mahmoud <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2025

openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 6, 2025

openshift-ci bot requested review from JoelSpeed and knobunc March 6, 2025 22:30

everettraven reviewed Mar 7, 2025

View reviewed changes

Initial attempt to address userDoc, enum fmt

78534fc

Signed-off-by: Mohamed Mahmoud <[email protected]>

everettraven reviewed Mar 11, 2025

View reviewed changes

bpfman-operator/apis/v1alpha1/bpf_application_types.go Show resolved Hide resolved

everettraven reviewed Mar 11, 2025

View reviewed changes

bpfman-operator/apis/v1alpha1/cluster_bpf_application_types.go Outdated Show resolved Hide resolved

everettraven reviewed Mar 11, 2025

View reviewed changes

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go Show resolved Hide resolved

everettraven reviewed Mar 11, 2025

View reviewed changes

bpfman-operator/apis/v1alpha1/cluster_fentry_program_types.go Outdated Show resolved Hide resolved

use enum for fentry and fexit attach

89fb437

Signed-off-by: Mohamed Mahmoud <[email protected]>