Skip to content

WIP:DNM: Bpfman operator APIs review #2221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions bpfman-operator/apis/v1alpha1/bpf_application_state_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
/*
Copyright 2023 The bpfman Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
metav1types "k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
)

// BpfApplicationProgramState defines the desired state of BpfApplication
// +union
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'XDP' ? has(self.xdp) : !has(self.xdp)",message="xdp configuration is required when type is xdp, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'TC' ? has(self.tc) : !has(self.tc)",message="tc configuration is required when type is tc, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'TCX' ? has(self.tcx) : !has(self.tcx)",message="tcx configuration is required when type is TCX, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'UProbe' ? has(self.uprobe) : !has(self.uprobe)",message="uprobe configuration is required when type is uprobe, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'URetProbe' ? has(self.uretprobe) : !has(self.uretprobe)",message="uretprobe configuration is required when type is uretprobe, and forbidden otherwise"
type BpfApplicationProgramState struct {
BpfProgramStateCommon `json:",inline"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we recommend to not add other fields to unions, and have just the discriminator and the members.

It's been found to be confusing in the past when we have mixed members and non members


// type specifies the bpf program type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This GoDoc is not descriptive enough for an end-user that knows nothing about this API.

What is a "bpf program type"? How do I know which one I need to use? What values can I put here? Do I have to specify one? What constraints are there when I set a particular value?

When writing the GoDoc for each field it is important to ensure you are conveying enough information to your end users. The GoDoc for each field is what is used in the generated documentation that is shown to users when using something like oc explain ... to get more information on an API. The markers are not included in the generated documentation.

More specifically, your documentation should answer the questions:

  • What is the purpose of this field? What does setting it allow me to achieve?
  • How does setting this field interact with other fields?
  • What are the limitations of the field?
  • Is the field optional or required?
    • If the field is optional, what happens if I don't set it?

To get you started, I would expect the GoDoc for this field to read something like:

// type is a required field used to specify the type of the bpf program.
// The type dictates where in the kernel the program is executed.
// The Linux kernel may restrict or allow certain functionality depending on the program type.
// 
// Allowed values are XDP, TC, TCX, Uprobe, and UretProbe.
//
// When set to XDP, the program can attach to network devices and will be called on every incoming packet
// received by the network device. When using the XDP program type, the xdp field is required.
//
// When set to TC ....
// ...

In general, these GoDoc guidelines apply to every field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the spirit of the request and I'll make a pass beefing up the descriptions. But I don't think we can or should give a full eBPF primer in the descriptions. Is it worth providing links to further reading in the description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would accept links to further reading as long as there is a decent base description. I agree that you shouldn't give a full eBPF primer in the descriptions, but you should give at least a basic description of what each allowed value maps to. Think about the familiarity of your user base and include relevant information to answer the questions I shared above.

Copy link
Contributor

@Billy99 Billy99 Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven I am in the process of going through all the descriptions for all the attributes and beefing up the content as you described above. In your example, you have the attribute type as all lower-case. When I display the CRD, the Description is under the attribute name, which is has the first letter of each word capitalized. Using my sample below, would you want attribute names in the Description to be interfaceSelector as shown below, or like the attribute name Interface Selector? I would think the latter. Is there a better way to display the content where it makes more sense to go with interfaceSelector?

 $ kubectl describe crd clusterbpfapplications
 :
      Interface Selector:
          Description:     interfaceSelector to determine the network interface (or interfaces)
          Max Properties:  1
          Min Properties:  1
          Properties:
:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the answer: oc explain ...
I also found dev-guide/api-conventions.md and I'm reading through it.

Your example and the examples in the api-conventions.md have each new sentence on a different line and long sentences wrapping. In my terminal, the output seems to wrap at 80 characters and new lines in the comments are honored (see below). Is that just a setting in my terminal (looked through my terminal settings and don't see it) so making sure we don't need to manage line spacing in the GoDoc comments.

$ kubectl explain clusterbpfapplication.spec.programs
GROUP:      bpfman.io
KIND:       ClusterBpfApplication
VERSION:    v1alpha1

FIELD: programs <[]Object>


DESCRIPTION:
    programs is the list of eBPF programs in the ClusterBpfApplication that
    should be
    loaded in kernel memory.
    eBPF programs in this list will be loaded on the system based the
    nodeSelector.
    Even if eBPF program is loaded in kernel memory, it cannot be triggered
    until an
    attachment point is provided.
    :

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we should try to respect the 80 characters, but this isn't something we have strictly enforced AFAIK. If you can put each sentence on a new line that is generally preferred. My example was just a general example, I wasn't expecting it to be taken literally.

Copy link
Contributor

@Billy99 Billy99 Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven I have a bpfman-operator PR that made a pass through the ClusterBpfApplication CRD and updated all the field descriptions (see PR#419). I also cut and pasted kubectl explain for each of the fields to help visualize the changes in a gist:
https://gist.github.com/Billy99/b871e60f04944d4b03c9c0106d2c8a43

I am getting ready to propagate my description changes to the other CRDs so hoping you could take a look and see if this what you had in mind before I propagate.

// +unionDiscriminator
// +required
// +kubebuilder:validation:Enum:="XDP";"TC";"TCX";"Uprobe";"UretProbe"
Type EBPFProgType `json:"type"`

// xdp defines the desired state of the application's XdpPrograms.
// +unionMember
// +optional
XDP *XdpProgramInfoState `json:"xdp,omitempty"`

// tc defines the desired state of the application's TcPrograms.
// +unionMember
// +optional
TC *TcProgramInfoState `json:"tc,omitempty"`

// tcx defines the desired state of the application's TcxPrograms.
// +unionMember
// +optional
TCX *TcxProgramInfoState `json:"tcx,omitempty"`

// uprobe defines the desired state of the application's UprobePrograms.
// +unionMember
// +optional
UProbe *UprobeProgramInfoState `json:"uprobe,omitempty"`

// uretprobe defines the desired state of the application's UretprobePrograms.
// +unionMember
// +optional
URetProbe *UprobeProgramInfoState `json:"uretprobe,omitempty"`
}

// BpfApplicationSpec defines the desired state of BpfApplication
type BpfApplicationStateSpec struct {
// node is the name of the node for this BpfApplicationStateSpec.
Node string `json:"node"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "node" in this context? Is it a Kubernetes Node? Are there any validations you can enforce to ensure a user cannot supply an invalid node name?

Also, this looks like it needs the +required marker as it appears to be a required field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node is the nodeName this CRD is not something the customers will be using to configure the feature, this object reflect the state of the application on the node that this instance is running on. the objects that are customer facing are in bpf_application_type.go for namespaces resource and in cluster_bpf_application_type.go for cluster scope resource

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using an entire CRD to communicate state of resources instead of the status subresource of your existing CRD types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status won't scale with large cluster so idea was to have nodeState object that reflect per prog type status what happened and overall/ node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to expand on that a little...

  1. A single BpfApplication object can create a lot of state! For example, it might load N eBPF probe(s) into every pod on every node.
  2. When reconciling a BpfApplicationCRD things get complicated since there is one controller running on every node. Each controller is watching for BpfApplicationCRD changes. As every instance of the controller completes its reconciliation it would write back state to that BpfApplicationCRD, which then triggers another update... creating a bit of an "update storm".
  3. Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

aside: I would be very happy to not have *StateCRDs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

That is one way. You could use server-side apply as well to allow for multiple field managers.

Not saying this is better, but another way I could think of is that you have the BpfApplication CRD be a higher level configuration option for users to specify the configuration across multiple nodes. The BpfApplication controller could then create NodeBpfApplication resources that are managed by the controllers running on the nodes. The NodeBpfApplication would specify a single node so that the controllers running on each node can identify which ones they are responsible for reconciling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this some more. If you are going to continue with this direction of separate state CRDs, you probably should not include a spec. Instead, have an empty spec and only ever populate the status with information for a user.

This would help prevent confusion from the user side that there are configurable options on this CRD.

Copy link

@anfredette anfredette Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are mostly replicating the declarative configuration they have specified...

Actually, we are mostly not replicating the declarative config. The user specifies the high-level intent in the BpfApplication CRD, and then the bpfman agent on each node applies that intent to the given node. For example, we have an option for the user to indicate that a program should be installed on the primary node interface. The agent running on the node figures out what that interface is, and reflects the name (e.g., eth0) in the state CRD. Or if there's a container selector, the agent finds the containers that match. Also, we need to maintain the program IDs and link IDs so that we can unload/remove them when needed. All these items are likely different on each node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your comment about putting the per node info in the status field, we just want to confirm. Are recommending that we move all of the info currently in the Spec section to the Status section? Are there any downsides to maintaining a large amount of info in the Status section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would recommend that anything that is capturing state to convey to users goes into the status subresource. It would help prevent users from thinking that the resource is something they can modify. Generally users know not to mess with the status, but it is common practice that the spec is a user defined desired state.

AFAIK, there is no difference in maintaining the info in the status vs the spec. If it works for you today to store things in the spec, there is no reason I can think of as to why it wouldn't work just as well moved to the status.

// updateCount is the number of times the BpfApplicationState has been updated. Set to 1
// when the object is created, then it is incremented prior to each update.
// This allows us to verify that the API server has the updated object prior
// to starting a new Reconcile operation.
UpdateCount int64 `json:"updateCount"`
// appLoadStatus reflects the status of loading the bpf application on the
// given node.
AppLoadStatus AppLoadStatus `json:"appLoadStatus"`
// programs is a list of bpf programs contained in the parent application.
// It is a map from the bpf program name to BpfApplicationProgramState
// elements.
Programs []BpfApplicationProgramState `json:"programs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SSATags, needs +listType with a value of atomic or map

}

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced

// BpfApplicationState contains the per-node state of a BpfApplication.
// +kubebuilder:printcolumn:name="Node",type=string,JSONPath=".spec.node"
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
type BpfApplicationState struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized mid-review of some of the above fields that this seems to be a separate CRD for conveying the observed state of a BpfApplication to users. Is there a reason why you are using a separate CRD for this instead of utilizing the status subresource on the BpfApplication CRD itself?

metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec BpfApplicationStateSpec `json:"spec,omitempty"`
Status BpfAppStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true
// BpfApplicationStateList contains a list of BpfApplicationState objects
type BpfApplicationStateList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []BpfApplicationState `json:"items"`
}

func (an BpfApplicationState) GetName() string {
return an.Name
}

func (an BpfApplicationState) GetUID() metav1types.UID {
return an.UID
}

func (an BpfApplicationState) GetAnnotations() map[string]string {
return an.Annotations
}

func (an BpfApplicationState) GetLabels() map[string]string {
return an.Labels
}

func (an BpfApplicationState) GetStatus() *BpfAppStatus {
return &an.Status
}

func (an BpfApplicationState) GetClientObject() client.Object {
return &an
}

func (anl BpfApplicationStateList) GetItems() []BpfApplicationState {
return anl.Items
}
104 changes: 104 additions & 0 deletions bpfman-operator/apis/v1alpha1/bpf_application_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
/*
Copyright 2024 The bpfman Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BpfApplicationProgram defines the desired state of BpfApplication
// +union
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'XDP' ? has(self.xdp) : !has(self.xdp)",message="xdp configuration is required when type is xdp, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'TC' ? has(self.tc) : !has(self.tc)",message="tc configuration is required when type is tc, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'TCX' ? has(self.tcx) : !has(self.tcx)",message="tcx configuration is required when type is TCX, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'UProbe' ? has(self.uprobe) : !has(self.uprobe)",message="uprobe configuration is required when type is uprobe, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'URetProbe' ? has(self.uretprobe) : !has(self.uretprobe)",message="uretprobe configuration is required when type is uretprobe, and forbidden otherwise"
type BpfApplicationProgram struct {
// name is the name of the function that is the entry point for the BPF
// program
// +kubebuilder:validation:Pattern="^[a-zA-Z][a-zA-Z0-9_]+."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General advice is to use CEL expressions to enforce regex matching now so that you can provide a more user-friendly message than returning a regex pattern to them on failed validations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u pls share more info on how to do so with cel ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

// +kubebuilder:validation:XValidation:rule=`self.matches("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")`,message=`resource must consist of only lowercase alphanumeric characters and hyphens`

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=64
Name string `json:"name"`

// type specifies the bpf program type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My prior comment on the BpfApplicationProgramState type field applies here.

// +unionDiscriminator
// +required
// +kubebuilder:validation:Enum:="XDP";"TC";"TCX";"Uprobe";"UretProbe"
Type EBPFProgType `json:"type"`

// xdp defines the desired state of the application's XdpPrograms.
// +unionMember
// +optional
XDP *XdpProgramInfo `json:"xdp,omitempty"`

// tc defines the desired state of the application's TcPrograms.
// +unionMember
// +optional
TC *TcProgramInfo `json:"tc,omitempty"`

// tcx defines the desired state of the application's TcxPrograms.
// +unionMember
// +optional
TCX *TcxProgramInfo `json:"tcx,omitempty"`

// uprobe defines the desired state of the application's UprobePrograms.
// +unionMember
// +optional
UProbe *UprobeProgramInfo `json:"uprobe,omitempty"`

// uretprobe defines the desired state of the application's UretprobePrograms.
// +unionMember
// +optional
URetProbe *UprobeProgramInfo `json:"uretprobe,omitempty"`
}

// BpfApplicationSpec defines the desired state of BpfApplication
type BpfApplicationSpec struct {
BpfAppCommon `json:",inline"`

// programs is the list of bpf programs in the BpfApplication that should be
// loaded. The application can selectively choose which program(s) to run
// from this list based on the optional attach points provided.
// +kubebuilder:validation:MinItems:=1
Programs []BpfApplicationProgram `json:"programs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be an optional field? What happens if I create a BpfApplication and specify no programs?

Copy link
Contributor

@everettraven everettraven Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally for lists we recommend having an upper bound to the number of entries. Is there a reasonable upper bound that can be specified here?

Additionally, what other constraints exist here? Can I have multiple program entries with the same .name value? If not there should be an admission time check that prevents this, likely using a CEL expression. If yes, are there any constraints when specifying it multiple times?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a BpfApplication is to ask bpfman to load and possibly attach eBpf programs. I don't see a use case for a BpfApplication CRD with no programs identified. Regarding the bound, I don't know what an acceptable bound would be. The same goes for links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, this should probably be a required field. For an upper bound, how many programs would you generally expect a user to specify? What number would surprise you as being a really high number?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it should be required. We'll fix that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to arbitrarily limit it. However, we'll try to come up with some upper bounds on what's reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always increase it in the future if needed. You can never decrease it though.

If you truly can't enforce a reasonable upper bound then that is fine, but usually there is a reasonable one that can be set

}

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced

// BpfApplication is the Schema for the bpfapplications API
// +kubebuilder:printcolumn:name="NodeSelector",type=string,JSONPath=`.spec.nodeselector`
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
type BpfApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec BpfApplicationSpec `json:"spec,omitempty"`
Status BpfAppStatus `json:"status,omitempty"`
Comment on lines +191 to +192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these fields should have godoc and either a +required or +optional marker

}

// +kubebuilder:object:root=true
// BpfApplicationList contains a list of BpfApplications
type BpfApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []BpfApplication `json:"items"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field should have godoc and either a +required or +optional marker

}
Loading