Skip to content

WIP:DNM: Bpfman operator APIs review #2221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions bpfman-operator/apis/v1alpha1/bpf_application_state_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
Copyright 2023 The bpfman Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
metav1types "k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
)

// BpfApplicationProgramState defines the desired state of BpfApplication
// +union
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'xdp' ? has(self.xdpInfo) : !has(self.xdpInfo)",message="xdpInfo configuration is required when type is xdp, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'tc' ? has(self.tcInfo) : !has(self.tcInfo)",message="tcInfo configuration is required when type is tc, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'tcx' ? has(self.tcxInfo) : !has(self.tcxInfo)",message="tcx configuration is required when type is TCtcxX, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'uprobe' ? has(self.uprobeInfo) : !has(self.uprobeInfo)",message="uprobe configuration is required when type is uprobe, and forbidden otherwise"
type BpfApplicationProgramState struct {
BpfProgramStateCommon `json:",inline"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we recommend to not add other fields to unions, and have just the discriminator and the members.

It's been found to be confusing in the past when we have mixed members and non members


// Type specifies the bpf program type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things here:

In OpenShift we generally recommend using the serialized name in the GoDoc because it matches a bit better with the name format that users of something like kubectl explain ... will see. Kubernetes prefers using the field name. Because this is an upstream project, we can't necessarily hold you strictly to the OpenShift conventions, but I would highly recommend deciding which one you'd like to go with and stay consistent.

This GoDoc is also not very helpful for users of tooling, like kubectl explain ..., that display the documentation generated from the GoDoc (which does not include the validation tags). I would recommend following the guidance outlined in https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc to come up with a more useful description of the field for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a brief glance, the second part of ^ applies to most, if not all, of the GoDoc across your APIs.

// +unionDiscriminator
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum:="xdp";"tc";"tcx";"uprobe";"uretprobe"
Type EBPFProgType `json:"type,omitempty"`

// xdp defines the desired state of the application's XdpPrograms.
// +unionMember
// +optional
XDPInfo *XdpProgramInfoState `json:"xdpInfo,omitempty"`

// tc defines the desired state of the application's TcPrograms.
// +unionMember
// +optional
TCInfo *TcProgramInfoState `json:"tcInfo,omitempty"`

// tcx defines the desired state of the application's TcxPrograms.
// +unionMember
// +optional
TCXInfo *TcxProgramInfoState `json:"tcxInfo,omitempty"`

// uprobe defines the desired state of the application's UprobePrograms.
// +unionMember
// +optional
UprobeInfo *UprobeProgramInfoState `json:"uprobeInfo,omitempty"`

// uretprobe defines the desired state of the application's UretprobePrograms.
// +unionMember
// +optional
UretprobeInfo *UprobeProgramInfoState `json:"uretprobeInfo,omitempty"`
}

// BpfApplicationSpec defines the desired state of BpfApplication
type BpfApplicationStateSpec struct {
// Node is the name of the node for this BpfApplicationStateSpec.
Node string `json:"node"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "node" in this context? Is it a Kubernetes Node? Are there any validations you can enforce to ensure a user cannot supply an invalid node name?

Also, this looks like it needs the +required marker as it appears to be a required field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node is the nodeName this CRD is not something the customers will be using to configure the feature, this object reflect the state of the application on the node that this instance is running on. the objects that are customer facing are in bpf_application_type.go for namespaces resource and in cluster_bpf_application_type.go for cluster scope resource

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using an entire CRD to communicate state of resources instead of the status subresource of your existing CRD types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status won't scale with large cluster so idea was to have nodeState object that reflect per prog type status what happened and overall/ node

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to expand on that a little...

  1. A single BpfApplication object can create a lot of state! For example, it might load N eBPF probe(s) into every pod on every node.
  2. When reconciling a BpfApplicationCRD things get complicated since there is one controller running on every node. Each controller is watching for BpfApplicationCRD changes. As every instance of the controller completes its reconciliation it would write back state to that BpfApplicationCRD, which then triggers another update... creating a bit of an "update storm".
  3. Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

aside: I would be very happy to not have *StateCRDs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?

That is one way. You could use server-side apply as well to allow for multiple field managers.

Not saying this is better, but another way I could think of is that you have the BpfApplication CRD be a higher level configuration option for users to specify the configuration across multiple nodes. The BpfApplication controller could then create NodeBpfApplication resources that are managed by the controllers running on the nodes. The NodeBpfApplication would specify a single node so that the controllers running on each node can identify which ones they are responsible for reconciling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this some more. If you are going to continue with this direction of separate state CRDs, you probably should not include a spec. Instead, have an empty spec and only ever populate the status with information for a user.

This would help prevent confusion from the user side that there are configurable options on this CRD.

Copy link

@anfredette anfredette Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are mostly replicating the declarative configuration they have specified...

Actually, we are mostly not replicating the declarative config. The user specifies the high-level intent in the BpfApplication CRD, and then the bpfman agent on each node applies that intent to the given node. For example, we have an option for the user to indicate that a program should be installed on the primary node interface. The agent running on the node figures out what that interface is, and reflects the name (e.g., eth0) in the state CRD. Or if there's a container selector, the agent finds the containers that match. Also, we need to maintain the program IDs and link IDs so that we can unload/remove them when needed. All these items are likely different on each node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your comment about putting the per node info in the status field, we just want to confirm. Are recommending that we move all of the info currently in the Spec section to the Status section? Are there any downsides to maintaining a large amount of info in the Status section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would recommend that anything that is capturing state to convey to users goes into the status subresource. It would help prevent users from thinking that the resource is something they can modify. Generally users know not to mess with the status, but it is common practice that the spec is a user defined desired state.

AFAIK, there is no difference in maintaining the info in the status vs the spec. If it works for you today to store things in the spec, there is no reason I can think of as to why it wouldn't work just as well moved to the status.

// The number of times the BpfApplicationState has been updated. Set to 1
// when the object is created, then it is incremented prior to each update.
// This allows us to verify that the API server has the updated object prior
// to starting a new Reconcile operation.
UpdateCount int64 `json:"updateCount"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who/What sets this to 1 when the object is created? Why do you need this to verify that the API server has the updated object? Is using the metadata.generation, which increments on every spec change insufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllers are responsible of creating this object and updating it

// AppLoadStatus reflects the status of loading the bpf application on the
// given node.
AppLoadStatus AppLoadStatus `json:"appLoadStatus"`
// Programs is a list of bpf programs contained in the parent application.
// It is a map from the bpf program name to BpfApplicationProgramState
// elements.
Programs []BpfApplicationProgramState `json:"programs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SSATags, needs +listType with a value of atomic or map

}

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced

// BpfApplicationState contains the per-node state of a BpfApplication.
// +kubebuilder:printcolumn:name="Node",type=string,JSONPath=".spec.node"
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
type BpfApplicationState struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized mid-review of some of the above fields that this seems to be a separate CRD for conveying the observed state of a BpfApplication to users. Is there a reason why you are using a separate CRD for this instead of utilizing the status subresource on the BpfApplication CRD itself?

metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec BpfApplicationStateSpec `json:"spec,omitempty"`
Status BpfAppStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true
// BpfApplicationStateList contains a list of BpfApplicationState objects
type BpfApplicationStateList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []BpfApplicationState `json:"items"`
}

func (an BpfApplicationState) GetName() string {
return an.Name
}

func (an BpfApplicationState) GetUID() metav1types.UID {
return an.UID
}

func (an BpfApplicationState) GetAnnotations() map[string]string {
return an.Annotations
}

func (an BpfApplicationState) GetLabels() map[string]string {
return an.Labels
}

func (an BpfApplicationState) GetStatus() *BpfAppStatus {
return &an.Status
}

func (an BpfApplicationState) GetClientObject() client.Object {
return &an
}

func (anl BpfApplicationStateList) GetItems() []BpfApplicationState {
return anl.Items
}
100 changes: 100 additions & 0 deletions bpfman-operator/apis/v1alpha1/bpf_application_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
Copyright 2024 The bpfman Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// BpfApplicationProgram defines the desired state of BpfApplication
// +union
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'xdp' ? has(self.xdpInfo) : !has(self.xdpInfo)",message="xdpInfo configuration is required when type is xdp, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'tc' ? has(self.tcInfo) : !has(self.tcInfo)",message="tcInfo configuration is required when type is tc, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'tcx' ? has(self.tcxInfo) : !has(self.tcxInfo)",message="tcxInfo configuration is required when type is tcx, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'uprobe' ? has(self.uprobeInfo) : !has(self.uprobeInfo)",message="uprobeInfo configuration is required when type is uprobe, and forbidden otherwise"
type BpfApplicationProgram struct {
// Name is the name of the function that is the entry point for the BPF
// program
Name string `json:"name"`

// Type specifies the bpf program type
// +unionDiscriminator
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum:="xdp";"tc";"tcx";"uprobe";"uretprobe"
Type EBPFProgType `json:"type,omitempty"`

// xdp defines the desired state of the application's XdpPrograms.
// +unionMember
// +optional
XDPInfo *XdpProgramInfo `json:"xdpInfo,omitempty"`

// tc defines the desired state of the application's TcPrograms.
// +unionMember
// +optional
TCInfo *TcProgramInfo `json:"tcInfo,omitempty"`

// tcx defines the desired state of the application's TcxPrograms.
// +unionMember
// +optional
TCXInfo *TcxProgramInfo `json:"tcxInfo,omitempty"`

// uprobe defines the desired state of the application's UprobePrograms.
// +unionMember
// +optional
UprobeInfo *UprobeProgramInfo `json:"uprobeInfo,omitempty"`

// uretprobe defines the desired state of the application's UretprobePrograms.
// +unionMember
// +optional
UretprobeInfo *UprobeProgramInfo `json:"uretprobeInfo,omitempty"`
}

// BpfApplicationSpec defines the desired state of BpfApplication
type BpfApplicationSpec struct {
BpfAppCommon `json:",inline"`

// Programs is the list of bpf programs in the BpfApplication that should be
// loaded. The application can selectively choose which program(s) to run
// from this list based on the optional attach points provided.
// +kubebuilder:validation:MinItems:=1
Programs []BpfApplicationProgram `json:"programs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be an optional field? What happens if I create a BpfApplication and specify no programs?

Copy link
Contributor

@everettraven everettraven Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally for lists we recommend having an upper bound to the number of entries. Is there a reasonable upper bound that can be specified here?

Additionally, what other constraints exist here? Can I have multiple program entries with the same .name value? If not there should be an admission time check that prevents this, likely using a CEL expression. If yes, are there any constraints when specifying it multiple times?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a BpfApplication is to ask bpfman to load and possibly attach eBpf programs. I don't see a use case for a BpfApplication CRD with no programs identified. Regarding the bound, I don't know what an acceptable bound would be. The same goes for links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, this should probably be a required field. For an upper bound, how many programs would you generally expect a user to specify? What number would surprise you as being a really high number?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it should be required. We'll fix that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to arbitrarily limit it. However, we'll try to come up with some upper bounds on what's reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always increase it in the future if needed. You can never decrease it though.

If you truly can't enforce a reasonable upper bound then that is fine, but usually there is a reasonable one that can be set

}

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope=Namespaced

// BpfApplication is the Schema for the bpfapplications API
// +kubebuilder:printcolumn:name="NodeSelector",type=string,JSONPath=`.spec.nodeselector`
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
type BpfApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec BpfApplicationSpec `json:"spec,omitempty"`
Status BpfAppStatus `json:"status,omitempty"`
Comment on lines +191 to +192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these fields should have godoc and either a +required or +optional marker

}

// +kubebuilder:object:root=true
// BpfApplicationList contains a list of BpfApplications
type BpfApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []BpfApplication `json:"items"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field should have godoc and either a +required or +optional marker

}
Loading