-
Notifications
You must be signed in to change notification settings - Fork 541
WIP:DNM: Bpfman operator APIs review #2221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP:DNM: Bpfman operator APIs review #2221
Conversation
Signed-off-by: Mohamed Mahmoud <[email protected]>
Hello @msherif1234! Some important instructions when contributing to openshift/api: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had intended to get further on this pass, but got around to reviewing this later than I had planned. I'll plan to circle back around to this on Monday to do a more thorough pass.
That being said, a couple of my initial comments seem to be applicable throughout your API surface. More specifically, writing user readable GoDoc. I would highly recommend looking at https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc and updating your GoDoc to follow these guidelines in general.
type BpfApplicationProgramState struct { | ||
BpfProgramStateCommon `json:",inline"` | ||
|
||
// Type specifies the bpf program type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things here:
In OpenShift we generally recommend using the serialized name in the GoDoc because it matches a bit better with the name format that users of something like kubectl explain ...
will see. Kubernetes prefers using the field name. Because this is an upstream project, we can't necessarily hold you strictly to the OpenShift conventions, but I would highly recommend deciding which one you'd like to go with and stay consistent.
This GoDoc is also not very helpful for users of tooling, like kubectl explain ...
, that display the documentation generated from the GoDoc (which does not include the validation tags). I would recommend following the guidance outlined in https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#write-user-readable-documentation-in-godoc to come up with a more useful description of the field for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a brief glance, the second part of ^ applies to most, if not all, of the GoDoc across your APIs.
// BpfApplicationSpec defines the desired state of BpfApplication | ||
type BpfApplicationStateSpec struct { | ||
// Node is the name of the node for this BpfApplicationStateSpec. | ||
Node string `json:"node"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "node" in this context? Is it a Kubernetes Node
? Are there any validations you can enforce to ensure a user cannot supply an invalid node name?
Also, this looks like it needs the +required
marker as it appears to be a required field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node is the nodeName this CRD is not something the customers will be using to configure the feature, this object reflect the state of the application on the node that this instance is running on. the objects that are customer facing are in bpf_application_type.go for namespaces resource and in cluster_bpf_application_type.go for cluster scope resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using an entire CRD to communicate state of resources instead of the status subresource of your existing CRD types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status won't scale with large cluster so idea was to have nodeState object that reflect per prog type status what happened and overall/ node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to expand on that a little...
- A single BpfApplication object can create a lot of state! For example, it might load N eBPF probe(s) into every pod on every node.
- When reconciling a BpfApplicationCRD things get complicated since there is one controller running on every node. Each controller is watching for BpfApplicationCRD changes. As every instance of the controller completes its reconciliation it would write back state to that BpfApplicationCRD, which then triggers another update... creating a bit of an "update storm".
- Aside from ☝️ it wasn't clear how to cleanly layout the State within an
BpfApplicationCRD
. Could it bemap[string]State
where the key is the k8s node name where the agent is running?
aside: I would be very happy to not have *StateCRDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from ☝️ it wasn't clear how to cleanly layout the State within an BpfApplicationCRD. Could it be map[string]State where the key is the k8s node name where the agent is running?
That is one way. You could use server-side apply as well to allow for multiple field managers.
Not saying this is better, but another way I could think of is that you have the BpfApplication
CRD be a higher level configuration option for users to specify the configuration across multiple nodes. The BpfApplication
controller could then create NodeBpfApplication
resources that are managed by the controllers running on the nodes. The NodeBpfApplication
would specify a single node so that the controllers running on each node can identify which ones they are responsible for reconciling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this some more. If you are going to continue with this direction of separate state CRDs, you probably should not include a spec. Instead, have an empty spec and only ever populate the status with information for a user.
This would help prevent confusion from the user side that there are configurable options on this CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are mostly replicating the declarative configuration they have specified...
Actually, we are mostly not replicating the declarative config. The user specifies the high-level intent in the BpfApplication CRD, and then the bpfman agent on each node applies that intent to the given node. For example, we have an option for the user to indicate that a program should be installed on the primary node interface. The agent running on the node figures out what that interface is, and reflects the name (e.g., eth0) in the state CRD. Or if there's a container selector, the agent finds the containers that match. Also, we need to maintain the program IDs and link IDs so that we can unload/remove them when needed. All these items are likely different on each node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your comment about putting the per node info in the status field, we just want to confirm. Are recommending that we move all of the info currently in the Spec section to the Status section? Are there any downsides to maintaining a large amount of info in the Status section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would recommend that anything that is capturing state to convey to users goes into the status subresource. It would help prevent users from thinking that the resource is something they can modify. Generally users know not to mess with the status, but it is common practice that the spec is a user defined desired state.
AFAIK, there is no difference in maintaining the info in the status vs the spec. If it works for you today to store things in the spec, there is no reason I can think of as to why it wouldn't work just as well moved to the status.
// The number of times the BpfApplicationState has been updated. Set to 1 | ||
// when the object is created, then it is incremented prior to each update. | ||
// This allows us to verify that the API server has the updated object prior | ||
// to starting a new Reconcile operation. | ||
UpdateCount int64 `json:"updateCount"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who/What sets this to 1 when the object is created? Why do you need this to verify that the API server has the updated object? Is using the metadata.generation
, which increments on every spec
change insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controllers are responsible of creating this object and updating it
// +kubebuilder:printcolumn:name="Node",type=string,JSONPath=".spec.node" | ||
// +kubebuilder:printcolumn:name="Status",type=string,JSONPath=`.status.conditions[0].reason` | ||
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" | ||
type BpfApplicationState struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized mid-review of some of the above fields that this seems to be a separate CRD for conveying the observed state of a BpfApplication
to users. Is there a reason why you are using a separate CRD for this instead of utilizing the status subresource on the BpfApplication
CRD itself?
Signed-off-by: Mohamed Mahmoud <[email protected]>
type BpfApplicationProgramState struct { | ||
BpfProgramStateCommon `json:",inline"` | ||
|
||
// type specifies the bpf program type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This GoDoc is not descriptive enough for an end-user that knows nothing about this API.
What is a "bpf program type"? How do I know which one I need to use? What values can I put here? Do I have to specify one? What constraints are there when I set a particular value?
When writing the GoDoc for each field it is important to ensure you are conveying enough information to your end users. The GoDoc for each field is what is used in the generated documentation that is shown to users when using something like oc explain ...
to get more information on an API. The markers are not included in the generated documentation.
More specifically, your documentation should answer the questions:
- What is the purpose of this field? What does setting it allow me to achieve?
- How does setting this field interact with other fields?
- What are the limitations of the field?
- Is the field optional or required?
- If the field is optional, what happens if I don't set it?
To get you started, I would expect the GoDoc for this field to read something like:
// type is a required field used to specify the type of the bpf program.
// The type dictates where in the kernel the program is executed.
// The Linux kernel may restrict or allow certain functionality depending on the program type.
//
// Allowed values are XDP, TC, TCX, Uprobe, and UretProbe.
//
// When set to XDP, the program can attach to network devices and will be called on every incoming packet
// received by the network device. When using the XDP program type, the xdp field is required.
//
// When set to TC ....
// ...
In general, these GoDoc guidelines apply to every field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the spirit of the request and I'll make a pass beefing up the descriptions. But I don't think we can or should give a full eBPF primer in the descriptions. Is it worth providing links to further reading in the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would accept links to further reading as long as there is a decent base description. I agree that you shouldn't give a full eBPF primer in the descriptions, but you should give at least a basic description of what each allowed value maps to. Think about the familiarity of your user base and include relevant information to answer the questions I shared above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@everettraven I am in the process of going through all the descriptions for all the attributes and beefing up the content as you described above. In your example, you have the attribute type
as all lower-case. When I display the CRD, the Description is under the attribute name, which is has the first letter of each word capitalized. Using my sample below, would you want attribute names in the Description to be interfaceSelector
as shown below, or like the attribute name Interface Selector
? I would think the latter. Is there a better way to display the content where it makes more sense to go with interfaceSelector
?
$ kubectl describe crd clusterbpfapplications
:
Interface Selector:
Description: interfaceSelector to determine the network interface (or interfaces)
Max Properties: 1
Min Properties: 1
Properties:
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the answer: oc explain ...
I also found dev-guide/api-conventions.md and I'm reading through it.
Your example and the examples in the api-conventions.md have each new sentence on a different line and long sentences wrapping. In my terminal, the output seems to wrap at 80 characters and new lines in the comments are honored (see below). Is that just a setting in my terminal (looked through my terminal settings and don't see it) so making sure we don't need to manage line spacing in the GoDoc comments.
$ kubectl explain clusterbpfapplication.spec.programs
GROUP: bpfman.io
KIND: ClusterBpfApplication
VERSION: v1alpha1
FIELD: programs <[]Object>
DESCRIPTION:
programs is the list of eBPF programs in the ClusterBpfApplication that
should be
loaded in kernel memory.
eBPF programs in this list will be loaded on the system based the
nodeSelector.
Even if eBPF program is loaded in kernel memory, it cannot be triggered
until an
attachment point is provided.
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we should try to respect the 80 characters, but this isn't something we have strictly enforced AFAIK. If you can put each sentence on a new line that is generally preferred. My example was just a general example, I wasn't expecting it to be taken literally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@everettraven I have a bpfman-operator PR that made a pass through the ClusterBpfApplication CRD and updated all the field descriptions (see PR#419). I also cut and pasted kubectl explain
for each of the fields to help visualize the changes in a gist:
https://gist.github.com/Billy99/b871e60f04944d4b03c9c0106d2c8a43
I am getting ready to propagate my description changes to the other CRDs so hoping you could take a look and see if this what you had in mind before I propagate.
// program | ||
Name string `json:"name"` | ||
|
||
// type specifies the bpf program type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My prior comment on the BpfApplicationProgramState
type
field applies here.
// loaded. The application can selectively choose which program(s) to run | ||
// from this list based on the optional attach points provided. | ||
// +kubebuilder:validation:MinItems:=1 | ||
Programs []BpfApplicationProgram `json:"programs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be an optional field? What happens if I create a BpfApplication
and specify no programs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally for lists we recommend having an upper bound to the number of entries. Is there a reasonable upper bound that can be specified here?
Additionally, what other constraints exist here? Can I have multiple program entries with the same .name
value? If not there should be an admission time check that prevents this, likely using a CEL expression. If yes, are there any constraints when specifying it multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of a BpfApplication is to ask bpfman to load and possibly attach eBpf programs. I don't see a use case for a BpfApplication CRD with no programs identified. Regarding the bound, I don't know what an acceptable bound would be. The same goes for links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, this should probably be a required field. For an upper bound, how many programs would you generally expect a user to specify? What number would surprise you as being a really high number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it should be required. We'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to arbitrarily limit it. However, we'll try to come up with some upper bounds on what's reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always increase it in the future if needed. You can never decrease it though.
If you truly can't enforce a reasonable upper bound then that is fine, but usually there is a reasonable one that can be set
Spec BpfApplicationSpec `json:"spec,omitempty"` | ||
Status BpfAppStatus `json:"status,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both these fields should have godoc and either a +required
or +optional
marker
type BpfApplicationList struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ListMeta `json:"metadata,omitempty"` | ||
Items []BpfApplication `json:"items"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field should have godoc and either a +required
or +optional
marker
// BpfApplicationSpec defines the desired state of BpfApplication | ||
type ClBpfApplicationStateSpec struct { | ||
// node is the name of the node for this BpfApplicationStateSpec. | ||
Node string `json:"node"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the node name be any arbitrary string value? What validations can you put in place to ensure that invalid values are rejected at admission time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this auto field with the cluster nodes name again state object is not customer facing this more of the state of the object on specific node
Signed-off-by: Mohamed Mahmoud <[email protected]>
// loaded. The application can selectively choose which program(s) to run | ||
// from this list based on the optional attach points provided. | ||
// +kubebuilder:validation:MinItems:=1 | ||
Programs []ClBpfApplicationProgram `json:"programs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as above here
// +optional | ||
// +kubebuilder:validation:MaxItems=1 | ||
// +kubebuilder:default:={} | ||
Links []ClFentryAttachInfo `json:"links"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a list? Will it ever be possible to specify multiple "links" for a FentryProgram
? If I don't specify it, what is the default behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fentry and Fexit programs can only ever have one link. However, I made this a slice also so that the link syntax and semantics were consistent with the other program types. In this case, you can have a max of one link on the list.
For all program types, if a link does not exist in the list, it should not be attached, and if it was attached previously, but is no longer on the list, it will be detached. The same is true for Fentry/Fexit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. This should be explained in the documentation on the field so that users understand this limitation as well.
I would also recommend not making this a slice if it can never have more than a single link to make it clear to a user instead of them thinking that they can specify multiple due to it being a yaml list.
For example, when a user sees:
links:
- ...
they may think they can add more values here and then encounter an admission failure.
Instead, if they saw:
link:
field: value
it would be clear that this type only allows a single link. No workflow happens where a user attempts to add another link and encounters an admission failure.
Ultimately it is up to you if you would like to take this suggestion or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially implemented it the way you suggested. However, when we looked at it holistically, we thought it made more sense for the Fentry/Fexit "links" fields to be a slice just like all the other programs even though in these two cases, it can only hold one link. As a side effect, it helps make the controller code more common between program types. We'd prefer to keep it this way, but we can improve the comments to make it more clear.
// are optional and may be udated after the bpf program has been loaded | ||
// +optional | ||
// +kubebuilder:default:={} | ||
Links []ClKprobeAttachInfo `json:"links"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reasonable upper bound to this? What happens if I don't specify any links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no errors but on the same time nothing, I don't think we have way to define upper bound for this it will depends on the kernel how much it can handle b4 it complains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to specify this program type without providing at least one link? Re: upper bounds, even if it depends on the kernel is there a reasonable value you can specify here? Would it make sense for a user to be able to specify 10,000 links? 100,000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While programs may be attached based on configuration, it’s often important for all eBPF programs used by a BPF-based application (like netobserv) to be loaded at the same time. This ensures they can share a common set of global variables and maps. In this use case, all programs are loaded together, but attachments occur selectively based on the application’s configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with eBPF, so I'm having trouble parsing how this relates to the questions I've asked in this thread.
Is there a number of links that you think would be fairly common for users to specify? What number would make you think that it is a really high number of links?
Also, can I duplicate links or do they need to be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My answer was in response to why it makes sense to load a program without any links - they may not be needed initially, but may added/modified later based on configuration or other events.
The links need to be unique.
As with programs, we'll come up with an upper bound.
// are optional and may be udated after the bpf program has been loaded | ||
// +optional | ||
// +kubebuilder:default:={} | ||
Links []ClTcAttachInfo `json:"links"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as on other program type links.
|
||
// Optional container pid to attach the tc program in. | ||
// +optional | ||
ContainerPid *int32 `json:"containerPid"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean if there is no container pid specified?
// offset added to the address of the function for uprobe. | ||
// +optional | ||
// +kubebuilder:default:=0 | ||
Offset uint64 `json:"offset"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reasonable upper bound you can set here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combination of target, function, and offset is used to define the location of an instruction at which the user wants to invoke the uProbe program. So, the max offset would be the size of the largest allowable executable or library. The theoretical limit would depend on a number of factors including things like file format, OS, and file system. In any case, it would be pretty big, and I don't think it would be very helpful because in most cases it would need to be much smaller in practice. If the user provides an offset that is out of bounds for the given target, the eBPF verifier will return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that error manifest to users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The BpfApplication CRD would have Status == "Error", and it would list which BpfApplicationState CRD was in error.
- The BpfApplicationState CRD in question would have Status == "Error"
- The failed link would have LinkStatus == "AttachError" in the state CRD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the impact to user workloads in this state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other links that don't have errors will still be attached. It's up to the user application to decide what to do when this happens. Then, they may also need to figure out why what they are trying to do doesn't work.
eBPF is complicated, and the kernel isn't very helpful in telling you what's wrong when there's an error. So, we will also log what we can about the error, but often it's just EINVAL
and the eBPF application developer would need to use their knowledge of eBPF and the Linux kernel to figure out what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a failed attachment, does the container it would have been attached to still function correctly?
I want to make sure I'm understanding correctly that outside of the ebpf program itself going into an error state, that a misconfiguration in the offset doesn't have any negative side effects to the actual workload.
If a misconfiguration here would cause workloads to be impacted we should generally try to prevent that configuration from attempting to be rolled out where we can. That might mean admission time validation like setting an arbitrary but generally safe maximum, or doing some runtime evaluations that fail fast prior to attempting to roll out the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, by "workload" i'm meaning the non-ebpf applications that are running on a cluster via things like Deployment
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. No, a failed attachment would not affect the container/workload.
We check what we can, but the loads and link attachments end up as a kernel syscall. The kernel then validates everything to make sure it's allowed and safe (e.g., https://docs.kernel.org/bpf/verifier.html). If a load or attach isn't valid, the kernel doesn't load or attach it and returns an error.
That said, I'll think about whether there's a reasonable limit we can use that's less than U64_MAX
.
Offset uint64 `json:"offset"` | ||
|
||
// target is the Library name or the absolute path to a binary or library. | ||
Target string `json:"target"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any validations you can put in place to ensure a valid value is provided by a user at admission time?
// +optional | ||
// +kubebuilder:validation:MaxItems=11 | ||
// +kubebuilder:default:={pipe,dispatcher_return} | ||
ProceedOn []TcProceedOnValue `json:"proceedOn"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does each entry need to be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u mean we don't duplicate entries in the above list ? if so yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can use +kubebuilder:validation:UniqueItems:=true
to gurantee that will add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that didn't go well
spec.validation.openAPIV3Schema.properties[spec].properties[programs].items.properties[tc].properties[links].items.properties[proceedOn].uniqueItems: Forbidden: uniqueItems cannot be set to true since the runtime complexity becomes quadratic]
any other alternatives ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried different options as discussed over slack with no luck
https://redhat-internal.slack.com/archives/CE4L0F143/p1741298311519169
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error you are seeing related to default may be because it is thinking you mean to literally set the string "Pipe,DispatcherReturn"
instead of a slice containing 2 entries.
I think the default tag you are looking for is +kubebuilder:default:=[Pipe,DispatcherReturn]
(it might need the quotations for the string values, not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is my default // +kubebuilder:default:={Pipe,DispatcherReturn}
and its working fine with the [] initializaton
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
…ess more comments Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
type BpfApplicationProgram struct { | ||
// name is the name of the function that is the entry point for the BPF | ||
// program | ||
// +kubebuilder:validation:Pattern="^[a-zA-Z][a-zA-Z0-9_]+." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General advice is to use CEL expressions to enforce regex matching now so that you can provide a more user-friendly message than returning a regex pattern to them on failed validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u pls share more info on how to do so with cel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
api/insights/v1alpha1/types_insights.go
Line 322 in 8a7efbf
// +kubebuilder:validation:XValidation:rule=`self.matches("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$")`,message=`resource must consist of only lowercase alphanumeric characters and hyphens` |
bpfman-operator/apis/v1alpha1/cluster_bpf_application_state_types.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Mohamed Mahmoud <[email protected]>
Signed-off-by: Mohamed Mahmoud <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: msherif1234 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Mohamed Mahmoud <[email protected]>
@msherif1234: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is huge!
I think we might want to tackle this in smaller pieces, and focus on one area, iterate on that, and apply the lessons learned then to other areas of the API rather than trying to review and correct the whole thing in one go.
I'd suggest that we also make sure to fix the linter issues. The linters enforce standards that help the API to become complete and have the validations that we would want to see for a comprehensive API.
The more you can self help on fixing the linter problems, the easier it becomes for us to tackle the trickier problems of API review.
For example, adding a +optional
or +required
as per your requirements to every field would then help us working out what does and doesn't serialize well at the moment which the current structures, and where we might need additional validations
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'UProbe' ? has(self.uprobe) : !has(self.uprobe)",message="uprobe configuration is required when type is uprobe, and forbidden otherwise" | ||
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'URetProbe' ? has(self.uretprobe) : !has(self.uretprobe)",message="uretprobe configuration is required when type is uretprobe, and forbidden otherwise" | ||
type BpfApplicationProgramState struct { | ||
BpfProgramStateCommon `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we recommend to not add other fields to unions, and have just the discriminator and the members.
It's been found to be confusing in the past when we have mixed members and non members
// programs is a list of bpf programs contained in the parent application. | ||
// It is a map from the bpf program name to BpfApplicationProgramState | ||
// elements. | ||
Programs []BpfApplicationProgramState `json:"programs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing SSATags, needs +listType
with a value of atomic
or map
// kretprobe defines the desired state of the application's KprobePrograms. | ||
// +unionMember | ||
// +optional | ||
KRetProbe *ClKretprobeProgramInfoState `json:"kretprobe,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not kRetProbe
for the json value?
// Whether the program should be attached to the function. | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=1 | ||
// +kubebuilder:default:={} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default is not correct, you are defaulting to the empty object, but this is a list
Why would you want to default to the empty object?
Given you don't have omitempty
, defaults have no effect anyway for Go clients
|
||
// ClFexitProgramInfo defines the Fexit program details | ||
type ClFexitProgramInfo struct { | ||
ClFexitLoadInfo `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you creating so many inlines? These are very uncommon in Kube APIs, what's the motivation?
|
||
// offset added to the address of the function for kprobe. | ||
// The offset must be zero for kretprobes. | ||
// TODO: Add a webhook to enforce kretprobe offset=0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can likely achieve this validation with CEL validations
This commit includes changes based on multiple review comments. Signed-off-by: Andre Fredette <[email protected]>
@everettraven, Mohamed has left Red Hat, and we still have work to do on this PR. Can someone give other members of the bpfman team permission to update this pr? Thanks, Andre |
@anfredette Since the PR came from Mohamed's own repository, he is the only one who can give that access. You could either contact him and ask him to make you a collaborator on his fork of the repo, or you'll have to copy the branch and make a new PR |
I’ll see if I can get in contact with Mohamed. Thanks, Andre
…On Thu, May 1, 2025 at 4:42 AM Joel Speed ***@***.***> wrote:
*JoelSpeed* left a comment (openshift/api#2221)
<#2221 (comment)>
@anfredette <https://github.com/anfredette> Since the PR came from
Mohamed's own repository, he is the only one who can give that access. You
could either contact him and ask him to make you a collaborator on his fork
of the repo, or you'll have to copy the branch and make a new PR
—
Reply to this email directly, view it on GitHub
<#2221 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHY7X6MBTB5E347RG4PQR324HM5HAVCNFSM6AAAAABYP2NEL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBUGM4TANZQGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@msherif1234 can you add me and @frobware as collaborators on this pr? |
Done |
Apis review for bpfman-operator in preparation for GA in OCP 4.19