Report mistakes in HW plugin configuration to the PtpConfig status#152
Report mistakes in HW plugin configuration to the PtpConfig status#152gtannous-spec wants to merge 4 commits intok8snetworkplumbingwg:mainfrom
Conversation
|
Thanks for your PR, |
Summary of ChangesHello @gtannous-spec, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the observability and debugging experience for Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances PtpConfig status reporting by introducing CRD status condition reporting for hardware plugin misconfigurations, which will significantly improve the debugging experience for operators. However, a critical security vulnerability has been identified: the implementation lacks proper validation for the Name field in the PtpProfile struct. Since this field is a pointer and comes from an external source (the PtpConfig CRD), dereferencing it without a nil check can lead to daemon panics and Denial of Service. Additionally, there are suggestions to improve maintainability and adhere to best practices, mainly around reducing code duplication and context propagation.
c2db9fa to
324fe27
Compare
11d3748 to
4e28fe0
Compare
4e28fe0 to
648b5ca
Compare
648b5ca to
2736dc2
Compare
2736dc2 to
13fb797
Compare
pkg/daemon/ptpconfig_status.go
Outdated
| } | ||
| for _, cfg := range ptpConfigs.Items { | ||
| for _, p := range cfg.Spec.Profile { | ||
| if p.Name != nil && *p.Name == profileName { |
There was a problem hiding this comment.
Are we guaranteed that a profile name is unique?
There was a problem hiding this comment.
In practice the daemon likely expects profile name to be unique. But there is nothing that enforces it on the CRD level it seems.
There was a problem hiding this comment.
Thought that was supported by the webhook
There was a problem hiding this comment.
@gtannous-spec Can you double check the webhook. If there is link it here. If not we probably want to add that it if that what we're using to find it.
There was a problem hiding this comment.
Are we referring to "unique profiles" per one ptpconfig or same profile name across mutliple Ptpconfigs ?
There was a problem hiding this comment.
I think we can force that condition under the namespace openshift-ptp
There was a problem hiding this comment.
[core@master-0 ~]$ oc get packagemanifest ptp-operator -n openshift-marketplace -o jsonpath='{.status.channels[0].currentCSVDesc.installModes}'
[{"supported":true,"type":"OwnNamespace"},{"supported":true,"type":"SingleNamespace"},{"supported":false,"type":"MultiNamespace"},{"supported":false,"type":"AllNamespaces"}
There was a problem hiding this comment.
Single namespace, own namespace
There was a problem hiding this comment.
The PTP Operator controller watches PtpConfig CRs in the openshift-ptp namespace. Hardcoded
There was a problem hiding this comment.
Just wanted to make sure it was by definition and not an assumption :)
| devices: | ||
| - enp108s0f0 | ||
| e825: | ||
| enableDefaultConfig: false |
There was a problem hiding this comment.
Whats going on here?
There was a problem hiding this comment.
I guess this was auto-added when was fixing Merge conflicts, should fix it.
There was a problem hiding this comment.
I see why now it was added..
The unit tests couldn't pass with it
ec3f53b to
dab224c
Compare
dab224c to
0e5bedc
Compare
0e5bedc to
0f06bb3
Compare
0f06bb3 to
ac49a7f
Compare
0ef3ab0 to
7185394
Compare
pkg/daemon/daemon.go
Outdated
|
|
||
| if len(dn.unknownPlugins) > 0 { | ||
| pluginErrors = append(pluginErrors, | ||
| fmt.Errorf("unknown plugins specified (possible typo): %v", dn.unknownPlugins)) |
There was a problem hiding this comment.
If the plugin is never used why throw an error?
There was a problem hiding this comment.
do you mean if the plugin is incorrect from the first place we should return error?
Because in this lines we're catching plugins that couldn't be registered at all in the daemon-level start.
not per profile tho
There was a problem hiding this comment.
Its reported on the profile though. If we don't use that plugin that couldn't be registered then it doesn't matter right?
There was a problem hiding this comment.
Ok good point.
Fixed it, such that if no profile is using that plugin then it doesn't get reported globally.
addons/intel/validate.go
Outdated
| // sysfs pins or DPLL board labels. This mirrors each plugin's runtime behavior: | ||
| // - E810: passes hasSysfsSMAPins (sysfs when SMA1 exists, DPLL otherwise) | ||
| // - E825/E830: passes alwaysSysfs (always sysfs, matching pinConfig.applyPinSet) | ||
| func validatePinNames(devicePins map[string]pinSet, pluginName string, useSysfs func(string) bool) []string { |
There was a problem hiding this comment.
Why not simplify this with a book that defines if you call back to dpll or not?
In fact you can do that just off the plugin name you don't even need to pass anything in.
There was a problem hiding this comment.
You mean the decision is static per plugin?
addons/intel/validate.go
Outdated
| for _, p := range sysfsPins { | ||
| validSet[p] = true | ||
| } | ||
| for pinName := range pins { |
There was a problem hiding this comment.
As I said before returning all of them when you can just loop through the ones you want and check they exist is much simpler. Y
You're having too 3 loops for each device when it could just be one for each.
addons/intel/validate.go
Outdated
|
|
||
| // validatePinValues checks that pin values have the expected "<direction> <channel>" format | ||
| // where direction is 0-2 and channel is a non-negative integer | ||
| func validatePinValues(devicePins map[string]pinSet, pluginName string) []string { |
There was a problem hiding this comment.
Having all these checks in there own functions just means your looping over the same data over and over again.
addons/intel/validate.go
Outdated
| func validateInterconnections(inputs []PhaseInputs) []string { | ||
| var errs []string | ||
| validParts := validPartNames() | ||
| validConns := validConnectorNames() |
There was a problem hiding this comment.
I don't think this is correct. I don't think its caching the correct data structure you'd need. As I said before I'm pretty sure what a valid connector name is depends on the part.
You should get the part name from the input.
Look up that part from the hardware getting the valid connections for that.
Then from there get the valid connections for that part (cache here).
Finally verify the connections are valid.
There was a problem hiding this comment.
I updated that.
7185394 to
f27f8aa
Compare
Strip qualified prefix from profile names in logs and status conditions Add generic detection of HW plugin misconfiguration
…ks SMA pins On newer kernels (4.22+) SMA pins are not exposed via sysfs but are still configurable through DPLL. validatePinNames now accepts a strategy function (hasSysfsSMAPins) to decide per-device whether to check sysfs pins or DPLL board labels, mirroring the runtime pin-configuration path used by each plugin. Made-with: Cursor
…eportPluginStatus Remove indirection through mockable function variables (discoverSysfsPinsFunc, hasDPLLPinLabelFunc, hasSysfsSMAPinsFunc) in favor of direct filesystem and DPLL label lookups. Merge validatePinNames and validatePinValues into a single validateDevicePins pass. Narrow connector validation to be per-part instead of across all hardware specs. Extract inline plugin status reporting from daemon.go into reportPluginStatus() in ptpconfig_status.go, and stop misclassifying hardware config errors as plugin errors.
Overview
This PR adds CRD status condition reporting for hardware plugin misconfiguration in
PtpConfig. When the daemon detects invalid plugin names (typos) or plugin configuration errors during profile application, it now writes aHardwarePluginReadycondition to thePtpConfigstatus, making the failure visible to operators instead of silently continuing. On successful configuration, the condition is set toTrue.addressing CNF-16423
Impact
This is a bug fix / enhancement addressing a painful debugging scenario where typos in the hardware plugin configuration section of
PtpConfigcaused the daemon to silently fail hardware setup and continue running. The change introduces a new status condition on thePtpConfigCRD, which is a non-breaking API addition (additiveConditionsfield onPtpConfigStatus). Requires a corresponding CRD update inptp-operatorto include theConditionsfield in the schema.Files Changed
pkg/daemon/ptpconfig_status.goFindPtpConfigByProfileName(resolves owning PtpConfig by profile name) andUpdatePtpConfigCondition(sets/updates ametav1.Conditionon the PtpConfig status).pkg/daemon/daemon.goptpClientfield toDaemonstruct andNew()signature. Adds plugin name validation inapplyNodePtpProfilebefore plugin execution. Captures errors fromOnPTPConfigChangeand reports them to PtpConfig CRD status via conditions. Clears condition on success.pkg/daemon/plugin.goregisterPluginsto return(PluginManager, []string)-- the second value is a list of unrecognized plugin names.pkg/plugin/plugin.goOnPTPConfigChangeto return[]errorinstead of silently discarding plugin errors. Logs a warning per failed plugin.cmd/main.goptpClienttodaemon.New().pkg/daemon/daemon_internal_test.goNew()call sites and tworegisterPlugins()call sites to match the new signatures.vendor/.../ptpconfig_types.goConditions []metav1.Conditionfield toPtpConfigStatus(vendored copy of ptp-operator CRD type).Technical Implementation
Error Detection
nodeProfile.Pluginsagainstdn.pluginManager.Plugins(registered plugins) before invoking any plugin logic. Unknown names are flagged with descriptive errors.PluginManager.OnPTPConfigChangenow propagates errors returned by individual plugins (e.g., unmarshal failures, missing required fields).CRD Status Reporting
FindPtpConfigByProfileNamequeries the API server to dynamically resolve whichPtpConfigCR owns a given profile name. This works in all daemon modes (full controller, hybrid, and legacy) without requiring metadata injection from the controller.UpdatePtpConfigConditionperforms a read-modify-write on the PtpConfig status usingmeta.SetStatusConditionfor proper upsert semantics.HardwarePluginReady=Falsewith reasonHardwarePluginConfigErrorand a message listing all errors.HardwarePluginReady=Truewith reasonHardwarePluginConfigured.Dependency Wiring
ptpClient(typed PTP clientset) is threaded fromcmd/main.gothroughdaemon.New()into theDaemonstruct, enabling direct API access to PtpConfig resources.