Skip to content

Report mistakes in HW plugin configuration to the PtpConfig status#152

Open
gtannous-spec wants to merge 4 commits intok8snetworkplumbingwg:mainfrom
gtannous-spec:hwconfigs
Open

Report mistakes in HW plugin configuration to the PtpConfig status#152
gtannous-spec wants to merge 4 commits intok8snetworkplumbingwg:mainfrom
gtannous-spec:hwconfigs

Conversation

@gtannous-spec
Copy link
Copy Markdown
Collaborator

Overview

This PR adds CRD status condition reporting for hardware plugin misconfiguration in PtpConfig. When the daemon detects invalid plugin names (typos) or plugin configuration errors during profile application, it now writes a HardwarePluginReady condition to the PtpConfig status, making the failure visible to operators instead of silently continuing. On successful configuration, the condition is set to True.

addressing CNF-16423

Impact

This is a bug fix / enhancement addressing a painful debugging scenario where typos in the hardware plugin configuration section of PtpConfig caused the daemon to silently fail hardware setup and continue running. The change introduces a new status condition on the PtpConfig CRD, which is a non-breaking API addition (additive Conditions field on PtpConfigStatus). Requires a corresponding CRD update in ptp-operator to include the Conditions field in the schema.

Files Changed

File Changed Explanation
pkg/daemon/ptpconfig_status.go New file. Contains FindPtpConfigByProfileName (resolves owning PtpConfig by profile name) and UpdatePtpConfigCondition (sets/updates a metav1.Condition on the PtpConfig status).
pkg/daemon/daemon.go Adds ptpClient field to Daemon struct and New() signature. Adds plugin name validation in applyNodePtpProfile before plugin execution. Captures errors from OnPTPConfigChange and reports them to PtpConfig CRD status via conditions. Clears condition on success.
pkg/daemon/plugin.go Changes registerPlugins to return (PluginManager, []string) -- the second value is a list of unrecognized plugin names.
pkg/plugin/plugin.go Changes OnPTPConfigChange to return []error instead of silently discarding plugin errors. Logs a warning per failed plugin.
cmd/main.go Passes ptpClient to daemon.New().
pkg/daemon/daemon_internal_test.go Updates two New() call sites and two registerPlugins() call sites to match the new signatures.
vendor/.../ptpconfig_types.go Adds Conditions []metav1.Condition field to PtpConfigStatus (vendored copy of ptp-operator CRD type).

Technical Implementation

Error Detection

  • Validates plugin names in nodeProfile.Plugins against dn.pluginManager.Plugins (registered plugins) before invoking any plugin logic. Unknown names are flagged with descriptive errors.
  • PluginManager.OnPTPConfigChange now propagates errors returned by individual plugins (e.g., unmarshal failures, missing required fields).

CRD Status Reporting

  • FindPtpConfigByProfileName queries the API server to dynamically resolve which PtpConfig CR owns a given profile name. This works in all daemon modes (full controller, hybrid, and legacy) without requiring metadata injection from the controller.
  • UpdatePtpConfigCondition performs a read-modify-write on the PtpConfig status using meta.SetStatusCondition for proper upsert semantics.
  • On error: sets HardwarePluginReady=False with reason HardwarePluginConfigError and a message listing all errors.
  • On success: sets HardwarePluginReady=True with reason HardwarePluginConfigured.

Dependency Wiring

  • ptpClient (typed PTP clientset) is threaded from cmd/main.go through daemon.New() into the Daemon struct, enabling direct API access to PtpConfig resources.

@gtannous-spec gtannous-spec added the ok-to-test ok to test label Feb 16, 2026
@github-actions
Copy link
Copy Markdown

Thanks for your PR,
Best regards.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @gtannous-spec, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the observability and debugging experience for PtpConfig hardware plugin configurations. It addresses a critical issue where misconfigurations, such as typos in plugin names or internal plugin errors, would silently fail without providing feedback to operators. By introducing explicit status conditions on the PtpConfig CRD, the system now clearly communicates the health and configuration status of hardware plugins, making it easier to identify and resolve issues.

Highlights

  • Enhanced Error Reporting: The system now reports hardware plugin misconfigurations and errors directly to the PtpConfig status, preventing silent failures and providing clear feedback to operators.
  • Plugin Validation: Introduced validation for plugin names specified in PtpConfig profiles, identifying unknown or mistyped plugins before execution.
  • Status Conditions: Added a new HardwarePluginReady condition to the PtpConfig status, which is set to False on configuration errors and True upon successful configuration.
  • API Integration: Integrated a PTP client into the daemon, enabling direct querying and updating of PtpConfig resources for status reporting.
  • Error Propagation: Modified plugin interfaces to propagate errors from individual plugins, ensuring all issues encountered during plugin execution are captured and reported.
Changelog
  • cmd/main.go
    • Passed the new ptpClient to the daemon.New() function call.
  • go.mod
    • Updated the ptp-operator dependency version to a newer commit.
  • go.sum
    • Updated checksums for the ptp-operator dependency to reflect the version change.
  • pkg/daemon/daemon.go
    • Imported ptpclient and metav1 packages.
    • Added a ptpClient field to the Daemon struct.
    • Updated the New function signature to accept ptpClient.
    • Implemented plugin name validation within applyNodePtpProfile.
    • Added logic to collect plugin errors and report them to the PtpConfig status via conditions.
    • Included logic to clear the HardwarePluginReady condition on successful configuration.
  • pkg/daemon/daemon_internal_test.go
    • Updated calls to daemon.New() to include the new ptpClient parameter (passed as nil for tests).
    • Adjusted calls to registerPlugins() to handle its new return signature.
  • pkg/daemon/plugin.go
    • Modified the registerPlugins function to return a slice of unknownPlugins in addition to the PluginManager.
  • pkg/daemon/ptpconfig_status.go
    • Added a new file containing FindPtpConfigByProfileName to locate PtpConfig by profile name.
    • Added UpdatePtpConfigCondition to set or update metav1.Condition on PtpConfig status.
  • pkg/plugin/plugin.go
    • Changed the OnPTPConfigChange method in PluginManager to return a slice of errors, allowing plugins to report failures.
    • Added logging for plugin failures during OnPTPConfigChange.
  • vendor/github.com/k8snetworkplumbingwg/ptp-operator/api/v1/ptpconfig_types.go
    • Added a Conditions field of type []metav1.Condition to the PtpConfigStatus struct.
  • vendor/github.com/k8snetworkplumbingwg/ptp-operator/api/v1/zz_generated.deepcopy.go
    • Updated the DeepCopyInto method for PtpConfigStatus to correctly handle the new Conditions field.
  • vendor/modules.txt
    • Updated the vendored module entry for ptp-operator to reflect the new version.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances PtpConfig status reporting by introducing CRD status condition reporting for hardware plugin misconfigurations, which will significantly improve the debugging experience for operators. However, a critical security vulnerability has been identified: the implementation lacks proper validation for the Name field in the PtpProfile struct. Since this field is a pointer and comes from an external source (the PtpConfig CRD), dereferencing it without a nil check can lead to daemon panics and Denial of Service. Additionally, there are suggestions to improve maintainability and adhere to best practices, mainly around reducing code duplication and context propagation.

}
for _, cfg := range ptpConfigs.Items {
for _, p := range cfg.Spec.Profile {
if p.Name != nil && *p.Name == profileName {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed that a profile name is unique?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice the daemon likely expects profile name to be unique. But there is nothing that enforces it on the CRD level it seems.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that was supported by the webhook

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtannous-spec Can you double check the webhook. If there is link it here. If not we probably want to add that it if that what we're using to find it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we referring to "unique profiles" per one ptpconfig or same profile name across mutliple Ptpconfigs ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can force that condition under the namespace openshift-ptp

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[core@master-0 ~]$ oc get packagemanifest ptp-operator -n openshift-marketplace -o jsonpath='{.status.channels[0].currentCSVDesc.installModes}'
[{"supported":true,"type":"OwnNamespace"},{"supported":true,"type":"SingleNamespace"},{"supported":false,"type":"MultiNamespace"},{"supported":false,"type":"AllNamespaces"}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single namespace, own namespace

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PTP Operator controller watches PtpConfig CRs in the openshift-ptp namespace. Hardcoded

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure it was by definition and not an assumption :)

devices:
- enp108s0f0
e825:
enableDefaultConfig: false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats going on here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was auto-added when was fixing Merge conflicts, should fix it.

Copy link
Copy Markdown
Collaborator Author

@gtannous-spec gtannous-spec Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why now it was added..
The unit tests couldn't pass with it

@github-actions github-actions bot removed the ok-to-test ok to test label Mar 17, 2026
@gtannous-spec gtannous-spec added the ok-to-test ok to test label Mar 17, 2026
@github-actions github-actions bot removed the ok-to-test ok to test label Mar 17, 2026
@gtannous-spec gtannous-spec added ok-to-test ok to test and removed ok-to-test ok to test labels Mar 17, 2026
@github-actions github-actions bot removed the ok-to-test ok to test label Mar 18, 2026
@gtannous-spec gtannous-spec added the ok-to-test ok to test label Mar 18, 2026
@github-actions github-actions bot removed the ok-to-test ok to test label Mar 23, 2026
@gtannous-spec gtannous-spec force-pushed the hwconfigs branch 2 times, most recently from 0ef3ab0 to 7185394 Compare March 23, 2026 18:41
@gtannous-spec gtannous-spec added the ok-to-test ok to test label Mar 23, 2026
@nocturnalastro nocturnalastro added ok-to-test ok to test and removed ok-to-test ok to test labels Mar 24, 2026

if len(dn.unknownPlugins) > 0 {
pluginErrors = append(pluginErrors,
fmt.Errorf("unknown plugins specified (possible typo): %v", dn.unknownPlugins))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plugin is never used why throw an error?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean if the plugin is incorrect from the first place we should return error?
Because in this lines we're catching plugins that couldn't be registered at all in the daemon-level start.
not per profile tho

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its reported on the profile though. If we don't use that plugin that couldn't be registered then it doesn't matter right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good point.
Fixed it, such that if no profile is using that plugin then it doesn't get reported globally.

// sysfs pins or DPLL board labels. This mirrors each plugin's runtime behavior:
// - E810: passes hasSysfsSMAPins (sysfs when SMA1 exists, DPLL otherwise)
// - E825/E830: passes alwaysSysfs (always sysfs, matching pinConfig.applyPinSet)
func validatePinNames(devicePins map[string]pinSet, pluginName string, useSysfs func(string) bool) []string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simplify this with a book that defines if you call back to dpll or not?

In fact you can do that just off the plugin name you don't even need to pass anything in.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the decision is static per plugin?

for _, p := range sysfsPins {
validSet[p] = true
}
for pinName := range pins {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before returning all of them when you can just loop through the ones you want and check they exist is much simpler. Y

You're having too 3 loops for each device when it could just be one for each.


// validatePinValues checks that pin values have the expected "<direction> <channel>" format
// where direction is 0-2 and channel is a non-negative integer
func validatePinValues(devicePins map[string]pinSet, pluginName string) []string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having all these checks in there own functions just means your looping over the same data over and over again.

func validateInterconnections(inputs []PhaseInputs) []string {
var errs []string
validParts := validPartNames()
validConns := validConnectorNames()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. I don't think its caching the correct data structure you'd need. As I said before I'm pretty sure what a valid connector name is depends on the part.

You should get the part name from the input.
Look up that part from the hardware getting the valid connections for that.
Then from there get the valid connections for that part (cache here).
Finally verify the connections are valid.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated that.

Strip qualified prefix from profile names in logs and status conditions
Add generic detection of HW plugin misconfiguration
…ks SMA pins

On newer kernels (4.22+) SMA pins are not exposed via sysfs but are
still configurable through DPLL. validatePinNames now accepts a strategy
function (hasSysfsSMAPins) to decide per-device whether to check sysfs
pins or DPLL board labels, mirroring the runtime pin-configuration path
used by each plugin.

Made-with: Cursor
…eportPluginStatus

  Remove indirection through mockable function variables (discoverSysfsPinsFunc,
  hasDPLLPinLabelFunc, hasSysfsSMAPinsFunc) in favor of direct filesystem and DPLL
  label lookups. Merge validatePinNames and validatePinValues into a single
  validateDevicePins pass. Narrow connector validation to be per-part instead of
  across all hardware specs. Extract inline plugin status reporting from daemon.go
  into reportPluginStatus() in ptpconfig_status.go, and stop misclassifying
  hardware config errors as plugin errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test ok to test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants