Skip to content

Conversation

@SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Oct 10, 2024

No description provided.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the daemon_redesign branch 3 times, most recently from 9f5207e to 7d4e946 Compare October 10, 2024 10:23
@coveralls
Copy link

coveralls commented Oct 10, 2024

Pull Request Test Coverage Report for Build 13526205039

Details

  • 273 of 615 (44.39%) changed or added relevant lines in 18 files are covered.
  • 199 unchanged lines in 12 files lost coverage.
  • Overall coverage increased (+0.9%) to 48.882%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/sriovnetworknodepolicy_controller.go 1 2 50.0%
pkg/host/internal/network/network.go 0 1 0.0%
pkg/host/store/store.go 0 1 0.0%
pkg/plugins/mellanox/mellanox_plugin.go 0 1 0.0%
pkg/utils/shutdown.go 0 1 0.0%
controllers/drain_controller_helper.go 1 3 33.33%
controllers/drain_controller.go 4 7 57.14%
pkg/systemd/systemd.go 2 7 28.57%
pkg/daemon/status.go 59 96 61.46%
cmd/sriov-network-config-daemon/start.go 7 134 5.22%
Files with Coverage Reduction New Missed Lines %
cmd/sriov-network-config-daemon/start.go 2 8.94%
pkg/client/clientset/versioned/fake/clientset_generated.go 5 46.15%
pkg/client/informers/externalversions/sriovnetwork/interface.go 6 0.0%
pkg/plugins/fake/fake_plugin.go 6 20.0%
pkg/client/informers/externalversions/sriovnetwork/v1/interface.go 9 0.0%
pkg/client/clientset/versioned/typed/sriovnetwork/v1/fake/fake_sriovoperatorconfig.go 15 10.81%
pkg/client/clientset/versioned/typed/sriovnetwork/v1/fake/fake_sriovnetworknodestate.go 17 25.68%
pkg/client/informers/externalversions/sriovnetwork/v1/sriovnetworknodestate.go 19 0.0%
pkg/client/informers/externalversions/sriovnetwork/v1/sriovoperatorconfig.go 19 0.0%
api/v1/helper.go 21 73.32%
Totals Coverage Status
Change from base Build 13522938942: 0.9%
Covered Lines: 7324
Relevant Lines: 14983

💛 - Coveralls

@SchSeba SchSeba force-pushed the daemon_redesign branch 2 times, most recently from e17b584 to ddc70f2 Compare October 10, 2024 12:45
@SchSeba SchSeba force-pushed the daemon_redesign branch 2 times, most recently from 929886d to 9377404 Compare December 12, 2024 11:54
@SchSeba SchSeba force-pushed the daemon_redesign branch 4 times, most recently from 79cc69b to dc6eb11 Compare December 16, 2024 14:03
@SchSeba SchSeba changed the title [WIP] Daemon redesign Daemon redesign - using controller-runtime Dec 26, 2024
Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments. This PR hugely improves the config-daemon shape.

c := current.Status.DeepCopy().Interfaces
d := desiredNodeState.Status.DeepCopy().Interfaces
for idx := range d {
// check if it's a new device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if it's a new device
if idx >= len(c) {
return true, nil
}
// check if it's a new device

I don't know if it can ever happen (maybe if a new device is hot-plugged between reconcile loops). But it's safer to check for slice length

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in theory (not sure if it's possible) you can have a new device like hot-unplug and after that hot-plug so the len will be the same but the device is not no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in both case, the if idx >= len(c) { can save a nil dereferencing. right?

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

predicate.Funcs
}

func (DrainStateAnnotationPredicate) Create(e event.CreateEvent) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about removing the Create fn ?

the default of predicate.Funcs is true.

is object being nil a real concern here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concern is not nil. without this when a new node is added to the cluster the reconcile will not get called without this.
and if the reconcile is not created no one adds the labels to make the drain possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this when a new node is added to the cluster the reconcile will not get called without this

are you sure ? i thought by default it will call reconcile since predicate.Funcs contains implementation that will return true. which means to enqueue event for reconcile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give it a try and I needed to add it. without it I was not able to get the create calls :(

I can give it another try

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this ended up as needed ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep we need this one at least from my tests

log.Log.Error(err, "nodeStateSyncHandler(): Failed to fetch node state", "name", vars.NodeName)
return err
reqLogger.Error(err, "failed to check systemd status unexpected error")
return ctrl.Result{}, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont want to fail here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return an error it will get stuck in a loop as if the result for the systemd operator return an error we don't retry to not put the node into a bootloop

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

should we update status with "unexpected error" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right nice catch!

@adrianchiris
Copy link
Collaborator

once we get this in, i think we should move the node controllers under controllers/node/ pkg

@SchSeba SchSeba force-pushed the daemon_redesign branch 4 times, most recently from 80564fa to 00587fb Compare February 3, 2025 08:34
@SchSeba
Copy link
Collaborator Author

SchSeba commented Feb 3, 2025

Hi @zeeke @adrianchiris @ykulazhenkov can you please make another round of review here?

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great !

added some small comments. once we resolve them we can merge !

@SchSeba SchSeba force-pushed the daemon_redesign branch 2 times, most recently from 174365e to 8768925 Compare February 19, 2025 14:04
Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@e0ne
Copy link
Collaborator

e0ne commented Feb 20, 2025

LGTM. it could be merged once CI pass


loadedPlugins map[string]plugin.VendorPlugin
lastAppliedGeneration int64
disableDrain bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we never set this value.

return ctrl.Result{}, err
}

// update log level
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure that all these global variables are initialized before the NodeReconciler starts.
Currently, FeatureGates are initialized, but not the DisableDrain flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! done

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

can be merged once CI pass and comments from @ykulazhenkov addressed.

@SchSeba SchSeba force-pushed the daemon_redesign branch 2 times, most recently from 2f979de to d9f11df Compare February 24, 2025 14:26
Signed-off-by: Sebastian Sch <[email protected]>
It's only a one time run script so it should not be a problem to run
with debug logs.

Signed-off-by: Sebastian Sch <[email protected]>
Signed-off-by: Sebastian Sch <[email protected]>
@SchSeba
Copy link
Collaborator Author

SchSeba commented Feb 26, 2025

Thanks for all the help folks!

merging

@SchSeba SchSeba merged commit fdc7247 into k8snetworkplumbingwg:master Feb 26, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants