Skip to content

Support L3 connectivity on vlan sub-interfaces#229

Merged
rrajendran17 merged 1 commit intoharvester:masterfrom
rrajendran17:ipconnectivity
Mar 4, 2026
Merged

Support L3 connectivity on vlan sub-interfaces#229
rrajendran17 merged 1 commit intoharvester:masterfrom
rrajendran17:ipconnectivity

Conversation

Copilot AI review requested due to automatic review settings January 14, 2026 22:23
@rrajendran17 rrajendran17 requested a review from a team January 14, 2026 22:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for L3 connectivity on VLAN sub-interfaces by introducing a new HostNetworkConfig resource and related validation logic.

Changes:

  • Introduces new HostNetworkConfig validator for managing L3 connectivity on VLAN sub-interfaces
  • Adds validation checks to prevent deletion/updates when overlay VMs are using the cluster network
  • Updates existing validators (VlanConfig, NAD) to include new HostNetworkConfig and VirtualMachine cache dependencies
  • Adds utility functions for managing VLAN sub-interfaces and NAD getters
  • Extensive generated code updates for client/controller integration

Reviewed changes

Copilot reviewed 25 out of 102 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/webhook/hostnetworkconfig/validator.go New validator for HostNetworkConfig resource with create/update/delete validation
pkg/webhook/vlanconfig/validator.go Added overlay VM checks and new cache dependencies
pkg/webhook/nad/validator.go Added overlay VM checks and new cache dependencies
pkg/utils/nad.go Added GetVlanID() and ListAllNads() utility functions
pkg/utils/bridge.go Added functions for VLAN device management
pkg/network/iface/vlan.go New VLAN sub-interface management functions
pkg/network/vlan/vlan.go Added GetBridgelink() method
pkg/utils/fakeclients/hostnetworkconfig.go Fake client for testing HostNetworkConfig
pkg/generated/** Generated client and controller code for HostNetworkConfig
Comments suppressed due to low confidence (1)

pkg/webhook/hostnetworkconfig/validator.go:1

  • The logic appears inverted. The check should verify if overlay VMs exist when the hostnetworkconfig has Underlay enabled, but the condition on line 362 checks if Underlay is true, then checks for overlay NADs. This seems backwards - if underlay is true, we should be checking for overlay dependencies. The condition should likely be !hostnetworkconfig.Spec.Underlay or the comment on line 242 should be updated to clarify the intended logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return nil
}
return c.restClient
}
Copy link
Contributor Author

@rrajendran17 rrajendran17 Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/generated/clientset/versioned/typed/v1/client.go - This was file was generated as _client.go due to which the defintions were not recognized outside this file, so I had to rename this file from _client.go to client.go.

Ideally for all other existing generated files, pkg/generated/clientset/versioned/typed//v1/<compponent-name_client.go" is the format, but for some reasons, the latest path generated for Node with recent k8s.io is pkg/generated/clientset/versioned/typed/v1 due to which the client file started with _client.go (empty component name). Not sure if this is a bug from a k8s.io client-go.

@w13915984028 w13915984028 self-requested a review January 22, 2026 12:54
@mergify
Copy link

mergify bot commented Jan 22, 2026

This pull request is now in conflict. Could you fix it @rrajendran17? 🙏

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, review is still ongoing, thanks.

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

@w13915984028 w13915984028 requested a review from a team January 23, 2026 08:46
@mergify
Copy link

mergify bot commented Jan 28, 2026

This pull request is now in conflict. Could you fix it @rrajendran17? 🙏

@rrajendran17
Copy link
Contributor Author

Thanks @w13915984028 for the review. I have updated the PR addressing review comments.

@w13915984028
Copy link
Member

w13915984028 commented Jan 30, 2026

Another topic which could be tracked on a new issue/pr: The handling of MTU change

Currently, if user plans to change MTU, all VMs must be stopped, and then the new MTU will be propagated to VMs when VM start again.

L3 subinterface & controller, need to consider:

(1) It inherites the current MTU from it's parent cluster-network

(2) It detects this change and updates the new MTU to the managed subinterfaces

(3) When MTU is changed, should L3 workloads be stopped or not? esp. the underlay case, when e.g. cn2-br.2026 MTU is changed from 1500 to 9000, could the OVN detect this automatically and adapt to it?
(The VMs on top of OVN, will still be forced to stop due to the NAD checks).
This depends on some real tests, the worst case is to disable ovn and enable again. For third-party usage, we need to mention this on document.

thanks.

@rrajendran17
Copy link
Contributor Author

Another topic which could be tracked on a new issue/pr: The handling of MTU change

Currently, if user plans to change MTU, all VMs must be stopped, and then the new MTU will be propagated to VMs when VM start again.

L3 subinterface & controller, need to consider:

(1) It inherites the current MTU from it's parent cluster-network

(2) It detects this change and updates the new MTU to the managed subinterfaces

(3) When MTU is changed, should L3 workloads be stopped or not? esp. the underlay case, when e.g. cn2-br.2026 MTU is changed from 1500 to 9000, could the OVN detect this automatically and adapt to it? (The VMs on top of OVN, will still be forced to stop due to the NAD checks). This depends on some real tests, the worst case is to disable ovn and enable again. For third-party usage, we need to mention this on document.

thanks.

Thanks for bringing this out.
Yes the hostnetworkconfig controller should handle the MTU changes and update the vlan sub interface.
In non-overlay networks, the mtu update on the NAD from associated cluster network propagates the new MTU to vm guest os interface and the veth interface on the host.
But In case of overlay networks, there is no associated cluster network and all logical ports of vm connect to the ovs bridge.
The underlay with the new MTU should continue to handle the traffic with new MTU like any other physical interface handling the packets.I do not see any challenges there.
But the challenge is on the vm side, how the new MTU will be populated to the overlay vm inside guest os.
I will create a separate task to test this feasibility and add changes accordingly.

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, thanks.

// check if any overlay vm exists for the cluster network used by the nad as underlay
func (v *Validator) checkOverlayVMsUsingClusterNetwork(nad *cniv1.NetworkAttachmentDefinition, vlanID int) error {
clusterNetwork := utils.GetNadLabel(nad, utils.KeyClusterNetworkLabel)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from below context, it looks:

this input nad (being deleted) must be a IsVlanAccessMode NAD first? if it is true, please check it/or check vid == 0 first and return early, the check does not care overlay type nad

func (nc *NetConf) GetVlanID() int {
	if nc.IsVlanAccessMode() {
		return nc.Vlan
	}
	return 0
}


if hostnetworkconfig.Spec.Underlay && int(hostnetworkconfig.Spec.VlanID) == vlanID {
if err := v.checkifVMExistsForOverlayNADs(); err != nil {
return fmt.Errorf("hostnetworkconfig %s uses overlay nads on cluster network %s, %w", hostnetworkconfig.Name, clusterNetwork, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostnetworkconfig %s is underlay and is being used by overlay nad ...

}

//no more than one underlay exists for overlay networks. If user has to change underlay setting, old underlay has to be disabled first.
if hostnetworkconfig.Spec.Underlay && newhnc.Spec.Underlay {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems need to add below to L251, avoid check upon self in update case

		if hostnetworkconfig.Name == newhnc.Name {
			continue
		}


//check if vlanconfig contains all the nodes in the cluster
for _, node := range nodes {
if !matchedNodes.Contains(node.Name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip if node.DeletionTimestamp != nil ?

@mergify
Copy link

mergify bot commented Feb 27, 2026

This pull request is now in conflict. Could you fix it @rrajendran17? 🙏

@rrajendran17 rrajendran17 force-pushed the ipconnectivity branch 3 times, most recently from d45f10a to 5741595 Compare March 2, 2026 03:17
@rrajendran17 rrajendran17 requested a review from a team March 2, 2026 16:48
@rrajendran17
Copy link
Contributor Author

rrajendran17 commented Mar 2, 2026

@ibrokethecloud I have the addressed the following comments from our last discussion.Please have a look.Thanks

1.Use status as subresource
2.Use Patch command to update the status of each node in the hostnetworkconfig to avoid conflicts
3.Include a node selector in the hostnetworkconfig api for users to selectively configure the vlan sub interface on nodes.

Regarding 3,
case 1: Update node selector on hostnetworkconfig, so network controller agent adds or removes an interface based on its node selector labels.
case 2: Label a node and when a hostnetworkconfig exists with matching label, network controller agent adds/removes vlan interfaces on the node.

charts changes: harvester/charts#456

Note: I will create a separate task to handle node label updates which causes removal of vlan sub interfaces on a node which is being used as underlay.We need node webhook validations to restrict this and must be handled in harvester repo.

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last questions, thanks.

if intfExists {
return h.removeHostNetworkInterface(hnc, true)
} else {
return hnc, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the removeHostNetworkInterface can fail on last step removeHostNetworkPerNodeStatus, and when reconciller runs again, due to vlan.GetVlan(hnc.Spec.ClusterNetwork) returns non-existing, it has no chance to call removeHostNetworkPerNodeStatus again

removeHostNetworkInterface

  v, err := vlan.GetVlan(hnc.Spec.ClusterNetwork)
  ...
  h.removeHostNetworkPerNodeStatus

could consider to optimize it as:

if !matchNodeSet {
		if intfExists {
			return h.removeHostNetworkInterface(hnc, true)
		}
		
		// always ensure the status is cleaned
		err := h.removeHostNetworkPerNodeStatus(hnc)
		if err != nil {
			return nil, err
		}
		return hnc, nil
}

return "", fmt.Errorf("no matching IP found for node %s", nodeName)
}
func (h *Handler) removeHostNetworkPerNodeStatus(hnc *networkv1.HostNetworkConfig) error {
hnCopy := hnc.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add below before L286,

if  hnc.Status.NodeStatus == nil || hnc.Status.NodeStatus[nodeName] == nil {
	return nil
}

it will make removeHostNetworkPerNodeStatus be idempotency

and can be freely called on OnChange


//update nodestatus when interface deleted due to node selector changes.
if onChange {
return nil, h.removeHostNetworkPerNodeStatus(hnc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return nil, h.removeHostNetworkPerNodeStatus(hnc) be?

	err := h.removeHostNetworkPerNodeStatus(hnc)
	if err != nil {
		return nil, err
	}
	return hnc, nil

}

//vlan interface chosen as underlay must have vlanconfig spanning all nodes
if err := v.checkVCSpansAllNodes(newhnc.Spec.ClusterNetwork); err != nil {
Copy link
Member

@w13915984028 w13915984028 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recalled one scenario:

besides mgmt, all other cluster-network can skip the witness node (the witness node normally only has mgmt related nic)

@w13915984028 w13915984028 requested a review from a team March 2, 2026 21:36
cnClient ctlnetworkv1.ClusterNetworkClient
cnCache ctlnetworkv1.ClusterNetworkCache
cnController ctlnetworkv1.ClusterNetworkController
hostNetworkConfigClient ctlnetworkv1.HostNetworkConfigClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client does not seem to be used anywhere

}

//reconcile hostnetworkconfig to stop DHCP lease managers associated with the removed uplink
if err := h.reconcileHostNetwork(vs.Status.ClusterNetwork); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only aim of reconcileHostNetwork seems to be to trigger requeue of HostNetworkConfig when vlan config changes.

We can move all this directly toe hostnetworkconfig controller in the agent, and leverage the relatedresource handler from wrangler

https://github.com/harvester/harvester/blob/master/pkg/controller/master/addon/addon.go#L61

This will ensure all changes for hostnetworkconfig exist in that controller only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, we should make sure that the vlanconfig has removed the cluster network uplink successfully before calling hostnetworkconfig to cleanup associated resources during remove and during update/add of vlanconfig we should make sure that the cluster network uplink is available before calling hostnetworkconfig controller to create sub interfaces on top of it.
Would it be better if we enqueue hostnetworkconfig controller from vlanconfig in that case ?

}

// node selector matches and host network interface already exists, skip processing
if intfExists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a scenario the link is setup but the HostNetworkConfig status update fails, the object will be requeued, and interface will be found and contorller will exit without actually updating the HostNetworkConfig status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that scenario is possible.I will add code to update the hostnetworkconfig status even if the interface exists.

w13915984028
w13915984028 previously approved these changes Mar 3, 2026
Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your big efforts.

@mergify
Copy link

mergify bot commented Mar 4, 2026

This pull request is now in conflict. Could you fix it @rrajendran17? 🙏

oldhnc := oldObj.(*networkv1.HostNetworkConfig)
newhnc := newObj.(*networkv1.HostNetworkConfig)

// ignore the update if the resource is being deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed as the Delete handler will get the deletion requests

vmiCache ctlkubevirtv1.VirtualMachineInstanceCache,
cnCache ctlnetworkv1.ClusterNetworkCache) *Validator {
cnCache ctlnetworkv1.ClusterNetworkCache,
hncCache ctlnetworkv1.HostNetworkConfigCache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hncCache and vmCache seem unused

ibrokethecloud
ibrokethecloud previously approved these changes Mar 4, 2026
Copy link
Contributor

@ibrokethecloud ibrokethecloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks for the PR.

Copy link
Member

@w13915984028 w13915984028 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Signed-off-by: Renuka Devi Rajendran <renuka.rajendran@suse.com>
@rrajendran17 rrajendran17 merged commit eb271b2 into harvester:master Mar 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants