-
Notifications
You must be signed in to change notification settings - Fork 128
Implement RDMA subsystem mode change #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
c986251 to
d3641ab
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
d3641ab to
a0af988
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
a0af988 to
a3ce3a3
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
a3ce3a3 to
3952e04
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
3952e04 to
bcb0804
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 11229466077Details
💛 - Coveralls |
pkg/systemd/systemd.go
Outdated
| newState.Spec.DpConfigVersion = "" | ||
|
|
||
| // shared mode is a default on OS | ||
| rdmaMode := consts.RdmaSubsystemModeShared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to query/change mode only in case if rdmaMode parameter is explicitly set in the poolConfig, to provide a safer behavior for ENVs which doesn't use RDMA.
| hostHelpers.TryEnableTun() | ||
| hostHelpers.TryEnableVhostNet() | ||
|
|
||
| rdmaSubsystem, err := hostHelpers.GetRDMASubsystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should execute this logic only if mode configuration is explicitly requested by a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| // +kubebuilder:validation:Enum=shared;exclusive | ||
| // RDMA subsystem. Allowed value "shared", "exclusive". | ||
| RdmaMode string `json:"rdmaMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is only valid for systemd mode?
Do we want to document this somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done as log message in a SriovNetworkPoolConfig controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e0ne cant we set this using module parameter ?
bcb0804 to
79013d7
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
79013d7 to
f60fdf7
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
ykulazhenkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added few additional comments
| if conf.RdmaMode != "" { | ||
| rdmaSubsystem, err := hostHelpers.GetRDMASubsystem() | ||
| if err != nil { | ||
| setupLog.Error(err, "failed to get RDMA subsystem mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If conf.RdmaMode is not empty string, then the user explicitly requested RDMA mode configuration. I think we can return error in this case. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if rdmaSubsystem != conf.RdmaMode { | ||
| err = hostHelpers.SetRDMASubsystem(conf.RdmaMode) | ||
| if err != nil { | ||
| setupLog.Error(err, "failed to set RDMA subsystem mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to return error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want, thanks!
pkg/host/internal/kernel/kernel.go
Outdated
|
|
||
| func (k *kernel) GetRDMASubsystem() (string, error) { | ||
| log.Log.Info("GetRDMASubsystem(): retrieving RDMA subsystem mode") | ||
| chrootDefinition := utils.GetChrootExtension() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have helper to enter chroot (part of utilsHelper). Do we want to use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd got the same implementation in all `kernel' methods. Let's do it in a scope of a separate PR
f60fdf7 to
a5f0d3b
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
feb4bd0 to
ab92392
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
ab92392 to
f37c6d1
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
|
@SchSeba could you please review this PR? |
f37c6d1 to
3d19033
Compare
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
pkg/utils/cluster.go
Outdated
| return AnnotateObject(ctx, node, key, value, c) | ||
| } | ||
|
|
||
| func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also im thinking we should have two functions:
- find node pool for node
- find nodes for node pool (with special handling for case where default node pool was provided)
WDYT ?
Also please add UT for whatever we end up with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
pkg/utils/cluster.go
Outdated
|
|
||
| var ( | ||
| oneNode = intstr.FromInt32(1) | ||
| defaultNpcl = &sriovnetworkv1.SriovNetworkPoolConfig{Spec: sriovnetworkv1.SriovNetworkPoolConfigSpec{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use full name here ? e.g defaultPoolConfig ?
also the 'l' at the end is not related
pkg/utils/cluster.go
Outdated
| return nil, nil, err | ||
| } | ||
|
|
||
| // list all the nodes that are also part of this pool and return them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for those nodes why arent we validating they match exactly one ncp ? like in L223
pkg/daemon/daemon.go
Outdated
| if vars.UsingSystemdMode { | ||
| log.Log.V(0).Info("nodeStateSyncHandler(): writing systemd config file to host") | ||
| // get node object | ||
| node := &corev1.Node{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont see node is being used in this scope.
| mountPath: /host/etc/os-release | ||
| readOnly: true | ||
| {{- end }} | ||
| {{- if .RDMACNIImage }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi please rebase this PR now that we merged the rdma-cni deployment
| hostHelpers.TryEnableTun() | ||
| hostHelpers.TryEnableVhostNet() | ||
|
|
||
| if conf.Spec.System.RdmaMode != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this one as we do the configure via the modeprobe file
| } | ||
|
|
||
| type System struct { | ||
| RdmaMode string `json:"rdmaMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can add here also
// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| ns.Name = node.Name | ||
| ns.Namespace = vars.Namespace | ||
| j, _ := json.Marshal(ns) | ||
| netPoolConfig, _, err := utils.FindNodePoolConfig(context.Background(), &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the context from the function don't create a new one
| } | ||
|
|
||
| // RdmaMode could be set in systemd mode only | ||
| if instance.Spec.RdmaMode != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this one as we support this on both modes
pkg/host/internal/kernel/kernel.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func (k *kernel) DiscoverRDMASubsystem() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this function to the network or sriov package
pkg/host/internal/kernel/kernel.go
Outdated
|
|
||
| func (k *kernel) DiscoverRDMASubsystem() (string, error) { | ||
| log.Log.Info("DiscoverRDMASubsystem(): retrieving RDMA subsystem mode") | ||
| subsystem, err := netlink.RdmaSystemGetNetnsMode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the netlink interface in the project so we can have a mock for it on unit tests
pkg/host/internal/kernel/kernel.go
Outdated
| return subsystem, nil | ||
| } | ||
|
|
||
| func (k *kernel) SetRDMASubsystem(mode string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is no needed now that we use the modprobe file
pkg/utils/cluster.go
Outdated
| return AnnotateObject(ctx, node, key, value, c) | ||
| } | ||
|
|
||
| func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
e0ae318 to
8ce9b72
Compare
|
Hi @e0ne can you please rebase the PR? |
done |
e294415 to
288e028
Compare
SchSeba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
I left some small comments
controllers/drain_controller.go
Outdated
| } | ||
| return defaultNpcl, defaultNodeLists, nil | ||
| } | ||
| return utils.FindNodePoolConfig(ctx, node, dr.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put this in the helper of the controllers?
I don't want to utils to start growing again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense. done
| ns.Name = node.Name | ||
| ns.Namespace = vars.Namespace | ||
| j, _ := json.Marshal(ns) | ||
| netPoolConfig, _, err := utils.FindNodePoolConfig(ctx, &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a general todo here we should have in memory map I think for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please elaborate on this?
pkg/host/internal/network/network.go
Outdated
| "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/vars" | ||
| ) | ||
|
|
||
| var ManifestsPath = "./bindata/manifests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets put this in consts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed anymore, so I deleted it
pkg/host/internal/network/network.go
Outdated
| modeValue = 0 | ||
| } | ||
| config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) | ||
| err := os.WriteFile("/etc/modprobe.d/ib_core.conf", []byte(config), 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the getExtention here so we know if we are not inside a chroot
pkg/host/internal/network/network.go
Outdated
| return fmt.Errorf("failed to write ib_core config: %v", err) | ||
| } | ||
|
|
||
| err = os.WriteFile(path.Join(consts.Chroot, "/etc/modprobe.d/ib_core.conf"), []byte(config), 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a duplicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase issue, it's deleted now
pkg/render/render.go
Outdated
| return out, nil | ||
| } | ||
|
|
||
| func RenderToString(path string, d *RenderData) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this function where we use it?
pkg/utils/cluster.go
Outdated
| return AnnotateObject(ctx, node, key, value, c) | ||
| } | ||
|
|
||
| func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this function to the helpers in controllers better then adding more stuff to utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
288e028 to
596e73d
Compare
596e73d to
4e5c92b
Compare
4e5c92b to
60432e0
Compare
| ns.Name = node.Name | ||
| ns.Namespace = vars.Namespace | ||
| j, _ := json.Marshal(ns) | ||
| netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this b4 L274 so j contains rdmamode information ?
| j, _ := json.Marshal(ns) | ||
| netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client) | ||
| if err != nil { | ||
| log.Log.Error(err, "nodeStateSyncHandler(): failed to get SriovNetworkPoolConfig for the current node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: err msg func name is wrong
| // IsLinkAdminStateUp checks if the admin state of a link is up | ||
| IsLinkAdminStateUp(link Link) bool | ||
| // DiscoverRDMASubsystem returns RDMA subsystem mode | ||
| DiscoverRDMASubsystem() (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any chance to stick to the method name from netlink lib ?(RdmaSystemGetNetnsMode)
| return subsystem, nil | ||
| } | ||
|
|
||
| func (n *network) SetRDMASubsystem(mode string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the distinction between: (?)
- mode is "shared"
- mode is "exclusive"
- mode is unspecified (i.e "") which means system default
the latter would mean we need to delete the file.
changing the default value in kernel is a matter of one line change:
https://github.com/torvalds/linux/blob/d3d1556696c1a993eec54ac585fe5bf677e07474/drivers/infiniband/core/device.c#L127
| modeValue = 0 | ||
| } | ||
| config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) | ||
| path := filepath.Join(vars.FilesystemRoot, consts.Host, "etc", "modprobe.d", "ib_core.conf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use a more unique name ? e.g sriov_network_operator_modules_config.conf
ib_core.conf feels like a file that might exists with some values that we override when re-writing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i wonder if we should search all conf files and see if the value is already there or we have a conflict and log it.
generally we dont expect this module parameter to be specified in the system.
| if mode == "exclusive" { | ||
| modeValue = 0 | ||
| } | ||
| config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps add some comment to the beginning of the file like
# This file is managed by sriov-network-operator do not edit.
| reqReboot = reqReboot || r | ||
| } | ||
|
|
||
| if dn.currentNodeState.Status.System.RdmaMode != dn.desiredNodeState.Spec.System.RdmaMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to handle the case when dn.desiredNodeState.Spec.System.RdmaMode is empty (system default)
in this case need to delete the file if its present and decide if reboot is needed depending on the current kernel default.
root# modinfo ib_core
filename: /lib/modules/5.15.0-121-generic/kernel/drivers/infiniband/core/ib_core.ko
alias: rdma-netlink-subsys-4
license: Dual BSD/GPL
description: core kernel InfiniBand API
author: Roland Dreier
alias: net-pf-16-proto-20
alias: rdma-netlink-subsys-5
srcversion: C45D89EC6DCCFE96001D79F
depends:
retpoline: Y
intree: Y
name: ib_core
vermagic: 5.15.0-121-generic SMP mod_unload modversions
sig_id: PKCS#7
signer: Build time autogenerated kernel key
sig_key: 5E:7B:57:CA:17:D7:74:58:75:3F:84:AD:DE:07:46:5C:DC:AD:16:4E
sig_hashalgo: sha512
signature: AE:90:AA:07:BB:6C:07:8C:AD:25:51:4B:1A:C6:FC:9F:D1:14:5B:B9:
90:F0:F5:84:E6:85:10:7E:AD:79:B5:04:5E:38:CF:5F:EC:6C:CD:BD:
E5:BD:4D:4A:5D:7F:76:56:5E:DA:F0:C3:EA:63:98:0A:EE:B8:51:06:
42:8F:FD:08:51:28:DC:AD:4A:38:2E:A4:C4:7C:9E:42:4F:37:98:AD:
4D:8F:7F:5C:5C:41:93:27:62:C2:A1:D8:A0:5E:D5:15:25:5A:B9:C6:
8C:4D:17:CC:1F:A1:72:FE:18:5C:08:55:64:E6:A2:A7:2C:DD:57:1D:
03:A1:8C:12:17:76:61:72:E7:F9:A4:8F:F9:26:8F:36:02:8F:C6:56:
7B:A4:9E:6D:1D:ED:28:0E:7A:B5:81:F2:F0:FC:C4:05:0F:37:44:D3:
C6:F4:00:B9:81:E2:32:EB:9B:1B:8E:EF:E5:CA:73:8F:4D:5E:11:80:
51:80:EB:AD:EC:97:2D:30:15:E9:8F:6B:9B:DB:40:5F:89:99:94:B1:
01:16:82:EF:22:01:5A:0F:14:F2:DE:64:68:76:3F:8B:26:F5:E9:97:
E3:7F:DD:23:18:B2:A6:8F:8F:0F:A2:74:E1:B0:18:9F:E0:46:9F:7A:
BE:89:9C:B7:C6:D4:47:64:70:E9:28:69:DC:A1:B0:F9:CB:A3:84:67:
DF:68:A3:3D:E5:93:63:7D:91:A4:86:A9:CC:AA:DA:08:A8:64:97:D5:
CC:BB:13:BB:28:17:87:1B:10:1B:2C:43:A6:0D:A0:05:6F:DB:45:03:
1C:0B:C5:67:37:94:CB:E3:CB:CF:03:6F:81:80:F2:77:E1:FD:09:2A:
8F:0F:FE:EA:C0:B8:CD:14:D2:69:55:0F:2F:82:3D:2D:30:0B:6E:72:
42:0C:F4:AB:6C:F8:D4:CA:45:AF:74:C9:A1:5D:EC:BE:C6:8C:81:4B:
2F:F4:46:EE:F6:28:83:11:B5:0D:EE:38:53:68:EF:1E:AC:AC:A9:B0:
91:C6:76:D4:46:2E:DA:CB:47:66:99:42:84:E2:31:99:35:C2:A5:4B:
04:F8:6A:34:E7:8A:AA:76:F3:83:DF:A8:82:E9:C8:14:05:51:90:F3:
18:31:3D:A7:40:F8:EE:32:B9:F7:C2:01:9F:71:2A:B1:8C:00:34:0F:
F2:7C:DE:50:54:E3:CF:4B:EA:05:43:AF:E3:9D:A1:05:E6:A8:48:EE:
82:B7:6B:06:E3:C5:3D:AA:48:92:63:D8:7B:54:3E:F4:45:C7:5B:F6:
77:97:DD:32:93:ED:AC:DB:AD:EB:24:81:89:24:4F:25:A8:34:EA:63:
A1:D4:FC:D8:B2:B2:41:61:C3:D3:E3:F5
parm: send_queue_size:Size of send queue in number of work requests (int)
parm: recv_queue_size:Size of receive queue in number of work requests (int)
parm: netns_mode:Share device among net namespaces; default=1 (shared) (bool)
parm: force_mr:Force usage of MRs for RDMA READ/WRITE operations (bool)
maybe parse the cmd above for:
parm: netns_mode:Share device among net namespaces; default=1 (shared) (bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now the default is shared, i dont know how likely its to change. maybe we can assume shared is the default.
SchSeba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general another point we need to take care here is when we create/update a pool it will not do anything we need to wait for the nodePolicy controller to apply this.
we have two options here:
- we pass a channel so the pool controller can trigger a policy one
- we handle the system section in the pool controller directly
I must say I am not sure what is the best option @adrianchiris @zeeke WDYT?
|
/hold this doesn't work on OCP it puts the node in boot loop because the mode didn't change. |
we can watch on pool obj as well and trigger reconcile event. |
writing files under |
sure that also can work on we don't expect machine changes on it |
I checked with our kernel team on OCP platform. @e0ne let me know if you want me to work on this and push the changes for you |
|
closing this one in favor or #799 |
Now it's possible to configure RDMA subsystem mode using SR-IOV Network Operator in systemd mode.
We can't configure RDMA subsystem in a daemon mode because it should be done on host before any network namespace is created.