Skip to content

Conversation

@Jaganathancse
Copy link
Contributor

@Jaganathancse Jaganathancse commented Mar 19, 2025

This change is to update default os-net-config nmstate provider for network configuration.
jira: https://issues.redhat.com/browse/OSPRH-6809

@openshift-ci openshift-ci bot requested review from frenzyfriday and viroel March 19, 2025 09:00
@Jaganathancse Jaganathancse requested review from dsneddon, karthiksundaravel and vcandapp and removed request for frenzyfriday and viroel March 19, 2025 09:00
Copy link

@karthiksundaravel karthiksundaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@ekuris-redhat
Copy link

lgtm

Copy link
Contributor

@jpodivin jpodivin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change looks fine, but could you please add a link to JIRA item in your PR message?

@rabi
Copy link
Contributor

rabi commented Mar 19, 2025

We probably have to be careful. Are all network configurations used by customers going to work with the provider switch? What's the kind of testing we've done for it. CI would test only one sample configuration.

@vcandapp
Copy link
Contributor

We probably have to be careful. Are all network configurations used by customers going to work with the provider switch? What's the kind of testing we've done for it. CI would test only one sample configuration.

@rabi , we have an EPIC https://issues.redhat.com/browse/OSPRH-8768 which is under QE verification for now.
For FR2 atleast the plan is to enable this in only nodeset

@ekuris-redhat
Copy link

the PR is related to this Jira story : https://issues.redhat.com/browse/OSPRH-6809

@Jaganathancse
Copy link
Contributor Author

Code change looks fine, but could you please add a link to JIRA item in your PR message?

done

@Jaganathancse Jaganathancse requested a review from jpodivin March 19, 2025 11:42
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/49f28c4318cf47be931c0dc2796e3f40

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 29m 32s
podified-multinode-edpm-deployment-crc RETRY_LIMIT Host unreachable in 1h 12m 02s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 34m 44s
✔️ noop SUCCESS in 0s
edpm-ansible-tempest-multinode RETRY_LIMIT Host unreachable in 1h 36m 59s
adoption-standalone-to-crc-ceph-provider FAILURE in 3h 10m 54s

@slagle
Copy link
Contributor

slagle commented Mar 19, 2025

shouldn't there be some type of migration to stop/disable the network service if we are going to change the default? If a deployment was done with network-scripts originally, then we change the default, then the network settings need to be changed, it would then be done with nmstate. But isn't the old config put down with network-scripts still there that could conflict with the new settings?

@rabi
Copy link
Contributor

rabi commented Mar 20, 2025

shouldn't there be some type of migration to stop/disable the network service if we are going to change the default?

Yeah, I think for already configured nodes there won't be any change unless edpm_network_config_update is set to true during update/upgrade/subsequent deploy. But that would lead to some nodes configured with network-scripts and some with nmstate/NetworkManager. Not sure if that's something we want to get into. Also we should cleanup interfaces configured with network-scripts in https://github.com/os-net-config/os-net-config/blob/master/os_net_config/impl_nmstate.py#L2354 when provider is switched. I would suggest we don't change the default in a hurry without looking at and testing all scenarios.

@Jaganathancse
Copy link
Contributor Author

shouldn't there be some type of migration to stop/disable the network service if we are going to change the default? If a deployment was done with network-scripts originally, then we change the default, then the network settings need to be changed, it would then be done with nmstate. But isn't the old config put down with network-scripts still there that could conflict with the new settings?

We are working on ifcfg provider to nmstate provider migration automation and work is inprogress.
For greenfield deployment, we are planning to make default provider is the nmstate and still we can add this parameter in the deployed env for ifcfg provider making 'fakse'.
I think this will not be blocker if any issues happened, and also we have not changed hurry. this activity is planned for and hold to avoid any issues due to this.

@Jaganathancse
Copy link
Contributor Author

shouldn't there be some type of migration to stop/disable the network service if we are going to change the default?

Yeah, I think for already configured nodes there won't be any change unless edpm_network_config_update is set to true during update/upgrade/subsequent deploy. But that would lead to some nodes configured with network-scripts and some with nmstate/NetworkManager. Not sure if that's something we want to get into. Also we should cleanup interfaces configured with network-scripts in https://github.com/os-net-config/os-net-config/blob/master/os_net_config/impl_nmstate.py#L2354 when provider is switched. I would suggest we don't change the default in a hurry without looking at and testing all scenarios.

@rabi
We are suddenly planned and changing now. this activity planned for GA but hold this change to avoid issues and also had issues for NFV SRIOV and OVS DPDK deployments.
Looks like stable nmstate provider for NFV greenfield deployments, so we are planning now to make nmstate provider default.
Also this change will not be blocker in the deployment, still we can add this variable 'false' in nodeset CR to overcome deployment issues with ifcfg provider which is default now.

Also we are working on ifcfg provider to nmstate provider migration automation is in progress now.

Any how we need to update this variable 'false' for adoption ci jobs, we don't have ifcfg provider to nmstate provider migration in OSP 17 now.

@Jaganathancse
Copy link
Contributor Author

shouldn't there be some type of migration to stop/disable the network service if we are going to change the default?

Yeah, I think for already configured nodes there won't be any change unless edpm_network_config_update is set to true during update/upgrade/subsequent deploy. But that would lead to some nodes configured with network-scripts and some with nmstate/NetworkManager. Not sure if that's something we want to get into. Also we should cleanup interfaces configured with network-scripts in https://github.com/os-net-config/os-net-config/blob/master/os_net_config/impl_nmstate.py#L2354 when provider is switched. I would suggest we don't change the default in a hurry without looking at and testing all scenarios.

@rabi We are suddenly planned and changing now. this activity planned for GA but hold this change to avoid issues and also had issues for NFV SRIOV and OVS DPDK deployments. Looks like stable nmstate provider for NFV greenfield deployments, so we are planning now to make nmstate provider default. Also this change will not be blocker in the deployment, still we can add this variable 'false' in nodeset CR to overcome deployment issues with ifcfg provider which is default now.

Also we are working on ifcfg provider to nmstate provider migration automation is in progress now.

Any how we need to update this variable 'false' for adoption ci jobs, we don't have ifcfg provider to nmstate provider migration in OSP 17 now.

@karthiksundaravel please add your comments

@Jaganathancse
Copy link
Contributor Author

recheck

@openshift-ci openshift-ci bot removed the lgtm label Jun 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2025

New changes are detected. LGTM label has been removed.

@Jaganathancse Jaganathancse force-pushed the default_nmstate_provider branch from a6e0023 to 5e4ecbd Compare June 10, 2025 04:48
@Jaganathancse Jaganathancse requested a review from danpawlik June 10, 2025 04:50
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ff7d0a8d6e9041a39d047a476559c871

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 57m 52s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 58s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 38m 38s
✔️ noop SUCCESS in 0s
✔️ edpm-ansible-tempest-multinode SUCCESS in 1h 41m 52s
✔️ edpm-ansible-molecule-edpm_bootstrap SUCCESS in 5m 55s
adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 43m 09s

@evallesp
Copy link

evallesp commented Jun 10, 2025

I see some failures cifmw-crc-podified-edpm-baremetal:
TASK [osp.edpm.edpm_libvirt : Install addditional packages name={{ edpm_libvirt_packages }}, download_only=True] *** Tuesday 10 June 2025 06:13:46 +0000 (0:00:00.125) 0:00:30.085 ********** �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (5 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (4 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (3 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (2 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (1 retries left).�[0m �[0;31mfatal: [edpm-compute-0]: FAILED! => {"attempts": 5, "changed": false, "failures": [], "msg": "Failed to download packages: Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=epel-9&arch=x86_64&infra=$infra&content=centos [Could not resolve host: mirrors.fedoraproject.org]", "rc": 1, "results": []}�

Rechecking and holding the node to see what's going on

@evallesp
Copy link

recheck

@Jaganathancse
Copy link
Contributor Author

I see some failures cifmw-crc-podified-edpm-baremetal: TASK [osp.edpm.edpm_libvirt : Install addditional packages name={{ edpm_libvirt_packages }}, download_only=True] *** Tuesday 10 June 2025 06:13:46 +0000 (0:00:00.125) 0:00:30.085 ********** �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (5 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (4 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (3 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (2 retries left).�[0m �[1;30mFAILED - RETRYING: [edpm-compute-0]: Install addditional packages (1 retries left).�[0m �[0;31mfatal: [edpm-compute-0]: FAILED! => {"attempts": 5, "changed": false, "failures": [], "msg": "Failed to download packages: Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=epel-9&arch=x86_64&infra=$infra&content=centos [Could not resolve host: mirrors.fedoraproject.org]", "rc": 1, "results": []}�

Rechecking and holding the node to see what's going on

also adoption-standalone-to-crc-ceph-provider failing

@Jaganathancse
Copy link
Contributor Author

rechek

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d67186456c0c409fbf5980251578837c

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 27m 30s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 45s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 45m 59s
✔️ noop SUCCESS in 0s
edpm-ansible-tempest-multinode FAILURE in 1h 44m 48s
✔️ edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 11s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 3h 10m 56s

@Jaganathancse
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e62674f7bc404d67b7da1645afb1050a

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 58m 01s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 30s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 43m 39s
✔️ noop SUCCESS in 0s
✔️ edpm-ansible-tempest-multinode SUCCESS in 1h 44m 37s
✔️ edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 25s
adoption-standalone-to-crc-ceph-provider FAILURE in 46m 15s

@Jaganathancse Jaganathancse force-pushed the default_nmstate_provider branch 4 times, most recently from 846e175 to 6a0fc76 Compare June 10, 2025 19:11
This change is to update default os-net-config nmstate
provider for network configuration and also fixing
DNS config update issue when nmste provider or nmsate
tool used for network configuration.
And also defaulting DNS update config
'edpm_bootstrap_network_resolvconf_update' true
for default nmstate provider and alse updated false
for ifcfg provider based edpm_network_config_nmstte
variable.
@Jaganathancse Jaganathancse force-pushed the default_nmstate_provider branch from 6a0fc76 to cf54fe3 Compare June 10, 2025 19:21
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/fa517245e07b4d3ca25cda1f09c4bf15

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 24m 37s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 10s
cifmw-crc-podified-edpm-baremetal FAILURE in 26m 49s
✔️ noop SUCCESS in 0s
✔️ edpm-ansible-tempest-multinode SUCCESS in 1h 28m 37s
✔️ edpm-ansible-molecule-edpm_bootstrap SUCCESS in 5m 46s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 3h 08m 47s

@Jaganathancse
Copy link
Contributor Author

recheck

type: bool
description: "Cleanup network interfaces not in network config"
default: true
edpm_bootstrap_network_resolvconf_update:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping same name used in edpm_bootrap role for edpm_network_config, will propose new PR later based on edpm_network_config role and need changes in other datapalane adoption project also.

@eshulman2 eshulman2 added the lgtm label Jun 11, 2025
Copy link

@karthiksundaravel karthiksundaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bshephar, Jaganathancse, jpodivin, karthiksundaravel, vcandapp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 50a12d8 into openstack-k8s-operators:main Jun 11, 2025
34 checks passed
eduolivares added a commit to eduolivares/architecture that referenced this pull request Jun 13, 2025
Default value was changed to true (NM is used instead of ifcfg) with
[1].
With the new default value, os-net-config fails on BGP setups.

Related-Bug: OSPRH-17472

[1] openstack-k8s-operators/edpm-ansible#908
softwarefactory-project-zuul bot added a commit to openstack-k8s-operators/architecture that referenced this pull request Jun 13, 2025
[BGP] Set edpm_network_config_nmstate=false

Default value was changed to true (NM is used instead of ifcfg) with [1].
With the new default value, os-net-config fails on BGP setups.
Related-Bug: OSPRH-17472
[1] openstack-k8s-operators/edpm-ansible#908

Reviewed-by: Luca Miccini
Reviewed-by: Candido Campos Rivas
Reviewed-by: John Fulton <[email protected]>
skovili pushed a commit to skovili/architecture that referenced this pull request Oct 13, 2025
Default value was changed to true (NM is used instead of ifcfg) with
[1].
With the new default value, os-net-config fails on BGP setups.

Related-Bug: OSPRH-17472

[1] openstack-k8s-operators/edpm-ansible#908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.