Skip to content

Conversation

@rabi
Copy link
Contributor

@rabi rabi commented Aug 13, 2025

Changing NetworkManager to manage resolv.conf results in resolv.conf entries by cloud-init overwritten when using unprovisioned nodes. Let's change it just before running os-net-config so that tasks to download packages do not fail.

This is a regression from #908

jira: https://issues.redhat.com/browse/OSPRH-19018

Changing NetworkManager to manage resolv.conf results in
resolv.conf entries by cloud-init overwritten when using
unprovisioned nodes. Let's change it just beforre running
os-net-config so that tasks to download packages does not
fail.

jira: https://issues.redhat.com/browse/OSPRH-19018
Signed-off-by: rabi <[email protected]>
@openshift-ci openshift-ci bot requested review from olliewalsh and stuggi August 13, 2025 07:01
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/08b46ef8ba9540919ed2bc89543da1fe

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 52m 59s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 14m 28s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 25s
✔️ noop SUCCESS in 0s
✔️ edpm-ansible-tempest-multinode SUCCESS in 1h 28m 53s
adoption-standalone-to-crc-ceph-provider NODE_FAILURE Node request 099-0007977878 failed in 0s

@rabi
Copy link
Contributor Author

rabi commented Aug 13, 2025

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e54e5842f4674649be9909eec7e07e09

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 55m 41s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 55s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 24s
✔️ noop SUCCESS in 0s
✔️ edpm-ansible-tempest-multinode SUCCESS in 1h 38m 47s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 37m 28s

@rabi
Copy link
Contributor Author

rabi commented Aug 13, 2025

recheck

Copy link
Contributor

@slagle slagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the nfv folks should probably review as well since they originally put this change in.
@dsneddon @Jaganathancse

retries: "{{ edpm_network_config_download_retries }}"
delay: "{{ edpm_network_config_download_delay }}"

- name: Import DNS NetworkManager configs tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rabi for pushing this, came across this while looking a issue in Cu deployment due to this.

Looks like it will clear the issue with success cases i.e os-net-config apply succeeds. But if that fails for any reason next configure-network should fail and stuck as before without manual intervention, wdyt or i am missing something here? Considering it only changes resolv.conf parts may be we can move this after os-net-config runs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, those changes can not be moved after os-net-config run as the dns changes would be done by network-manager duing os-net-config run. I think openstack-k8s-operators/edpm-image-builder#85 would fix all cases as reloading network-manager any number of times won't have any impact after that, but for customer using images without that this would be useful.

Copy link
Contributor

@karelyatin karelyatin Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But considering it moves without any new dns config applied on system during os-net-config run with edpm_bootstrap_network_resolvconf_update=false, looks like moving it at later stage should work just that new config will roll out a step later.

And we even can't use edpm_bootstrap_network_resolvconf_update=false as have other issues with nmstate enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though theoritically if we reload NetworkManager after os-net-config run it should update resolv.conf, I've not tested if not removing dns=none from 99-cloud-init.conf when os-net-config runs have any impact. If this can be tested and works we can move it after os-net-config run. Feel free to update the PR. However, IMO we should probably focus on openstack-k8s-operators/edpm-image-builder#85 which woud fix without any of these hacks, assuming we want everyone to move to nmstate provider with os-net-config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, failed os-net-config run would normally mess up things anyway, so I won't overly bother about that use-case here.

Copy link
Contributor

@karelyatin karelyatin Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked more on this and proposed openstack-k8s-operators/openstack-baremetal-operator#316 , let's see if that goes fine too
Not sure if openstack-k8s-operators/edpm-image-builder#85 will work in this scenario considering above patch, or is that validated already in this scenario?

Copy link
Contributor Author

@rabi rabi Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if openstack-k8s-operators/edpm-image-builder#85 will work.

I think I've tested it.

[cloud-admin@edpm-compute-0 ~]$ cat /etc/resolv.conf 
; Created by cloud-init automatically, do not edit.
;
nameserver 192.168.122.80
search ctlplane.example.com

a. Remove dns=none

[cloud-admin@edpm-compute-0 ~]$ NetworkManager --print-config
# NetworkManager configuration: /etc/NetworkManager/NetworkManager.conf, /usr/lib/NetworkManager/conf.d/00-server.conf, /etc/NetworkManager/conf.d/99-cloud-init.conf

[main]
# plugins=
# rc-manager=auto
# migrate-ifcfg-rh=false
# auth-polkit=true
# dhcp=internal
# iwd-config-path=
no-auto-default=*
ignore-carrier=*
configure-and-quit=no

[logging]
# backend=journal
# audit=false

[device]
# wifi.backend=wpa_supplicant

# no-auto-default file "/var/lib/NetworkManager/no-auto-default.state"
[cloud-admin@edpm-compute-0 ~]$ sudo systemctl reload NetworkManager
[cloud-admin@edpm-compute-0 ~]$ cat /etc/resolv.conf 
# Generated by NetworkManager

b. Update renderor

[cloud-admin@edpm-compute-0 ~]$ cat /etc/cloud/cloud.cfg | grep renderer
    renderers: ['network-manager', 'sysconfig', 'eni', 'netplan', 'networkd']
[cloud-admin@edpm-compute-0 ~]$     sudo cloud-init clean --logs --reboot
Connection to 192.168.122.100 closed by remote host.
Connection to 192.168.122.100 closed.

c. clean the network configured with sysconfig earlier (this won't be required when image has the cloud.cfg changes)

[cloud-admin@edpm-compute-0 ~]$ nmcli connection
NAME               UUID                                  TYPE      DEVICE 
System enp1s0      c0ab6b8c-0eac-a1b4-1c47-efe4b2d1191f  ethernet  enp1s0 
lo                 a3135099-7820-4f7a-94f5-c48210ae43eb  loopback  lo     
cloud-init enp1s0  a41601f3-3acc-5f60-ac5f-9d9011ab7c25  ethernet  --     
ens3               35f7245a-9a2b-4111-ba3e-b6fb322a1f25  ethernet  --     
[cloud-admin@edpm-compute-0 ~]$ sudo nmcli connection delete c0ab6b8c-0eac-a1b4-1c47-efe4b2d1191f
Connection 'System enp1s0' (c0ab6b8c-0eac-a1b4-1c47-efe4b2d1191f) successfully deleted.
[cloud-admin@edpm-compute-0 ~]$ nmcli connection
NAME               UUID                                  TYPE      DEVICE 
cloud-init enp1s0  a41601f3-3acc-5f60-ac5f-9d9011ab7c25  ethernet  enp1s0 
lo                 a3135099-7820-4f7a-94f5-c48210ae43eb  loopback  lo     
ens3               35f7245a-9a2b-4111-ba3e-b6fb322a1f25  ethernet  --     
[cloud-admin@edpm-compute-0 ~]$ cat /etc/resolv.conf 
# Generated by NetworkManager
search ctlplane.example.com
nameserver 192.168.122.80

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<< Also, failed os-net-config run would normally mess up things anyway, so I won't overly bother about that use-case here.
But that depends on how that failed, so like if failure is due to wrong os-net-config in next attempt we can fix config in the nodeset and rerun and that should work but in the current proposal it will just get stuck if nameserver get's wiped off in the previous run.

Copy link
Contributor

@karelyatin karelyatin Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though theoritically if we reload NetworkManager after os-net-config run it should update resolv.conf, I've not tested if not removing dns=none from 99-cloud-init.conf when os-net-config runs have any impact. If this can be tested and works we can move it after os-net-config run. Feel free to update the PR. However, IMO we should probably focus on openstack-k8s-operators/edpm-image-builder#85 which woud fix without any of these hacks, assuming we want everyone to move to nmstate provider with os-net-config.

Ok looking more on it, with openstack-k8s-operators/openstack-baremetal-operator#316 and/or openstack-k8s-operators/edpm-image-builder#85 this PR shoudn't be needed but will also not hurt so the concerns raised for failure cases will not be much relevant. Have also done some tests with nmstate=false those also went fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with openstack-k8s-operators/openstack-baremetal-operator#316

I've commented in above PR but adding it here as well...

We should allow global dns settinngs as per the openstack networkdata schema https://docs.openstack.org/nova/latest/_downloads/9119ca7ac90aa2990e762c08baea3a36/network_data.json and not only interface level ones as done in this PR. We allow users to use custom networkData in nodeset spec and that can be anything as per shcema. As the current default is to use nmstate provider with os-net-config (and we plan to remove support for ifcfg scripts) we should switch the renderer as proposed in openstack-k8s-operators/edpm-image-builder#85.

- /etc/NetworkManager/NetworkManager.conf
- /etc/NetworkManager/conf.d/99-cloud-init.conf
- name: Set 'rc-manager=unmanaged' in /etc/NetworkManager/NetworkManager.conf
- name: Unset 'rc-manager=unmanaged' in /etc/NetworkManager/NetworkManager.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below Reload task can also be made conditional i.e no need to reload if desired config is in place

Copy link
Contributor

@Jaganathancse Jaganathancse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good. Thanks Rabi for this PR.

Copy link
Contributor

@Jaganathancse Jaganathancse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabi
Looks like this edpm_network_config_tool 'nmstate' block also required this dns_nm_configs.yml changes.

  • name: Configure network with network role from system roles [nmstate]
    when: edpm_network_config_tool == 'nmstate'
    become: true
    block:
    • name: Render network_state variable
      ansible.builtin.set_fact:
      network_state: "{{ edpm_network_config_template | from_yaml }}"
    • name: Load system-roles.network tasks [nmstate]
      ansible.builtin.include_role:
      name: "{{ lookup('ansible.builtin.env', 'EDPM_SYSTEMROLES', default='fedora.linux_system_roles') + '.network' }}"

@openshift-ci openshift-ci bot removed the lgtm label Aug 20, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rabi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

state: restarted
when: nm_ovs_status.changed # noqa: no-handler

- name: Import DNS NetworkManager configs tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task only configures DNS update will be done by NM or not.
is this updating DNS config for nmstate provider when using cloud-init nmstate config intial setup?

@Jaganathancse
Copy link
Contributor

@rabi As we discussed , i am going to rework and test this issue.
Proposed new draft PR: #1007
Please check it.

@rabi
Copy link
Contributor Author

rabi commented Aug 21, 2025

Closing this as openstack-k8s-operators/edpm-image-builder#85 has merged. But there are still issues with minor updates as mentioned in #1007 (comment)

@rabi rabi closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants