Skip to content

Comments

[update] Add complete IPv6 support to workload launch scripts#3616

Open
sathlan wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
sathlan:update-ipv6-workload-dns
Open

[update] Add complete IPv6 support to workload launch scripts#3616
sathlan wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
sathlan:update-ipv6-workload-dns

Conversation

@sathlan
Copy link
Contributor

@sathlan sathlan commented Jan 22, 2026

Add comprehensive IPv6 support for RHOSO deployments in the update
workload launch infrastructure. This enables workload testing on
IPv6 environments that were previously unsupported. Here IPv6-only
means that no IPv4 routing is defined at all on the controller-0 and
on the openstackclient pod.

This does support dual stack IPv6/IPv4 public network.

For IPv6 we directly attach to the IPv6 public subnet as there is
no FIP in IPv6 OpenStack.

To make sure that OVN distribute the RA to the cirros instance we
create a dummy router that create the necessary structure in ovn.

Limitation: SRIOV validation is not supported in IPv6 only setup.

Maintains full backward compatibility with existing IPv4 workflows.

Signed-off-by: Sofer Athlan-Guyot sathlang@redhat.com

Partially-Closes: OSPCIX-1114

@sathlan
Copy link
Contributor Author

sathlan commented Jan 22, 2026

@hjensas Hi, so I've tried to address the fallback with specific IPv6 DNS server or specific IPv4 DNS server. I couldn't find a better variable than cifmw_polarion_jump_custom_fields.iprotocol to get if it is a an ipv4 or ipv6 deployment. Do you know a better way ?

(follow up on your question in #3498)

done
else
echo "Warning: No nameservers found, using public DNS fallback"
{%- if cifmw_polarion_jump_custom_fields.iprotocol|default("IPv4") == "IPv6" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what else can be used, but I have another comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now checking cifmw_os_net_setup_config|[public][public_subsnet].ip_version . If it exists and is equal to "6", then I'm switching to IPv6 setup.

@sathlan
Copy link
Contributor Author

sathlan commented Feb 6, 2026

Result of the test on an uni04delta ipv6 live setup

This is not working.

uni04delta-ipv6 jobs creates a ipv4 public network as defined in ci-framework-jobs/scenarios/uni/uni04delta-ipv6/04-scenario-vars.yaml file:

cifmw_os_net_setup_config:
  - name: public
    external: true
    shared: false
    is_default: true
    provider_network_type: flat
    provider_physical_network: datacentre
    subnets:
      - name: public_subnet
        cidr: 192.168.122.0/24
        allocation_pool_start: 192.168.122.171
        allocation_pool_end: 192.168.122.250
        gateway_ip: 192.168.122.1
        enable_dhcp: true
  - name: private
    external: false
    shared: true
    subnets:
      - name: private_subnet
        cidr: '10.2.0.0/24'
        allocation_pool_start: 10.2.0.10
        allocation_pool_end: 10.2.0.250
        gateway_ip: 10.2.0.1
        enable_dhcp: true

This is confirmed by any Zuul log of any uni04delta-ipv6-rhel9-rhoso18.0 deployment job in the logs/controller-0/ci-framework-data/logs/ansible-post-deployment.log file. Look for TASK [os_net_setup : Print subnet command creation output.

In any case, in workload_launch when trying to use a ipv6 DNS with an ipv4 subnet definition, we get that error:

+ openstack subnet create --subnet-range 192.168.0.0/24 --allocation-pool start=192.168.0.10,end=192.168.0.100 --gateway 192.168.0.254 --dns-nameserver 2620:cf:cf:cf02::1 --dns-nameserver 2620:cf:cf:aaaa::1 --network internal_net_c120974ca7 internal_net_c120974ca7_subnet
BadRequestException: 400: Client Error for url: https://neutron-public-openstack.apps.ocp.openstack.lab/v2.0/subnets, Invalid input for operation: dns_nameserver '2620:cf:cf:cf02::1' does not match the ip_version '4'.

meaning that it's either an full ipv6 subnet definition or a full ipv4 definition, we cannot mix and match.

Given the scope of this instance creation, which is being able to assert that networking is alive during update; that you can define ipv4 tenant network in a ipv6 setup; and that the public network (where the FIP will be available for ping) is defined as ipv4, I think we just have to make sure that we get a valid ipv4 DNS on ipv6 deployment.

I'm refactoring the patch along those lines.

===> UPDATE

That didn't work because all IPv4 routing tables are empty, meaning there's no way to reach the public network created during uni04delta post_deployment setup. We have to switch to full IPv6 for everything:

  • the script
  • the public network definition in the uni04delta job.

@sathlan sathlan force-pushed the update-ipv6-workload-dns branch 2 times, most recently from 6cb6fbc to ceebb3f Compare February 9, 2026 09:54
@sathlan sathlan changed the title [update] Fix IPv6 nameserver support in workload launch script [update] Add complete IPv6 support to workload launch scripts Feb 9, 2026
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ea1f11865cc0493cbaed4cd258d88d3f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 59m 52s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 25s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 34m 42s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 1h 46m 28s
✔️ cifmw-pod-zuul-files SUCCESS in 5m 28s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 13s
cifmw-pod-pre-commit FAILURE in 8m 31s
✔️ cifmw-molecule-update SUCCESS in 4m 57s

@sathlan sathlan force-pushed the update-ipv6-workload-dns branch from ceebb3f to c342450 Compare February 9, 2026 14:55
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/351013ab6ff948c29450f6cf3c2d5fe1

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 53m 15s
podified-multinode-edpm-deployment-crc FAILURE in 38m 46s
cifmw-crc-podified-edpm-baremetal FAILURE in 40m 34s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 1h 39m 06s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 45s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 03s
cifmw-pod-pre-commit TIMED_OUT in 31m 11s
✔️ cifmw-molecule-update SUCCESS in 5m 21s

@sathlan sathlan force-pushed the update-ipv6-workload-dns branch 2 times, most recently from dd6e7ea to 98f4abc Compare February 9, 2026 18:19
@sathlan
Copy link
Contributor Author

sathlan commented Feb 9, 2026

New fixes

  • I had a nit in the template that make the script failed (a fi was rendered on the same line than the code because of trim action);
  • I wrongly pushed the validation script - removed
  • IPv6 OVN needs a router and a port to start advertising RA in SLAAC mode. Didn't find it documented, but without the "dummy" router the cirros vm don't get the IPv6 ip.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/61914a7db9ad441f92f3fd836c2890cc

✔️ openstack-k8s-operators-content-provider SUCCESS in 54m 21s
podified-multinode-edpm-deployment-crc FAILURE in 41m 10s
cifmw-crc-podified-edpm-baremetal FAILURE in 24m 04s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 30m 01s
✔️ cifmw-pod-zuul-files SUCCESS in 6m 10s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 55s
cifmw-pod-pre-commit FAILURE in 9m 54s
✔️ cifmw-molecule-update SUCCESS in 6m 05s

@sathlan
Copy link
Contributor Author

sathlan commented Feb 10, 2026

Validated

Deployed uni04delta-ipv6-update-base in testproject with that patch and in ci-framework-job for [uni04deltaIPv6] that "Creates a public and private IPv6 network instead of IPv4" and got successful deployment and update.

The ping test was able to run:

[1770683518.333013] 64 bytes from 2620:cf:cf:aaaa:f816:3eff:fea4:6d86: icmp_seq=1442 ttl=64 time=0.572 ms
[1770683519.357982] 64 bytes from 2620:cf:cf:aaaa:f816:3eff:fea4:6d86: icmp_seq=1443 ttl=64 time=1.38 ms
[1770683520.358109] 64 bytes from 2620:cf:cf:aaaa:f816:3eff:fea4:6d86: icmp_seq=1444 ttl=64 time=0.617 ms
[1770683521.406265] 64 bytes from 2620:cf:cf:aaaa:f816:3eff:fea4:6d86: icmp_seq=1445 ttl=64 time=1.74 ms
[1770683522.407515] 64 bytes from 2620:cf:cf:aaaa:f816:3eff:fea4:6d86: icmp_seq=1446 ttl=64 time=0.995 ms

--- 2620:cf:cf:aaaa:f816:3eff:fea4:6d86 ping statistics ---
1446 packets transmitted, 1444 received, 0.138313% packet loss, time 1469516ms
rtt min/avg/max/mdev = 0.277/0.841/13.917/0.506 ms

This is associated with that VM:

[zuul@controller-0 update]$ oc rsh openstackclient openstack server list --fit-width
+--------------+--------------+--------+--------------+--------------+-----------------+
| ID           | Name         | Status | Networks     | Image        | Flavor          |
+--------------+--------------+--------+--------------+--------------+-----------------+
| 4cf6e940-6e8 | instance_e6b | ACTIVE | public=2620: | upgrade_work | v1-512M-10G-e6b |
| 4-44b2-b956- | 499f706      |        | cf:cf:aaaa:f | load_e6b499f | 499f706         |
| 38a7d200d305 |              |        | 816:3eff:fea | 706          |                 |
|              |              |        | 4:6d86       |              |                 |
+--------------+--------------+--------+--------------+--------------+-----------------+

It fails because of 13% ping loss, but that is for another issue. Here we can clearly see that we were able to setup the VM and could successfully ping it on its IPv6 address.

@sathlan sathlan marked this pull request as ready for review February 10, 2026 08:44
@sathlan sathlan force-pushed the update-ipv6-workload-dns branch from 98f4abc to 0c75949 Compare February 10, 2026 08:46
@sathlan
Copy link
Contributor Author

sathlan commented Feb 10, 2026

Fix an extra blank line in the roles/update/templates/l3_agent_start_ping.sh.j2 file that was failing the cifmw-pod-pre-commit test, no need for full re-validation.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c346d5a2cb5e41d6bc1a8ff8d246860a

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 16m 53s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 36s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 19s
cifmw-crc-podified-edpm-baremetal-minor-update FAILURE in 2h 02m 52s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 42s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 01s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 47s
✔️ cifmw-molecule-update SUCCESS in 4m 51s

@sathlan
Copy link
Contributor Author

sathlan commented Feb 17, 2026

In reply to @hjensas

When looking at this as a reviewer, it is hard to review.

I realize this pattern is common in ci-framework, but is a shell script that is 250+ lines something we want to use jinj2 for? Can we instead set env before calling the script and have this logic in shell scripting? If we did that, we'd get coverage via shellcheck in pre-commit. Also, would it be better to split this into two or three files, one with common functions, one with the IPv6 flow and the other with IPv4 to separate the concerns?

This work has been on-going for some time, I understand there has been hours and hours to validate it is working. So, let's merge this as is - but add something in backlog to re-visit the jinja2 shell script pattern?

/lgtm

Well, we actually want to remove this entirely eventually and move this functionality to the test_operator. This script date back from ... OSP13.

I understand the pain and agree with most point, especially with not a template but only a bash script, but currently, that would required some other refactoring around the all the tasks it's in. I don't think this is worth the time and effort and instead focus on a refactor that leverage the test_operator somehow.

Thanks for the comments.

@sathlan
Copy link
Contributor Author

sathlan commented Feb 17, 2026

We the latest patchset on a full run I found an issue.

Switching back to draf.

hjensas
hjensas previously approved these changes Feb 17, 2026
@sathlan
Copy link
Contributor Author

sathlan commented Feb 17, 2026

RCA of the problem found

I have a local test script but it was missing that option:

from jinja2 import StrictUndefined

when run in a real environment I had that error:

    msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''ip_version''. ''dict
      object'' has no attribute ''ip_version'''

This come from the fact that in the public subnet we have one subnet where ip_version is not defined causing the error.

The fix

Add a default value in the filter chain to support that case. I've check the outcome using my local validation script (with the StrictUndefined) and also on a live environment and they were both able to render the script.

I also tested locally the case where we only have a IPv4 subnet definition and we have IS_IPV6=False as expected.

@sathlan sathlan dismissed stale reviews from hjensas and holser via 7e43751 February 17, 2026 18:30
@sathlan sathlan force-pushed the update-ipv6-workload-dns branch from db25242 to 7e43751 Compare February 17, 2026 18:30
@openshift-ci openshift-ci bot removed the lgtm label Feb 17, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 17, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from holser. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sathlan sathlan force-pushed the update-ipv6-workload-dns branch from 7e43751 to db16fcb Compare February 17, 2026 18:31
@sathlan sathlan marked this pull request as ready for review February 17, 2026 18:31
@sathlan sathlan marked this pull request as draft February 18, 2026 10:20
Add comprehensive IPv6 support for RHOSO deployments in the update
workload launch infrastructure. This enables workload testing on
`IPv6` environments that were previously unsupported.  Here IPv6-only
means that no IPv4 routing is defined at all on the `controller-0` and
on the `openstackclient` pod.

This does support dual stack `IPv6/IPv4` public network.

For `IPv6` we directly attach to the `IPv6` public subnet as there is
no `FIP` in `IPv6` OpenStack.

To make sure that OVN distribute the `RA` to the `cirros` instance we
create a dummy router that create the necessary structure in `ovn`.

Limitation: SRIOV validation is not supported in IPv6 only setup.

Maintains full backward compatibility with existing IPv4 workflows.

Signed-off-by: Sofer Athlan-Guyot <sathlang@redhat.com>

Partially-Closes: [OSPCIX-1114](https://issues.redhat.com/browse/OSPCIX-1114)
@sathlan sathlan force-pushed the update-ipv6-workload-dns branch from db16fcb to 61ef30b Compare February 18, 2026 12:46
@sathlan
Copy link
Contributor Author

sathlan commented Feb 18, 2026

IPv6 Dual-Stack Support for Workload Launch Script

I had to adjust the network definition of the job (in ci-framework-jobs) to be a dual-stack IPv4/IPv6 setup so that the existing tempest test which are using IPv4 would still work. This caused a problem with the creation of the instance.

Dual-Stack Network Allocation Issue

When dual-stack public networks were introduced (both IPv4 and IPv6 subnets), VM creation failed with Failed to allocate the network(s) because the script attached directly to the public network using --nic net-id="${PUBLIC_NET_ID}". This caused OpenStack to attempt IP allocation from both subnets simultaneously, creating resource conflicts.

Resolution: For IPv6 deployments, create a dedicated port with explicit IPv6 subnet constraint (--fixed-ip subnet=${IPV6_SUBNET_ID}), then attach the VM to that port using --port ${IPV6_PORT_NAME}.

Jinja2 Filter Simplification

Replaced complex template-time IPv6 detection with runtime OpenStack commands:

- {% set ipv6_subnet = cifmw_os_net_setup_config | selectattr('name', 'equalto', 'public') | map(attribute='subnets') | flatten | selectattr('ip_version', 'defined') | 
selectattr('ip_version', 'equalto', 6) | first | default(none) %}
+ export IPV6_SUBNET_ID=$(openstack subnet list --network "${EXTERNAL_NET_NAME}" --ip-version 6 -f value -c ID | head -1)
+ export IS_IPV6=$([ -n "${IPV6_SUBNET_ID}" ] && echo "True" || echo "False")

This eliminates template complexity and uses direct OpenStack API calls for IPv6 subnet detection and VM IP address extraction.

ping start stop simplification

ping actually works for both IPv4 and IPv6 so no need to use ping6, simplifying the code further.

Status: Waiting for another clean run of the full test.

I will report here when the full test is done.

@sathlan sathlan marked this pull request as ready for review February 19, 2026 13:46
@sathlan
Copy link
Contributor Author

sathlan commented Feb 19, 2026

Tested from start to finish on a testproject environment.

The vm is created correctly and ping test is started. There is an high drop rate but this is for another story and not related to that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants