Skip to content

Conversation

@fultonj
Copy link
Contributor

@fultonj fultonj commented Dec 19, 2025

This PR will be rebased after the following merges:

#1184

olliewalsh and others added 7 commits December 17, 2025 15:24
Create a scenario to adopt DCN deployments, based on the HCI scenario.

Signed-off-by: Oliver Walsh <[email protected]>
The DCN deployment templates are missing route definitions and
the DCN roles were using central subnets instead of their own.
This patch fixes that problem by making the following changes.

1. Network Routes Added (network_data.yaml.j2)

  a. dcn1/network_data.yaml.j2: Added routes to InternalApi,
     Storage, and Tenant subnets pointing to central
     (172.17.0.0/24, 172.18.0.0/24, 172.19.0.0/24) and dcn2
     (172.17.20.0/24, 172.18.20.0/24, 172.19.20.0/24)

  b. dcn2/network_data.yaml.j2: Added routes to InternalApi,
     Storage, and Tenant subnets pointing to central
     (172.17.0.0/24, 172.18.0.0/24, 172.19.0.0/24) and dcn1
     (172.17.10.0/24, 172.18.10.0/24, 172.19.10.0/24)

  c. central/network_data.yaml.j2: Added routes to InternalApi,
     Storage, and Tenant subnets pointing to dcn1
     (172.17.10.0/24, 172.18.10.0/24, 172.19.10.0/24) and dcn2
     (172.17.20.0/24, 172.18.20.0/24, 172.19.20.0/24)

2. Control Plane Routes Added (config_download.yaml)

  a. dcn1/config_download.yaml: Added host_routes to
     leaf1 subnet for central (192.168.122.0/24) and
     dcn2 (192.168.144.0/24)

  b. dcn2/config_download.yaml: Added host_routes to
     leaf2 subnet for central (192.168.122.0/24) and
     dcn1 (192.168.133.0/24)

  c. central/config_download.yaml: Added host_routes to
     ctlplane-subnet for dcn1 (192.168.133.0/24) and dcn2
     (192.168.144.0/24)

3. Subnet References Fixed (roles.yaml)

  a. dcn1/roles.yaml: Changed ComputeDcn1 networks to use
     internal_api_leaf1, tenant_leaf_1, storage_leaf1

  b. dcn2/roles.yaml: Changed ComputeDcn2 networks to use
     internal_api_leaf2, tenant_leaf_2, storage_leaf2

Signed-off-by: John Fulton <[email protected]>
Co-authored-by: Claude <[email protected]>
The new control plane defined in the architecture repo
(examples/dt/dcn_nostorage/control-plane/nncp/values.yaml)
uses unique VLAN IDs per site like central: 20-23, dcn1:
30-33 and dcn2: 40-43.

The old control plane defined in the data-plane-adoption repo
(tests/vars.dcn_nostorage.yaml) uses the same VLAN per site
like central: 20-23, dcn1: 20-23 and dcn2: 20-23.

This leads to problems during adoption testing which requires
manual renumbering to fix. This patch updates the original 17
test control plane to use the same unique VLAN IDs per site.

Co-authored-by: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
DCN deployments on RHEL 9.4 hypervisors require loose reverse path
filtering to allow asymmetric routing between DCN compute nodes and
the central controller.

Set KernelIpv4ConfAllRpFilter=2 to set net.ipv4.conf.all.rp_filter=2
on all overcloud nodes.

Without this setting, DCN compute nodes cannot communicate with the
central controller's Keystone service during deployment, causing the
nova_wait_for_compute_service task to fail.

Note: This issue does not occur on CentOS Stream 9 hypervisors.

Note: It is assumed that the same settings have already been made on
the hypervisor hosting the VMs.

Also, remove redundant ControllerExtraConfig and set
nova::availability_zone::default_schedule_zone: az-central
using the single ControllerExtraConfig.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
Add network-specific routes to dcn_nostorage.yaml stack configurations
to enable ci-framework's os-net-config template to render cross-site
routes for DCN compute nodes.

Problem:
- ci-framework pre-generates /etc/os-net-config/tripleo_config.yaml
  via ansible template before TripleO Heat deployment runs
- This bypasses TripleO Heat NIC templates completely
- Routes defined in network_data.yaml.j2 are never rendered to
  compute nodes

Solution:
- Add network_routes configuration to dcn1 and dcn2 stacks in
  dcn_nostorage.yaml
- ci-framework's os_net_config_overcloud.yml.j2 template will consume
  these routes and render them to /etc/os-net-config/tripleo_config.yaml

Routes Added:
- DCN1: Routes to central (172.17/18/19.0.0/24) and dcn2
  (172.17/18/19.20.0/24) via appropriate gateways
- DCN2: Routes to central (172.17/18/19.0.0/24) and dcn1
  (172.17/18/19.10.0/24) via appropriate gateways

This enables DCN compute nodes to reach OVN southbound DB and other
services on central controllers using the correct source IP addresses.

Related: Commit bfc6d4d added routes to network_data.yaml.j2, but
those routes were never being used due to ci-framework bypassing
Heat templates.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
Add routes field to DCN subnet definitions in netconfig_networks to enable
proper inter-site connectivity. Routes are templated from edpm_dcn1_routes
and edpm_dcn2_routes variables.

The NetConfig controller propagates these routes to IPSet status, which the
openstack-operator inventory generator reads to create {network}_host_routes
ansible variables for the EDPM network configuration template.

Changes:
- Add routes to internalapidcn1/dcn2 subnets for RabbitMQ/API connectivity
- Add routes to storagedcn1/dcn2 subnets for storage traffic
- Add routes to tenantdcn1/dcn2 subnets for tenant network traffic

This fixes the issue where DCN compute nodes couldn't connect to RabbitMQ
and other control plane services because they lacked routes to central site.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
Problem:
DCN compute nodes' OVN Controller agents cannot connect to the OVN SB database
because the default ovncontroller-config ConfigMap uses Kubernetes ClusterIP
(tcp:ovsdbserver-sb.openstack.svc:6642) which is not routable from external
EDPM nodes on different network segments.

While DNS resolution works (via dnsmasq at 192.168.122.80), the resolved
ClusterIP cannot be reached from DCN sites which are on different internalapi
subnets (172.17.10.x for dcn1, 172.17.20.x for dcn2 vs central's 172.17.0.x).

This causes port binding failures when launching VMs in DCN availability zones:
  "Binding failed for port, please check neutron logs for more information"

Evidence:
- Central compute OVN Controller agents: Connected and working (`:-)` status)
- DCN compute OVN Controller agents: NOT registered in OVN SB database
- `ovn-sbctl show` shows only central computes and gateway, no DCN chassis

Root Cause:
Setting edpm_ovn_dbs variable is insufficient because the edpm_ovn role loads
ovncontroller-config ConfigMap data which overrides the ovn-remote setting.
The default ConfigMap (created by OVNDBCluster operator) uses ClusterIP.

Solution:
1. Retrieve OVN SB internalapi IPs from pod annotations
2. Create DCN-specific ConfigMap (ovncontroller-config-dcn) with direct IPs
3. Create DCN-specific DataPlaneService (ovn-dcn) referencing this ConfigMap
4. Patch dcn1/dcn2 nodesets to use ovn-dcn service instead of ovn

This ensures DCN nodes connect to OVN SB via routable internalapi IPs:
  tcp:172.17.0.34:6642,tcp:172.17.0.36:6642,tcp:172.17.0.35:6642

Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Dec 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jistr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a4d319b24b8342a787b83d63227f4077

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 10m 50s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 11m 36s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants