-
Notifications
You must be signed in to change notification settings - Fork 80
Do Not Merge: Dcn adoption pr ovn sb #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Do Not Merge: Dcn adoption pr ovn sb #1186
Conversation
Create a scenario to adopt DCN deployments, based on the HCI scenario. Signed-off-by: Oliver Walsh <[email protected]>
The DCN deployment templates are missing route definitions and
the DCN roles were using central subnets instead of their own.
This patch fixes that problem by making the following changes.
1. Network Routes Added (network_data.yaml.j2)
a. dcn1/network_data.yaml.j2: Added routes to InternalApi,
Storage, and Tenant subnets pointing to central
(172.17.0.0/24, 172.18.0.0/24, 172.19.0.0/24) and dcn2
(172.17.20.0/24, 172.18.20.0/24, 172.19.20.0/24)
b. dcn2/network_data.yaml.j2: Added routes to InternalApi,
Storage, and Tenant subnets pointing to central
(172.17.0.0/24, 172.18.0.0/24, 172.19.0.0/24) and dcn1
(172.17.10.0/24, 172.18.10.0/24, 172.19.10.0/24)
c. central/network_data.yaml.j2: Added routes to InternalApi,
Storage, and Tenant subnets pointing to dcn1
(172.17.10.0/24, 172.18.10.0/24, 172.19.10.0/24) and dcn2
(172.17.20.0/24, 172.18.20.0/24, 172.19.20.0/24)
2. Control Plane Routes Added (config_download.yaml)
a. dcn1/config_download.yaml: Added host_routes to
leaf1 subnet for central (192.168.122.0/24) and
dcn2 (192.168.144.0/24)
b. dcn2/config_download.yaml: Added host_routes to
leaf2 subnet for central (192.168.122.0/24) and
dcn1 (192.168.133.0/24)
c. central/config_download.yaml: Added host_routes to
ctlplane-subnet for dcn1 (192.168.133.0/24) and dcn2
(192.168.144.0/24)
3. Subnet References Fixed (roles.yaml)
a. dcn1/roles.yaml: Changed ComputeDcn1 networks to use
internal_api_leaf1, tenant_leaf_1, storage_leaf1
b. dcn2/roles.yaml: Changed ComputeDcn2 networks to use
internal_api_leaf2, tenant_leaf_2, storage_leaf2
Signed-off-by: John Fulton <[email protected]>
Co-authored-by: Claude <[email protected]>
The new control plane defined in the architecture repo (examples/dt/dcn_nostorage/control-plane/nncp/values.yaml) uses unique VLAN IDs per site like central: 20-23, dcn1: 30-33 and dcn2: 40-43. The old control plane defined in the data-plane-adoption repo (tests/vars.dcn_nostorage.yaml) uses the same VLAN per site like central: 20-23, dcn1: 20-23 and dcn2: 20-23. This leads to problems during adoption testing which requires manual renumbering to fix. This patch updates the original 17 test control plane to use the same unique VLAN IDs per site. Co-authored-by: Claude <[email protected]> Signed-off-by: John Fulton <[email protected]>
DCN deployments on RHEL 9.4 hypervisors require loose reverse path filtering to allow asymmetric routing between DCN compute nodes and the central controller. Set KernelIpv4ConfAllRpFilter=2 to set net.ipv4.conf.all.rp_filter=2 on all overcloud nodes. Without this setting, DCN compute nodes cannot communicate with the central controller's Keystone service during deployment, causing the nova_wait_for_compute_service task to fail. Note: This issue does not occur on CentOS Stream 9 hypervisors. Note: It is assumed that the same settings have already been made on the hypervisor hosting the VMs. Also, remove redundant ControllerExtraConfig and set nova::availability_zone::default_schedule_zone: az-central using the single ControllerExtraConfig. Co-Authored-By: Claude <[email protected]> Signed-off-by: John Fulton <[email protected]>
Add network-specific routes to dcn_nostorage.yaml stack configurations to enable ci-framework's os-net-config template to render cross-site routes for DCN compute nodes. Problem: - ci-framework pre-generates /etc/os-net-config/tripleo_config.yaml via ansible template before TripleO Heat deployment runs - This bypasses TripleO Heat NIC templates completely - Routes defined in network_data.yaml.j2 are never rendered to compute nodes Solution: - Add network_routes configuration to dcn1 and dcn2 stacks in dcn_nostorage.yaml - ci-framework's os_net_config_overcloud.yml.j2 template will consume these routes and render them to /etc/os-net-config/tripleo_config.yaml Routes Added: - DCN1: Routes to central (172.17/18/19.0.0/24) and dcn2 (172.17/18/19.20.0/24) via appropriate gateways - DCN2: Routes to central (172.17/18/19.0.0/24) and dcn1 (172.17/18/19.10.0/24) via appropriate gateways This enables DCN compute nodes to reach OVN southbound DB and other services on central controllers using the correct source IP addresses. Related: Commit bfc6d4d added routes to network_data.yaml.j2, but those routes were never being used due to ci-framework bypassing Heat templates. Co-Authored-By: Claude <[email protected]> Signed-off-by: John Fulton <[email protected]>
Add routes field to DCN subnet definitions in netconfig_networks to enable
proper inter-site connectivity. Routes are templated from edpm_dcn1_routes
and edpm_dcn2_routes variables.
The NetConfig controller propagates these routes to IPSet status, which the
openstack-operator inventory generator reads to create {network}_host_routes
ansible variables for the EDPM network configuration template.
Changes:
- Add routes to internalapidcn1/dcn2 subnets for RabbitMQ/API connectivity
- Add routes to storagedcn1/dcn2 subnets for storage traffic
- Add routes to tenantdcn1/dcn2 subnets for tenant network traffic
This fixes the issue where DCN compute nodes couldn't connect to RabbitMQ
and other control plane services because they lacked routes to central site.
Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
Problem: DCN compute nodes' OVN Controller agents cannot connect to the OVN SB database because the default ovncontroller-config ConfigMap uses Kubernetes ClusterIP (tcp:ovsdbserver-sb.openstack.svc:6642) which is not routable from external EDPM nodes on different network segments. While DNS resolution works (via dnsmasq at 192.168.122.80), the resolved ClusterIP cannot be reached from DCN sites which are on different internalapi subnets (172.17.10.x for dcn1, 172.17.20.x for dcn2 vs central's 172.17.0.x). This causes port binding failures when launching VMs in DCN availability zones: "Binding failed for port, please check neutron logs for more information" Evidence: - Central compute OVN Controller agents: Connected and working (`:-)` status) - DCN compute OVN Controller agents: NOT registered in OVN SB database - `ovn-sbctl show` shows only central computes and gateway, no DCN chassis Root Cause: Setting edpm_ovn_dbs variable is insufficient because the edpm_ovn role loads ovncontroller-config ConfigMap data which overrides the ovn-remote setting. The default ConfigMap (created by OVNDBCluster operator) uses ClusterIP. Solution: 1. Retrieve OVN SB internalapi IPs from pod annotations 2. Create DCN-specific ConfigMap (ovncontroller-config-dcn) with direct IPs 3. Create DCN-specific DataPlaneService (ovn-dcn) referencing this ConfigMap 4. Patch dcn1/dcn2 nodesets to use ovn-dcn service instead of ovn This ensures DCN nodes connect to OVN SB via routable internalapi IPs: tcp:172.17.0.34:6642,tcp:172.17.0.36:6642,tcp:172.17.0.35:6642 Co-Authored-By: Claude <[email protected]> Signed-off-by: John Fulton <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a4d319b24b8342a787b83d63227f4077 ✔️ noop SUCCESS in 0s |
This PR will be rebased after the following merges:
#1184