OCPBUGS-63472: detect manual network config and set --copy-network installer arg#10414
OCPBUGS-63472: detect manual network config and set --copy-network installer arg#10414rwsu wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
@rwsu: This pull request references Jira Issue OCPBUGS-63472, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a oneshot agent systemd unit and script that detect manual NetworkManager keyfiles and instruct assisted-service to set the installer Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh`:
- Around line 15-17: The current pipeline that sets manual_keyfiles using grep
and xargs will break on filenames with spaces (e.g., "Wired connection
1.nmconnection"); replace the pipeline that reads from "$NM_CONNECTIONS_DIR" and
pipes through xargs with a null-delimited safe approach: use find (or grep with
--null/-Z where available) to emit NUL-separated filenames and consume them with
xargs -0 or a while IFS= read -r -d '' loop, testing each file for the
^[connection] header and assembling manual_keyfiles accordingly; update the code
around the manual_keyfiles assignment in agent-set-host-copy-network-arg.sh to
use this NUL-safe find/xargs or read-loop so files with spaces are handled
correctly.
In `@data/data/agent/systemd/units/agent-set-host-copy-network-arg.service`:
- Around line 3-4: The current unit ordering is racy because
agent-set-host-copy-network-arg.service only specifies
Before=start-cluster-installation.service but lacks ordering against
agent-add-node.service and no readiness gate for install-worker flows; update
agent-set-host-copy-network-arg.service to include explicit ordering against
agent-add-node.service (e.g., Before=agent-add-node.service and/or
Wants=agent-add-node.service as appropriate) and add a readiness/gate for
install-worker flows by introducing a simple target/marker (e.g.,
host-patch-complete.target) that the agent’s PATCH completion emits and make
start-cluster-installation.service require/After= that target (or have
start-cluster-installation.service Wait/Requires that marker) so install workers
cannot proceed until the assisted-service/agent PATCH has completed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e74578f0-3ced-428d-990b-1683fd931fa7
⛔ Files ignored due to path filters (4)
docs/user/agent/agent_installer_services-add_nodes_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-install_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-interactive.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-unconfigured_ignition_and_config_image_flow.svgis excluded by!**/*.svg
📒 Files selected for processing (7)
cmd/openshift-install/internal_integration_test.gocmd/openshift-install/testdata/agent/image/assets/default_assets.txtdata/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.shdata/data/agent/systemd/units/agent-set-host-copy-network-arg.servicedocs/user/agent/agent-services.mdinternal/tshelpers/custom_commands.gopkg/asset/agent/image/ignition.go
data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh
Outdated
Show resolved
Hide resolved
| After=agent.service | ||
| Before=start-cluster-installation.service |
There was a problem hiding this comment.
This ordering is still racy for add-nodes and install workers.
Before=start-cluster-installation.service only serializes the rendezvous node’s local unit. pkg/asset/agent/image/ignition.go also enables this service in add-nodes, but there is no ordering against agent-add-node.service, and install-workflow workers can still be driven to installing before their local PATCH completes.
Minimal day-2 fix
-Before=start-cluster-installation.service
+Before=start-cluster-installation.service agent-add-node.serviceThe install-worker path still needs a readiness or assisted-service-side gate.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| After=agent.service | |
| Before=start-cluster-installation.service | |
| After=agent.service | |
| Before=start-cluster-installation.service agent-add-node.service |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@data/data/agent/systemd/units/agent-set-host-copy-network-arg.service` around
lines 3 - 4, The current unit ordering is racy because
agent-set-host-copy-network-arg.service only specifies
Before=start-cluster-installation.service but lacks ordering against
agent-add-node.service and no readiness gate for install-worker flows; update
agent-set-host-copy-network-arg.service to include explicit ordering against
agent-add-node.service (e.g., Before=agent-add-node.service and/or
Wants=agent-add-node.service as appropriate) and add a readiness/gate for
install-worker flows by introducing a simple target/marker (e.g.,
host-patch-complete.target) that the agent’s PATCH completion emits and make
start-cluster-installation.service require/After= that target (or have
start-cluster-installation.service Wait/Requires that marker) so install workers
cannot proceed until the assisted-service/agent PATCH has completed.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh`:
- Around line 31-38: The script currently grabs the first infra-env via jq
'.[0].id' which can pick the wrong infra-env; instead, query all infra-envs
(using curl_assisted_service) and search each infra-env for the host that
matches this machine (e.g., by comparing host identifier fields such as MAC,
hostname, or sensor ID returned in the infra-env's hosts/inventory) and only set
INFRA_ENV_ID when you find the infra-env that contains the matching host; update
the logic around INFRA_ENV_ID, the curl_assisted_service call and the jq filter
(replace '.[0].id' usage) to loop over infra-envs and select the id of the
infra-env that contains the matched host before proceeding.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d9cfd38-655c-40a4-ad16-a33c11558a31
⛔ Files ignored due to path filters (4)
docs/user/agent/agent_installer_services-add_nodes_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-install_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-interactive.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-unconfigured_ignition_and_config_image_flow.svgis excluded by!**/*.svg
📒 Files selected for processing (8)
cmd/openshift-install/internal_integration_test.gocmd/openshift-install/testdata/agent/image/assets/default_assets.txtdata/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.shdata/data/agent/systemd/units/agent-set-host-copy-network-arg.servicedocs/user/agent/agent-services.mdinternal/tshelpers/custom_commands.gopkg/asset/agent/image/ignition.gopkg/asset/agent/image/ignition_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/asset/agent/image/ignition.go
🚧 Files skipped from review as they are similar to previous changes (3)
- data/data/agent/systemd/units/agent-set-host-copy-network-arg.service
- docs/user/agent/agent-services.md
- cmd/openshift-install/testdata/agent/image/assets/default_assets.txt
| INFRA_ENV_ID="" | ||
| until [ -n "$INFRA_ENV_ID" ] && [ "$INFRA_ENV_ID" != "null" ]; do | ||
| INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[0].id' 2>/dev/null) || true | ||
| if [ -z "$INFRA_ENV_ID" ] || [ "$INFRA_ENV_ID" = "null" ]; then | ||
| echo "Waiting for assisted-service and infra-env to be available..." | ||
| sleep 5 | ||
| fi | ||
| done |
There was a problem hiding this comment.
Do not select infra-env by .[0].id.
Line 33 assumes the first infra-env belongs to this host. In environments with multiple infra-envs (notably add-nodes/unconfigured flows), this can patch the wrong host or wait forever in later host lookup.
Suggested fix (resolve host across infra-envs, then patch the matched one)
-INFRA_ENV_ID=""
-until [ -n "$INFRA_ENV_ID" ] && [ "$INFRA_ENV_ID" != "null" ]; do
- INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[0].id' 2>/dev/null) || true
- if [ -z "$INFRA_ENV_ID" ] || [ "$INFRA_ENV_ID" = "null" ]; then
- echo "Waiting for assisted-service and infra-env to be available..."
- sleep 5
- fi
-done
-echo "Got INFRA_ENV_ID: $INFRA_ENV_ID"
+INFRA_ENV_ID=""
+host_id=""
+until [ -n "$INFRA_ENV_ID" ] && [ -n "$host_id" ]; do
+ infra_env_ids=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[].id // empty' 2>/dev/null) || true
+ for infra in $infra_env_ids; do
+ for mac in $local_macs; do
+ host_id=$(curl_assisted_service "/infra-envs/${infra}/hosts" GET 2>/dev/null | \
+ jq -r --arg mac "$mac" '
+ .[] |
+ select(.inventory != null and .inventory != "") |
+ select(
+ .inventory | fromjson | .interfaces // [] |
+ map(.mac_address | ascii_downcase) |
+ index($mac) != null
+ ) |
+ .id
+ ' 2>/dev/null | head -1) || true
+ if [ -n "$host_id" ]; then
+ INFRA_ENV_ID="$infra"
+ break 2
+ fi
+ done
+ done
+ [ -z "$INFRA_ENV_ID" ] && echo "Host not yet found in any infra-env, retrying..." && sleep 5
+done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| INFRA_ENV_ID="" | |
| until [ -n "$INFRA_ENV_ID" ] && [ "$INFRA_ENV_ID" != "null" ]; do | |
| INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[0].id' 2>/dev/null) || true | |
| if [ -z "$INFRA_ENV_ID" ] || [ "$INFRA_ENV_ID" = "null" ]; then | |
| echo "Waiting for assisted-service and infra-env to be available..." | |
| sleep 5 | |
| fi | |
| done | |
| INFRA_ENV_ID="" | |
| host_id="" | |
| until [ -n "$INFRA_ENV_ID" ] && [ -n "$host_id" ]; do | |
| infra_env_ids=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[].id // empty' 2>/dev/null) || true | |
| for infra in $infra_env_ids; do | |
| for mac in $local_macs; do | |
| host_id=$(curl_assisted_service "/infra-envs/${infra}/hosts" GET 2>/dev/null | \ | |
| jq -r --arg mac "$mac" ' | |
| .[] | | |
| select(.inventory != null and .inventory != "") | | |
| select( | |
| .inventory | fromjson | .interfaces // [] | | |
| map(.mac_address | ascii_downcase) | | |
| index($mac) != null | |
| ) | | |
| .id | |
| ' 2>/dev/null | head -1) || true | |
| if [ -n "$host_id" ]; then | |
| INFRA_ENV_ID="$infra" | |
| break 2 | |
| fi | |
| done | |
| done | |
| [ -z "$INFRA_ENV_ID" ] && echo "Host not yet found in any infra-env, retrying..." && sleep 5 | |
| done |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh`
around lines 31 - 38, The script currently grabs the first infra-env via jq
'.[0].id' which can pick the wrong infra-env; instead, query all infra-envs
(using curl_assisted_service) and search each infra-env for the host that
matches this machine (e.g., by comparing host identifier fields such as MAC,
hostname, or sensor ID returned in the infra-env's hosts/inventory) and only set
INFRA_ENV_ID when you find the infra-env that contains the matching host; update
the logic around INFRA_ENV_ID, the curl_assisted_service call and the jq filter
(replace '.[0].id' usage) to loop over infra-envs and select the id of the
infra-env that contains the matched host before proceeding.
…staller arg Add agent-set-host-copy-network-arg.service which runs on every node after agent.service registers the host with assisted-service. If manually-created NetworkManager keyfiles are found in /etc/NetworkManager/system-connections (excluding those auto-generated by nm-initrd-generator), the service sets the --copy-network installer arg for the current host via the assisted-service REST API: PATCH /api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/installer-args This ensures that network configuration created manually via the agent-tui persists into the installed OS via coreos-installer's --copy-network flag. Nodes with no manual network config exit immediately without any API call. The service runs Before=start-cluster-installation.service, which prevents installation from starting on the rendezvous node until the service has completed. For non-rendezvous nodes, there is no equivalent ordering guarantee since start-cluster-installation.service only runs on the rendezvous node. This means the --copy-network flag for non-rendezvous hosts may potentially be applied after cluster installation has already started. In practice the window is small as the service completes quickly once the host is registered, but this remains an open timing consideration. The service is enabled across all agent workflows (install, add-nodes, and unconfigured/ZTP) via getDefaultEnabledServices(), and runs after agent.service. A new isoIgnitionSystemdContains testscript helper is added to verify systemd units are present in the ignition config embedded in the ISO. The default_assets.txt test is extended to verify all services from getDefaultEnabledServices() are present, including the new service. The agent-services.md documentation and workflow SVG diagrams are updated to include the new service. Assisted-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh (1)
31-73:⚠️ Potential issue | 🟠 MajorResolve infra-env selection by matching host across all infra-envs
Line 33 uses
.[0].id, which can pick the wrong infra-env when multiple infra-envs exist. Then Lines 53-73 only query that infra-env’s hosts, so this can either patch the wrong host or loop forever waiting for a host that will never appear there.Suggested fix
-INFRA_ENV_ID="" -until [ -n "$INFRA_ENV_ID" ] && [ "$INFRA_ENV_ID" != "null" ]; do - INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[0].id' 2>/dev/null) || true - if [ -z "$INFRA_ENV_ID" ] || [ "$INFRA_ENV_ID" = "null" ]; then - echo "Waiting for assisted-service and infra-env to be available..." - sleep 5 - fi -done -echo "Got INFRA_ENV_ID: $INFRA_ENV_ID" +INFRA_ENV_ID="" # Get local MAC addresses (lowercase, skip loopback). These are used to match # this host against the inventory reported by the agent to assisted-service, # since assisted-service identifies hosts by their interface MAC addresses. local_macs=$(ip link show | awk '/link\/ether/{print tolower($2)}') @@ host_id="" until [ -n "$host_id" ]; do - for mac in $local_macs; do - host_id=$(curl_assisted_service "/infra-envs/${INFRA_ENV_ID}/hosts" GET 2>/dev/null | \ - jq -r --arg mac "$mac" ' - .[] | - select(.inventory != null and .inventory != "") | - select( - .inventory | fromjson | .interfaces // [] | - map(.mac_address | ascii_downcase) | - index($mac) != null - ) | - .id - ' 2>/dev/null | head -1) || true - [ -n "$host_id" ] && break + infra_env_ids=$(curl_assisted_service "/infra-envs" GET 2>/dev/null | jq -r '.[].id // empty' 2>/dev/null) || true + for infra in $infra_env_ids; do + for mac in $local_macs; do + host_id=$(curl_assisted_service "/infra-envs/${infra}/hosts" GET 2>/dev/null | \ + jq -r --arg mac "$mac" ' + .[] | + select(.inventory != null and .inventory != "") | + select( + .inventory | fromjson | .interfaces // [] | + map(.mac_address | ascii_downcase) | + index($mac) != null + ) | + .id + ' 2>/dev/null | head -1) || true + if [ -n "$host_id" ]; then + INFRA_ENV_ID="$infra" + break 2 + fi + done done if [ -z "$host_id" ]; then - echo "Host not yet found in assisted-service, retrying..." + echo "Host not yet found in any infra-env, retrying..." sleep 5 fi done +echo "Got INFRA_ENV_ID: $INFRA_ENV_ID"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh` around lines 31 - 73, The code currently grabs a single infra-env via INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET ... | jq -r '.[0].id') then only searches that infra-env for a host, which fails when the host belongs to a different infra-env; change the logic to enumerate all infra-envs returned by curl_assisted_service and search each one's hosts for the MACs from local_macs (via curl_assisted_service "/infra-envs/${ID}/hosts" and the same jq inventory matching) and when a match is found set both INFRA_ENV_ID and host_id and break out; update the loop that uses local_macs/host_id and references to curl_assisted_service, INFRA_ENV_ID, and host_id accordingly so it never assumes .[0].id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@data/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.sh`:
- Around line 31-73: The code currently grabs a single infra-env via
INFRA_ENV_ID=$(curl_assisted_service "/infra-envs" GET ... | jq -r '.[0].id')
then only searches that infra-env for a host, which fails when the host belongs
to a different infra-env; change the logic to enumerate all infra-envs returned
by curl_assisted_service and search each one's hosts for the MACs from
local_macs (via curl_assisted_service "/infra-envs/${ID}/hosts" and the same jq
inventory matching) and when a match is found set both INFRA_ENV_ID and host_id
and break out; update the loop that uses local_macs/host_id and references to
curl_assisted_service, INFRA_ENV_ID, and host_id accordingly so it never assumes
.[0].id.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b66c4f13-b262-4289-b360-402fb9749b81
⛔ Files ignored due to path filters (4)
docs/user/agent/agent_installer_services-add_nodes_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-install_workflow.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-interactive.svgis excluded by!**/*.svgdocs/user/agent/agent_installer_services-unconfigured_ignition_and_config_image_flow.svgis excluded by!**/*.svg
📒 Files selected for processing (8)
cmd/openshift-install/internal_integration_test.gocmd/openshift-install/testdata/agent/image/assets/default_assets.txtdata/data/agent/files/usr/local/bin/agent-set-host-copy-network-arg.shdata/data/agent/systemd/units/agent-set-host-copy-network-arg.servicedocs/user/agent/agent-services.mdinternal/tshelpers/custom_commands.gopkg/asset/agent/image/ignition.gopkg/asset/agent/image/ignition_test.go
✅ Files skipped from review due to trivial changes (3)
- pkg/asset/agent/image/ignition.go
- docs/user/agent/agent-services.md
- data/data/agent/systemd/units/agent-set-host-copy-network-arg.service
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/asset/agent/image/ignition_test.go
- cmd/openshift-install/internal_integration_test.go
|
/retest-required |
|
@rwsu: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add agent-set-host-copy-network-arg.service which runs on every node after agent.service registers the host with assisted-service. If manually-created NetworkManager keyfiles are found in /etc/NetworkManager/system-connections (excluding those auto-generated by nm-initrd-generator), the service sets the --copy-network installer arg for the current host via the assisted-service REST API:
PATCH /api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/installer-args
This ensures that network configuration created manually via the agent-tui persists into the installed OS via coreos-installer's --copy-network flag. Nodes with no manual network config exit immediately without any API call.
The service runs Before=start-cluster-installation.service, which prevents installation from starting on the rendezvous node until the service has completed. For non-rendezvous nodes, there is no equivalent ordering guarantee since start-cluster-installation.service only runs on the rendezvous node. This means the --copy-network flag for non-rendezvous hosts may potentially be applied after cluster installation has already started. In practice the window is small as the service completes quickly once the host is registered, but this remains an open timing consideration.
The service is enabled across all agent workflows (install, add-nodes, and unconfigured/ZTP) via getDefaultEnabledServices(), and runs after agent.service.
A new isoIgnitionSystemdContains testscript helper is added to verify systemd units are present in the ignition config embedded in the ISO. The default_assets.txt test is extended to verify all services from getDefaultEnabledServices() are present, including the new service.
The agent-services.md documentation and workflow SVG diagrams are updated to include the new service.
Assisted-by: Claude Sonnet 4.6 (1M context) noreply@anthropic.com