Add worker static IP manifests and update worker scripts#98
Add worker static IP manifests and update worker scripts#98SharonBX wants to merge 1 commit intorh-ecosystem-edge:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SharonBX The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @SharonBX. Thanks for your PR. I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
WalkthroughAdds optional per-worker static IP provisioning: new env var and README doc, NMState-based bridge script overhaul, unit/service updates, new static-IP BareMetalHost and BMC secret manifests, and worker provisioning script changes to validate/generate/apply static-IP manifests. Changes
Sequence DiagramsequenceDiagram
autonumber
actor User
participant Script as Worker Provisioning Script
participant K8s as Kubernetes API
participant BMH as BareMetalHost Controller
participant NMState as NMState Engine
participant Worker as Worker Node
User->>Script: Start provisioning (WORKER_STATIC_IP=true)
Script->>Script: Validate per-worker INT/IP/GW/PL/DNS
Script->>Script: Generate static BMH & Secret manifests
Script->>K8s: Apply bmc-secret-static-ip.yaml
Script->>K8s: Apply baremetalhost-static-ip.yaml
K8s->>BMH: Deliver manifests to provisioning controller
BMH->>Worker: Boot node with injected nmstate secret
Worker->>NMState: Request/apply network configuration
NMState->>Worker: Configure bridge and static IPv4 (IP, GW, DNS)
Worker-->>Script: Provisioning complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manifests/cluster-installation/99-worker-bridge.yaml (1)
52-63:⚠️ Potential issue | 🟠 Major
p0-routing.servicecan fail permanently without bridge readiness/retry guards.Line 54 does not order this unit after
nmstate-bridge.service, and Lines 57-63 have no restart policy. If first execution happens before bridge prerequisites settle, the routing setup may never recover.Suggested hardening
[Unit] Description=Configure p0 interface routing -After=kubelet.service network-online.target -Wants=network-online.target +After=kubelet.service network-online.target nmstate-bridge.service +Wants=network-online.target nmstate-bridge.service [Service] Type=oneshot ExecStart=/usr/local/bin/configure-p0-routing.sh RemainAfterExit=yes StandardOutput=journal StandardError=journal +Restart=on-failure +RestartSec=10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/cluster-installation/99-worker-bridge.yaml` around lines 52 - 63, The p0-routing.service unit can fail permanently because it doesn't wait for the bridge and has no restart policy; update the unit for p0-routing.service to depend on and start after nmstate-bridge.service (add After=nmstate-bridge.service and optionally Wants=nmstate-bridge.service) and add a restart policy (e.g., Restart=on-failure and RestartSec=30) so ExecStart=/usr/local/bin/configure-p0-routing.sh will be retried if it runs before the bridge is ready.
🧹 Nitpick comments (2)
manifests/worker-provisioning/baremetalhost-static-ip.yaml (1)
16-16: Avoid hardcoding disabled TLS verification for BMC access.Line 16 sets
disableCertificateVerification: trueunconditionally. Consider parameterizing this (secure default, explicit opt-out for lab/self-signed environments).Suggested template change
- disableCertificateVerification: true + disableCertificateVerification: <BMC_DISABLE_CERT_VERIFICATION>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/worker-provisioning/baremetalhost-static-ip.yaml` at line 16, The manifest currently hardcodes disableCertificateVerification: true; change this to a parameterized/template-driven value (e.g., replace the literal with a configurable variable such that the default is false/secure and operators must explicitly opt-in to disable certificate verification for lab/self-signed BMCs). Update the template and/or values schema to expose disableCertificateVerification (default false) and ensure any rendering paths (helm/templating code that produces the resource) read that variable so environments can opt-out explicitly rather than having TLS verification disabled by default.manifests/cluster-installation/99-worker-bridge.yaml (1)
14-19: Inline base64 scripts are difficult to audit and can drift from source scripts.Lines 14 and 19 embed full executable logic as base64, which makes regression review and maintenance harder. Consider generating these payloads from
configuration_templates/*.shin a build step to keep a single source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/cluster-installation/99-worker-bridge.yaml` around lines 14 - 19, The manifest embeds full scripts as inline base64 (the contents: source: data:text/plain;charset=utf-8;base64:... entries that write /usr/local/bin/apply-nmstate-bridge.sh and the OVN wait script), which makes auditing hard; add a build step that generates these payloads from configuration_templates/*.sh (encode to the same data:text/plain;base64 form or produce plain file assets) and update 99-worker-bridge.yaml to reference the generated payloads instead of the long inline base64 strings (replace the two contents.source base64 blobs for apply-nmstate-bridge.sh and the OVN wait script with the build-produced values or file references), ensuring the manifest still writes the same target paths (/usr/local/bin/apply-nmstate-bridge.sh and the OVN wait script) and preserving mode/overwrite fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@configuration_templates/apply-nmstate-bridge.sh`:
- Around line 44-71: The DHCP vs static detection should inspect the primary
address entry in IFACE_DATA instead of any occurrence of "dynamic":true; update
the conditional that currently greps IFACE_DATA to parse the JSON (e.g., with jq
or python) and check the primary address for the selected IFACE_NAME to see if
its "dynamic" flag is true (use the same IFACE_NAME/IFACE_DATA variables and
keep BRIDGE_NAME, MAC_ADDR, TARGET_MTU references), and for the static branch
ensure you validate GATEWAY (the GATEWAY=$(ip route show dev "$IFACE_NAME" | awk
'/default via/ {print $3}') assignment) before embedding it in the nmstate YAML:
if GATEWAY is empty or not a valid IP, either omit the gateway section from the
generated config or exit with a clear error; apply these changes where the
dynamic check and GATEWAY usage occur so nmstatectl apply does not receive
invalid YAML.
In `@scripts/worker.sh`:
- Around line 63-83: The DNS and prefix-length inputs are not being fully
validated: change validation in the WORKER_STATIC_IP branch so WORKER_${i}_DNS
is split into individual addresses (split on whitespace, commas, and newlines)
and each token is checked with is_valid_ip (instead of treating the whole scalar
as one token), and validate WORKER_${i}_PL (the variable referenced as pl in the
block) is non-empty, an integer, and within a valid range for the IP family
(0–32 for IPv4, 0–128 for IPv6 determined from ipaddr/int) before proceeding;
update the for-loop that currently iterates over $ipaddr $gw $dns to iterate
over ipaddr, gw and each dns token, and add a numeric range check for pl
alongside invoking is_valid_ip for each DNS token.
---
Outside diff comments:
In `@manifests/cluster-installation/99-worker-bridge.yaml`:
- Around line 52-63: The p0-routing.service unit can fail permanently because it
doesn't wait for the bridge and has no restart policy; update the unit for
p0-routing.service to depend on and start after nmstate-bridge.service (add
After=nmstate-bridge.service and optionally Wants=nmstate-bridge.service) and
add a restart policy (e.g., Restart=on-failure and RestartSec=30) so
ExecStart=/usr/local/bin/configure-p0-routing.sh will be retried if it runs
before the bridge is ready.
---
Nitpick comments:
In `@manifests/cluster-installation/99-worker-bridge.yaml`:
- Around line 14-19: The manifest embeds full scripts as inline base64 (the
contents: source: data:text/plain;charset=utf-8;base64:... entries that write
/usr/local/bin/apply-nmstate-bridge.sh and the OVN wait script), which makes
auditing hard; add a build step that generates these payloads from
configuration_templates/*.sh (encode to the same data:text/plain;base64 form or
produce plain file assets) and update 99-worker-bridge.yaml to reference the
generated payloads instead of the long inline base64 strings (replace the two
contents.source base64 blobs for apply-nmstate-bridge.sh and the OVN wait script
with the build-produced values or file references), ensuring the manifest still
writes the same target paths (/usr/local/bin/apply-nmstate-bridge.sh and the OVN
wait script) and preserving mode/overwrite fields.
In `@manifests/worker-provisioning/baremetalhost-static-ip.yaml`:
- Line 16: The manifest currently hardcodes disableCertificateVerification:
true; change this to a parameterized/template-driven value (e.g., replace the
literal with a configurable variable such that the default is false/secure and
operators must explicitly opt-in to disable certificate verification for
lab/self-signed BMCs). Update the template and/or values schema to expose
disableCertificateVerification (default false) and ensure any rendering paths
(helm/templating code that produces the resource) read that variable so
environments can opt-out explicitly rather than having TLS verification disabled
by default.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 97c909fd-183b-41db-b703-88e8f5651a06
📒 Files selected for processing (9)
README.mdci/env.defaultsci/env.templateconfiguration_templates/apply-nmstate-bridge.shdocs/user-guide/worker-provisioning.mdmanifests/cluster-installation/99-worker-bridge.yamlmanifests/worker-provisioning/baremetalhost-static-ip.yamlmanifests/worker-provisioning/bmc-secret-static-ip.yamlscripts/worker.sh
| if [[ "${WORKER_STATIC_IP}" == "true" ]]; then | ||
| for var in WORKER_${i}_INT WORKER_${i}_IPADDR WORKER_${i}_GW WORKER_${i}_PL WORKER_${i}_DNS; do | ||
| if [[ -z "${!var}" ]]; then | ||
| log "ERROR" "WORKER_STATIC_IP is enabled but $var is not set" | ||
| return 1 | ||
| fi | ||
| done | ||
| local ipaddr_var="WORKER_${i}_IPADDR"; local ipaddr="${!ipaddr_var}" | ||
| local gw_var="WORKER_${i}_GW"; local gw="${!gw_var}" | ||
| local int_var="WORKER_${i}_INT"; local int="${!int_var}" | ||
| local pl_var="WORKER_${i}_PL"; local pl="${!pl_var}" | ||
| local dns_var="WORKER_${i}_DNS"; local dns="${!dns_var}" | ||
|
|
||
| #TBD dns array | ||
| for ip in $ipaddr $gw $dns; do | ||
| if ! is_valid_ip "$ip"; then | ||
| log "ERROR" "ip not valid: $ip " | ||
| return 1 | ||
| fi | ||
| done | ||
| fi |
There was a problem hiding this comment.
Fail fast on DNS and prefix-length input.
The current DNS check validates whitespace-separated values one token at a time, but the static secret still renders WORKER_${i}_DNS into a single <DNS_IP> scalar. WORKER_${i}_PL is also accepted verbatim. Both cases can produce network data that only fails later when NMState consumes it.
Proposed fix
if [[ "${WORKER_STATIC_IP}" == "true" ]]; then
for var in WORKER_${i}_INT WORKER_${i}_IPADDR WORKER_${i}_GW WORKER_${i}_PL WORKER_${i}_DNS; do
if [[ -z "${!var}" ]]; then
log "ERROR" "WORKER_STATIC_IP is enabled but $var is not set"
return 1
fi
done
local ipaddr_var="WORKER_${i}_IPADDR"; local ipaddr="${!ipaddr_var}"
local gw_var="WORKER_${i}_GW"; local gw="${!gw_var}"
local int_var="WORKER_${i}_INT"; local int="${!int_var}"
local pl_var="WORKER_${i}_PL"; local pl="${!pl_var}"
local dns_var="WORKER_${i}_DNS"; local dns="${!dns_var}"
- `#TBD` dns array
- for ip in $ipaddr $gw $dns; do
+ for ip in "$ipaddr" "$gw"; do
if ! is_valid_ip "$ip"; then
log "ERROR" "ip not valid: $ip "
return 1
fi
done
+
+ if ! is_valid_ip "$dns"; then
+ log "ERROR" "WORKER_${i}_DNS must be a single IPv4 address until DNS-list support is added"
+ return 1
+ fi
+
+ if ! [[ "$pl" =~ ^[0-9]+$ ]] || (( pl < 0 || pl > 32 )); then
+ log "ERROR" "WORKER_${i}_PL must be an integer between 0 and 32"
+ return 1
+ fi
fiAlso applies to: 104-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/worker.sh` around lines 63 - 83, The DNS and prefix-length inputs are
not being fully validated: change validation in the WORKER_STATIC_IP branch so
WORKER_${i}_DNS is split into individual addresses (split on whitespace, commas,
and newlines) and each token is checked with is_valid_ip (instead of treating
the whole scalar as one token), and validate WORKER_${i}_PL (the variable
referenced as pl in the block) is non-empty, an integer, and within a valid
range for the IP family (0–32 for IPv4, 0–128 for IPv6 determined from
ipaddr/int) before proceeding; update the for-loop that currently iterates over
$ipaddr $gw $dns to iterate over ipaddr, gw and each dns token, and add a
numeric range check for pl alongside invoking is_valid_ip for each DNS token.
bdfa4f1 to
480136f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
scripts/worker.sh (1)
79-85:⚠️ Potential issue | 🟠 MajorDNS and prefix-length validation is incomplete.
The
#TBD dns arraycomment indicates awareness that DNS handling needs work. Currently:
- DNS validation relies on word-splitting (
$dnsunquoted), but the template (bmc-secret-static-ip.yaml) expects a single<DNS_IP>value$pl(prefix-length) is not validated as an integer within valid range (0-32 for IPv4)Until multi-DNS support is implemented, validate that
dnsis a single valid IP and thatplis a valid integer.🐛 Proposed fix
`#TBD` dns array - for ip in $ipaddr $gw $dns; do + for ip in "$ipaddr" "$gw"; do if ! is_valid_ip "$ip"; then log "ERROR" "ip not valid: $ip " return 1 fi done + + if ! is_valid_ip "$dns"; then + log "ERROR" "WORKER_${i}_DNS must be a single valid IP address" + return 1 + fi + + if ! [[ "$pl" =~ ^[0-9]+$ ]] || (( pl < 1 || pl > 32 )); then + log "ERROR" "WORKER_${i}_PL must be an integer between 1 and 32" + return 1 + fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/worker.sh` around lines 79 - 85, The current validation loop incorrectly word-splits $dns and doesn't validate prefix length; change validation to explicitly check ipaddr, gw, and dns as single values (use is_valid_ip on each variable individually and ensure dns contains no whitespace so it's a single IP) and add a new check for pl to ensure it's an integer between 0 and 32 (log an "ERROR" and return 1 on failure). Locate and update the block referencing ipaddr, gw, dns and the function is_valid_ip to perform separate validations, and add the pl integer/range check before returning success.configuration_templates/apply-nmstate-bridge.sh (2)
68-101:⚠️ Potential issue | 🟠 MajorMissing gateway validation before YAML generation.
GATEWAYis extracted at line 70 but used at line 99 without checking if it's empty or valid. If no default route exists on$IFACE_NAME, the generated NMState config will contain an emptynext-hop-address, causingnmstatectl applyto fail.🐛 Proposed fix to validate gateway
echo "INFO: Detected STATIC IP. Generating Static bridge config..." GATEWAY=$(ip route show dev "$IFACE_NAME" | awk '/default via/ {print $3}') + if [[ -z "$GATEWAY" ]]; then + echo "ERROR: Could not determine default gateway on $IFACE_NAME" >&2 + exit 1 + fi cat <<EOF > /tmp/br-dpu-config.yml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@configuration_templates/apply-nmstate-bridge.sh` around lines 68 - 101, The script extracts GATEWAY with GATEWAY=$(ip route show dev "$IFACE_NAME" | awk '/default via/ {print $3}') but does not validate it before embedding it into the generated /tmp/br-dpu-config.yml (used in the routes -> next-hop-address field), which can produce an empty next-hop and break nmstatectl apply; add a validation immediately after the extraction that checks if GATEWAY is non-empty and a plausible IP (e.g. non-empty and matches basic IPv4 pattern), and if invalid either emit a clear error message including $IFACE_NAME and exit non-zero, or alter the YAML generation to omit the routes section when no valid gateway is found so nmstatectl apply won’t receive an empty next-hop-address.
44-44:⚠️ Potential issue | 🟠 MajorDHCP detection not scoped to the primary IP address.
The
grep -q '"dynamic":true'check examines the entire interface JSON, not just the selected IP's entry. On dual-stack or multi-address interfaces where one address is dynamic (e.g., IPv6 SLAAC) and another is static, this can misclassify the primary IP. Usejqto check thedynamicflag for the specific$IP_ADDRentry.🐛 Proposed fix to scope check to primary IP
-if echo "$IFACE_DATA" | grep -q '"dynamic":true'; then +IP_DYNAMIC=$(echo "$IFACE_DATA" | jq -r --arg ip "$IP_ADDR" ' + .addr_info[] | select(.local==$ip) | (.dynamic // false) +' | head -n1) + +if [[ "$IP_DYNAMIC" == "true" ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@configuration_templates/apply-nmstate-bridge.sh` at line 44, The current DHCP check scans the whole IFACE_DATA with grep which can misclassify when other addresses are dynamic; update the check to parse IFACE_DATA with jq and select the address object matching the IP_ADDR (compare .address or appropriate field) and test its .dynamic flag (e.g., evaluate if that specific address has "dynamic": true), then branch based on that result; modify the conditional around IFACE_DATA/IFACE_DATA check so it queries the specific address entry for IP_ADDR rather than grepping the entire JSON.
🧹 Nitpick comments (2)
scripts/worker.sh (1)
132-132: Inconsistent indentation: tab vs spaces.Line 132 uses a tab character while the rest of the file uses spaces for indentation.
♻️ Proposed fix
- retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret.yaml" false + retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret.yaml" false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/worker.sh` at line 132, The line calling retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret.yaml" false uses a tab for indentation while the rest of scripts/worker.sh uses spaces; replace the leading tab with the same number of spaces used elsewhere (match the file's indentation style) so the call to retry/apply_manifest and variables WORKER_GENERATED_DIR and name are consistently indented.configuration_templates/apply-nmstate-bridge.sh (1)
15-22: Whitespace handling inconsistency with related script.The
get_ip_from_ip_hint_file()function reads the IP hint file with barecat, but the related scriptconfiguration_templates/configure-pod-to-worker-routing.sh(lines 15-16) strips whitespace usingtr -d '[:space:]'. Trailing newlines or spaces in the hint file could cause IP lookup failures.♻️ Proposed fix to strip whitespace
get_ip_from_ip_hint_file() { local ip_hint_file="$1" if [[ ! -f "${ip_hint_file}" ]]; then echo "ERROR: IP Hint file not found at $ip_hint_file" >&2 exit 1 fi - cat "${ip_hint_file}" + cat "${ip_hint_file}" | tr -d '[:space:]' }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@configuration_templates/apply-nmstate-bridge.sh` around lines 15 - 22, The get_ip_from_ip_hint_file function currently returns the file contents verbatim which can include trailing newlines/spaces and break IP lookups; update get_ip_from_ip_hint_file to trim all whitespace (e.g., pipe the file through tr -d '[:space:]' or equivalent) before returning so its behavior matches the whitespace-stripping used in configure-pod-to-worker-routing.sh; ensure the function name get_ip_from_ip_hint_file is the one modified and that callers receive the cleaned IP string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/cluster-installation/99-worker-bridge.yaml`:
- Line 33: Replace the hardcoded MTU value in the systemd ExecStart for
apply-nmstate-bridge by changing the literal "9000" to the template placeholder
"<NODES_MTU>" so the manifests.sh templating can substitute the real MTU;
specifically update the ExecStart line that calls
/usr/local/bin/apply-nmstate-bridge.sh to pass "<NODES_MTU>" instead of "9000".
In `@manifests/worker-provisioning/bmc-secret-static-ip.yaml`:
- Around line 17-19: Add a numeric-only validation for the WORKER_${i}_PL
variable inside the existing validation loop in worker.sh: in the loop that
checks worker vars (the block referencing WORKER_${i}_IP and WORKER_${i}_PL),
test that the value of WORKER_${i}_PL matches a numeric regex (e.g., '^[0-9]+$')
and if not, print a clear error and exit non‑zero; ensure you reference the same
variable name WORKER_${i}_PL and run this check before proceeding to
sed/template substitution so that invalid prefix-lengths are caught early.
---
Duplicate comments:
In `@configuration_templates/apply-nmstate-bridge.sh`:
- Around line 68-101: The script extracts GATEWAY with GATEWAY=$(ip route show
dev "$IFACE_NAME" | awk '/default via/ {print $3}') but does not validate it
before embedding it into the generated /tmp/br-dpu-config.yml (used in the
routes -> next-hop-address field), which can produce an empty next-hop and break
nmstatectl apply; add a validation immediately after the extraction that checks
if GATEWAY is non-empty and a plausible IP (e.g. non-empty and matches basic
IPv4 pattern), and if invalid either emit a clear error message including
$IFACE_NAME and exit non-zero, or alter the YAML generation to omit the routes
section when no valid gateway is found so nmstatectl apply won’t receive an
empty next-hop-address.
- Line 44: The current DHCP check scans the whole IFACE_DATA with grep which can
misclassify when other addresses are dynamic; update the check to parse
IFACE_DATA with jq and select the address object matching the IP_ADDR (compare
.address or appropriate field) and test its .dynamic flag (e.g., evaluate if
that specific address has "dynamic": true), then branch based on that result;
modify the conditional around IFACE_DATA/IFACE_DATA check so it queries the
specific address entry for IP_ADDR rather than grepping the entire JSON.
In `@scripts/worker.sh`:
- Around line 79-85: The current validation loop incorrectly word-splits $dns
and doesn't validate prefix length; change validation to explicitly check
ipaddr, gw, and dns as single values (use is_valid_ip on each variable
individually and ensure dns contains no whitespace so it's a single IP) and add
a new check for pl to ensure it's an integer between 0 and 32 (log an "ERROR"
and return 1 on failure). Locate and update the block referencing ipaddr, gw,
dns and the function is_valid_ip to perform separate validations, and add the pl
integer/range check before returning success.
---
Nitpick comments:
In `@configuration_templates/apply-nmstate-bridge.sh`:
- Around line 15-22: The get_ip_from_ip_hint_file function currently returns the
file contents verbatim which can include trailing newlines/spaces and break IP
lookups; update get_ip_from_ip_hint_file to trim all whitespace (e.g., pipe the
file through tr -d '[:space:]' or equivalent) before returning so its behavior
matches the whitespace-stripping used in configure-pod-to-worker-routing.sh;
ensure the function name get_ip_from_ip_hint_file is the one modified and that
callers receive the cleaned IP string.
In `@scripts/worker.sh`:
- Line 132: The line calling retry 5 10 apply_manifest
"${WORKER_GENERATED_DIR}/${name}-bmc-secret.yaml" false uses a tab for
indentation while the rest of scripts/worker.sh uses spaces; replace the leading
tab with the same number of spaces used elsewhere (match the file's indentation
style) so the call to retry/apply_manifest and variables WORKER_GENERATED_DIR
and name are consistently indented.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 67c9d889-f40d-4def-9dd8-7e73ab25f5fa
📒 Files selected for processing (9)
README.mdci/env.defaultsci/env.templateconfiguration_templates/apply-nmstate-bridge.shdocs/user-guide/worker-provisioning.mdmanifests/cluster-installation/99-worker-bridge.yamlmanifests/worker-provisioning/baremetalhost-static-ip.yamlmanifests/worker-provisioning/bmc-secret-static-ip.yamlscripts/worker.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- README.md
- ci/env.defaults
- docs/user-guide/worker-provisioning.md
- ci/env.template
| address: | ||
| - ip: <IPADDR> | ||
| prefix-length: <PL> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if NMState documentation confirms prefix-length must be integer
# and verify how process_template handles numeric substitution
rg -n 'prefix-length' --type yaml -C2Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 475
🏁 Script executed:
# Find the process_template function to understand placeholder substitution
rg -n 'process_template' --type sh -B2 -A10Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 10257
🏁 Script executed:
# Search for WORKER_*_PL variable definition in worker.sh
rg -n 'PL=' . --type sh -B2 -A2Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 57
🏁 Script executed:
# Check for NMState schema or documentation references in the repository
fd -e md -e txt -e yaml | xargs grep -l 'nmstate\|prefix-length' 2>/dev/null | head -20Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 183
🏁 Script executed:
# Look at the actual structure of bmc-secret-static-ip.yaml to understand full context
cat -n manifests/worker-provisioning/bmc-secret-static-ip.yamlRepository: rh-ecosystem-edge/openshift-dpf
Length of output: 869
🏁 Script executed:
# Search for where pl variable is defined or extracted in worker.sh
rg -n '\bpl\b|WORKER_.*PL' scripts/worker.sh -B3 -A1Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 2854
🏁 Script executed:
# Get the full process_template function implementation
sed -n '280,320p' scripts/utils.shRepository: rh-ecosystem-edge/openshift-dpf
Length of output: 1336
🏁 Script executed:
# Check if there are any NMState schema or documentation references in the repo
find . -type f \( -name "*.md" -o -name "README*" \) | xargs grep -l -i nmstate 2>/dev/nullRepository: rh-ecosystem-edge/openshift-dpf
Length of output: 115
🏁 Script executed:
# Search for validation or comment about prefix-length/PL in worker.sh
rg -n 'prefix.*length|PL' scripts/worker.sh -B2 -A2Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 3006
🏁 Script executed:
# Find the escape_sed_replacement function implementation
rg -n 'escape_sed_replacement' scripts/utils.sh -A10Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 1051
🏁 Script executed:
# Check the worker provisioning documentation for any NMState requirements
grep -n -i 'prefix\|nmstate\|static.*ip' docs/user-guide/worker-provisioning.md | head -30Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 822
🏁 Script executed:
# Search for any input validation of WORKER_${i}_PL variable
rg -n 'WORKER_.*PL\|prefix.*length' . --type sh -B5 -A2Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 57
🏁 Script executed:
# Check if there are any NMState schema files or validation in the repo
find . -type f \( -name "*nmstate*" -o -name "*network*" \) | grep -E '\.(yaml|yml|json|schema)$'Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 181
🏁 Script executed:
# Look for any validation or error handling related to nmstate configuration
rg -n 'nmstate\|prefix-length' . --type yaml -C3Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 57
🏁 Script executed:
# Check if YAML validation is performed before applying manifests
rg -n 'apply_manifest\|oc apply\|kubectl apply' scripts/utils.sh -A5 -B2 | head -40Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 57
🏁 Script executed:
# Search NMState documentation or schema references online is not available
# Let me check if there's validation at the Kubernetes/NMState application stage
rg -n 'apply_manifest\|validate' scripts/utils.sh -B2 -A8 | head -50Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 57
🏁 Script executed:
# Check if there are examples in configuration_templates with numeric PL values
cat configuration_templates/nmstate_dhcp_worker_bridge.yamlRepository: rh-ecosystem-edge/openshift-dpf
Length of output: 391
🏁 Script executed:
# Final check: search for any numeric validation patterns in the codebase
rg -n '\[0-9\]|\binteger\b|[^[:digit:]]PL' scripts/worker.sh -C2Repository: rh-ecosystem-edge/openshift-dpf
Length of output: 3006
Add numeric validation for WORKER_${i}_PL variable in worker.sh.
The template structure and sed substitution are correct—<PL> is properly replaced with a bare integer value in the final YAML. However, there is no input validation to ensure WORKER_${i}_PL actually contains numeric characters. If a user provides an invalid value (e.g., WORKER_1_PL="abc"), the generated manifest will contain invalid NMState configuration. Add a check in the validation loop (around line 67–69) to verify that WORKER_${i}_PL matches a numeric pattern (e.g., [0-9]+).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/worker-provisioning/bmc-secret-static-ip.yaml` around lines 17 -
19, Add a numeric-only validation for the WORKER_${i}_PL variable inside the
existing validation loop in worker.sh: in the loop that checks worker vars (the
block referencing WORKER_${i}_IP and WORKER_${i}_PL), test that the value of
WORKER_${i}_PL matches a numeric regex (e.g., '^[0-9]+$') and if not, print a
clear error and exit non‑zero; ensure you reference the same variable name
WORKER_${i}_PL and run this check before proceeding to sed/template substitution
so that invalid prefix-lengths are caught early.
480136f to
65c6e80
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
scripts/worker.sh (1)
66-86:⚠️ Potential issue | 🟠 MajorReject unsupported DNS/prefix inputs before templating.
WORKER_${i}_DNSis validated token-by-token here, butmanifests/worker-provisioning/bmc-secret-static-ip.yamlstill renders it into a single<DNS_IP>scalar. A value like8.8.8.8 1.1.1.1passes this loop and only fails later when NMState consumes the generated secret.WORKER_${i}_PLis also never checked to be numeric and in range for the current IPv4 template.Suggested fix
- `#TBD` dns array - for ip in $ipaddr $gw $dns; do + if ! [[ "$pl" =~ ^[0-9]+$ ]] || (( pl < 0 || pl > 32 )); then + log "ERROR" "WORKER_${i}_PL must be an integer between 0 and 32" + return 1 + fi + + for ip in "$ipaddr" "$gw"; do if ! is_valid_ip "$ip"; then log "ERROR" "ip not valid: $ip " return 1 fi done + + if [[ "$dns" =~ [[:space:],] ]]; then + log "ERROR" "WORKER_${i}_DNS must be a single IP until multi-DNS rendering is implemented" + return 1 + fi + + if ! is_valid_ip "$dns"; then + log "ERROR" "ip not valid: $dns " + return 1 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/worker.sh` around lines 66 - 86, The script currently treats WORKER_${i}_DNS as a single scalar and doesn't validate WORKER_${i}_PL; fix by validating DNS and prefix before templating: split WORKER_${i}_DNS into whitespace-separated tokens and call is_valid_ip on each token (use the same validation used for ipaddr/gw), and validate WORKER_${i}_PL is a numeric integer within the IPv4 prefix range (0-32) before proceeding; update the validation block around the variables ipaddr_var/ipaddr, gw_var/gw, dns_var/dns and the loop that calls is_valid_ip, and add a numeric/range check for pl_var/pl that logs an ERROR and returns 1 on failure.configuration_templates/apply-nmstate-bridge.sh (1)
44-70:⚠️ Potential issue | 🟠 MajorScope DHCP detection to the selected address and fail on missing gateway.
grep '"dynamic":true'matches any address on the interface, so a static primary IPv4 plus SLAAC/DHCPv6 is misclassified as DHCP. The static branch also writes$GATEWAYinto the NMState YAML without checking thatip routeactually found one. Please keep the embedded copy inmanifests/cluster-installation/99-worker-bridge.yamlin sync.Suggested fix
-if echo "$IFACE_DATA" | grep -q '"dynamic":true'; then +IP_DYNAMIC=$(echo "$IFACE_DATA" | jq -r --arg ip "$IP_ADDR" ' + .addr_info[] | select(.local == $ip) | (.dynamic // false) +' | head -n1) + +if [[ "$IP_DYNAMIC" == "true" ]]; then echo "INFO: Detected DHCP (dynamic) IP. Generating DHCP bridge config..." cat <<EOF > /tmp/br-dpu-config.yml @@ else echo "INFO: Detected STATIC IP. Generating Static bridge config..." GATEWAY=$(ip route show dev "$IFACE_NAME" | awk '/default via/ {print $3}') + if [[ -z "$GATEWAY" ]]; then + echo "ERROR: Could not determine default gateway on $IFACE_NAME" >&2 + exit 1 + fi cat <<EOF > /tmp/br-dpu-config.yml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@configuration_templates/apply-nmstate-bridge.sh` around lines 44 - 70, The DHCP detection is currently too broad (using grep on IFACE_DATA) and the static branch proceeds without verifying a gateway; change the logic to inspect the specific address entry in IFACE_DATA (not any address) to determine if that selected address has "dynamic":true (e.g., parse the JSON for the chosen/primary address) and only then generate the DHCP YAML; in the static branch, require that GATEWAY (obtained from ip route show dev "$IFACE_NAME") is non-empty and error/exit if it's missing instead of writing an empty gateway into the NMState YAML; update references in the IFACE_DATA check and the GATEWAY usage so the generated manifest stays in sync with the embedded copy in manifests/cluster-installation/99-worker-bridge.yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@configuration_templates/apply-nmstate-bridge.sh`:
- Around line 8-13: The current validate_bridge_exists() function exits early
when BRIDGE_NAME exists, which masks drift (wrong port, IP, route, or MTU) and
prevents nmstate-bridge.service from reconciling; change validate_bridge_exists
to perform idempotent verification instead of immediate success: inspect the
bridge's slaves, IP address, routes and MTU (using ip link, ip addr, ip route)
and only return success if all expected values match, otherwise either repair
the configuration (remove/recreate the bridge or correct the mismatches) or
return failure so the oneshot nmstate-bridge.service can apply the proper
configuration; keep the function name validate_bridge_exists and use BRIDGE_NAME
and expected-port/expected-ip/expected-mtu symbols from the script to locate
checks to add.
In `@scripts/worker.sh`:
- Around line 128-131: The generated static-IP Secret may be skipped on retries
because apply_manifest is called with the no-overwrite flag (false); update the
retry invocation in the WORKER_STATIC_IP block to force re-apply so regenerated
files replace existing Secrets — change the call to retry 5 10 apply_manifest
"${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" true (apply_manifest
in scripts/utils.sh handles the flag).
---
Duplicate comments:
In `@configuration_templates/apply-nmstate-bridge.sh`:
- Around line 44-70: The DHCP detection is currently too broad (using grep on
IFACE_DATA) and the static branch proceeds without verifying a gateway; change
the logic to inspect the specific address entry in IFACE_DATA (not any address)
to determine if that selected address has "dynamic":true (e.g., parse the JSON
for the chosen/primary address) and only then generate the DHCP YAML; in the
static branch, require that GATEWAY (obtained from ip route show dev
"$IFACE_NAME") is non-empty and error/exit if it's missing instead of writing an
empty gateway into the NMState YAML; update references in the IFACE_DATA check
and the GATEWAY usage so the generated manifest stays in sync with the embedded
copy in manifests/cluster-installation/99-worker-bridge.yaml.
In `@scripts/worker.sh`:
- Around line 66-86: The script currently treats WORKER_${i}_DNS as a single
scalar and doesn't validate WORKER_${i}_PL; fix by validating DNS and prefix
before templating: split WORKER_${i}_DNS into whitespace-separated tokens and
call is_valid_ip on each token (use the same validation used for ipaddr/gw), and
validate WORKER_${i}_PL is a numeric integer within the IPv4 prefix range (0-32)
before proceeding; update the validation block around the variables
ipaddr_var/ipaddr, gw_var/gw, dns_var/dns and the loop that calls is_valid_ip,
and add a numeric/range check for pl_var/pl that logs an ERROR and returns 1 on
failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30af0c67-f007-4570-b9c0-5968ef717d33
📒 Files selected for processing (9)
README.mdci/env.defaultsci/env.templateconfiguration_templates/apply-nmstate-bridge.shdocs/user-guide/worker-provisioning.mdmanifests/cluster-installation/99-worker-bridge.yamlmanifests/worker-provisioning/baremetalhost-static-ip.yamlmanifests/worker-provisioning/bmc-secret-static-ip.yamlscripts/worker.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- manifests/worker-provisioning/baremetalhost-static-ip.yaml
- manifests/worker-provisioning/bmc-secret-static-ip.yaml
- ci/env.template
- README.md
| validate_bridge_exists() { | ||
| if ip link show "$BRIDGE_NAME" &> /dev/null; then | ||
| echo "INFO: Bridge '$BRIDGE_NAME' already exists. Configuration assumed complete." | ||
| exit 0 | ||
| fi | ||
| } |
There was a problem hiding this comment.
Don't treat a pre-existing br-dpu link as success.
This exits as soon as the link exists, but a failed previous run can leave br-dpu present with the wrong port, IP, route, or MTU. Because nmstate-bridge.service is oneshot, the node never reconciles that drift after this early success path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@configuration_templates/apply-nmstate-bridge.sh` around lines 8 - 13, The
current validate_bridge_exists() function exits early when BRIDGE_NAME exists,
which masks drift (wrong port, IP, route, or MTU) and prevents
nmstate-bridge.service from reconciling; change validate_bridge_exists to
perform idempotent verification instead of immediate success: inspect the
bridge's slaves, IP address, routes and MTU (using ip link, ip addr, ip route)
and only return success if all expected values match, otherwise either repair
the configuration (remove/recreate the bridge or correct the mismatches) or
return failure so the oneshot nmstate-bridge.service can apply the proper
configuration; keep the function name validate_bridge_exists and use BRIDGE_NAME
and expected-port/expected-ip/expected-mtu symbols from the script to locate
checks to add.
| # Apply manifests | ||
| if [[ "${WORKER_STATIC_IP}" == "true" ]]; then | ||
| retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" false | ||
| fi |
There was a problem hiding this comment.
Re-apply generated static network secrets on retries.
apply_manifest skips existing resources when the second argument is false (scripts/utils.sh:224-246). If a run creates ${name}-network-config and fails later, the next run regenerates the file but never updates the Secret, so the BMH can still boot with stale static-IP data.
Suggested fix
- retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" false
+ retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Apply manifests | |
| if [[ "${WORKER_STATIC_IP}" == "true" ]]; then | |
| retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" false | |
| fi | |
| # Apply manifests | |
| if [[ "${WORKER_STATIC_IP}" == "true" ]]; then | |
| retry 5 10 apply_manifest "${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" true | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/worker.sh` around lines 128 - 131, The generated static-IP Secret may
be skipped on retries because apply_manifest is called with the no-overwrite
flag (false); update the retry invocation in the WORKER_STATIC_IP block to force
re-apply so regenerated files replace existing Secrets — change the call to
retry 5 10 apply_manifest
"${WORKER_GENERATED_DIR}/${name}-bmc-secret-static-ip.yaml" true (apply_manifest
in scripts/utils.sh handles the flag).
| ipv4: | ||
| enabled: true |
There was a problem hiding this comment.
and what about ipv6? Do we want to add support later?
There was a problem hiding this comment.
we can add support easily but needs our lab to support it first .
| NODES_MTU=${1:-$DEFAULT_MTU} | ||
| BRIDGE_NAME="br-dpu" | ||
| IP_HINT_FILE="/run/nodeip-configuration/primary-ip" | ||
| TARGET_MTU="${1:-1500}" |
There was a problem hiding this comment.
Maybe lets take mtu from interface ? always?
There was a problem hiding this comment.
but in dhcp mode this is the 1st time we really use etu env var .
in static we use user mtu from IPA to CoreOS
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary by CodeRabbit
New Features
Documentation
Configuration
Provisioning
Manifests