Skip to content

⚠️ Improve priority and validation of input variables consumed in wait_for_interface_or_ip#788

Open
mchiappero wants to merge 2 commits intometal3-io:mainfrom
mchiappero:waitforiforip-improvements
Open

⚠️ Improve priority and validation of input variables consumed in wait_for_interface_or_ip#788
mchiappero wants to merge 2 commits intometal3-io:mainfrom
mchiappero:waitforiforip-improvements

Conversation

@mchiappero
Copy link
Contributor

@mchiappero mchiappero commented Nov 14, 2025

What this PR does / why we need it:
The current logic in wait_for_interface_or_ip deals with the input IP addresses or interfaces, according to the following order: PROVISIONING_IP, IRONIC_IP, PROVISIONING_INTERFACE. The improvements this PR introduces are:

  • make the test case for IRONIC_IP explicit rather than implicit, improving code readability before the following change
  • "group" PROVISIONING_IP and PROVISIONING_INTERFACE in terms of priority by prioritizing IRONIC_IP instead, resulting in the following order: IRONIC_IP, PROVISIONING_IP, PROVISIONING_INTERFACE
  • include input validation for the externally provided IRONIC_IP

What the PR does not address is logging a warning in case of multiple input variables set and one taking inevitably precedence over the another(s).

NOTE: based on PR #787

Checklist:

  • Documentation has been updated, if necessary.
  • Integration tests have been added, if necessary.

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rozzii for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 14, 2025
@metal3-io-bot
Copy link
Contributor

Hi @mchiappero. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 14, 2025
@mchiappero mchiappero force-pushed the waitforiforip-improvements branch 3 times, most recently from 85381e6 to 4b626bc Compare November 14, 2025 17:37
Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nitting, and also please squash the commits.

As for the actual logic, I need another look with fresher eyes tomorrow.
/cc @Rozzii @dtantsur
PTAL.

@tuminoid
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 16, 2025
@tuminoid
Copy link
Member

Also, isn't this the same as #787 ?

@mchiappero
Copy link
Contributor Author

Also, isn't this the same as #787 ?

The PRs are stacked, as far as I can see it's not possible to set a different branch, but I should have mentioned that in the description. Let me fix it... along with the proposed changes (thanks for reviewing!).

@mchiappero
Copy link
Contributor Author

Some minor nitting, and also please squash the commits.

As for the actual logic, I need another look with fresher eyes tomorrow. /cc @Rozzii @dtantsur PTAL.

Sure. But before squashing I'd like to make sure the change is fully understood, though. By the way, IRONIC_IP is not documented in https://github.com/metal3-io/ironic-image/blob/main/README.md, while it probably should. Also, which variables take precedence is also not documented either, nor logged by the scripts. I guess maybe something to discuss at the weekly meeting. Let me know what you think!

@mchiappero mchiappero force-pushed the waitforiforip-improvements branch from 4b626bc to b26391c Compare November 17, 2025 20:00
@elfosardo
Copy link
Member

/hold
let's hold on to this until we deal with #787

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2025
@mchiappero mchiappero force-pushed the waitforiforip-improvements branch from b26391c to a3971ce Compare November 18, 2025 22:50
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 16, 2026
@Rozzii
Copy link
Member

Rozzii commented Feb 19, 2026

/remove-lifecycle stale
Not stale, it is the second in a chain of PRs, it was waiting for the first PR to get merged.

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2026
How Ironic and related services are configured and bound to the system
interfaces fundamentally depends on three possible inputs:
- a provisioning IP (if PROVISIONING_IP is set), which, if found, is used to
  determine IRONIC_IP
- a provisioning interface (if PROVISIONING_INTERFACE or
  PROVISIONING_MACS is set), by setting IRONIC_IP to the first IP
  address found on such interface
- IRONIC_IP directly, whenever set

Depending on the variable(s) set, the function might search and possibly wait
for the requested interface or IP to appear.

Rework the test logic of the function, so that the three possible cases
are clearly listed and tested separately, by introducing an explicit
check for IRONIC_IP. This allows to:
- validate the value of IRONIC_IP, when externally provided
- test PROVISIONING_INTERFACE separately, in order to improve readability
- validate at least one of the above three inputs is provided and fail
  gracefully otherwise

Note that 1) the order of the evaluation is not changed 2) for this to work,
no default value should be set in PROVISIONING_INTERFACE prior to the
execution of wait_for_interface_or_ip; thus remove "provisioning" from
get_provisioning_interface.

NOTE: a test for PROVISIONING_MACS could also be moved to
wait_for_interface_or_ip and get_provisioning_interface removed, but
this commits aims at minimizing change.

Signed-off-by: Marco Chiappero <marco.chiappero@suse.com>
Currently wait_for_interface_or_ip determined how to bind the services
onto the system by evaluating the input variables in the following order:
PROVISIONING_IP, IRONIC_IP, PROVISIONING_INTERFACE.

However IRONIC_IP should likely be considered as overriding any
PROVISIONING_* value. Thus, make sure IRONIC_IP is evaluated at the
beginning of the chain.

Signed-off-by: Marco Chiappero <marco.chiappero@suse.com>
@mchiappero mchiappero force-pushed the waitforiforip-improvements branch from d61deb8 to 32b7266 Compare February 24, 2026 12:04
@mchiappero
Copy link
Contributor Author

/retest

@tuminoid
Copy link
Member

/test metal3-centos-e2e-integration-test-main metal3-ubuntu-e2e-integration-test-main

@tuminoid
Copy link
Member

[2026-02-25T06:37:47.688Z]    - Waiting for task completion (up to 2400 seconds)  - Command: 'check_container httpd-infra'

[2026-02-25T07:17:57.752Z] FAIL - Container httpd-infra running

/retest

@Rozzii
Copy link
Member

Rozzii commented Mar 2, 2026

/test metal3-centos-e2e-integration-test-main

@mchiappero
Copy link
Contributor Author

Sorry, I didn't have much time to look into it, and I'm not sure I understand why it's failing; but it looks like it's some Ansible task:

11:09:08  + for name in ipa-downloader vbmc sushy-tools httpd-infra ipxe-builder registry
11:09:08  + sudo docker ps
11:09:08  + grep -w -q 'ipa-downloader$'
11:09:08  sudo: docker: command not found
11:09:08  + sudo docker ps --all
11:09:08  + grep -w -q 'ipa-downloader$'
11:09:08  sudo: docker: command not found
11:09:08  + for name in ipa-downloader vbmc sushy-tools httpd-infra ipxe-builder registry
11:09:08  + sudo docker ps
11:09:08  + grep -w -q 'vbmc$'
11:09:08  sudo: docker: command not found
[...]

What do you think?

@elfosardo
Copy link
Member

/unhold

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 5, 2026
@elfosardo
Copy link
Member

/retest

let's give it another try

@metal3-io-bot
Copy link
Contributor

@mchiappero: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
metal3-centos-e2e-integration-test-main 32b7266 link true /test metal3-centos-e2e-integration-test-main
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@metal3-io-bot
Copy link
Contributor

@mchiappero: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
metal3-centos-e2e-integration-test-main 32b7266 link true /test metal3-centos-e2e-integration-test-main

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tuminoid
Copy link
Member

tuminoid commented Mar 6, 2026

   - Waiting for task completion (up to 2400 seconds)  - Command: 'check_container httpd-infra'

That could have direct relation to change being made.

@tuminoid
Copy link
Member

tuminoid commented Mar 6, 2026

   - Waiting for task completion (up to 2400 seconds)  - Command: 'check_container httpd-infra'

That could have direct relation to change being made.

My bot says:

  Root Cause Analysis

  The failure is DIRECTLY CAUSED by PR #788. Two changes combine to break the httpd-infra container:

   1. get_provisioning_interface(): Default changed from interface="provisioning" to interface="". When no PROVISIONING_MACS are set, the function now returns empty 
  string instead of "provisioning".
   2. wait_for_interface_or_ip(): New else error branch added: else
        echo "ERROR: cannot determine an interface or an IP for binding and creating URLs"
        return 1
    fi

  The httpd-infra podman container receives NO environment variables (IRONIC_IP, PROVISIONING_IP, PROVISIONING_INTERFACE, PROVISIONING_MACS are all empty). The
  execution path is:

   - IRONIC_IP="" → skip
   - PROVISIONING_IP="" → skip  
   - PROVISIONING_INTERFACE="" (was "provisioning" before PR) → skip
   - Hits new else → return 1 → set -e kills container

  Before the PR: PROVISIONING_INTERFACE would default to "provisioning" via get_provisioning_interface(), and wait_for_interface_or_ip() would loop waiting for that
  interface to get an IP. In CI, the provisioning bridge exists, so it would succeed.

  Kubernetes Ironic pods are unaffected because they receive PROVISIONING_IP=172.22.0.2 via pod spec environment variables.

Ie. it fails on ubuntu which has local (ie. non-k8s) Ironic, while passing on centos (k8s ironic).

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2026
@metal3-io-bot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tuminoid
Copy link
Member

tuminoid commented Mar 9, 2026

This was superseded by above linked PR.

/close

@metal3-io-bot
Copy link
Contributor

@tuminoid: Closed this PR.

Details

In response to this:

This was superseded by above linked PR.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mchiappero
Copy link
Contributor Author

Let's reopen, PR #935 addresses a different problem.

If I understand it correctly the only problem is the final else, I wasn't aware of the possibility of running ironic without any of those variables defined. What should the logic be in such case?

@tuminoid
Copy link
Member

Let's reopen, PR #935 addresses a different problem.

Yes true, we still have prioritization etc left here. My bad.
/reopen

If I understand it correctly the only problem is the final else, I wasn't aware of the possibility of running ironic without any of those variables defined. What should the logic be in such case?

We either or maybe even both:

  • keep the "provisioning" default here (backwards compatibility)
  • add PROVISIONING_INTERFACE=provisioning default in dev-env (sane testing defaults)

WDYT @elfosardo ?

@metal3-io-bot metal3-io-bot reopened this Mar 10, 2026
@metal3-io-bot
Copy link
Contributor

@tuminoid: Reopened this PR.

Details

In response to this:

Let's reopen, PR #935 addresses a different problem.

Yes true, we still have prioritization etc left here. My bad.
/reopen

If I understand it correctly the only problem is the final else, I wasn't aware of the possibility of running ironic without any of those variables defined. What should the logic be in such case?

We either or maybe even both:

  • keep the "provisioning" default here (backwards compatibility)
  • add PROVISIONING_INTERFACE=provisioning default in dev-env (sane testing defaults)

WDYT @elfosardo ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@elfosardo
Copy link
Member

@tuminoid that should work

Copy link
Member

@Rozzii Rozzii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mchiappero, please rebase the PR.

@mchiappero
Copy link
Contributor Author

mchiappero commented Mar 13, 2026

Sorry for the late reply, I had to take some time off this week.

Before applying any change, I would like to double check:

  1. my assumptions, as I guess I didn't have a full understanding of the logic when I wrote this. So, is the "provisioning" interface name specific to non metal3 environments? Because otherwise if none of the input variables are provided it fails in a confusing way. Is the case where no variables are provided valid?

  2. If it's still a good idea to make IRONIC_IP higher priority. So whether to:

    • to apply this change and use the following evaluation order IRONIC_IP > PROVISIONING_IP > PROVISIONING_INTERFACE; but then is there a case none of them would be set?
    • simply keep it as is as there are other reasons I can't see

I think it would be good to document this logic, from a developer perspective, but also from the user point of view. Maybe I can add some edits to README.md in this PR if we have a decision on the outcome.

@mchiappero
Copy link
Contributor Author

Is the case where no variables are provided valid?

This cannot be valid indeed, otherwise no IRONIC_IP & URLs are generated. I'm not sure I understand why it is not passing the tests.

@tuminoid
Copy link
Member

Is the case where no variables are provided valid?

This cannot be valid indeed, otherwise no IRONIC_IP & URLs are generated. I'm not sure I understand why it is not passing the tests.

It is not valid, but it is just not defined anywhere with local Ironic case. To fix that, the suggestion made earlier stands.

We either or maybe even both:

  • keep the "provisioning" default here (backwards compatibility)
  • add PROVISIONING_INTERFACE=provisioning default in dev-env (sane testing defaults)

As for the question 2 about IRONIC_IP priority, I need to defer to @elfosardo or @Rozzii .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants