Skip to content

Conversation

@synthe102
Copy link

@synthe102 synthe102 commented Dec 5, 2025

What this PR does / why we need it:

This PR adds arm64 iPXE binaries to the Ironic image and points arm64 DHCP clients to the correct file.

Linked with #502.

Checklist:

  • Documentation has been updated, if necessary.
  • Integration tests have been added, if necessary.

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rozzii for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot
Copy link
Contributor

Hi @synthe102. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 5, 2025
@metal3-io-bot metal3-io-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 5, 2025
@elfosardo
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 5, 2025
Copy link

@shenwei66 shenwei66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi friend,please have a look my comments.

# NOTE(elfosardo): warning should not be treated as errors by default
NO_WERROR=1 make bin/undionly.kpxe "bin-$ARCH-efi/snponly.efi"
NO_WERROR=1 make bin/undionly.kpxe bin-x86_64-efi/snponly.efi && \
NO_WERROR=1 make CROSS=aarch64-linux-gnu- bin-arm64-efi/snponly.efi
Copy link

@shenwei66 shenwei66 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if my server is arm? From line 26 to 27.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without my changes, the image can't be built on arm64. bin/undionly.kpxe can only be built on amd64 servers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed on Slack that adding arm64 support to amd64 image was a nice first step rather than trying to add in one shot arm64 support and multi-arch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, the assumption is that the build happen on x86
should we consider also the arm64 case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, currently the image can't be built for the linux/arm64 target.

Dockerfile Outdated
dnf install -y gcc git make xz-devel
dnf groupinstall -y 'Development Tools' && \
dnf install -y epel-release && \
dnf install -y gcc git make xz-devel gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if my server is arm64, we can’t execute this command,right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can ! I tried during development to have a multi-arch image, and it works. Its the undionly.kpxe build that causes issue on arm64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're giving as assumed that we're running on x86 ?

Copy link
Author

@synthe102 synthe102 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as the image can't be cross compiled right now because of syslinux.
It means that:

  • syslinux-nonlinux in main-package-list.txt can't be installed
  • bin/undionly.kpxe can't be compiled on arm64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both are required for legacy boot, it sort of makes sense. I guess the path forward is to install them conditionally with understanding that only UEFI can be used for ARM64 clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, it does not have to be in this PR since, as you said, ironic-image cannot be built on ARM64 now. But it's relatively easy to fix that, and I think we should do exactly that: only build undionly.kpxe and install syslinux for ironic-image on amd64.

@dtantsur
Copy link
Member

dtantsur commented Dec 6, 2025

/cc @honza

@metal3-io-bot metal3-io-bot requested a review from honza December 6, 2025 15:40
@synthe102
Copy link
Author

/test

@metal3-io-bot
Copy link
Contributor

@synthe102: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test markdownlint
/test metal3-centos-e2e-integration-test-main
/test metal3-ubuntu-e2e-integration-test-main
/test shellcheck

The following commands are available to trigger optional jobs:

/test metal3-dev-env-integration-test-centos-main
/test metal3-dev-env-integration-test-ubuntu-main
Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@synthe102
Copy link
Author

/test metal3-dev-env-integration-test-centos-main
/test metal3-ubuntu-e2e-integration-test-main

@synthe102
Copy link
Author

/test metal3-centos-e2e-integration-test-main

@synthe102
Copy link
Author

/test metal3-ubuntu-e2e-integration-test-main

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for arm64 architecture in the iPXE boot process by introducing arm64-specific iPXE binaries. The changes enable arm64 clients to boot using the appropriate snponly-arm64.efi bootloader instead of the x86_64 version.

  • Cross-compilation support added to build arm64 iPXE binaries alongside x86_64 binaries
  • DHCP configuration updated to route arm64 clients (client-arch 11) to the arm64-specific bootloader
  • Runtime script updated to deploy the new arm64 binary to the TFTP boot directory

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
Dockerfile Adds cross-compilation toolchain for aarch64 and builds snponly-arm64.efi binary; copies the arm64 binary to the final image
scripts/rundnsmasq Updates firmware copy operations to include snponly-arm64.efi in both custom and default firmware paths
ironic-config/dnsmasq.conf.j2 Separates arm64 EFI clients (client-arch 11) into dedicated efi-arm64 tag with architecture-specific bootloader configuration for both IPv4 and IPv6

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@synthe102
Copy link
Author

/test metal3-centos-e2e-integration-test-main
/test metal3-ubuntu-e2e-integration-test-main

Copy link
Member

@elfosardo elfosardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you considering only the scenario where the image is built on x86 ?

echo "keepcache=1" >> /etc/dnf/dnf.conf && \
dnf install -y gcc git make xz-devel
dnf groupinstall -y 'Development Tools' && \
dnf install -y epel-release && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the need for EPEL?
I would really like to see an alternative, we removed the need for the EPEL repo since a while

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EPEL is needed for gcc-aarch64-linux-gnu and gcc-c++-aarch64-linux-gnu. I'm not familiar with the RHEL ecosystem, if there is another way to grab those packages I'll be glad to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, install packages from EPEL without enabling it globally (you can enable a repo just for one command). Just so that we have control over where packages are coming from.

Dockerfile Outdated
dnf install -y gcc git make xz-devel
dnf groupinstall -y 'Development Tools' && \
dnf install -y epel-release && \
dnf install -y gcc git make xz-devel gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're giving as assumed that we're running on x86 ?

# NOTE(elfosardo): warning should not be treated as errors by default
NO_WERROR=1 make bin/undionly.kpxe "bin-$ARCH-efi/snponly.efi"
NO_WERROR=1 make bin/undionly.kpxe bin-x86_64-efi/snponly.efi && \
NO_WERROR=1 make CROSS=aarch64-linux-gnu- bin-arm64-efi/snponly.efi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, the assumption is that the build happen on x86
should we consider also the arm64 case?

Dockerfile Outdated
echo "tsflags=nodocs" >> /etc/dnf/dnf.conf && \
echo "keepcache=1" >> /etc/dnf/dnf.conf && \
dnf install -y gcc git make xz-devel
dnf groupinstall -y 'Development Tools' && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we specify packages instead of using a group to avoid bringing in unnecessary dependencies without realizing it.

@synthe102
Copy link
Author

synthe102 commented Dec 11, 2025

It seems there is a race condition caused --mount=type=cache,target=/var/cache/dnf between ironic-builder and the main image where /var/cache/dnf/download_lock.pid can be deleted while still needed in ironic-builder.
I get the following error: [Errno 2] No such file or directory: '/var/cache/dnf/download_lock.pid', removing the cache fixes it.

Edit: fixed by setting sharing=locked in cache settings.

@metal3-io-bot metal3-io-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 11, 2025
@elfosardo
Copy link
Member

It seems there is a race condition caused --mount=type=cache,target=/var/cache/dnf between ironic-builder and the main image where /var/cache/dnf/download_lock.pid can be deleted while still needed in ironic-builder. I get the following error: [Errno 2] No such file or directory: '/var/cache/dnf/download_lock.pid', removing the cache fixes it.

Edit: fixed by setting sharing=locked in cache settings.

@synthe102 that was not happening before though, so what's changed?
dnf has a good locking system and the default shared option for the cache should just work
if we need to change the behavior, I think we need a reason, and probably a separate issue
please try using dnf clean all instead after the installation command

@synthe102
Copy link
Author

synthe102 commented Dec 11, 2025

I think its a race condition where prepare-image.sh runs rm -rf /var/cache/{yum,dnf}/* before all the dnf commands line 11 are done running.
I'm noticing this on my arm macbook where I build so image for amd64 so the cross compile make the line 11 way slower than usual.

@synthe102
Copy link
Author

@elfosardo You're right, its a regression. Introduced by disabling epel:
This errors:

dnf install -y epel-release
dnf config-manager --set-disabled epel
dnf install -y gcc git make xz-devel
dnf install --enablerepo=epel -y gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu

This doesn't:

dnf install -y epel-release
dnf install -y gcc git make xz-devel
dnf install -y gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu

@synthe102
Copy link
Author

Still can't make it work with dnf config-manager --set-disabled epel even when removing rm -rf /var/cache/{yum,dnf}/* from build scripts and the Dockerfile and trying to rely on dnf clear all.

I didn't notice any difference in build time with and without sharing=locked.

@synthe102
Copy link
Author

/test metal3-centos-e2e-integration-test-main
/test metal3-ubuntu-e2e-integration-test-main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants