Skip to content

Move ARM Linux workflows to GitHub Actions#8977

Open
alexreinking wants to merge 6 commits intomainfrom
alexreinking/gha-arm-linux
Open

Move ARM Linux workflows to GitHub Actions#8977
alexreinking wants to merge 6 commits intomainfrom
alexreinking/gha-arm-linux

Conversation

@alexreinking
Copy link
Member

@alexreinking alexreinking commented Mar 3, 2026

I had Codex write the initial workflow for testing on ARM Linux, based on the master.cfg from the buildbots and from looking at the logs from a real run. It's probably a little wrong, but I can fix it up.

@alexreinking alexreinking requested a review from shoaibkamil March 3, 2026 18:31
@alexreinking alexreinking added the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Mar 3, 2026
@alexreinking
Copy link
Member Author

There appears to be a real bug in how LLVM handles float16 constants on ARM32. It results in a misalignment of the constant pool that manifests as an illegal instruction. A salient part from my transcript:

The FP16 instructions are working perfectly:

vldr.16 s0, [pc, #404]
vmov.f16 s2, #224
vcvt.f16.s32 s4, s4
vfma.f16 s6, s4, s0
vstr.16 s6, [r7]

The crash is from bl 0xee1d53a4 at 0xee1d51e4 — and 0xee1d53a4 is in the constant pool (the hex dump confirms it's data). Multiple tail-call branches (b 0xee1d53ac, b 0xee1d53b4, b 0xee1d53bc, bl 0xee1d53c4) all go into the same data region.

The mov r1, lr; bl; cmp r0, #0; bne pattern looks like a Halide runtime call (feature check or allocation). The data at the targets shows a repeating pattern — every other word is 0xf004ee1d — which looks like corrupted 8-byte JIT stubs. If I byte-shift the data by 2, each pair decodes as LDR PC, [PC, #-4] (0xe51ff004) + a function address (0xee1dXXXX). This suggests the constant pool is 2 bytes too short, shifting all subsequent stubs by 2 bytes.

This is likely an LLVM ARM32 constant island pass bug where a 16-bit vldr.16 constant pool entry causes misalignment. This is not something our Halide changes can fix — it's an LLVM backend issue.

Want to file this upstream, or should we look for a Halide-level workaround (e.g., avoiding vldr.16 literal pool entries on ARM32)?

@alexreinking alexreinking force-pushed the alexreinking/gha-arm-linux branch from 07c999b to 04d9dd1 Compare March 5, 2026 10:40
@alexreinking alexreinking removed the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Mar 5, 2026
@alexreinking alexreinking force-pushed the alexreinking/gha-arm-linux branch 2 times, most recently from f9115e8 to 97d3009 Compare March 5, 2026 15:29
@alexreinking alexreinking requested a review from abadams March 5, 2026 15:51
if (Internal::get_llvm_version() >= 210 &&
Internal::get_llvm_version() < 220 &&
get_jit_target_from_environment().has_feature(Target::SVE2)) {
printf("[SKIP] LLVM 21 has known getFixedValue() assertions on SVE scalable types.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assertions -> "assertion failures" or "bugs"

@abadams
Copy link
Member

abadams commented Mar 5, 2026

From the tests, it looks like we should just say we only support sve2 from llvm 22 up, because user code is going to hit assertion failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants