Skip to content

Conversation

@mmuetzel
Copy link
Contributor

GitHub started hosting runners with Ubuntu on ARM64 processors for open source projects for free:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/

Add one configuration that is using these runners to the build matrix.

According to their blog post, the arm64 runners are more efficient and potentially faster than the x86_64 runners. (But it is still in a preview phase and maybe it will take some time for them to better scale and balance the load.) If that turns out to be true, it should be easy to switch more configurations in that workflow (or the one using the root CMake file) to the arm64 runners (and maybe keep only one or two running on x86_64).

Since there is now the possibility to run tests for this target natively, remove that target from the build matrix of the workflow that emulates it with qemu.

There is an open PR for the Alpine action that would allow running the armv7 tests natively on aarch64. But it hasn't been merged yet:
jirutka/setup-alpine#22
So, keep emulating armv7 on x86_64 for the time being.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Jan 29, 2025

Hu? Why are the runners that are using Clang on Ubuntu failing? I didn't touch anything in the rules about them...

Looking back on the actions that ran in the last couple of weeks: That isn't new with the changes from here. These runners started failing without any change in this repository.
Might be related to the fact that the ubuntu-latest label was moved from ubuntu-22.04 to ubuntu-24.04 by GitHub recently. That came with a newer version of Clang...

A few tests for LAGraphX are failing with a segfault now.
@DrTimothyAldenDavis: Do you see a pattern for the failing tests?

  The following tests FAILED:
  	 39 - LAGraphX_BF (SEGFAULT)
  	 40 - LAGraphX_FastGraphletTransform (SEGFAULT)
  	 50 - LAGraphX_SquareClustering (SEGFAULT)
  	 55 - LAGraphX_msf (Failed)
  Errors while running CTest
  make: *** [Makefile:74: test] Error 8

@mmuetzel mmuetzel marked this pull request as draft January 29, 2025 10:30
@mmuetzel
Copy link
Contributor Author

mmuetzel commented Jan 29, 2025

Sure enough, those workflows succeed if they are reverted to the Ubuntu 22.04 runners.
That could be a work-around for the time being. But it might also be worth finding out why it is failing on Ubuntu 24.04. (Fwiw, GitHub - at least until now - provides runners for the last two LTS versions of Ubuntu. So, the Ubuntu 22.04 runners will probably be available until some time in 2026.)

What I've missed before: The runners that build for Windows MSVC using Clang as the compiler are also failing the tests for LAGraph.
It might be that the compatibility with newer versions of LLVM Clang is broken for some reason (independent on the OS -- Linux or Windows).

@DrTimothyAldenDavis
Copy link
Owner

Thanks. yes, I've noticed the Windows clang failures. I was considering just deleting those runners.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Feb 6, 2025

I tried to enable CUDA on the Ubuntu on ARM runner. But CMake couldn't detect a working CUDA compiler (albeit installing the same packages that work for Ubuntu on x86_64).

  -- Looking for a CUDA compiler
  -- Looking for a CUDA compiler - NOTFOUND
  -- CUDA:             not found
  CMake Error at cmake_modules/SuiteSparsePolicy.cmake:367 (message):
    CUDA required for SuiteSparse but not found
  Call Stack (most recent call first):
    CMakeLists.txt:46 (include)
  
  
  -- CUDA:             not enabled
  -- Configuring incomplete, errors occurred!
  Error: Process completed with exit code 1.

I don't know how to install a working CUDA toolchain on ARM or how to tell CMake to use it. I also can't test that locally (no ARM hardware here). So, I opted for disabling CUDA on the ARM runner.

In the last run for the root CMakeLists.txt file on Ubuntu on ARM, checking out the repository failed apparently for some reason.
@DrTimothyAldenDavis: When you come around to it, could you please restart that runner?

Other than that, this should be ready for review now.

@mmuetzel mmuetzel marked this pull request as ready for review February 6, 2025 10:21
@mmuetzel
Copy link
Contributor Author

Failing to checkout the repository intermittently on ARM runners seems to be a known issue:
actions/partner-runner-images#47

Changing to draft until the image providers have figured out a solution for that.

@mmuetzel mmuetzel marked this pull request as draft February 15, 2025 13:45
GitHub started hosting runners with Ubuntu on ARM64 processors for open source projects for free:
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/

Add one configuration that is using these runners to the build matrix.
GitHub hosts Ubuntu on ARM64 runners now.
Remove the emulated runner (in favor of the native one).
LLVM Clang emits a different set of compiler warnings for potential
issues in the source code (compared to GCC). So, it makes sense to keep
building with it in CI.
There is probably little reason (when it comes to coverage) to build
with different configurations with that compiler though.

Keep only one configuration that builds with Clang on Ubuntu (to save
some CI minutes - and reduce the energy footprint).
@mmuetzel
Copy link
Contributor Author

Changing to draft until the image providers have figured out a solution for that.

Apparently, that was caused by a hardware error(?) that they worked around by switching out their runners:
https://github.com/orgs/community/discussions/148648#discussioncomment-12205917

Marking as ready for review again.

@mmuetzel mmuetzel marked this pull request as ready for review February 19, 2025 09:02
@DrTimothyAldenDavis
Copy link
Owner

Looks great.

I hope to release a new version of SuiteSparse soon (v7.9.0). It will include these changes, and the other pending PRs to SuiteSparse once I review them and integrate them. I have a minor bug fix to post in GraphBLAS as well (v9.4.5, currenly in the v9.4.branch in the GraphBLAS repo).

Then probably in about 2 weeks I'll post a GraphBLAS v10.0.0, and add it to SuiteSparse v7.10.0.

@DrTimothyAldenDavis DrTimothyAldenDavis merged commit ba37590 into DrTimothyAldenDavis:dev2 Feb 19, 2025
25 of 26 checks passed
@DrTimothyAldenDavis
Copy link
Owner

The segfaults are because of a clang compiler bug, which affects GraphBLAS/Source/mask/GB_masker.c.

The fix is here, just posted on dev2 a few hours ago:

https://github.com/DrTimothyAldenDavis/SuiteSparse/blame/ece49b5f114771b2eeafefd3edb3f86714ec7e61/GraphBLAS/Source/mask/GB_masker.c#L52-L64

#include "include/GB_compiler.h"
#if GB_COMPILER_CLANG
// On the Mac, this file triggers a bug in AppleClang 16.0.0 when -O3
// optimization is used (MacOS 14.6.1 (23G93), Xcode 16.2):
// Apple clang version 16.0.0 (clang-1600.0.26.6)
// Target: arm64-apple-darwin23.6.0
// Thread model: posix
// InstalledDir: /Library/Developer/CommandLineTools/usr/bin
// See the test for R_sparsity below. R_sparsity is 4 but the if test (for
// R_sparsity 1 or 2) evaluates as true, and then the assertion fails.
#pragma clang optimize off
#endif

That should allow clang to properly compile this file. See also this note in the GraphBLAS/Doc/Changelog:

* AppleClang compiler bug: On the Mac, the Source/mask/GB_masker.c file
triggers a bug in AppleClang 16.0.0 with -O3 in (MacOS 14.6.1 (23G93),
Xcode 16.2, Apple clang version 16.0.0, clang-1600.0.26.6). It also
fails in MacOS 15.2 (Target: arm64-apple-darwin23.6.0). The bug is
triggered by these tests in LAGraph (v1.2 branch, unreleased, Jan 4,
2025):
39 - LAGraphX_BF (SIGTRAP)
40 - LAGraphX_Coarsen_Matching (Failed)
41 - LAGraphX_FastGraphletTransform (SIGTRAP)
49 - LAGraphX_PageRankGX (SIGTRAP)
54 - LAGraphX_SquareClustering (SIGTRAP)
61 - LAGraphX_msf (Failed)
When using clang, optimization is turned off for this file. This has
no impact on performance since GB_masker.c is very simple, consisting
of a single sequence of calls to other methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants