Enable more jobs relying on the server's spack and module by yhmtsai · Pull Request #1841 · ginkgo-project/ginkgo

yhmtsai · 2025-05-12T13:46:53Z

This PR enables more jobs running on our server and they are reusing the packages and spack from the system rather than building everything through spack.

before_script: setup the environment to use the system packages.
script: additionally allows environment variables MODULE_LOAD and SPACK_LOAD to load the package through the variables. They are simply expanded to module load ${MODULE_LOAD} and spack load ${SPACK_LOAD} if they contains context. cuda module from spack does not include LD_LIBRARY_PATH, so the script contains extending LD_LIBRARY_PATH before test_install. RPATH_USE_LINK only helps pure cpu build but not for the hip part.
image-tags: tags contains "s" like nvidia-gpus is the new settings. Do we have any specific tags additionally for the server or do we need to have? -> using tum

This PR will only extend the job relying on the same linux version and the packages from system.
Loading the system packages on different linux version or distribution might still work but it is not the purpose of this PR.

TODO:

move rocky_tum to some registry? It is a local image for apptainer now. It uses the same linux version as the server's system with the necessary components
move the jobs to appropriate place (just show them together in quick condition for easy check)
add more version from the current package sets

…ion, workspace reallcation This moves CI jobs and fixes cuda12.2 cusparse matrix, coo exception, workspace reallcation found from #1841 Related PR: #1843

sonarqubecloud · 2025-06-24T11:31:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

pratikvn

Its a bit weird that the OSX jobs are failing.

yhmtsai · 2025-08-28T14:04:11Z

@pratikvn OSX issue is tracked by #1924

pratikvn

Mostly LGTM! I am little concerned that so many jobs might overload the TUM system.

pratikvn · 2025-09-04T09:40:32Z

+build/cuda120/openmpi/gcc/cuda/release/static:
+  extends:
+    - .build_and_test_tum_template
+    - .default_variables
+    - .full_test_condition
+    - .use_tum-nvidia
+  variables:
+    BUILD_CUDA: "ON"
+    BUILD_HWLOC: "OFF"
+    ENABLE_HALF: "ON"
+    BUILD_MPI: "ON"
+    BUILD_SHARED_LIBS: "OFF"
+    BUILD_TYPE: "Release"
+    MODULE_LOAD: "cmake/3.18.6 cuda/12.0.1 gcc/12.4.0 openmpi/4.1.8"
+
+build/cuda122/openmpi/gcc/cuda/release/shared:
+  extends:
+    - .build_and_test_tum_template
+    - .default_variables
+    - .full_test_condition
+    - .use_tum-nvidia
+  variables:
+    BUILD_CUDA: "ON"
+    BUILD_HWLOC: "OFF"
+    ENABLE_HALF: "ON"
+    BUILD_TYPE: "Release"
+    MODULE_LOAD: "cmake/3.20.6 cuda/12.2.2 gcc/12.4.0 openmpi/5.0.7"
+
+build/cuda124/mpich/gcc/cuda/release/shared:


I feel so many jobs might overload the TUM system ? Maybe we can:

Switch off MPI or CUDA wherever possible ?

Not test multiple versions of the same CUDA major version ?

It is not the scope of this pr.
We will allow a GPU can serve multiple instance.
also, we are working on another project to make the setup more widely cover with limited resource

side note: - add cuda 12.2 and do not compile bfloat16 below cuda 12.2 - cuda 11.8.0 starts to support 8.9 and 9.0 - current docker image does not support A770. set up another without docker for A770 - tum does not contain HWLOC yet

try disable cuda home turn off CMAKE_INSTALL_RPATH_USE_LINK_PATH to see what happens now

sonarqubecloud · 2025-09-11T19:09:34Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

yhmtsai requested review from a team, MarcelKoch, pratikvn and upsj May 12, 2025 13:46

yhmtsai self-assigned this May 12, 2025

yhmtsai added reg:ci-cd This is related to the continuous integration system. 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. 1:ST:no-changelog-entry Skip the wiki check for changelog update labels May 12, 2025

yhmtsai force-pushed the spack_job branch 13 times, most recently from a3d12fd to 58c1e15 Compare May 15, 2025 15:43

yhmtsai added the 1:ST:run-full-test label May 15, 2025

yhmtsai mentioned this pull request May 16, 2025

Move CI job and fix cuda12.2 cusparse matrix, coo exception, workspace reallcation #1843

Merged

yhmtsai force-pushed the spack_job branch from 062cef0 to cc7270b Compare May 19, 2025 08:26

yhmtsai force-pushed the spack_job branch from d7a18e1 to 322d0a3 Compare May 30, 2025 13:54

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label May 30, 2025

yhmtsai force-pushed the spack_job branch from e0b5ff3 to b52df3a Compare June 9, 2025 09:53

yhmtsai force-pushed the spack_job branch from b52df3a to dc37212 Compare June 23, 2025 13:43

yhmtsai force-pushed the spack_job branch 2 times, most recently from faeb610 to 083f458 Compare August 27, 2025 11:32

pratikvn reviewed Aug 27, 2025

View reviewed changes

Comment thread .gitlab-ci.yml Outdated

yhmtsai force-pushed the spack_job branch from 083f458 to fc86068 Compare August 27, 2025 14:58

yhmtsai requested a review from pratikvn August 28, 2025 14:03

yhmtsai force-pushed the spack_job branch from fc86068 to b75fb23 Compare August 29, 2025 13:18

pratikvn approved these changes Sep 4, 2025

View reviewed changes

yhmtsai mentioned this pull request Sep 9, 2025

Fix bitvector test by precompiling the kernel in library #1929

Merged

yhmtsai force-pushed the spack_job branch 2 times, most recently from 770d8bb to 3c3e805 Compare September 10, 2025 08:12

yhmtsai added 13 commits September 11, 2025 09:23

add spack job

78fe1f7

side note: - add cuda 12.2 and do not compile bfloat16 below cuda 12.2 - cuda 11.8.0 starts to support 8.9 and 9.0 - current docker image does not support A770. set up another without docker for A770 - tum does not contain HWLOC yet

comment out the workaround to see the failure

6bccd8f

add different gcc

a61e218

hip adds gcc-toolchain when not system gcc and check dpcpp open files

a602383

check hip flags

9e69e2b

check behavior of CMAKE_INSTALL_RPATH_USE_LINK_PATH

9960d27

current server does not have cmake 3.16.9

09a47fa

update module version according to the server

944039f

bring hip workaround back

1c8c758

add mpich

d6440a7

move LD_LIBRARY_PATH

4073b30

update the method to check g++ version

6d535db

try disable cuda home turn off CMAKE_INSTALL_RPATH_USE_LINK_PATH to see what happens now

move most of new jobs to full pipeline. use LD_LIBRARY_PATH now

a2dc743

yhmtsai force-pushed the spack_job branch from 3c3e805 to a2dc743 Compare September 11, 2025 07:23

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Sep 11, 2025

yhmtsai merged commit 36688a5 into develop Sep 11, 2025
19 of 21 checks passed

yhmtsai deleted the spack_job branch September 11, 2025 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable more jobs relying on the server's spack and module#1841

Enable more jobs relying on the server's spack and module#1841
yhmtsai merged 13 commits into
developfrom
spack_job

yhmtsai commented May 12, 2025 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 24, 2025

Uh oh!

pratikvn left a comment •

edited

Loading

Uh oh!

Uh oh!

yhmtsai commented Aug 28, 2025

Uh oh!

pratikvn left a comment

Uh oh!

pratikvn Sep 4, 2025

Uh oh!

yhmtsai Sep 4, 2025

Uh oh!

Uh oh!

sonarqubecloud Bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yhmtsai commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2025

Quality Gate passed

Uh oh!

pratikvn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yhmtsai commented Aug 28, 2025

Uh oh!

pratikvn left a comment

Choose a reason for hiding this comment

Uh oh!

pratikvn Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

yhmtsai Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented Sep 11, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yhmtsai commented May 12, 2025 •

edited

Loading

pratikvn left a comment •

edited

Loading