Description
The size of the zip involving the cuda compilers is pretty unwieldy:
conda-forge-pinning-feedstock/recipe/conda_build_config.yaml
Lines 181 to 189 in ff043bb
which makes it hard to override locally. There's a general desire to keep the zip complexity down and remove entries here, rather than adding them.
And indeed we recently removed cdt_name
(48621d1) as well as c_stdlib_version
(c7f5973), and we made it easier to override the image version without having to locally override the whole zip (#6626).
#6910 now added another component (github_action_labels
) to the zip, due to the fact that there exist feedstocks that require GPU agents (a very rare resource), and that without linking this with the zip that determines non-CUDA vs. CUDA, we're stuck with only bad options:
- waste resources by using GPU agents even for the non-CUDA builds
- local workarounds like h-vetinari/pytorch-cpu-feedstock@6c232ae, which cause several issues:
- duplicated builds in CI matrix
- very long variant config names, which causes failures on windows, see Test: CPU vs GPU workaround pytorch-cpu-feedstock#332 & lower limit at which we start shortening filenames for variant configs conda-smithy#2233
The desire for simplity in the zip is well-founded, but it doesn't trump reality. Keeping these data apart when they are closely connected causes far more problems than a single extra zip entry (aside from the fact that only 5 feedstocks across all of conda-forge are affected and the required changes are trivial).
The CUDA zip overall is IMO a reflection of irreducible complexity that exists in our packaging constraints. However, there are some nice opportunities to further simplify this zip:
docker_image
can be removed as soon as we drop CUDA 11.8 (because then everything can just run on the alma9 images){c,cxx,fortran}_compiler_version
could be dropped if we accept being limited by the maximum GCC version supported by our lowest-supported CUDA- this can obviously be accelerated if we keep only building for the newest CUDA 12.x, as we recently started doing (and assuming that newer nvcc keeps updating their GCC compatibility).
- the problem case there is CUDA on PPC, which was dropped in CUDA 12.5; CUDA 12.4 is limited to gcc 12
- we might even consider turning
cuda_compiler
into a constant (once we drop 11.8)
So in most realistic most scenario (after we drop 11.8 but assuming we keep CUDA on PPC for the foreseeable future), things would look like
zip_keys:
- # [unix]
- c_compiler_version # [unix]
- cxx_compiler_version # [unix]
- fortran_compiler_version # [unix]
- cuda_compiler # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- cuda_compiler_version # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- github_actions_labels # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
which is still 2 entries shorter than it was until last November.
If we did drop CUDA support on PPC and relied only on the latest CUDA (as well as recipes guarding their {{ compiler("cuda") }}
correctly with cuda_compiler_version != "None"
selectors), the zip could even be reduced to
zip_keys:
- # [unix]
- c_compiler_version # [unix]
- cxx_compiler_version # [unix]
- fortran_compiler_version # [unix]
# uncoupled!
- # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- cuda_compiler_version # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- github_actions_labels # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
but not sure whether we have the appetite for that. IMO the longer zip is less of an issue than dropping support for a platform.