Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.4.x backports #322

Merged
merged 10 commits into from
Jan 18, 2025
Merged

2.4.x backports #322

merged 10 commits into from
Jan 18, 2025

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Jan 15, 2025

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Backported the fixes to the two bugs reported against 2.4.x. I haven't rerendered, because this forces CUDA 12.6, and I wasn't able to get it to work — the most obvious changes result in build error:

  CMake Error at cmake/public/cuda.cmake:70 (message):
    Failed to find nvToolsExt
  Call Stack (most recent call first):
    cmake/Dependencies.cmake:43 (include)
    CMakeLists.txt:857 (include)

and I'm not sure if it's really worth putting more effort into it at this point.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 15, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12831901956. Examine the logs at this URL for more detail.

@hmaarrfk
Copy link
Contributor

is the GPU server back online or should i start building out?

@h-vetinari
Copy link
Member

is the GPU server back online or should i start building out?

The server is back online (I just double-checked this in another PR). It's still only at half the GPU capacity (as throughout December), apparently due to a faulty motherboard that'll need to be replaced. But it should be good enough to build out things. For more time savings, we could merge conda-forge/conda-forge-pinning-feedstock#6910

Comment on lines +36 to +39
cdt_name: # [linux]
- cos7 # [linux]
- cos7 # [linux]
- cos7 # [linux]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove this one with all the recent zip changes... Oh well, not important enough to repush

@hmaarrfk
Copy link
Contributor

Thanks for addressing the bugs in the build process.

I had started a build got through (4-5 of the configs) only to realize that:
linux_64_blas_implmklc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12-log.txt does not pass.

For more time savings, we could merge conda-forge/conda-forge-pinning-feedstock#6910

I would rather just go all on CPU runners on this maintenance branch.

The complexity of these zips is really demotivating me from using them.

@h-vetinari
Copy link
Member

linux_64_blas_implmklc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12-log.txt does not pass.

What problem are you running into?

The complexity of these zips is really demotivating me from using them.

We have very different ways of looking at this. For me it's just a fact of life, and I'm not "using" them, as much as they're part of the ambient fabric of conda-forge. And I'd much rather have an extra entry in the zip (we recently removed c_stdlib_version and cdt_name, by the way...), rather than not do that but wait 9-10h longer for every build.

@hmaarrfk
Copy link
Contributor

I unfortunately deleted the logs. I’ll have new ones in 24 hours or so.

@h-vetinari
Copy link
Member

h-vetinari commented Jan 17, 2025

Looks OK except one failure on aarch+CUDA (IOW, this doesn't reproduce the failure on MKL you saw @hmaarrfk):

  building 'torch._C' extension
  creating build/temp.linux-aarch64-cpython-313/torch/csrc
  $BUILD_PREFIX/bin/aarch64-conda-linux-gnu-cc -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/pytorch-2.4.1 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -I$PREFIX/targets/sbsa-linux/include -I$BUILD_PREFIX/targets/sbsa-linux/include -L$PREFIX/targets/sbsa-linux/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -L$BUILD_PREFIX/targets/sbsa-linux/lib -L$BUILD_PREFIX/targets/sbsa-linux/lib/stubs -Wno-deprecated-declarations -Wno-error=maybe-uninitialized -ffunction-sections -fdata-sections -ffunction-sections -fdata-sections -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem $PREFIX/include -I$PREFIX/targets/sbsa-linux/include -I$BUILD_PREFIX/targets/sbsa-linux/include -L$PREFIX/targets/sbsa-linux/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -L$BUILD_PREFIX/targets/sbsa-linux/lib -L$BUILD_PREFIX/targets/sbsa-linux/lib/stubs -fPIC -I$PREFIX/include/python3.13 -c torch/csrc/stub.c -o build/temp.linux-aarch64-cpython-313/torch/csrc/stub.o -Wall -Wextra -Wno-strict-overflow -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unknown-pragmas -fno-strict-aliasing
  $BUILD_PREFIX/bin/aarch64-conda-linux-gnu-cc -shared -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -Wl,-O2 -Wl,--sort-common -Wl,-z,relro -Wl,-z,lazy -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -L$PREFIX/targets/sbsa-linux/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -L$BUILD_PREFIX/targets/sbsa-linux/lib -L$BUILD_PREFIX/targets/sbsa-linux/lib/stubs -T$SRC_DIR/cmake/linker_script.ld -T$SRC_DIR/cmake/linker_script.ld -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/pytorch-2.4.1 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -I$PREFIX/targets/sbsa-linux/include -I$BUILD_PREFIX/targets/sbsa-linux/include -L$PREFIX/targets/sbsa-linux/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -L$BUILD_PREFIX/targets/sbsa-linux/lib -L$BUILD_PREFIX/targets/sbsa-linux/lib/stubs -Wno-deprecated-declarations -Wno-error=maybe-uninitialized -ffunction-sections -fdata-sections -ffunction-sections -fdata-sections -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem $PREFIX/include -I$PREFIX/targets/sbsa-linux/include -I$BUILD_PREFIX/targets/sbsa-linux/include -L$PREFIX/targets/sbsa-linux/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -L$BUILD_PREFIX/targets/sbsa-linux/lib -L$BUILD_PREFIX/targets/sbsa-linux/lib/stubs build/temp.linux-aarch64-cpython-313/torch/csrc/stub.o -L$SRC_DIR/torch/lib -L$PREFIX/targets/sbsa-linux/lib/stubs -ltorch_python -o build/lib.linux-aarch64-cpython-313/torch/_C.cpython-313-aarch64-linux-gnu.so -Wl,-rpath,$ORIGIN/lib
  $BUILD_PREFIX/aarch64-conda-linux-gnu/bin/ld: error: linker script file '$SRC_DIR/cmake/linker_script.ld' appears multiple times
  collect2: error: ld returned 1 exit status
  error: command '$BUILD_PREFIX/bin/aarch64-conda-linux-gnu-cc' failed with exit code 1
  error: subprocess-exited-with-error
  
  × Building wheel for torch (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

@hmaarrfk
Copy link
Contributor

It's great to see the CIs again.

I think I was seeing memory errors on the tiny machine I used to build things on. So this is new to me

@h-vetinari
Copy link
Member

OK, looking around, the problem with $SRC_DIR/cmake/linker_script.ld has a fix already.

Comment on lines +7 to +17
c_compiler_version: # [osx]
- 17 # [osx]
cxx_compiler_version: # [osx]
- 17 # [osx]
llvm_openmp: # [osx]
- 17 # [osx]

mkl:
- 2023
libprotobuf:
- 5.28.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is mainly to keep building 2.4 as before. However, we could remove these bits in another PR and see what a rerender brings (there were some issues with clang 18 in the fbgemm submodule, but there have been no changes in that submodule for ~2 years, so no idea what would have changed between 2.4 & 2.5, where we're successfully using clang 18.

@h-vetinari h-vetinari added the automerge Merge the PR when CI passes label Jan 17, 2025
@conda-forge-admin conda-forge-admin merged commit 321cd3e into conda-forge:v2.4.x Jan 18, 2025
27 checks passed
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: passed
  • github-actions: passed

Thus the PR was passing and merged! Have a great day!

@h-vetinari
Copy link
Member

I keep forgetting that automerge is not compatible with the open-gpu setup... Anyway, pushed a manual commit to trigger the builds on the v2.4.x branch

This was referenced Jan 18, 2025
@h-vetinari h-vetinari mentioned this pull request Feb 2, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants