Fix gpu builds #340

h-vetinari · 2025-01-30T20:37:22Z

Fall-out from #318, especially the torchinductor tests.

conda-forge-admin · 2025-01-30T20:38:51Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13080209035. Examine the logs at this URL for more detail.}

h-vetinari · 2025-01-31T19:54:32Z

Fall-out from #318, especially the torchinductor tests.

By adding these tests, the pytorch test suite run on windows went from

= 7486 passed, 1432 skipped, 43 deselected, 31 xfailed, 75940 warnings in 1067.19s (0:17:47) =

to

= 8122 passed, 1494 skipped, 45 deselected, 31 xfailed, 75969 warnings in 3534.84s (0:58:54) =

I don't have timings for linux yet, but assume it will also increase (though perhaps less so). In any case, I think I'll constrain these torchinductor tests to run for only one python version.

hmaarrfk · 2025-01-31T20:36:33Z

woudl it help if i built the linux-gpu stuff?

hmaarrfk · 2025-01-31T20:36:50Z

I guess are you ready for me to start some jobs locally?

h-vetinari · 2025-01-31T21:10:54Z

woudl it help if i built the linux-gpu stuff?

Thanks for the kind offer! The server has been working well lately but right now the resources are occupied by conda-forge/libmagma-feedstock#24. The torchinductor tests have really been a PITA to fix, but I feel we're finally close now.

linux-aarch64 + CUDA is passing already, and so is windows (aside from the fact that it ran into a timeout, which I've addressed with the last commits though). So the only thing I need to see here still is whether the full test suite can pass on linux-64 + CUDA, because though I tried to look for any needles in the haystack, it's possible that there are still some singleton test failures that need to be skipped.

The whole situation is complicated by the fact that we need some fixes for the CMake metadata as well to make it useable at all (i.e. compiling against CUDA-enabled libtorch on a CPU agent - which is the situation we're in for all the dependent packages compiling against pytorch). I have a hack for that +/- ready in #339, but I want to separate these concerns, because it's always possible that I still overlooked something.

In my ideal scenario, the CI for this PR and #339 could run (CUDA-only; no aarch) essentially at full speed, if I had the entire server available (hence my request to cancel the builds in conda-forge/libmagma-feedstock#24 for now). Then I could decide at the end of my day (~14h) which one to merge based on whether the CMake stuff is passing or not.

So depends on how much capacity you have. You could try a build for one of the CUDA variants in #339 and tell me if it passes. If so, we could use the artefact from that for publication right away.

h-vetinari · 2025-01-31T22:17:07Z

Update: now running at full steam, so this should be good for now, and we should have at least one passing PR in ~12h that we can merge. If you do have spare cycles for building things locally, I think the much bigger impact would be if we could unblock tensorflow... 😅

h-vetinari · 2025-02-01T07:53:42Z

So finally, the MKL+CUDA build passed. There's something strange with the tests though - the run for py311 collects a whole bunch more tests and takes longer than elsewhere (even than py312, which is the only version where we run the inductor tests).

TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 13171 passed, 2586 skipped, 91 xfailed, 143216 warnings in 2916.32s (0:48:36) =
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py311_h3846359_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 458.74s (0:07:38) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py39_hdffab68_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestCustomOp and test_data_dependent_compile) or (TestCustomOp and test_functionalize_error) or (TestCustomOpAPI and test_compile) or (TestCustomOpAPI and test_fake) or test_compile_int4_mm or test_compile_int8_mm or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7532 passed, 1375 skipped, 31 xfailed, 75718 warnings in 455.21s (0:07:35) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py313_h33c0e77_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
== 7552 passed, 1375 skipped, 31 xfailed, 75701 warnings in 459.08s (0:07:39) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py310_hca309f4_311.conda
TEST START: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py test/inductor/test_torchinductor.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
= 8196 passed, 1429 skipped, 31 xfailed, 76339 warnings in 2177.80s (0:36:17) ==
TEST END: /home/conda/feedstock_root/build_artifacts/linux-64/pytorch-2.5.1-cuda126_mkl_py312_hdbe889e_311.conda

The set of modules and skips is exactly the same as on 3.9 or 3.10, so I don't know what would explain this difference in test collection. Perhaps there are some tests upstream that are only run for 3.11? 🤔

h-vetinari · 2025-02-01T10:56:46Z

OK, three extra test failure for the generic blas run (not even from the torchinductor module, so no idea why this triggers), but I'm skipping them now and merging this. The CMake stuff needed at least another round, and I want to get the first round of fixes out (esp. a working win+CUDA build with the right bin/lib split). The rest will hopefully soon after in #339.

h-vetinari · 2025-02-01T23:43:50Z

Ah, the pleasure of flaky tests. After passing here, the MKL build failed on main with:

=========================== short test summary info ============================
FAILED [0.2937s] test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_reentrant_parent_error_on_cpu_cuda - AssertionError: "Simulate error" does not match "grad can be implicitly created only for scalar outputs"

To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradDeviceTypeCUDA.test_reentrant_parent_error_on_cpu_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
= 1 failed, 14514 passed, 2663 skipped, 91 xfailed, 143956 warnings in 4192.22s (1:09:52) =

Also, each test run now collected 13k+ tests and took ~50min. Very weird how that happens.

hmaarrfk · 2025-02-02T00:02:27Z

Also, each test run now collected 13k+ tests and took ~50min. Very weird how that happens.

Does this mean that it takes 3.9, 3.10, 3.11, 3.12, 3.13 so an extra 250 mins for the tests to run on each platform?

h-vetinari · 2025-02-02T00:09:17Z

Well, it's supposed to take <10 min per version, and one run (for 3.12 only) which includes torchinductor that's a bit longer. And this is exactly what happened for the openblas build on main for example, but it seems to change randomly. I opened an issue about this: #343

hmaarrfk · 2025-02-02T00:11:06Z

ah ok, so total 50 mins / 10 hours of the build. Got it. I was going to ask how much it takes!

Thanks!!!

h-vetinari · 2025-02-02T00:19:51Z

The normal case should be about 1:10h (4x8min + 40min), but when we hit #343, it can take much longer suddenly (worst case almost 5h).

h-vetinari added 2 commits January 31, 2025 07:35

restrict where we run test_torchinductor tests

2182f81

temporary: skip CPU build

7c4fed7

skip two more failing tests

ef14443

Tobias-Fischer mentioned this pull request Jan 31, 2025

Windows rvss environment not working rvss-australia/RVSS_Need4Speed#6

Open

h-vetinari added 4 commits February 1, 2025 07:18

add back CUDA compiler to pytorch testing; torch.compile needs it

902ee87

increase timeout yet again

0b65ac0

run test_torchinductor tests for only one python version

95db499

temporary: also skip aarch+CUDA, which is already passing

d04bba8

h-vetinari marked this pull request as ready for review February 1, 2025 10:56

h-vetinari requested review from Tobias-Fischer, baszalmstra, beckermr, benjaminrwilson, hmaarrfk, jeongseok-meta, mgorny and sodre as code owners February 1, 2025 10:56

h-vetinari added a commit that referenced this pull request Feb 1, 2025

Merge pull request #340 from h-vetinari/fix_gpu

df40133

h-vetinari merged commit d04bba8 into conda-forge:main Feb 1, 2025
25 of 27 checks passed

h-vetinari deleted the fix_gpu branch February 1, 2025 10:57

h-vetinari mentioned this pull request Feb 1, 2025

Test suite sometimes picks up way more tests; takes ~6 times longer #343

Open

This was referenced Feb 6, 2025

Test if last passing run can be reproduced #345

Closed

pytest crashes for linux64+CUDA+MKL #348

Closed

pytorch v2.6.0 #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix gpu builds #340

Fix gpu builds #340

Uh oh!

h-vetinari commented Jan 30, 2025

Uh oh!

conda-forge-admin commented Jan 30, 2025 •

edited

Loading

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

hmaarrfk commented Jan 31, 2025

Uh oh!

hmaarrfk commented Jan 31, 2025

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

hmaarrfk commented Feb 2, 2025

Uh oh!

h-vetinari commented Feb 2, 2025 •

edited

Loading

Uh oh!

hmaarrfk commented Feb 2, 2025

Uh oh!

h-vetinari commented Feb 2, 2025

Uh oh!

Uh oh!

Uh oh!

Fix gpu builds #340

Fix gpu builds #340

Uh oh!

Conversation

h-vetinari commented Jan 30, 2025

Uh oh!

conda-forge-admin commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

hmaarrfk commented Jan 31, 2025

Uh oh!

hmaarrfk commented Jan 31, 2025

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

h-vetinari commented Jan 31, 2025

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

Uh oh!

h-vetinari commented Feb 1, 2025

Uh oh!

hmaarrfk commented Feb 2, 2025

Uh oh!

h-vetinari commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmaarrfk commented Feb 2, 2025

Uh oh!

h-vetinari commented Feb 2, 2025

Uh oh!

Uh oh!

conda-forge-admin commented Jan 30, 2025 •

edited

Loading

h-vetinari commented Feb 2, 2025 •

edited

Loading