Add two flash-attn extensions as multi-outputs #19

rongou · 2024-10-17T23:23:02Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Fixes #18

rongou · 2024-10-17T23:23:41Z

This is not quite working yet, but want to make sure I'm on the right track.

cc @jakirkham

conda-forge-admin · 2024-10-17T23:24:50Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

weiji14

@rongou, we'll need to give you access to the Openstack server CI. Could you open a PR at https://github.com/Quansight/open-gpu-server/pulls to add your GitHub username to the access/conda-forge-users.json file? See also step 2 of https://conda-forge.org/docs/maintainer/knowledge_base/#packages-that-require-a-gpu-or-long-running-builds for more info.

recipe/meta.yaml

jakirkham

Thanks Rong! 🙏

Tried to put some rough initial thoughts together below. Hopefully that helps

Happy to discuss further as needed 🙂

recipe/meta.yaml

Co-authored-by: jakirkham <[email protected]> Co-authored-by: Wei Ji <[email protected]>

jakirkham · 2024-10-21T18:34:31Z

Please make sure to add this to the extra section at the bottom

extra:
  feedstock-name: flash-attn
  ...

Edit: To change _ to -. Please see this doc

…nda-forge-pinning 2024.10.21.14.45.36

conda-forge-admin · 2024-10-21T23:45:07Z

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

The extra section contained an unexpected subsection name. feedstock_name is not a valid subsection name.

rongou · 2024-10-21T23:46:18Z

Ok, this is more or less structured as we've discussed, but I'm getting some errors when it tries to package the extensions, any ideas?

Packaging flash-attn-fused-dense-lib
number of files: 1
Warning: rpath /home/conda/feedstock_root/build_artifacts/flash-attn-split_1729553591356/_build_env/lib is outside prefix /home/conda/feedstock_root/build_artifacts/flash-attn-split_1729553591356/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac (removing it)
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libc10.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_cpu.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_python.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libcudart.so.12 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libc10_cuda.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_cuda.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libstdc++.so.6 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libgcc_s.so.1 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libc.so.6 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?

jakirkham · 2024-10-21T23:52:33Z

Thanks Rong! 🙏

Think there is more to do on this point: #19 (comment)

Would start by copying that into the flash-attn output

The other outputs like need python in requirements/host and requirements/run

recipe/meta.yaml

conda-forge-admin · 2024-10-22T20:17:09Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

recipe/meta.yaml

rongou · 2024-10-22T22:11:12Z

For recipe/meta.yaml:

The extra section contained an unexpected subsection name. feedstock_name is not a valid subsection name.

The linter didn't like it.

jakirkham · 2024-10-22T22:42:29Z

Ah that's because it should be feedstock-name instead of feedstock_name 🤦‍♂️ Sorry about that 😞

ref: https://conda-forge.org/docs/maintainer/adding_pkgs/#feedstock-name

conda-forge-admin · 2024-10-23T17:05:14Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.
I do have some suggestions for making it better though...

For recipe/meta.yaml:

libgcc-ng has been superseded by libgcc. Note however, that except in truly exceptional cases, you should not have to add this manually; you can rely on the fact that {{ compiler("c") }} and {{ compiler("cxx") }} will always create the correct run-export for this. If you need to ignore the run-export for whatever reason, the best way to do it is:
```
build:
  ignore_run_exports_from:
    - {{ compiler("c") }}    # depending on which...
    - {{ compiler("cxx") }}  # ... compilers you use
```
libstdcxx-ng has been superseded by libstdcxx. Note however, that except in truly exceptional cases, you should not have to add this manually; you can rely on the fact that {{ compiler("cxx") }} will always create the correct run-export for this. If you need to ignore the run-export for whatever reason, the best way to do it is:
```
build:
  ignore_run_exports_from:
    - {{ compiler("cxx") }}
```

conda-forge-admin · 2024-10-23T17:12:13Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

rongou · 2024-10-23T17:14:40Z

@weiji14 @carterbox @jakirkham I think this is ready. How do I get Cirun to kick off? I've already added myself to open-gpu-server: Quansight/open-gpu-server#46

jakirkham · 2024-11-04T21:11:04Z

recipe/meta.yaml

    - cuda-cudart-dev  # [(cuda_compiler_version or "").startswith("12")]
    - libcublas-dev    # [(cuda_compiler_version or "").startswith("12")]
+    - libcurand-dev    # [(cuda_compiler_version or "").startswith("12")]
    - libcusolver-dev  # [(cuda_compiler_version or "").startswith("12")]
    - libcusparse-dev  # [(cuda_compiler_version or "").startswith("12")]


Note that all of these CUDA packages were here before to satisfy PyTorch's header requirements. The only new one is libcurand-dev. Perhaps this comes up as some of the new extensions use other bits from PyTorch that were not used before

carterbox · 2024-11-04T22:11:28Z

recipe/setup.py

+            },
+            extra_link_args = ["-Wl,--strip-all"],


Suggested change

},

extra_link_args = ["-Wl,--strip-all"],

},

libraries=[

'cublas',

'cublasLt',

],

extra_link_args = ["-Wl,--strip-all"],

https://github.com/Dao-AILab/flash-attention/blob/478ee666cccbd1b8f63648633003059a8dc6827d/csrc/fused_dense_lib/fused_dense_cuda.cu#L11

Also needs cuRAND

Suggested change

},

extra_link_args = ["-Wl,--strip-all"],

},

libraries=[

'cublas',

'cublasLt',

'curand',

],

extra_link_args = ["-Wl,--strip-all"],

Looks like the original setup.py doesn't include these libraries?

https://github.com/Dao-AILab/flash-attention/blob/main/csrc/layer_norm/setup.py

Right Daniel is stating they should be based on usage found internally, which also makes sense to me

We can also propose they include this change upstream

Don't these libraries get resolved by libcudart?

Sorry not following

The #includes Daniel and I reference come from cuBLAS and cuRAND. Meaning the symbols used also come from those libraries

Likely we have gotten lucky as import torch causes the loader to find these libraries first and thus satisfy the symbols by the time these extensions use them. However we shouldn't rely on this for at least three reasons:

Loading order could change

If PyTorch changes its dependencies, we won't get them

These packages need to express their version constraint on these libraries so they are correctly satisfied at install time

Hmm looks like these libraries are explicitly loaded by pytorch, e.g. https://github.com/pytorch/pytorch/blob/6734cb7bf2c1763118dcc430cee6110a88f8f849/torch/__init__.py#L313, since these packages are all pytorch CUDAExtensions, perhaps they should rely on pytorch to load them.

Surprisingly, the linker seems to think that none of the curand symbols are needed to be loaded dynamically. Perhaps, this package uses header stuff that can be inlined? I have added the cublas, cudart, and python libraries to the linking as needed.

carterbox · 2024-11-07T21:00:37Z

I'm currently building with -Wl,--no-undefined to check which libraries are missing links for symbols.

carterbox · 2024-11-11T17:24:23Z

Waiting on conda-forge/admin-requests#1158

The CUDA 12 builds complete, but the CUDA 11 builds need more time.

…nda-forge-pinning 2024.11.11.08.59.26

carterbox · 2024-11-11T17:31:16Z

Testing to see if 18 hours is enought for CUDA 11 builds.

…nda-forge-pinning 2024.11.17.06.32.00

conda-forge-admin · 2024-11-18T10:51:37Z

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

linter: passed
github-actions: passed

Thus the PR was passing and merged! Have a great day!

jakirkham · 2024-11-19T17:27:26Z

Woohoo! 🥳

Thanks everyone! 🙏

Glad to see this one in 😄

Add two flash-attn extensions as multi-outputs

253211c

rongou requested review from carterbox and weiji14 as code owners October 17, 2024 23:23

rongou marked this pull request as draft October 17, 2024 23:23

weiji14 reviewed Oct 18, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

jakirkham reviewed Oct 19, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

recipe/meta.yaml Outdated Show resolved Hide resolved

recipe/meta.yaml Outdated Show resolved Hide resolved

recipe/meta.yaml Outdated Show resolved Hide resolved

Apply suggestions from code review

96eea86

Co-authored-by: jakirkham <[email protected]> Co-authored-by: Wei Ji <[email protected]>

rongou added 3 commits October 21, 2024 11:40

easier debugging

2b81b6f

build once and split outputs

c369353

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.43.0, and co…

7186c4c

…nda-forge-pinning 2024.10.21.14.45.36

carterbox reviewed Oct 22, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

fix output requirements

aa4487d

rongou marked this pull request as ready for review October 22, 2024 20:19

jakirkham reviewed Oct 22, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

rongou added 3 commits October 22, 2024 16:21

Remove unneeded output requirements

a3ff73d

add back feedstock name to extra

5a5f9aa

further clean up output requirements

943fe53

make linter happy

ad5c634

jakirkham reviewed Nov 4, 2024

View reviewed changes

carterbox reviewed Nov 4, 2024

View reviewed changes

weiji14 mentioned this pull request Nov 6, 2024

Rebuild for pytorch 2.5, add python 3.13, update to flash-attn 2.7.0.post2 #20

Merged

carterbox added 3 commits November 7, 2024 14:32

BLD: Disallow undefined symbols

26421e4

BLD: Disallow undefined symbols

f1c1351

STY: Format setup.py

e5cc560

carterbox added 3 commits November 7, 2024 15:01

BLD: Rename fused-dense output

0789f14

BLD: Match host deps with library links

0d23df9

CI: Increase timeout to 12 hours

66a46f7

carterbox added 2 commits November 11, 2024 11:27

CI: Bump timeout to 18 hours

16d13d1

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.3, and co…

7190d0b

…nda-forge-pinning 2024.11.11.08.59.26

carterbox force-pushed the multi-output-extensions branch from 4f13ea7 to 576a572 Compare November 13, 2024 20:00

weiji14 mentioned this pull request Nov 14, 2024

flash-attn v2.7.0.post2 #22

Closed

3 tasks

BLD: Need cudatoolkit for CUDA 11

210cead

carterbox force-pushed the multi-output-extensions branch from 576a572 to 210cead Compare November 15, 2024 19:00

carterbox closed this Nov 15, 2024

carterbox reopened this Nov 15, 2024

BLD: Bump build again just to add a commit

5bcbf37

carterbox mentioned this pull request Nov 15, 2024

Cirun failing with "JSON not found in the url at 'users_from_json'" Quansight/open-gpu-server#49

Closed

carterbox closed this Nov 15, 2024

carterbox reopened this Nov 15, 2024

carterbox added 2 commits November 17, 2024 18:41

BLD: Disable debug skips

48a7800

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.6, and co…

4e225cc

…nda-forge-pinning 2024.11.17.06.32.00

carterbox added the automerge Merge the PR when CI passes label Nov 18, 2024

conda-forge-admin merged commit efabe86 into conda-forge:main Nov 18, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two flash-attn extensions as multi-outputs #19

Add two flash-attn extensions as multi-outputs #19

rongou commented Oct 17, 2024 •

edited

Loading

rongou commented Oct 17, 2024

conda-forge-admin commented Oct 17, 2024

weiji14 left a comment

jakirkham left a comment

jakirkham commented Oct 21, 2024 •

edited

Loading

conda-forge-admin commented Oct 21, 2024

rongou commented Oct 21, 2024

jakirkham commented Oct 21, 2024

conda-forge-admin commented Oct 22, 2024

rongou commented Oct 22, 2024

jakirkham commented Oct 22, 2024

conda-forge-admin commented Oct 23, 2024

conda-forge-admin commented Oct 23, 2024

rongou commented Oct 23, 2024

jakirkham Nov 4, 2024 •

edited

Loading

carterbox Nov 4, 2024

jakirkham Nov 5, 2024

rongou Nov 5, 2024

jakirkham Nov 5, 2024

rongou Nov 5, 2024

jakirkham Nov 5, 2024

rongou Nov 5, 2024

carterbox Nov 11, 2024

carterbox commented Nov 7, 2024

carterbox commented Nov 11, 2024

carterbox commented Nov 11, 2024

conda-forge-admin commented Nov 18, 2024

jakirkham commented Nov 19, 2024

-            },
-            extra_link_args = ["-Wl,--strip-all"],
+            },
+            libraries=[
+                'cublas',
+                'cublasLt',
+            ],
+            extra_link_args = ["-Wl,--strip-all"],

Add two flash-attn extensions as multi-outputs #19

Add two flash-attn extensions as multi-outputs #19

Conversation

rongou commented Oct 17, 2024 • edited Loading

rongou commented Oct 17, 2024

conda-forge-admin commented Oct 17, 2024

weiji14 left a comment

Choose a reason for hiding this comment

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham commented Oct 21, 2024 • edited Loading

conda-forge-admin commented Oct 21, 2024

rongou commented Oct 21, 2024

jakirkham commented Oct 21, 2024

conda-forge-admin commented Oct 22, 2024

rongou commented Oct 22, 2024

jakirkham commented Oct 22, 2024

conda-forge-admin commented Oct 23, 2024

conda-forge-admin commented Oct 23, 2024

rongou commented Oct 23, 2024

jakirkham Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carterbox commented Nov 7, 2024

carterbox commented Nov 11, 2024

carterbox commented Nov 11, 2024

conda-forge-admin commented Nov 18, 2024

jakirkham commented Nov 19, 2024

rongou commented Oct 17, 2024 •

edited

Loading

jakirkham commented Oct 21, 2024 •

edited

Loading

jakirkham Nov 4, 2024 •

edited

Loading