Skip to content

Add two flash-attn extensions as multi-outputs #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Nov 18, 2024

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Oct 17, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #18

@rongou rongou marked this pull request as draft October 17, 2024 23:23
@rongou
Copy link
Contributor Author

rongou commented Oct 17, 2024

This is not quite working yet, but want to make sure I'm on the right track.

cc @jakirkham

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rongou, we'll need to give you access to the Openstack server CI. Could you open a PR at https://github.com/Quansight/open-gpu-server/pulls to add your GitHub username to the access/conda-forge-users.json file? See also step 2 of https://conda-forge.org/docs/maintainer/knowledge_base/#packages-that-require-a-gpu-or-long-running-builds for more info.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rong! 🙏

Tried to put some rough initial thoughts together below. Hopefully that helps

Happy to discuss further as needed 🙂

Co-authored-by: jakirkham <[email protected]>
Co-authored-by: Wei Ji <[email protected]>
@jakirkham
Copy link
Member

jakirkham commented Oct 21, 2024

Please make sure to add this to the extra section at the bottom

extra:
  feedstock-name: flash-attn
  ...

Edit: To change _ to -. Please see this doc

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • The extra section contained an unexpected subsection name. feedstock_name is not a valid subsection name.

@rongou
Copy link
Contributor Author

rongou commented Oct 21, 2024

Ok, this is more or less structured as we've discussed, but I'm getting some errors when it tries to package the extensions, any ideas?

Packaging flash-attn-fused-dense-lib
number of files: 1
Warning: rpath /home/conda/feedstock_root/build_artifacts/flash-attn-split_1729553591356/_build_env/lib is outside prefix /home/conda/feedstock_root/build_artifacts/flash-attn-split_1729553591356/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac (removing it)
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libc10.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_cpu.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_python.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libcudart.so.12 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libc10_cuda.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): $RPATH/libtorch_cuda.so not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libstdc++.so.6 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libgcc_s.so.1 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?
  ERROR (flash-attn-fused-dense-lib,lib/python3.12/site-packages/fused_dense_lib.cpython-312-x86_64-linux-gnu.so): /lib64/libc.so.6 not found in packages, sysroot(s) nor the missing_dso_whitelist.
.. is this binary repackaging?

@jakirkham
Copy link
Member

Thanks Rong! 🙏

Think there is more to do on this point: #19 (comment)

Would start by copying that into the flash-attn output

The other outputs like need python in requirements/host and requirements/run

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@rongou rongou marked this pull request as ready for review October 22, 2024 20:19
@rongou
Copy link
Contributor Author

rongou commented Oct 22, 2024

For recipe/meta.yaml:

  • The extra section contained an unexpected subsection name. feedstock_name is not a valid subsection name.

The linter didn't like it.

@jakirkham
Copy link
Member

Ah that's because it should be feedstock-name instead of feedstock_name 🤦‍♂️ Sorry about that 😞

ref: https://conda-forge.org/docs/maintainer/adding_pkgs/#feedstock-name

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.
I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • libgcc-ng has been superseded by libgcc. Note however, that except in truly exceptional cases, you should not have to add this manually; you can rely on the fact that {{ compiler("c") }} and {{ compiler("cxx") }} will always create the correct run-export for this. If you need to ignore the run-export for whatever reason, the best way to do it is:
    build:
      ignore_run_exports_from:
        - {{ compiler("c") }}    # depending on which...
        - {{ compiler("cxx") }}  # ... compilers you use
  • libstdcxx-ng has been superseded by libstdcxx. Note however, that except in truly exceptional cases, you should not have to add this manually; you can rely on the fact that {{ compiler("cxx") }} will always create the correct run-export for this. If you need to ignore the run-export for whatever reason, the best way to do it is:
    build:
      ignore_run_exports_from:
        - {{ compiler("cxx") }}

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@rongou
Copy link
Contributor Author

rongou commented Oct 23, 2024

@weiji14 @carterbox @jakirkham I think this is ready. How do I get Cirun to kick off? I've already added myself to open-gpu-server: Quansight/open-gpu-server#46

Comment on lines 37 to 41
- cuda-cudart-dev # [(cuda_compiler_version or "").startswith("12")]
- libcublas-dev # [(cuda_compiler_version or "").startswith("12")]
- libcurand-dev # [(cuda_compiler_version or "").startswith("12")]
- libcusolver-dev # [(cuda_compiler_version or "").startswith("12")]
- libcusparse-dev # [(cuda_compiler_version or "").startswith("12")]
Copy link
Member

@jakirkham jakirkham Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all of these CUDA packages were here before to satisfy PyTorch's header requirements. The only new one is libcurand-dev. Perhaps this comes up as some of the new extensions use other bits from PyTorch that were not used before

recipe/setup.py Outdated
Comment on lines 229 to 230
},
extra_link_args = ["-Wl,--strip-all"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
},
extra_link_args = ["-Wl,--strip-all"],
},
libraries=[
'cublas',
'cublasLt',
],
extra_link_args = ["-Wl,--strip-all"],

https://github.com/Dao-AILab/flash-attention/blob/478ee666cccbd1b8f63648633003059a8dc6827d/csrc/fused_dense_lib/fused_dense_cuda.cu#L11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs cuRAND

Suggested change
},
extra_link_args = ["-Wl,--strip-all"],
},
libraries=[
'cublas',
'cublasLt',
'curand',
],
extra_link_args = ["-Wl,--strip-all"],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the original setup.py doesn't include these libraries?

https://github.com/Dao-AILab/flash-attention/blob/main/csrc/layer_norm/setup.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right Daniel is stating they should be based on usage found internally, which also makes sense to me

We can also propose they include this change upstream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't these libraries get resolved by libcudart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not following

The #includes Daniel and I reference come from cuBLAS and cuRAND. Meaning the symbols used also come from those libraries

Likely we have gotten lucky as import torch causes the loader to find these libraries first and thus satisfy the symbols by the time these extensions use them. However we shouldn't rely on this for at least three reasons:

  1. Loading order could change
  2. If PyTorch changes its dependencies, we won't get them
  3. These packages need to express their version constraint on these libraries so they are correctly satisfied at install time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looks like these libraries are explicitly loaded by pytorch, e.g. https://github.com/pytorch/pytorch/blob/6734cb7bf2c1763118dcc430cee6110a88f8f849/torch/__init__.py#L313, since these packages are all pytorch CUDAExtensions, perhaps they should rely on pytorch to load them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, the linker seems to think that none of the curand symbols are needed to be loaded dynamically. Perhaps, this package uses header stuff that can be inlined? I have added the cublas, cudart, and python libraries to the linking as needed.

@carterbox
Copy link
Member

I'm currently building with -Wl,--no-undefined to check which libraries are missing links for symbols.

@carterbox
Copy link
Member

Waiting on conda-forge/admin-requests#1158

The CUDA 12 builds complete, but the CUDA 11 builds need more time.

@carterbox
Copy link
Member

Testing to see if 18 hours is enought for CUDA 11 builds.

@carterbox carterbox force-pushed the multi-output-extensions branch from 4f13ea7 to 576a572 Compare November 13, 2024 20:00
@weiji14 weiji14 mentioned this pull request Nov 14, 2024
3 tasks
@carterbox carterbox force-pushed the multi-output-extensions branch from 576a572 to 210cead Compare November 15, 2024 19:00
@carterbox carterbox closed this Nov 15, 2024
@carterbox carterbox reopened this Nov 15, 2024
@carterbox carterbox added the automerge Merge the PR when CI passes label Nov 18, 2024
@conda-forge-admin conda-forge-admin merged commit efabe86 into conda-forge:main Nov 18, 2024
10 checks passed
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • github-actions: passed

Thus the PR was passing and merged! Have a great day!

@jakirkham
Copy link
Member

Woohoo! 🥳

Thanks everyone! 🙏

Glad to see this one in 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add multiple outputs for fused_dense_lib and layer_norm
5 participants