Add attention_backend to let user choose #1456

yanbing-j · 2025-01-13T06:48:56Z

This PR is submitted for #1452, to add an argument attention_backend to let user choose the backend used in SDPA.

For CPU, only math and flash_attention are valid. For CUDA, all 4 backends are valid.

pytorch-bot · 2025-01-13T06:48:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1456

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit a9dfc0e with merge base 2fc98f7 ():

NEW FAILURE - The following job has failed:

pull / test-build-runner-et-android / linux-job (gh)
RuntimeError: Command docker exec -t c280bdd05e52a41ccfca921f81ff91498a56c0f0acec2b0ffc66b73308d8adb3 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-tinystories-executorch (16-core-ubuntu) (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-tinystories-executorch (macos-14-xlarge) (gh) (trunk failure)
RuntimeError: Failed to load ET compiled /Users/runner/work/torchchat/torchchat/stories15M.pte

This comment was automatically generated by Dr. CI and updates every 15 minutes.

yanbing-j · 2025-01-13T06:51:27Z

cc @mingfeima @lucylq @angelayi @Jack-Khuu

lucylq

Thanks for making these changes! This looks good to me.

cc @Jack-Khuu

Jack-Khuu · 2025-01-15T19:41:45Z

Looks like the ET pin needs bumping so I'll spin that up

Jack-Khuu · 2025-01-15T19:43:17Z

torchchat/generate.py

+            if self.builder_args.device == "cpu" and (self.builder_args.attention_backend == "efficient_attention"
+                                     or self.builder_args.attention_backend == "cudnn_attention"):
+                print(f"Warning: {self.builder_args.attention_backend} is not supported on CPU. Using math instead.")
+                self.builder_args.attention_backend = "math"


Can we bump this into the constructor of BuilderArgs?

Jack-Khuu · 2025-01-15T19:44:36Z

torchchat/cli/builder.py

@@ -69,6 +69,7 @@ class BuilderArgs:
    prefill_possible: bool = False
    dynamic_shapes: bool = False
    max_seq_length: Optional[int] = None
+    attention_backend: str = "math"


Thanks for adding the change!! Mind bumping the casting to the actual kernel into the body here?

That way we can do any sanity checks early and aren't passing around a string

Have updated. Please review again!

Jack-Khuu · 2025-01-15T19:47:31Z

@angelayi I recall you mentioning that there might be export issue with the other SDP backend?

Do you know if this issue was addressed? Or should be gate the backends to non-export?

Jack-Khuu · 2025-01-16T01:00:21Z

Seems like yall had a chance to explore the export issue in the tagged issue

Let's add a check in export.py that flags when someone tries to use not Math.SDPA, to avoid users accidentally stumbling into it

Jack-Khuu · 2025-01-16T04:18:29Z

torchchat/generate.py

+        sdp_backend_dict = {
+            'math': torch.nn.attention.SDPBackend.MATH,
+            'flash_attention': torch.nn.attention.SDPBackend.FLASH_ATTENTION,
+            'efficient_attention': torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION,
+            'cudnn_attention': torch.nn.attention.SDPBackend.CUDNN_ATTENTION,
+        }


Can we push this into the builder.py as well?

bump this into the constructor of BuilderArgs

mikekgfb · 2025-01-18T06:11:52Z

@angelayi I recall you mentioning that there might be export issue with the other SDP backend?

Do you know if this issue was addressed? Or should be gate the backends to non-export?

FLASH seemed to work for AOTI. Maybe we can at least allow a subset, and then methodically put them all thru a new test patter -- which we should anyway, to test the new options for generate as well. (suggest as separate PR, not piggyback here...)

for backend in math flash cudnn efficient
do 
  # cuda
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there"
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there" --compile --compile-prefill
  py3 tc export stories15M --attention-backend $backend --output-dso s.dso
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there" -dso s.dso
  # cpu
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there" -device cpu
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there" --compile --compile-prefill -device cpu
  py3 tc export stories15M --attention-backend $backend --output-dso s.dso -device cpu
  py3 tc generate stories15M --attention-backend $backend --prompt "hello there" -dso s.dso -device cpu
done

Jack-Khuu · 2025-01-22T01:56:02Z

Thanks again @yanbing-j, merging this in

We can work on creating a support matrix separately

bump this into the constructor of BuilderArgs Co-authored-by: Jack-Khuu <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 13, 2025

yanbing-j mentioned this pull request Jan 13, 2025

Why Torchchat uses MATH as SDPA backend? #1452

Closed

lucylq reviewed Jan 13, 2025

View reviewed changes

lucylq requested a review from Jack-Khuu January 13, 2025 22:39

Jack-Khuu reviewed Jan 15, 2025

View reviewed changes

Jack-Khuu reviewed Jan 16, 2025

View reviewed changes

yanbing-j force-pushed the yanbing/fix_1452 branch from d22fdb6 to 48f3c19 Compare January 16, 2025 04:41

Add attention_backend to let user choose

48f3c19

bump this into the constructor of BuilderArgs

Merge branch 'main' into yanbing/fix_1452

a9dfc0e

Jack-Khuu approved these changes Jan 22, 2025

View reviewed changes

Jack-Khuu merged commit 45cd239 into pytorch:main Jan 22, 2025
54 of 57 checks passed

yanbing-j deleted the yanbing/fix_1452 branch January 22, 2025 01:57

Jack-Khuu mentioned this pull request Jan 22, 2025

Typo: Fix generate signature type hint for attention_backend #1471

Merged

vmpuri pushed a commit that referenced this pull request Feb 4, 2025

Add attention_backend as a configurable option (#1456)

ef58fce

bump this into the constructor of BuilderArgs Co-authored-by: Jack-Khuu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add attention_backend to let user choose #1456

Add attention_backend to let user choose #1456

Uh oh!

yanbing-j commented Jan 13, 2025

Uh oh!

pytorch-bot bot commented Jan 13, 2025 •

edited

Loading

Uh oh!

yanbing-j commented Jan 13, 2025

Uh oh!

lucylq left a comment

Uh oh!

Jack-Khuu commented Jan 15, 2025

Uh oh!

Jack-Khuu Jan 15, 2025

Uh oh!

Jack-Khuu Jan 15, 2025

Uh oh!

yanbing-j Jan 16, 2025

Uh oh!

Jack-Khuu commented Jan 15, 2025

Uh oh!

Jack-Khuu commented Jan 16, 2025

Uh oh!

Jack-Khuu Jan 16, 2025

Uh oh!

yanbing-j Jan 16, 2025

Uh oh!

mikekgfb commented Jan 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Jack-Khuu commented Jan 22, 2025

Uh oh!

Uh oh!

Add attention_backend to let user choose #1456

Add attention_backend to let user choose #1456

Uh oh!

Conversation

yanbing-j commented Jan 13, 2025

Uh oh!

pytorch-bot bot commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1456

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

yanbing-j commented Jan 13, 2025

Uh oh!

lucylq left a comment

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu commented Jan 15, 2025

Uh oh!

Jack-Khuu Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

yanbing-j Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu commented Jan 15, 2025

Uh oh!

Jack-Khuu commented Jan 16, 2025

Uh oh!

Jack-Khuu Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

yanbing-j Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

mikekgfb commented Jan 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu commented Jan 22, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2025 •

edited

Loading

mikekgfb commented Jan 18, 2025 •

edited

Loading