Commit e8c5489
authored
ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834)
* Start work on flash_attn refactor
* Refactor
* Split k/v quantization
* Refactor and abstract quantization logic for flash_attn and mul_mat
* Add quantization support to tile path
* formatting
* Move to functions, add a check1 parent 3c7450c commit e8c5489
11 files changed
Lines changed: 986 additions & 950 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
15 | 18 | | |
16 | 19 | | |
17 | 20 | | |
| |||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
41 | | - | |
42 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
49 | 67 | | |
50 | 68 | | |
51 | 69 | | |
| |||
595 | 613 | | |
596 | 614 | | |
597 | 615 | | |
598 | | - | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
599 | 629 | | |
600 | 630 | | |
601 | 631 | | |
602 | 632 | | |
603 | | - | |
| 633 | + | |
604 | 634 | | |
605 | 635 | | |
606 | 636 | | |
607 | | - | |
| 637 | + | |
608 | 638 | | |
609 | 639 | | |
610 | | - | |
| 640 | + | |
611 | 641 | | |
612 | 642 | | |
613 | 643 | | |
| |||
0 commit comments