Commit 4452186
committed
feat: add flash-attn to reduce VRAM usage and speed up inference
Without flash-attention, eager attention materializes O(N²) matrices
for each layer. On high-res PDF pages this needs 7+ GB just for
activations, exceeding the MPS memory limit. Flash-attention reduces
this to O(N).1 parent 89cf547 commit 4452186
1 file changed
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
12 | 14 | | |
13 | | - | |
| 15 | + | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
| |||
0 commit comments