You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: try to use kernels lib to get flash attention kernel (#30)
In an attempt to ease some deps and open up more potential paths for
easier benchmarking, this let's us specify flash attention
implementations/backends on the main config objects for both training
and generation.
There's some typing-related changes in this too that affect the artifact
structure's internals. I'm going to do a follow up to fix _a lot_ of
typing issues throughout the repo soon; i have a draft in progress.
---------
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
|`eager`| Standard PyTorch attention | None (built-in) |
220
+
221
+
If the default `kernels-community/vllm-flash-attn3` is configured but the `kernels` package is not installed, the backend automatically falls back to `sdpa`.
222
+
223
+
### Generation (`attention_backend`)
224
+
225
+
Controls the vLLM attention backend used during synthetic data generation. Defaults to `"auto"`, which lets vLLM auto-select the best available backend.
226
+
227
+
```yaml
228
+
# config.yaml
229
+
generation:
230
+
attention_backend: "FLASH_ATTN"
231
+
```
232
+
233
+
Common values: `FLASHINFER`, `FLASH_ATTN`, `TORCH_SDPA`, `TRITON_ATTN`, `FLEX_ATTENTION`.
234
+
194
235
## Artifacts and Workdirs
195
236
196
237
Safe Synthesizer uses a structured directory format to manage artifacts (trained models, synthetic data, logs).
0 commit comments