Open
Description
I was parsing through our prototype
folder and wanted to give my take on what should be promoted, deleted or requires further discussion
- spinquant, awq, autoround, hqq: All of these are algorithm implementations and if we are convinced they are
correct
and pass reference checks relative to the original repos we should promote those to out of prototype. In particular the benefits we should lean into are accelerated performance usingtorch.compile
and serialization support with the HF hub @jerryzh168 - DORA: This technique didn't end up picking up as much as low bit optimizers, the idea of low rank as a form of compression is interesting but I would lean on deleting this although I could convinced that having some latent space optimizers in is relevant considering MLA is so hot right now Delete DORA #1815
- Profiler: This never worked with torch.compile and so for us has limited utility unless @jeromeku can fix and if not we can delete
- float8nocompile: This doesn't feel like it should be a prototype feature and I'd like to hear some detail on the promotion plan @danielvegamyhre
- common: This folder probably shouldn't exist
- quantized_training: This should remain in prototype because it primarily targets older or consumer GPUs, concretely as our focus moves to blackwell there won't be much of a difference between dtypes for inference vs training. Granted stochastic rounding should be promoted as a utility
- low_bit_optim: This is great work, we should promote it out of prototype
- Split_k: This just solves a very narrow problem so should be deleted in favor of using inductor matmul templates. It was meant more as an example of how to ship triton kernels with ao which well isn't hard because it's all JIT Remove split_k kernel #1816
- mx_formats: With blackwell out, this should be promoted out of prototype
- Sparsity: 2:4 sparsity for inference should be moved out of prototype, it's likely going to continue being relevant esp for future Flash attention implementations
- Kernel: this is a kernel autotuner, it should be deleted since we can just rely on inductor's max autotune mechanism
- dtypes: This is mostly for bitnet support, should either be deleted or refactored into quantized training
I'd love to hear more on folks especially if you disagree with anything!
Activity