Status of prototype features

I was parsing through our `prototype` folder and wanted to give my take on what should be promoted, deleted or requires further discussion

* [ ] spinquant, awq, autoround, hqq: All of these are algorithm implementations and if we are convinced they are `correct` and pass reference checks relative to the original repos we should promote those to out of prototype. In particular the benefits we should lean into are accelerated performance using `torch.compile` and serialization support with the HF hub @jerryzh168 
* [x] DORA: This technique didn't end up picking up as much as low bit optimizers, the idea of low rank as a form of compression is interesting but I would lean on deleting this although I could convinced that having some latent space optimizers in is relevant considering MLA is so hot right now https://github.com/pytorch/ao/pull/1815
* [x] Profiler: This never worked with torch.compile and so for us has limited utility unless @jeromeku can fix and if not we can delete https://github.com/pytorch/ao/pull/1862
* [ ] float8nocompile: This doesn't feel like it should be a prototype feature and I'd like to hear some detail on the promotion plan @danielvegamyhre 
* [ ] common: This folder probably shouldn't exist
* [x] quantized_training: This should remain in prototype because it primarily targets older or consumer GPUs, concretely as our focus moves to blackwell there won't be much of a difference between dtypes for inference vs training. Granted stochastic rounding should be promoted as a utility
* [x] low_bit_optim: This is great work, we should promote it out of prototype https://github.com/pytorch/ao/pull/1864
* [x] Split_k: This just solves a very narrow problem so should be deleted in favor of using inductor matmul templates. It was meant more as an example of how to ship triton kernels with ao which well isn't hard because it's all JIT https://github.com/pytorch/ao/pull/1816
* [ ] mx_formats: With blackwell out, this should be promoted out of prototype
* [ ] Sparsity: 2:4 sparsity for inference should be moved out of prototype, it's likely going to continue being relevant esp for future Flash attention implementations
* [x] dtypes: This is mostly for bitnet support, should either be deleted or refactored into quantized training https://github.com/pytorch/ao/pull/1866

I'd love to hear more on folks especially if you disagree with anything! 

cc @supriyar @jerryzh168 @drisspg @vkuzo @gau-nernst  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Status of prototype features #1807

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status of prototype features #1807

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions