Skip to content

Milestones

List view

  • 1. Int8 dynamic quantization (weight only) support 2. Make UNet without Triton to run as fast as possible (because Triton has a significant CPU overhead and is not stable enough) 3. Faster convolution kernel with FP16 accumulator 4. Demonstrate ability to optimize LLMs (I have already used stable-fast to do this internally)

    No due date