Change the repository type filter
All
Repositories list
16 repositories
- A high-performance inference system for large language models, designed for production environments.
cutlass
Publicdynamo
Publicwhl
Public3FS
Publicflashinfer
PublicFlashMLA
Publicdiscussions
Publicflash-attention
Publictokenizers
Publicxformers
PublicFasterTransformer
PublicByteTransformer
Publicoptimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052