Vectorch AI

All

16 repositories

ScaleLLM
Public
A high-performance inference system for large language models, designed for production environments.
performance gpu model production cuda efficiency inference transformer llama speculative
C++
•
Apache License 2.0
•37•488•48•8•Updated Dec 19, 2025Dec 19, 2025
cutlass
Public
CUDA Templates for Linear Algebra Subroutines
C++
•
Other
•1.6k•0•0•0•Updated Nov 6, 2025Nov 6, 2025
nixl
Public
NVIDIA Inference Xfer Library (NIXL)
C++
•
Apache License 2.0
•208•0•0•0•Updated Nov 4, 2025Nov 4, 2025
dynamo
Public
A Datacenter Scale Distributed Inference Serving Framework
Rust
•
Apache License 2.0
•748•0•0•0•Updated Nov 4, 2025Nov 4, 2025
whl
Public
repository to host python whl package.
HTML
•0•0•0•0•Updated Sep 13, 2025Sep 13, 2025
flux
Public
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++
•
Apache License 2.0
•85•0•0•0•Updated Apr 15, 2025Apr 15, 2025
3FS
Public
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
C++
•
MIT License
•977•0•0•0•Updated Feb 28, 2025Feb 28, 2025
flashinfer
Public
FlashInfer: Kernel Library for LLM Serving
Cuda
•
Apache License 2.0
•606•0•0•0•Updated Feb 27, 2025Feb 27, 2025
FlashMLA
Public
C++
•
MIT License
•918•0•0•0•Updated Feb 26, 2025Feb 26, 2025
vcpkg
Public
C++ Library Manager for Windows, Linux, and MacOS
CMake
•
MIT License
•7.3k•0•0•0•Updated Feb 24, 2025Feb 24, 2025
discussions
Public
0•0•0•0•Updated Jun 5, 2024Jun 5, 2024
flash-attention
Public
Fast and memory-efficient exact attention
Python
•
BSD 3-Clause "New" or "Revised" License
•2.2k•0•0•0•Updated Oct 15, 2023Oct 15, 2023
tokenizers
Public
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Rust
•
Apache License 2.0
•1k•0•0•0•Updated Aug 4, 2023Aug 4, 2023
xformers
Public
Hackable and optimized Transformers building blocks, supporting a composable construction.
Python
•
Other
•746•0•0•0•Updated Aug 1, 2023Aug 1, 2023
FasterTransformer
Public
Transformer related optimization, including BERT, GPT
C++
•
Apache License 2.0
•927•0•0•0•Updated Jul 28, 2023Jul 28, 2023
ByteTransformer
Public
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
C++
•
Apache License 2.0
•37•0•0•0•Updated Jul 24, 2023Jul 24, 2023