sfc-gh-yewang

Follow

Ye Wang sfc-gh-yewang

Follow

3 followers · 0 following

Achievements

Achievements

Popular repositories Loading

flash-attention flash-attention Public

Forked from vllm-project/flash-attention

Fast and memory-efficient exact attention

Python
onnxruntime onnxruntime Public

Forked from microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++
TensorRT-LLM TensorRT-LLM Public

Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

Python
DeepSpeed DeepSpeed Public

Forked from deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python
CUDA-Learn-Notes CUDA-Learn-Notes Public

Forked from xlite-dev/LeetCUDA

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda