qiching

Follow

🎯

Focusing

Albert Cheng qiching

🎯

Focusing

Follow

LLM training and inference @NVIDIA | ex FTE @google @google-gemini | ex intern @MicrosoftResearch @sarc-acl @baidu and quant research | @umich

7 followers · 26 following

NVIDIA
Santa Clara, CA
22:51 (UTC -07:00)
in/albert-c-4365a9249

Achievements

Achievements

Pinned Loading

tpu-inference tpu-inference Public

Forked from vllm-project/tpu-inference

Python
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 77.6k 15.9k
NVIDIA-NeMo/NeMo NVIDIA-NeMo/NeMo Public

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17.1k 3.4k
flashinfer-ai/flashinfer flashinfer-ai/flashinfer Public

FlashInfer: Kernel Library for LLM Serving

Python 5.5k 916
NVIDIA/tilus NVIDIA/tilus Public

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 477 24