-
NVIDIA
- Toronto, Canada
Highlights
Pinned Loading
-
NVIDIA/TensorRT-LLM
NVIDIA/TensorRT-LLM PublicTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
-
triton-inference-server/server
triton-inference-server/server PublicThe Triton Inference Server provides an optimized cloud and edge inferencing solution.
-
triton-inference-server/python_backend
triton-inference-server/python_backend PublicTriton backend that enables pre-process, post-processing and other logic to be implemented in Python.
-
triton-inference-server/model_analyzer
triton-inference-server/model_analyzer PublicTriton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
-
learning-to-quantize
learning-to-quantize PublicCode for "Adaptive Gradient Quantization for Data-Parallel SGD", published in NeurIPS 2020.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.