This repository demonstrates how to serve a chest X-ray classification model from HuggingFace using NVIDIA Triton Inference Server.
- NVIDIA GPU
- Docker with NVIDIA Container Toolkit
docker build -t chest-xray-triton .docker run -d --gpus "device=0" -p 8000:8000 8001:8001 -p 8002:8002 chest-xray-triton tritonserver --model-repository=/models# You may need to create a new virtual environment
pip install -r requirements.txt
python client.pyYou can also create a client in other languages (JAVA, C++, etc). Please refer to the this guide for more information.
Please refer to the NVIDIA Triton Inference Server documentation
You may change the count in the instance group section of the config.pbtxt file to enable multiple instances on a single GPU.