Skip to content

Latest commit

 

History

History

perf_analyzer

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Using Perf Analyzer for profiling HuggingFace BART model

Overview

The example presents profiling of HuggingFace BART model using Perf Analyzer

Example consists of following scripts:

  • install.sh - install additional packages and libraries required to run the example
  • server.py - start the model with Triton Inference Server
  • client.py - execute HTTP/gRPC requests to the deployed model

Requirements

The example requires the torch package. It can be installed in your current environment using pip:

pip install torch

Or you can use NVIDIA PyTorch container:

docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:24.10-py3 bash

If you select to use container we recommend to install NVIDIA Container Toolkit.

Quick Start

The step-by-step guide:

  1. Install PyTriton following the installation instruction
  2. Install the additional packages using install.sh
./install.sh
  1. In current terminal start the model on Triton using server.py
./server.py
  1. Open new terminal tab (ex. Ctrl + T on Ubuntu) or window
  2. Go to the example directory
  3. Run the client.sh to perform queries on model:
./client.sh