Performance benchmark for NVIDIA embedding models.
This tool benchmarks NVIDIA embedding models by sending requests with various configurations and measuring performance metrics such as latency and throughput.
The benchmark suite now requires a configuration file to configure the benchmark parameters:
./run.sh my_bench.conf
Where my_bench.conf
is a configuration file containing the benchmark parameters.
The configuration file uses a simple key-value format:
# Required parameters
URL=http://localhost:8000
MODEL=nvidia/nv-embed-base
# Optional parameters (comment out to use defaults)
# Space-separated values for arrays
MODE=query passage
BATCH_SIZE=1 16 32 64
CONCURRENCY=1 4 8 16
URL
: The URL of the embedding serviceMODEL
: The model name to use for embeddings (the "nvidia/" prefix is optional and will be added automatically if omitted)
MODE
: Space-separated list of modes to test (default: "query passage")BATCH_SIZE
: Space-separated list of batch sizes to test (default: "1 16 32 64")CONCURRENCY
: Space-separated list of concurrency levels to test (default: "1 4 8 16")
An example configuration file is provided in example.conf
.
Results are saved to result.csv
in the current directory, with the following columns:
- Model: The model name
- Tokens: Number of tokens per chunk
- Batch size: Size of each batch
- Concurrency: Number of concurrent operations
- Min (ms): Minimum latency in milliseconds
- Median (ms): Median latency in milliseconds
- P90 (ms): 90th percentile latency in milliseconds
- P99 (ms): 99th percentile latency in milliseconds
- Max (ms): Maximum latency in milliseconds
- Throughput: Requests per second