-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathguidellm-help.txt
More file actions
119 lines (115 loc) · 7.3 KB
/
guidellm-help.txt
File metadata and controls
119 lines (115 loc) · 7.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
Usage: guidellm benchmark from-file [OPTIONS] [PATH]
Load a saved benchmark report and optionally re-export to other formats.
PATH: Path to the saved benchmark report file (default: ./benchmarks.json).
Options:
--output-path PATH Directory or file path to save re-exported benchmark
results. If a directory, all output formats will be
saved there. If a file, the matching format will be
saved to that file.
--output-formats TEXT Output formats for benchmark results (e.g., console,
json, html, csv).
--help Show this message and exit.
root@d7b17c23597a:/workspace/vllm-rbln# guidellm benchmark run --help
Usage: guidellm benchmark run [OPTIONS]
Run a benchmark against a generative model. Supports multiple backends, data
sources, strategies, and output formats. Configuration can be loaded from a
scenario file or specified via options.
Options:
-c, --scenario [file|[rag|rag.json|chat|chat.json]]
Builtin scenario name or path to config
file. CLI options override scenario
settings.
--target TEXT Target backend URL (e.g.,
http://localhost:8000).
--data TEXT HuggingFace dataset ID, path to dataset,
path to data file (csv/json/jsonl/txt), or
synthetic data config (json/key=value).
--profile, --rate-type [async|poisson|concurrent|throughput|constant|sweep|synchronous]
Benchmark profile type. Options: async,
poisson, concurrent, throughput, constant,
sweep, synchronous.
--rate TEXT Benchmark rate(s) to test. Meaning depends
on profile: sweep=number of benchmarks,
concurrent=concurrent requests,
async/constant/poisson=requests per second.
--backend, --backend-type [openai_http]
Backend type. Options: openai_http.
--backend-kwargs, --backend-args TEXT
JSON string of arguments to pass to the
backend.
--model TEXT Model ID to benchmark. If not provided, uses
first available model.
--request-type [audio_transcriptions|chat_completions|text_completions|audio_translations]
Request type to create for each data sample.
Options: audio_transcriptions,
chat_completions, text_completions,
audio_translations.
--request-formatter-kwargs TEXT
JSON string of arguments to pass to the
request formatter.
--processor TEXT Processor or tokenizer for token count
calculations. If not provided, loads from
model.
--processor-args TEXT JSON string of arguments to pass to the
processor constructor.
--data-args TEXT JSON string of arguments to pass to dataset
creation.
--data-samples INTEGER Number of samples from dataset. -1 (default)
uses all samples and dynamically generates
more.
--data-column-mapper TEXT JSON string of column mappings to apply to
the dataset.
--data-sampler [shuffle] Data sampler type.
--data-num-workers INTEGER Number of worker processes for data loading.
--dataloader-kwargs TEXT JSON string of arguments to pass to the
dataloader constructor.
--random-seed INTEGER Random seed for reproducibility.
--output-dir DIRECTORY The directory path to save file output types
in
--outputs TEXT The filename.ext for each of the outputs to
create or the alises (json, csv, html) for
the output files to create with their
default file names (benchmark.[EXT])
--output-path PATH Legacy parameter for the output path to save
the output result to. Resolves to fill in
output-dir and outputs based on input path.
--disable-console, --disable-console-outputs
Disable all outputs to the console (updates,
interactive progress, results).
--disable-console-interactive, --disable-progress
Disable interactive console progress
updates.
--warmup, --warmup-percent TEXT
Warmup specification: int, float, or dict as
string (json or key=value). Controls time or
requests before measurement starts. Numeric
in (0, 1): percent of duration or request
count. Numeric >=1: duration in seconds or
request count. Advanced config: see
TransientPhaseConfig schema.
--cooldown, --cooldown-percent TEXT
Cooldown specification: int, float, or dict
as string (json or key=value). Controls time
or requests after measurement ends. Numeric
in (0, 1): percent of duration or request
count. Numeric >=1: duration in seconds or
request count. Advanced config: see
TransientPhaseConfig schema.
--rampup FLOAT The time, in seconds, to ramp up the request
rate over. Only applicable for
Throughput/Concurrent strategies
--sample-requests, --output-sampling INTEGER
Number of sample requests per status to
save. None (default) saves all, recommended:
20.
--max-seconds FLOAT Maximum seconds per benchmark. If None, runs
until max_requests or data exhaustion.
--max-requests INTEGER Maximum requests per benchmark. If None,
runs until max_seconds or data exhaustion.
--max-errors INTEGER Maximum errors before stopping the
benchmark.
--max-error-rate FLOAT Maximum error rate before stopping the
benchmark.
--max-global-error-rate FLOAT Maximum global error rate across all
benchmarks.
--help Show this message and exit.