Skip to content

Commit 687ca03

Browse files
committed
Update configuration reference
1 parent b7823ca commit 687ca03

File tree

1 file changed

+149
-31
lines changed

1 file changed

+149
-31
lines changed

docs/genai/reference/config.md

Lines changed: 149 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Below is an example `genai_config.json` for a decoder-only style model:
8181

8282
## Configuration structure
8383

84-
The configuration file is structured as a JSON object with two main sections: `model` and `search`.
84+
The configuration file is structured as a JSON object with `model`, `search`, and optional `engine` sections.
8585

8686

8787
---
@@ -99,6 +99,9 @@ Top-level configuration object.
9999
- **search**: *(object)*
100100
Generation/search parameters.
101101

102+
- **engine**: *(object, optional)*
103+
Batch scheduling configuration.
104+
102105
---
103106

104107
### Config::Model
@@ -151,6 +154,15 @@ Describes the model architecture, files, and tokenization.
151154
- **decoder_start_token_id**: *(int, optional)*
152155
The id of the decoder start token (for encoder-decoder models).
153156

157+
- **image_token_id**: *(int, optional)*
158+
Token id used to delimit images in multi-modal models.
159+
160+
- **video_token_id**: *(int, optional)*
161+
Token id used to delimit video content in multi-modal models.
162+
163+
- **vision_start_token_id**: *(int, optional)*
164+
Token id used to mark the start of vision content in multi-modal models.
165+
154166
- **vocab_size**: *(int)*
155167
The size of the vocabulary.
156168

@@ -176,12 +188,21 @@ Describes the model architecture, files, and tokenization.
176188

177189
#### Model::Encoder
178190

191+
- **session_options**: *(object, optional)*
192+
See [SessionOptions](#sessionoptions).
193+
194+
- **run_options**: *(array of [string, string] pairs, optional)*
195+
Per-run configuration entries applied to the encoder session.
196+
179197
- **filename**: *(string)*
180198
Path to the encoder ONNX file.
181199

182200
- **hidden_size**: *(int)*
183201
Hidden size of the encoder.
184202

203+
- **num_attention_heads**: *(int)*
204+
Number of attention heads.
205+
185206
- **num_key_value_heads**: *(int)*
186207
Number of key-value heads.
187208

@@ -192,16 +213,26 @@ Describes the model architecture, files, and tokenization.
192213
Size of each attention head.
193214

194215
- **inputs**: *(object)*
195-
- **input_features**: *(string)*
196-
Name of the input features tensor.
197216
- **input_ids**: *(string)*
198217
Name of the input ids tensor.
218+
- **embeddings**: *(string)*
219+
Name of the input embeddings tensor.
199220
- **attention_mask**: *(string)*
200221
Name of the attention mask tensor.
222+
- **position_ids**: *(string)*
223+
Name of the position ids tensor.
224+
- **audio_features**: *(string)*
225+
Name of the audio features tensor.
201226

202227
- **outputs**: *(object)*
203228
- **encoder_outputs**: *(string)*
204229
Name of the encoder outputs tensor.
230+
- **hidden_states**: *(string)*
231+
Name of the encoder hidden states tensor.
232+
- **cross_present_key_names**: *(string)*
233+
Name pattern for cross-attention present key tensors.
234+
- **cross_present_value_names**: *(string)*
235+
Name pattern for cross-attention present value tensors.
205236

206237
---
207238

@@ -210,6 +241,12 @@ Describes the model architecture, files, and tokenization.
210241
- **filename**: *(string)*
211242
Path to the embedding ONNX file.
212243

244+
- **session_options**: *(object, optional)*
245+
See [SessionOptions](#sessionoptions).
246+
247+
- **run_options**: *(array of [string, string] pairs, optional)*
248+
Per-run configuration entries applied to the embedding session.
249+
213250
- **inputs**: *(object)*
214251
- **input_ids**: *(string)*
215252
Name of the input ids tensor.
@@ -229,31 +266,68 @@ Describes the model architecture, files, and tokenization.
229266
- **filename**: *(string)*
230267
Path to the vision ONNX file.
231268

269+
- **session_options**: *(object, optional)*
270+
See [SessionOptions](#sessionoptions).
271+
272+
- **run_options**: *(array of [string, string] pairs, optional)*
273+
Per-run configuration entries applied to the vision session.
274+
232275
- **config_filename**: *(string, optional)*
233-
Path to the vision processor config file.
276+
Path to the vision processor config file. Defaults to `processor_config.json`.
234277

235278
- **adapter_filename**: *(string, optional)*
236279
Path to the vision adapter file.
237280

281+
- **spatial_merge_size**: *(int, optional)*
282+
Patch merge size used by some models (for example, Qwen2.5-VL). Defaults to 2.
283+
284+
- **tokens_per_second**: *(float, optional)*
285+
Tokens-per-second parameter used by some models. Defaults to 2.0.
286+
238287
- **inputs**: *(object)*
239288
- **pixel_values**: *(string)*
240289
Name of the pixel values tensor.
241290
- **image_sizes**: *(string)*
242291
Name of the image sizes tensor.
292+
- **image_grid_thw**: *(string)*
293+
Name of the image grid tensor. Defaults to `image_sizes` when not provided.
243294
- **attention_mask**: *(string)*
244295
Name of the image attention mask tensor.
245296

246297
- **outputs**: *(object)*
247298
- **image_features**: *(string)*
248299
Name of the image features output tensor.
249300

301+
- **pipeline**: *(array, optional)*
302+
Ordered list of sub-models for vision pipelines (for example, patch embedding, attention, merge).
303+
- **filename**: *(string)*
304+
Path to the ONNX file.
305+
- **session_options**: *(object, optional)*
306+
Session options for this pipeline model.
307+
- **run_options**: *(array of [string, string] pairs, optional)*
308+
Run options for this pipeline model.
309+
- **model_id**: *(string)*
310+
Identifier used to link outputs to subsequent stages.
311+
- **inputs**: *(array of string)*
312+
Graph input names.
313+
- **outputs**: *(array of string)*
314+
Graph output names.
315+
- **run_on_cpu**: *(bool, optional)*
316+
If true, forces CPU EP when multiple EPs are configured.
317+
250318
---
251319

252320
#### Model::Speech
253321

254322
- **filename**: *(string)*
255323
Path to the speech ONNX file.
256324

325+
- **session_options**: *(object, optional)*
326+
See [SessionOptions](#sessionoptions).
327+
328+
- **run_options**: *(array of [string, string] pairs, optional)*
329+
Per-run configuration entries applied to the speech session.
330+
257331
- **config_filename**: *(string, optional)*
258332
Path to the speech processor config file.
259333

@@ -284,6 +358,9 @@ Describes the model architecture, files, and tokenization.
284358
- **session_options**: *(object)*
285359
See [SessionOptions](#sessionoptions).
286360

361+
- **run_options**: *(array of [string, string] pairs, optional)*
362+
Per-run configuration entries applied to the decoder session.
363+
287364
- **hidden_size**: *(int)*
288365
Size of the hidden layers.
289366

@@ -309,6 +386,10 @@ Describes the model architecture, files, and tokenization.
309386
"left" or "right".
310387
- **slide_key_value_cache**: *(bool)*
311388
Whether to slide the key-value cache.
389+
- **slide_inputs**: *(bool, optional)*
390+
Whether to slide the input prompt along with the cache.
391+
- **layers**: *(array of int, optional)*
392+
Layer indices that use sliding window attention.
312393

313394
- **inputs**: *(object)*
314395
- **input_ids**: *(string)*
@@ -329,20 +410,28 @@ Describes the model architecture, files, and tokenization.
329410
Name for cross-attention past key tensors.
330411
- **cross_past_value_names**: *(string, optional)*
331412
Name for cross-attention past value tensors.
413+
- **past_key_values_length**: *(string)*
414+
Name of the past key values length tensor.
332415
- **current_sequence_length**: *(string)*
333416
Name of the current sequence length tensor.
334417
- **past_sequence_length**: *(string)*
335418
Name of the past sequence length tensor.
336-
- **past_key_values_length**: *(string)*
337-
Name of the past key values length tensor.
338419
- **total_sequence_length**: *(string)*
339420
Name of the total sequence length tensor.
421+
- **cache_indirection**: *(string)*
422+
Name of the cache indirection tensor.
340423
- **encoder_hidden_states**: *(string)*
341424
Name of the encoder hidden states tensor.
342425
- **rnn_prev_states**: *(string, optional)*
343426
Name of the previous RNN states tensor.
344427
- **encoder_attention_mask**: *(string, optional)*
345428
Name of the encoder attention mask tensor.
429+
- **cumulative_sequence_lengths**: *(string, optional)*
430+
Name of the cumulative sequence lengths tensor.
431+
- **past_sequence_lengths**: *(string, optional)*
432+
Name of the past sequence lengths tensor.
433+
- **block_table**: *(string, optional)*
434+
Name of the block table tensor.
346435

347436
- **outputs**: *(object)*
348437
- **logits**: *(string)*
@@ -353,10 +442,8 @@ Describes the model architecture, files, and tokenization.
353442
Name pattern for present value tensors.
354443
- **present_names**: *(string, optional)*
355444
Name for combined present key/value pairs.
356-
- **cross_present_key_names**: *(string, optional)*
357-
Name for cross-attention present key tensors.
358-
- **cross_present_value_names**: *(string, optional)*
359-
Name for cross-attention present value tensors.
445+
- **output_cross_qk_names**: *(string, optional)*
446+
Name pattern for cross-attention QK outputs.
360447
- **rnn_states**: *(string, optional)*
361448
Name of the RNN states output tensor.
362449

@@ -376,6 +463,9 @@ Describes the model architecture, files, and tokenization.
376463
- **session_options**: *(object, optional)*
377464
Session options for this pipeline model.
378465

466+
- **run_options**: *(array of [string, string] pairs, optional)*
467+
Run options for this pipeline model.
468+
379469
- **inputs**: *(array of string)*
380470
List of input tensor names.
381471

@@ -391,6 +481,9 @@ Describes the model architecture, files, and tokenization.
391481
- **run_on_token_gen**: *(bool)*
392482
Whether to run this model during token generation.
393483

484+
- **is_lm_head**: *(bool, optional)*
485+
True if this pipeline model is the language modeling head.
486+
394487
- **reset_session_idx**: *(int)*
395488
Index of the session to reset for memory management.
396489

@@ -412,39 +505,21 @@ Options passed to ONNX Runtime for model execution.
412505
- **enable_mem_pattern**: *(bool, optional)*
413506
Enable/disable memory pattern optimization.
414507

415-
- **disable_cpu_ep_fallback**: *(bool, optional)*
416-
Disable fallback to CPU execution provider.
417-
418-
- **disable_quant_qdq**: *(bool, optional)*
419-
Disable quantization QDQ.
420-
421-
- **enable_quant_qdq_cleanup**: *(bool, optional)*
422-
Enable quantization QDQ cleanup.
423-
424-
- **ep_context_enable**: *(bool, optional)*
425-
Enable execution provider context.
426-
427-
- **ep_context_embed_mode**: *(string, optional)*
428-
Execution provider context embed mode.
429-
430-
- **ep_context_file_path**: *(string, optional)*
431-
Path to execution provider context file.
432-
433508
- **log_id**: *(string, optional)*
434509
Prefix for logging.
435510

436511
- **log_severity_level**: *(int, optional)*
437512
Logging severity level.
438513

514+
- **log_verbosity_level**: *(int, optional)*
515+
Logging verbosity level.
516+
439517
- **enable_profiling**: *(string, optional)*
440518
Enable profiling.
441519

442520
- **custom_ops_library**: *(string, optional)*
443521
Path to custom ops library.
444522

445-
- **use_env_allocators**: *(bool)*
446-
Use environment allocators.
447-
448523
- **config_entries**: *(array of [string, string] pairs)*
449524
Additional config entries.
450525

@@ -477,6 +552,24 @@ Options passed to ONNX Runtime for model execution.
477552
- **options**: *(array of [string, string] pairs)*
478553
Provider-specific options.
479554

555+
- **device_filtering_options**: *(object, optional)*
556+
Device filtering constraints for this provider.
557+
- **hardware_device_type**: *(string, optional)*
558+
Hardware type to target (CPU, GPU, NPU).
559+
- **hardware_device_id**: *(int, optional)*
560+
Hardware device id to target.
561+
- **hardware_vendor_id**: *(int, optional)*
562+
Hardware vendor id to target.
563+
564+
---
565+
566+
### RunOptions
567+
568+
Entries added to `OrtRunOptions` for a specific session run.
569+
570+
- **run_options**: *(array of [string, string] pairs)*
571+
Key/value config entries applied to the run.
572+
480573
---
481574

482575
### Search
@@ -531,6 +624,31 @@ Describes the generation/search parameters.
531624
- **random_seed**: *(int)*
532625
Seed for the random number generator. -1 means use a random device.
533626

627+
- **chunk_size**: *(int, optional)*
628+
Chunk size for prefill chunking during context processing. Enables chunking when set > 0.
629+
630+
---
631+
632+
### Engine
633+
634+
Batching and scheduling settings for the runtime engine.
635+
636+
- **dynamic_batching**: *(object, optional)*
637+
Dynamic batching configuration.
638+
- **block_size**: *(int)*
639+
Total number of slots per block. Defaults to 256.
640+
- **num_blocks**: *(int, optional)*
641+
Total number of blocks per layer.
642+
- **gpu_utilization_factor**: *(float, optional)*
643+
Fraction of free GPU memory to use for key-value cache.
644+
- **max_batch_size**: *(int)*
645+
Maximum batch size for dynamically batching requests. Defaults to 16.
646+
647+
- **static_batching**: *(object, optional)*
648+
Static batching configuration.
649+
- **max_batch_size**: *(int)*
650+
Maximum batch size for static batching. Defaults to 4.
651+
534652
---
535653

536654
## Notes

0 commit comments

Comments
 (0)