@@ -81,7 +81,7 @@ Below is an example `genai_config.json` for a decoder-only style model:
8181
8282## Configuration structure
8383
84- The configuration file is structured as a JSON object with two main sections: ` model ` and ` search ` .
84+ The configuration file is structured as a JSON object with ` model ` , ` search ` , and optional ` engine ` sections .
8585
8686
8787---
@@ -99,6 +99,9 @@ Top-level configuration object.
9999- ** search** : * (object)*
100100 Generation/search parameters.
101101
102+ - ** engine** : * (object, optional)*
103+ Batch scheduling configuration.
104+
102105---
103106
104107### Config::Model
@@ -151,6 +154,15 @@ Describes the model architecture, files, and tokenization.
151154- ** decoder_start_token_id** : * (int, optional)*
152155 The id of the decoder start token (for encoder-decoder models).
153156
157+ - ** image_token_id** : * (int, optional)*
158+ Token id used to delimit images in multi-modal models.
159+
160+ - ** video_token_id** : * (int, optional)*
161+ Token id used to delimit video content in multi-modal models.
162+
163+ - ** vision_start_token_id** : * (int, optional)*
164+ Token id used to mark the start of vision content in multi-modal models.
165+
154166- ** vocab_size** : * (int)*
155167 The size of the vocabulary.
156168
@@ -176,12 +188,21 @@ Describes the model architecture, files, and tokenization.
176188
177189#### Model::Encoder
178190
191+ - ** session_options** : * (object, optional)*
192+ See [ SessionOptions] ( #sessionoptions ) .
193+
194+ - ** run_options** : * (array of [ string, string] pairs, optional)*
195+ Per-run configuration entries applied to the encoder session.
196+
179197- ** filename** : * (string)*
180198 Path to the encoder ONNX file.
181199
182200- ** hidden_size** : * (int)*
183201 Hidden size of the encoder.
184202
203+ - ** num_attention_heads** : * (int)*
204+ Number of attention heads.
205+
185206- ** num_key_value_heads** : * (int)*
186207 Number of key-value heads.
187208
@@ -192,16 +213,26 @@ Describes the model architecture, files, and tokenization.
192213 Size of each attention head.
193214
194215- ** inputs** : * (object)*
195- - ** input_features** : * (string)*
196- Name of the input features tensor.
197216 - ** input_ids** : * (string)*
198217 Name of the input ids tensor.
218+ - ** embeddings** : * (string)*
219+ Name of the input embeddings tensor.
199220 - ** attention_mask** : * (string)*
200221 Name of the attention mask tensor.
222+ - ** position_ids** : * (string)*
223+ Name of the position ids tensor.
224+ - ** audio_features** : * (string)*
225+ Name of the audio features tensor.
201226
202227- ** outputs** : * (object)*
203228 - ** encoder_outputs** : * (string)*
204229 Name of the encoder outputs tensor.
230+ - ** hidden_states** : * (string)*
231+ Name of the encoder hidden states tensor.
232+ - ** cross_present_key_names** : * (string)*
233+ Name pattern for cross-attention present key tensors.
234+ - ** cross_present_value_names** : * (string)*
235+ Name pattern for cross-attention present value tensors.
205236
206237---
207238
@@ -210,6 +241,12 @@ Describes the model architecture, files, and tokenization.
210241- ** filename** : * (string)*
211242 Path to the embedding ONNX file.
212243
244+ - ** session_options** : * (object, optional)*
245+ See [ SessionOptions] ( #sessionoptions ) .
246+
247+ - ** run_options** : * (array of [ string, string] pairs, optional)*
248+ Per-run configuration entries applied to the embedding session.
249+
213250- ** inputs** : * (object)*
214251 - ** input_ids** : * (string)*
215252 Name of the input ids tensor.
@@ -229,31 +266,68 @@ Describes the model architecture, files, and tokenization.
229266- ** filename** : * (string)*
230267 Path to the vision ONNX file.
231268
269+ - ** session_options** : * (object, optional)*
270+ See [ SessionOptions] ( #sessionoptions ) .
271+
272+ - ** run_options** : * (array of [ string, string] pairs, optional)*
273+ Per-run configuration entries applied to the vision session.
274+
232275- ** config_filename** : * (string, optional)*
233- Path to the vision processor config file.
276+ Path to the vision processor config file. Defaults to ` processor_config.json ` .
234277
235278- ** adapter_filename** : * (string, optional)*
236279 Path to the vision adapter file.
237280
281+ - ** spatial_merge_size** : * (int, optional)*
282+ Patch merge size used by some models (for example, Qwen2.5-VL). Defaults to 2.
283+
284+ - ** tokens_per_second** : * (float, optional)*
285+ Tokens-per-second parameter used by some models. Defaults to 2.0.
286+
238287- ** inputs** : * (object)*
239288 - ** pixel_values** : * (string)*
240289 Name of the pixel values tensor.
241290 - ** image_sizes** : * (string)*
242291 Name of the image sizes tensor.
292+ - ** image_grid_thw** : * (string)*
293+ Name of the image grid tensor. Defaults to ` image_sizes ` when not provided.
243294 - ** attention_mask** : * (string)*
244295 Name of the image attention mask tensor.
245296
246297- ** outputs** : * (object)*
247298 - ** image_features** : * (string)*
248299 Name of the image features output tensor.
249300
301+ - ** pipeline** : * (array, optional)*
302+ Ordered list of sub-models for vision pipelines (for example, patch embedding, attention, merge).
303+ - ** filename** : * (string)*
304+ Path to the ONNX file.
305+ - ** session_options** : * (object, optional)*
306+ Session options for this pipeline model.
307+ - ** run_options** : * (array of [ string, string] pairs, optional)*
308+ Run options for this pipeline model.
309+ - ** model_id** : * (string)*
310+ Identifier used to link outputs to subsequent stages.
311+ - ** inputs** : * (array of string)*
312+ Graph input names.
313+ - ** outputs** : * (array of string)*
314+ Graph output names.
315+ - ** run_on_cpu** : * (bool, optional)*
316+ If true, forces CPU EP when multiple EPs are configured.
317+
250318---
251319
252320#### Model::Speech
253321
254322- ** filename** : * (string)*
255323 Path to the speech ONNX file.
256324
325+ - ** session_options** : * (object, optional)*
326+ See [ SessionOptions] ( #sessionoptions ) .
327+
328+ - ** run_options** : * (array of [ string, string] pairs, optional)*
329+ Per-run configuration entries applied to the speech session.
330+
257331- ** config_filename** : * (string, optional)*
258332 Path to the speech processor config file.
259333
@@ -284,6 +358,9 @@ Describes the model architecture, files, and tokenization.
284358- ** session_options** : * (object)*
285359 See [ SessionOptions] ( #sessionoptions ) .
286360
361+ - ** run_options** : * (array of [ string, string] pairs, optional)*
362+ Per-run configuration entries applied to the decoder session.
363+
287364- ** hidden_size** : * (int)*
288365 Size of the hidden layers.
289366
@@ -309,6 +386,10 @@ Describes the model architecture, files, and tokenization.
309386 "left" or "right".
310387 - ** slide_key_value_cache** : * (bool)*
311388 Whether to slide the key-value cache.
389+ - ** slide_inputs** : * (bool, optional)*
390+ Whether to slide the input prompt along with the cache.
391+ - ** layers** : * (array of int, optional)*
392+ Layer indices that use sliding window attention.
312393
313394- ** inputs** : * (object)*
314395 - ** input_ids** : * (string)*
@@ -329,20 +410,28 @@ Describes the model architecture, files, and tokenization.
329410 Name for cross-attention past key tensors.
330411 - ** cross_past_value_names** : * (string, optional)*
331412 Name for cross-attention past value tensors.
413+ - ** past_key_values_length** : * (string)*
414+ Name of the past key values length tensor.
332415 - ** current_sequence_length** : * (string)*
333416 Name of the current sequence length tensor.
334417 - ** past_sequence_length** : * (string)*
335418 Name of the past sequence length tensor.
336- - ** past_key_values_length** : * (string)*
337- Name of the past key values length tensor.
338419 - ** total_sequence_length** : * (string)*
339420 Name of the total sequence length tensor.
421+ - ** cache_indirection** : * (string)*
422+ Name of the cache indirection tensor.
340423 - ** encoder_hidden_states** : * (string)*
341424 Name of the encoder hidden states tensor.
342425 - ** rnn_prev_states** : * (string, optional)*
343426 Name of the previous RNN states tensor.
344427 - ** encoder_attention_mask** : * (string, optional)*
345428 Name of the encoder attention mask tensor.
429+ - ** cumulative_sequence_lengths** : * (string, optional)*
430+ Name of the cumulative sequence lengths tensor.
431+ - ** past_sequence_lengths** : * (string, optional)*
432+ Name of the past sequence lengths tensor.
433+ - ** block_table** : * (string, optional)*
434+ Name of the block table tensor.
346435
347436- ** outputs** : * (object)*
348437 - ** logits** : * (string)*
@@ -353,10 +442,8 @@ Describes the model architecture, files, and tokenization.
353442 Name pattern for present value tensors.
354443 - ** present_names** : * (string, optional)*
355444 Name for combined present key/value pairs.
356- - ** cross_present_key_names** : * (string, optional)*
357- Name for cross-attention present key tensors.
358- - ** cross_present_value_names** : * (string, optional)*
359- Name for cross-attention present value tensors.
445+ - ** output_cross_qk_names** : * (string, optional)*
446+ Name pattern for cross-attention QK outputs.
360447 - ** rnn_states** : * (string, optional)*
361448 Name of the RNN states output tensor.
362449
@@ -376,6 +463,9 @@ Describes the model architecture, files, and tokenization.
376463- ** session_options** : * (object, optional)*
377464 Session options for this pipeline model.
378465
466+ - ** run_options** : * (array of [ string, string] pairs, optional)*
467+ Run options for this pipeline model.
468+
379469- ** inputs** : * (array of string)*
380470 List of input tensor names.
381471
@@ -391,6 +481,9 @@ Describes the model architecture, files, and tokenization.
391481- ** run_on_token_gen** : * (bool)*
392482 Whether to run this model during token generation.
393483
484+ - ** is_lm_head** : * (bool, optional)*
485+ True if this pipeline model is the language modeling head.
486+
394487- ** reset_session_idx** : * (int)*
395488 Index of the session to reset for memory management.
396489
@@ -412,39 +505,21 @@ Options passed to ONNX Runtime for model execution.
412505- ** enable_mem_pattern** : * (bool, optional)*
413506 Enable/disable memory pattern optimization.
414507
415- - ** disable_cpu_ep_fallback** : * (bool, optional)*
416- Disable fallback to CPU execution provider.
417-
418- - ** disable_quant_qdq** : * (bool, optional)*
419- Disable quantization QDQ.
420-
421- - ** enable_quant_qdq_cleanup** : * (bool, optional)*
422- Enable quantization QDQ cleanup.
423-
424- - ** ep_context_enable** : * (bool, optional)*
425- Enable execution provider context.
426-
427- - ** ep_context_embed_mode** : * (string, optional)*
428- Execution provider context embed mode.
429-
430- - ** ep_context_file_path** : * (string, optional)*
431- Path to execution provider context file.
432-
433508- ** log_id** : * (string, optional)*
434509 Prefix for logging.
435510
436511- ** log_severity_level** : * (int, optional)*
437512 Logging severity level.
438513
514+ - ** log_verbosity_level** : * (int, optional)*
515+ Logging verbosity level.
516+
439517- ** enable_profiling** : * (string, optional)*
440518 Enable profiling.
441519
442520- ** custom_ops_library** : * (string, optional)*
443521 Path to custom ops library.
444522
445- - ** use_env_allocators** : * (bool)*
446- Use environment allocators.
447-
448523- ** config_entries** : * (array of [ string, string] pairs)*
449524 Additional config entries.
450525
@@ -477,6 +552,24 @@ Options passed to ONNX Runtime for model execution.
477552- ** options** : * (array of [ string, string] pairs)*
478553 Provider-specific options.
479554
555+ - ** device_filtering_options** : * (object, optional)*
556+ Device filtering constraints for this provider.
557+ - ** hardware_device_type** : * (string, optional)*
558+ Hardware type to target (CPU, GPU, NPU).
559+ - ** hardware_device_id** : * (int, optional)*
560+ Hardware device id to target.
561+ - ** hardware_vendor_id** : * (int, optional)*
562+ Hardware vendor id to target.
563+
564+ ---
565+
566+ ### RunOptions
567+
568+ Entries added to ` OrtRunOptions ` for a specific session run.
569+
570+ - ** run_options** : * (array of [ string, string] pairs)*
571+ Key/value config entries applied to the run.
572+
480573---
481574
482575### Search
@@ -531,6 +624,31 @@ Describes the generation/search parameters.
531624- ** random_seed** : * (int)*
532625 Seed for the random number generator. -1 means use a random device.
533626
627+ - ** chunk_size** : * (int, optional)*
628+ Chunk size for prefill chunking during context processing. Enables chunking when set > 0.
629+
630+ ---
631+
632+ ### Engine
633+
634+ Batching and scheduling settings for the runtime engine.
635+
636+ - ** dynamic_batching** : * (object, optional)*
637+ Dynamic batching configuration.
638+ - ** block_size** : * (int)*
639+ Total number of slots per block. Defaults to 256.
640+ - ** num_blocks** : * (int, optional)*
641+ Total number of blocks per layer.
642+ - ** gpu_utilization_factor** : * (float, optional)*
643+ Fraction of free GPU memory to use for key-value cache.
644+ - ** max_batch_size** : * (int)*
645+ Maximum batch size for dynamically batching requests. Defaults to 16.
646+
647+ - ** static_batching** : * (object, optional)*
648+ Static batching configuration.
649+ - ** max_batch_size** : * (int)*
650+ Maximum batch size for static batching. Defaults to 4.
651+
534652---
535653
536654## Notes
0 commit comments