Configuration Reference

Complete reference for job configuration YAML files.

Overview
Cluster Config Discovery
name
model
resources
slurm
frontend
backend
benchmark
dynamo
profiling
output
health_check
infra
sweep
Config Overrides
FormattablePath Template System
container_mounts
environment
extra_mount
sbatch_directives
srun_options
setup_script
enable_config_dump
Complete Examples

Overview

name: "my-benchmark"           # Required: job name

model:                         # Required: model settings
  path: "deepseek-r1"
  container: "latest"
  precision: "fp8"

resources:                     # Required: GPU allocation
  gpu_type: "gb200"
  prefill_nodes: 1
  decode_nodes: 2

slurm:                         # Optional: SLURM overrides
  time_limit: "02:00:00"

frontend:                      # Optional: router/frontend config
  type: dynamo

backend:                       # Optional: worker config
  type: sglang
  sglang_config:
    prefill: {}
    decode: {}

benchmark:                     # Optional: benchmark config
  type: "sa-bench"
  isl: 1024
  osl: 1024

dynamo:                        # Optional: dynamo version
  version: "0.8.0"

profiling:                     # Optional: profiling config
  type: "none"

output:                        # Optional: output paths
  log_dir: "./outputs/{job_id}/logs"

health_check:                  # Optional: health check settings
  max_attempts: 180
  interval_seconds: 10

setup_script: "my-setup.sh"    # Optional: custom setup script

Cluster Config Discovery

srtctl looks for srtslurm.yaml (cluster-wide settings) in this order:

SRTSLURM_CONFIG environment variable (if set) - explicit path to config file
Current working directory
Parent directory (1 level up)
Grandparent directory (2 levels up)

For users working in deep directory structures (e.g., study directories), set SRTSLURM_CONFIG in your shell profile:

# Add to ~/.bashrc or ~/.zshrc
export SRTSLURM_CONFIG="/path/to/srt-slurm/srtslurm.yaml"

This allows you to run srtctl apply -f config.yaml from anywhere without needing srtslurm.yaml nearby.

Cluster Config Fields

The srtslurm.yaml file can contain the following fields:

Field	Type	Description
`default_account`	string	Default SLURM account
`default_partition`	string	Default SLURM partition
`default_time_limit`	string	Default job time limit
`gpus_per_node`	int	Default GPUs per node
`network_interface`	string	Network interface for NCCL
`srtctl_root`	string	Root directory for srtctl
`output_dir`	string	Custom output directory (overrides srtctl_root/outputs)
`model_paths`	dict	Model path aliases
`containers`	dict	Container image aliases
`default_mounts`	dict	Cluster-wide container mounts

output_dir: When set, job logs are written to output_dir/{job_id}/logs instead of srtctl_root/outputs/{job_id}/logs. Useful for CI/CD and ephemeral environments.

name

Field	Type	Required	Description
`name`	string	Yes	Job name, used for identification and log prefixes

name: "deepseek-r1-benchmark"

model

Model and container configuration.

model:
  path: "deepseek-r1"       # Alias from srtslurm.yaml or full path
  container: "latest"       # Container alias from srtslurm.yaml
  precision: "fp8"          # fp8, fp4, bf16, etc.

Field	Type	Required	Description
`path`	string	Yes	Model path alias (from `srtslurm.yaml`) or absolute path
`container`	string	Yes	Container alias (from `srtslurm.yaml`) or `.sqsh` path
`precision`	string	Yes	Model precision (informational: fp4, fp8, fp16, bf16)

resources

GPU allocation and worker topology.

Disaggregated Mode (prefill + decode)

resources:
  gpu_type: "gb200"
  gpus_per_node: 4          # GPUs per node (default: from srtslurm.yaml)

  prefill_nodes: 2          # Nodes for prefill workers
  prefill_workers: 4        # Number of prefill workers

  decode_nodes: 4           # Nodes for decode workers
  decode_workers: 8         # Number of decode workers

Aggregated Mode (single worker type)

resources:
  gpu_type: "h100"
  gpus_per_node: 8
  agg_nodes: 2              # Nodes for aggregated workers
  agg_workers: 4            # Number of aggregated workers

Field	Type	Default	Description
`gpu_type`	string	-	GPU type: "gb200", "gb300", or "h100"
`gpus_per_node`	int	4	GPUs per node
`prefill_nodes`	int	null	Nodes dedicated to prefill
`decode_nodes`	int	null	Nodes dedicated to decode
`prefill_workers`	int	null	Number of prefill workers
`decode_workers`	int	null	Number of decode workers
`agg_nodes`	int	null	Nodes for aggregated mode
`agg_workers`	int	null	Number of aggregated workers
`gpus_per_prefill`	int	computed	Explicit GPUs per prefill worker
`gpus_per_decode`	int	computed	Explicit GPUs per decode worker
`gpus_per_agg`	int	computed	Explicit GPUs per aggregated worker

Notes:

Set decode_nodes: 0 to have decode workers share nodes with prefill workers.
Either use disaggregated mode (prefill_nodes/decode_nodes) OR aggregated mode (agg_nodes), not both.
GPUs per worker are computed automatically: (nodes * gpus_per_node) / workers
Use gpus_per_prefill, gpus_per_decode, gpus_per_agg to explicitly override the computed values

Computed Properties

The ResourceConfig provides several computed properties:

is_disaggregated: True if using prefill/decode mode
total_nodes: Total nodes allocated (prefill + decode or agg)
num_prefill, num_decode, num_agg: Worker counts for each role
gpus_per_prefill, gpus_per_decode, gpus_per_agg: GPUs allocated per worker
prefill_gpus, decode_gpus: Total GPUs for each role

slurm

SLURM job settings.

slurm:
  time_limit: "04:00:00"    # Job time limit
  account: "my-account"     # SLURM account (overrides srtslurm.yaml)
  partition: "batch"        # SLURM partition (overrides srtslurm.yaml)

Field	Type	Default	Description
`time_limit`	string	from srtslurm.yaml	Job time limit (HH:MM:SS)
`account`	string	from srtslurm.yaml	SLURM account
`partition`	string	from srtslurm.yaml	SLURM partition

frontend

Frontend/router configuration.

frontend:
  # Frontend type: "dynamo" (default) or "sglang"
  type: dynamo

  # Scaling
  enable_multiple_frontends: true     # Enable nginx + multiple routers
  num_additional_frontends: 9         # Additional routers (total = 1 + this)

  # CLI args passed to the frontend/router
  args:
    router-mode: "kv"                 # dynamo: router-mode
    policy: "cache_aware"             # sglang: policy
    no-kv-events: true                # boolean flags

  # Environment variables for frontend processes
  env:
    MY_VAR: "value"

Field	Type	Default	Description
`type`	str	dynamo	Frontend type: "dynamo" or "sglang"
`enable_multiple_frontends`	bool	true	Scale with nginx + multiple routers
`num_additional_frontends`	int	9	Additional routers beyond master
`nginx_container`	str	nginx:1.27.4	Custom nginx container image
`args`	dict	null	CLI args for the frontend
`env`	dict	null	Env vars for frontend processes

See SGLang Router for detailed architecture.

backend

Worker configuration and SGLang settings.

backend:
  type: sglang                        # Backend type (currently only sglang)

  # Per-mode environment variables
  prefill_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
  decode_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
  aggregated_environment: {}

  # SGLang CLI config per mode
  sglang_config:
    prefill:
      tensor-parallel-size: 4
      mem-fraction-static: 0.84
      kv-cache-dtype: "fp8_e4m3"
      disaggregation-mode: "prefill"
      # ... any sglang CLI flag
    decode:
      tensor-parallel-size: 8
      mem-fraction-static: 0.83
      data-parallel-size: 8
      enable-dp-attention: true
    aggregated:
      # ... for aggregated mode

  # KV events (for kv-aware routing)
  kv_events_config:
    prefill: true                     # Enable for prefill workers
    decode: true                      # Enable for decode workers

Field	Type	Default	Description
`type`	string	sglang	Backend type: "sglang" or "trtllm"
`gpu_type`	string	null	GPU type override
`prefill_environment`	dict	{}	Environment variables for prefill
`decode_environment`	dict	{}	Environment variables for decode
`aggregated_environment`	dict	{}	Environment variables for aggregated
`sglang_config`	object	null	SGLang CLI configuration per mode
`kv_events_config`	bool/dict	null	KV events configuration

sglang_config

Per-mode SGLang server configuration. Any SGLang CLI flag can be specified (use kebab-case or snake_case):

Common Flags	Type	Description
`tensor-parallel-size`	int	Tensor parallelism degree
`data-parallel-size`	int	Data parallelism degree
`expert-parallel-size`	int	Expert parallelism (MoE models)
`mem-fraction-static`	float	GPU memory fraction (0.0-1.0)
`kv-cache-dtype`	string	KV cache precision (fp8_e4m3, etc.)
`context-length`	int	Max context length
`chunked-prefill-size`	int	Chunked prefill batch size
`enable-dp-attention`	bool	Enable DP attention
`disaggregation-mode`	string	"prefill" or "decode"
`disaggregation-transfer-backend`	string	Transfer backend ("nixl" or other)
`served-model-name`	string	Model name for API
`grpc-mode`	bool	Enable gRPC mode

kv_events_config

Note: KV events is a Dynamo frontend feature for kv-aware routing. It allows workers to publish cache/scheduling information over ZMQ for the Dynamo router to make intelligent routing decisions.

Enables --kv-events-config for workers with auto-allocated ZMQ ports.

# Enable with defaults
kv_events_config: true         # prefill+decode with publisher=zmq, topic=kv-events

# Per-mode control
kv_events_config:
  prefill: true
  decode: true
  aggregated: true              # Enable for aggregated workers

# Custom settings
kv_events_config:
  prefill:
    publisher: "zmq"
    topic: "prefill-events"
  decode:
    topic: "decode-events"     # publisher defaults to "zmq"
  aggregated: true             # Enable for aggregated mode

Each worker leader gets a globally unique port starting at 5550:

Worker	Port
prefill_0	5550
prefill_1	5551
decode_0	5552
decode_1	5553

TRTLLM Backend

When using type: trtllm, the backend uses TRTLLM with MPI-style launching:

backend:
  type: trtllm

  # Per-mode environment variables
  prefill_environment:
    CUDA_LAUNCH_BLOCKING: "1"
  decode_environment:
    CUDA_LAUNCH_BLOCKING: "1"

  # TRTLLM CLI config per mode
  trtllm_config:
    prefill:
      mem-fraction-static: 0.8
      chunked-prefill-size: 8192
    decode:
      mem-fraction-static: 0.9

Field	Type	Default	Description
`type`	string	-	Must be "trtllm"
`prefill_environment`	dict	{}	Environment variables for prefill
`decode_environment`	dict	{}	Environment variables for decode
`trtllm_config`	object	null	TRTLLM CLI configuration per mode

Key differences from SGLang backend:

No aggregated mode support (prefill/decode only)
Uses MPI-style launching (one srun per endpoint with all nodes)
Uses trtllm-llmapi-launch for distributed launching
Automatically sets TRTLLM_EPLB_SHM_NAME with unique UUID per endpoint

benchmark

Benchmark configuration. The type field determines which benchmark runner is used and what additional fields are available.

Available Benchmark Types

Type	Description
`manual`	No benchmark (default), manual testing mode
`sa-bench`	Throughput/latency serving benchmark
`sglang-bench`	SGLang bench_serving benchmark
`mmlu`	MMLU accuracy evaluation
`longbenchv2`	Long-context evaluation benchmark
`router`	Router performance with prefix caching
`mooncake-router`	KV-aware routing with Mooncake trace

manual

No benchmark is run. Use for manual testing and debugging.

benchmark:
  type: "manual"

sa-bench (Serving Accuracy)

Throughput and latency benchmark at various concurrency levels.

benchmark:
  type: "sa-bench"
  isl: 1024                          # Required: Input sequence length
  osl: 1024                          # Required: Output sequence length
  concurrencies: [256, 512]          # Required: Concurrency levels to test
  req_rate: "inf"                    # Optional: Request rate (default: "inf")

Field	Type	Required	Default	Description
`isl`	int	Yes	-	Input sequence length
`osl`	int	Yes	-	Output sequence length
`concurrencies`	list/string	Yes	-	Concurrency levels (list or "NxM" format)
`req_rate`	string/int	No	"inf"	Request rate

Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".

sglang-bench

SGLang bench_serving benchmark at various concurrency levels.

benchmark:
  type: "sglang-bench"
  isl: 1024                          # Required: Input sequence length
  osl: 1024                          # Required: Output sequence length
  concurrencies: [256, 512]          # Required: Concurrency levels to test
  req_rate: "inf"                    # Optional: Request rate (default: "inf")

Field	Type	Required	Default	Description
`isl`	int	Yes	-	Input sequence length
`osl`	int	Yes	-	Output sequence length
`concurrencies`	list/string	Yes	-	Concurrency levels (list or "NxM" format)
`req_rate`	string/int	No	"inf"	Request rate

Concurrencies format: Can be a list [128, 256, 512] or x-separated string "128x256x512".

mmlu

MMLU accuracy evaluation using sglang.test.run_eval.

benchmark:
  type: "mmlu"
  num_examples: 200                  # Optional: Number of examples
  max_tokens: 2048                   # Optional: Max tokens per response
  repeat: 8                          # Optional: Number of repeats
  num_threads: 512                   # Optional: Concurrent threads

Field	Type	Required	Default	Description
`num_examples`	int	No	200	Number of examples to run
`max_tokens`	int	No	2048	Max tokens per response
`repeat`	int	No	8	Number of repeats
`num_threads`	int	No	512	Concurrent threads

longbenchv2

Long-context evaluation benchmark.

benchmark:
  type: "longbenchv2"
  max_context_length: 128000         # Optional: Max context length
  num_threads: 16                    # Optional: Concurrent threads
  max_tokens: 16384                  # Optional: Max tokens
  num_examples: null                 # Optional: Number of examples (all if null)
  categories:                        # Optional: Task categories
    - "multi_doc_qa"
    - "single_doc_qa"

Field	Type	Required	Default	Description
`max_context_length`	int	No	128000	Max context length
`num_threads`	int	No	16	Concurrent threads
`max_tokens`	int	No	16384	Max tokens
`num_examples`	int	No	all	Number of examples
`categories`	list[str]	No	all	Task categories to run

router

Router performance benchmark with prefix caching. Requires frontend.type: sglang.

benchmark:
  type: "router"
  isl: 14000                         # Optional: Input sequence length
  osl: 200                           # Optional: Output sequence length
  num_requests: 200                  # Optional: Number of requests
  concurrency: 20                    # Optional: Concurrency level
  prefix_ratios: [0.1, 0.3, 0.5, 0.7, 0.9]  # Optional: Prefix ratios to test

Field	Type	Required	Default	Description
`isl`	int	No	14000	Input sequence length
`osl`	int	No	200	Output sequence length
`num_requests`	int	No	200	Number of requests
`concurrency`	int	No	20	Concurrency level
`prefix_ratios`	list/string	No	"0.1 0.3 0.5 0.7 0.9"	Prefix ratios to test

mooncake-router

KV-aware routing benchmark using Mooncake conversation trace.

benchmark:
  type: "mooncake-router"
  mooncake_workload: "conversation"  # Optional: Trace type
  ttft_threshold_ms: 2000            # Optional: Goodput TTFT threshold
  itl_threshold_ms: 25               # Optional: Goodput ITL threshold

Field	Type	Required	Default	Description
`mooncake_workload`	string	No	"conversation"	Trace type (see options below)
`ttft_threshold_ms`	int	No	2000	Goodput TTFT threshold in ms
`itl_threshold_ms`	int	No	25	Goodput ITL threshold in ms

Workload options: "mooncake", "conversation", "synthetic", "toolagent"

Dataset characteristics (conversation trace):

12,031 requests over ~59 minutes (3.4 req/s)
Avg input: 12,035 tokens, Avg output: 343 tokens
36.64% cache efficiency potential

dynamo

Dynamo installation configuration.

dynamo:
  version: "0.8.0"            # Install from PyPI
  # OR
  hash: "abc123"              # Install from git commit
  # OR
  top_of_tree: true           # Install from main branch

Field	Type	Default	Description
`install`	bool	true	Whether to install dynamo (set false if pre-installed)
`version`	string	"0.8.0"	PyPI version
`hash`	string	null	Git commit hash (source install)
`top_of_tree`	bool	false	Install from main branch

Notes:

Set install: false if your container already has dynamo pre-installed.
Only one of version, hash, or top_of_tree should be specified.
hash and top_of_tree are mutually exclusive.
When hash or top_of_tree is set, version is automatically cleared.
Source installs (hash or top_of_tree) clone the repo and build with maturin.

profiling

Profiling configuration for nsys or torch profiler.

profiling:
  type: "nsys"                       # "none", "nsys", or "torch"

  # Phase-specific profiling step configs
  prefill:
    start_step: 10                   # Step to start profiling
    stop_step: 20                    # Step to stop profiling
  decode:
    start_step: 10
    stop_step: 20
  # OR for aggregated mode:
  aggregated:
    start_step: 10
    stop_step: 20

Field	Type	Required	Default	Description
`type`	string	No	"none"	Profiling type: "none", "nsys", "torch"
`prefill`	object	Disaggregated	null	Prefill phase config
`decode`	object	Disaggregated	null	Decode phase config
`aggregated`	object	Aggregated	null	Aggregated phase config

ProfilingPhaseConfig

Each phase config has:

Field	Type	Required	Default	Description
`start_step`	int	No	null	Step to start profiling
`stop_step`	int	No	null	Step to stop profiling

Profiling Modes

nsys: NVIDIA Nsight Systems profiling. Wraps worker command with nsys profile.
torch: PyTorch profiler. Sets SGLANG_TORCH_PROFILER_DIR environment variable.

Validation Rules

Disaggregated mode requires both prefill and decode phase configs when profiling is enabled.
Aggregated mode requires aggregated phase config when profiling is enabled.

Example: Torch Profiling (Disaggregated)

resources:
  gpu_type: "h100"
  prefill_nodes: 1
  prefill_workers: 1
  decode_nodes: 1
  decode_workers: 1

profiling:
  type: "torch"
  prefill:
    start_step: 5
    stop_step: 15
  decode:
    start_step: 5
    stop_step: 15

Example: Nsys Profiling (Aggregated)

resources:
  gpu_type: "h100"
  agg_nodes: 1
  agg_workers: 1

profiling:
  type: "nsys"
  aggregated:
    start_step: 10
    stop_step: 25

output

Output configuration with formattable paths.

output:
  log_dir: "./outputs/{job_id}/logs"

Field	Type	Default	Description
`log_dir`	FormattablePath	"./outputs/{job_id}/logs"	Directory for log files

The log_dir supports FormattablePath templating. See FormattablePath Template System.

health_check

Health check configuration for worker readiness.

health_check:
  max_attempts: 180
  interval_seconds: 10

Field	Type	Default	Description
`max_attempts`	int	180	Maximum health check attempts (180 = 30 minutes)
`interval_seconds`	int	10	Seconds between health check attempts

Notes:

Default of 180 attempts at 10 second intervals = 30 minutes total wait time.
Large models (e.g., 70B+ parameters) may require the full 30 minutes to load.
Reduce max_attempts for smaller models or faster testing.

infra

Infrastructure configuration for etcd/nats placement.

infra:
  etcd_nats_dedicated_node: true

Field	Type	Default	Description
`etcd_nats_dedicated_node`	bool	false	Reserve first node for infrastructure services

Notes:

When etcd_nats_dedicated_node: true, the first allocated node is reserved exclusively for etcd and nats services.
This can improve stability for large-scale deployments by isolating infrastructure services.
The reserved node is not used for worker processes.

sweep

Parameter sweep configuration for running multiple benchmark variations.

sweep:
  mode: "zip"                        # "zip" or "grid"
  parameters:
    isl: [512, 1024, 2048]
    osl: [128, 256, 512]

Field	Type	Default	Description
`mode`	string	"zip"	Sweep mode: "zip" or "grid"
`parameters`	dict	{}	Parameter name to list of values mapping

Sweep Modes

zip: Pairs up parameters at matching indices. Parameters must have equal lengths.
- Example: isl=[512, 1024], osl=[128, 256] produces 2 combinations:
  - {isl: 512, osl: 128}
  - {isl: 1024, osl: 256}
grid: Cartesian product of all parameter values.
- Example: isl=[512, 1024], osl=[128, 256] produces 4 combinations:
  - {isl: 512, osl: 128}
  - {isl: 512, osl: 256}
  - {isl: 1024, osl: 128}
  - {isl: 1024, osl: 256}

Using Sweep Parameters

Reference sweep parameters in your config using {placeholder} syntax:

benchmark:
  type: "sa-bench"
  isl: "{isl}"                       # Replaced by sweep value
  osl: "{osl}"                       # Replaced by sweep value
  concurrencies: [128, 256]

sweep:
  mode: "grid"
  parameters:
    isl: [512, 1024, 2048, 4096]
    osl: [128, 256, 512]

Config Overrides

Config overrides let you define a base config plus multiple variants in a single YAML file. Each variant deep-merges a small set of changes onto the base, and is submitted as an independent SLURM job. This eliminates the need to duplicate entire config files when testing different parameter combinations.

YAML Structure

base:
  name: "my-benchmark"
  resources:
    decode_nodes: 8
  backend:
    sglang_config:
      decode:
        tp-size: 32
  benchmark:
    concurrencies: [8192, 10240]

override_tp64:
  backend:
    sglang_config:
      decode:
        tp-size: 64

override_small:
  resources:
    decode_nodes: 4
  benchmark:
    concurrencies: [4096]

Key	Description
`base`	Required. A complete, valid config (same structure as a normal recipe).
`override_<suffix>`	Optional. Partial config merged onto base. `<suffix>` is appended to the job name.

Naming

Override job names are auto-generated: {base.name}_{suffix}.

The example above produces three jobs: my-benchmark, my-benchmark_tp64, and my-benchmark_small.

Deep Merge Semantics

Type	Behavior	Example
Scalar (str/int/bool)	Override replaces base	`tp-size: 32` → `tp-size: 64`
Dict	Recursive merge — only specified keys change	Override `sglang_config.decode.tp-size: 64` leaves other decode keys untouched
List	Full replacement (no append)	`concurrencies: [4096]` replaces `[8192, 10240]`
New key	Added to base	Override adds fields base doesn't have
`null` value	Deletes the key from base	`extra_mount: null` removes it

Combining with Sweeps

Overrides and sweeps can coexist in the same file. Override expansion happens first, then each variant with a sweep: section is expanded via Cartesian product.

base:
  name: "combined"
  sweep:
    chunked_prefill_size: [4096, 8192]
  backend:
    sglang_config:
      prefill:
        chunked-prefill-size: "{chunked_prefill_size}"

override_big:
  resources:
    decode_nodes: 16

This produces 4 jobs: base × 2 sweep + override_big × 2 sweep.

Backward Compatibility

Files without a base top-level key are treated as normal configs — no behavior change.

FormattablePath Template System

FormattablePath is a powerful templating system for paths that supports runtime placeholders and environment variable expansion.

How It Works

FormattablePath ensures that configuration values with placeholders are always explicitly formatted before use, preventing accidental use of unformatted templates.

# Example usage in config
output:
  log_dir: "$HOME/logs/{job_id}/{run_name}"

container_mounts:
  "$HOME/data": "/data"
  "$HOME/logs/{job_id}": "/logs"

Available Placeholders

Placeholder	Type	Description	Example
`{job_id}`	string	SLURM job ID	"12345"
`{run_name}`	string	Job name + job ID	"my-benchmark_12345"
`{head_node_ip}`	string	IP address of head node	"10.0.0.1"
`{log_dir}`	string	Resolved log directory path	"/home/user/outputs/12345/logs"
`{model_path}`	string	Resolved model path	"/models/deepseek-r1"
`{container_image}`	string	Resolved container image path	"/containers/sglang.sqsh"
`{gpus_per_node}`	int	GPUs per node	8

Environment Variable Expansion

FormattablePath also expands environment variables using $VAR or ${VAR} syntax:

output:
  log_dir: "$HOME/outputs/{job_id}/logs"
  # Expands to: /home/username/outputs/12345/logs

Common environment variables:

$HOME - User home directory
$USER - Username
$SLURM_JOB_ID - SLURM job ID (also available as {job_id})

Extra Placeholders

Some contexts support additional placeholders:

Placeholder	Context	Description
`{nginx_url}`	Frontend config	Nginx URL for load balancing
`{frontend_url}`	Frontend config	Frontend/router URL
`{index}`	Worker config	Worker index
`{host}`	Worker config	Worker host
`{port}`	Worker config	Worker port

Examples

# Log directory with job ID
output:
  log_dir: "./outputs/{job_id}/logs"

# Mount user data into container
container_mounts:
  "$HOME/datasets": "/datasets"
  "./outputs/{job_id}": "/outputs"

# Custom paths with environment variables
extra_mount:
  - "$SCRATCH/cache:/cache"
  - "${DATA_DIR}/models:/models:ro"

container_mounts

Custom container mount mappings with FormattablePath support.

container_mounts:
  "$HOME/datasets": "/datasets"
  "$HOME/outputs/{job_id}": "/outputs"
  "/shared/cache": "/cache"

Key (Host Path)	Value (Container Path)	Description
FormattablePath	FormattablePath	Host path -> Container mount path

Both keys and values support FormattablePath templating with placeholders and environment variables.

Default Mounts

The following mounts are always added automatically:

Host Path	Container Path	Description
Model path	`/model`	Resolved model directory
Log directory	`/logs`	Log output directory
`configs/` directory	`/configs`	NATS, etcd binaries
Benchmark scripts	`/srtctl-benchmarks`	Bundled benchmark scripts

Cluster-Level Mounts

You can also define cluster-wide mounts in srtslurm.yaml using the default_mounts field. These are applied to all jobs on the cluster, after the built-in defaults but before job-level mounts.

# In srtslurm.yaml
default_mounts:
  "/cluster/special/libs": "/opt/libs"
  "$SCRATCH": "/scratch"

Environment variables (e.g., $SCRATCH, $HOME) are expanded. This is useful for mounting cluster-specific paths that are required by certain images without adding them to every job config.

Mount Priority

Mounts have the following priority (highest to lowest):

Job-level container_mounts - FormattablePath dict (highest priority)
Job-level extra_mount - simple host:container strings
Cluster-level - default_mounts from srtslurm.yaml
Built-in defaults - model, logs, configs, benchmark scripts (lowest priority)

Job-level mounts always take precedence over cluster-level and built-in defaults.

environment

Global environment variables for all worker processes.

environment:
  MY_VAR: "value"
  CUDA_LAUNCH_BLOCKING: "1"
  NCCL_DEBUG: "INFO"

Key	Value	Description
string	string	Environment variable name=value

Per-Worker Template Variables

Environment variable values support per-worker templating with these placeholders:

Placeholder	Description	Example
`{node}`	Hostname of the node where the worker runs	`"gpu-01"`
`{node_id}`	Numeric index of the node in worker list (0-based)	`0`, `1`, `2`

Note: For per-worker-mode environment variables, use backend.prefill_environment, backend.decode_environment, or backend.aggregated_environment.

extra_mount

Additional container mounts as a list of mount specifications.

extra_mount:
  - "/local/path:/container/path"
  - "/data:/data:ro"
  - "$HOME/cache:/cache"

Format	Description
`host_path:container_path`	Read-write mount
`host_path:container_path:ro`	Read-only mount

Note: Unlike container_mounts, extra_mount uses simple string format, not FormattablePath. Environment variables are still expanded.

sbatch_directives

Additional SLURM sbatch directives.

sbatch_directives:
  mail-user: "user@example.com"
  mail-type: "END,FAIL"
  comment: "Benchmark run for paper"
  reservation: "my-reservation"
  constraint: "volta"
  exclusive: ""                       # Flag without value
  gres: "gpu:8"

Directive	Example Value	Description
`mail-user`	"user@example.com"	Email for notifications
`mail-type`	"END,FAIL"	When to send email (BEGIN,END,FAIL)
`comment`	"My job description"	Job comment for tracking
`reservation`	"my-reservation"	Use a specific reservation
`constraint`	"volta"	Node feature constraint
`exclusive`	""	Exclusive node access (flag)
`gres`	"gpu:8"	Generic resource specification
`dependency`	"afterok:12345"	Job dependency
`qos`	"high"	Quality of service

Format: Each directive becomes #SBATCH --{key}={value} or #SBATCH --{key} if value is empty.

srun_options

Additional srun options for worker processes.

srun_options:
  cpu-bind: "none"
  mpi: "pmix"
  overlap: ""                         # Flag without value
  ntasks-per-node: "1"

Option	Example Value	Description
`cpu-bind`	"none"	CPU binding mode (none, cores, sockets)
`mpi`	"pmix"	MPI implementation
`overlap`	""	Allow step overlap (flag)
`ntasks-per-node`	"1"	Tasks per node
`gpus-per-task`	"1"	GPUs per task
`mem`	"0"	Memory per node

Format: Each option becomes --{key}={value} or --{key} if value is empty.

setup_script

Run a custom script before dynamo install and worker startup.

setup_script: "install-custom-deps.sh"

Field	Type	Default	Description
`setup_script`	string	null	Script filename (must be in `configs/`)

Notes:

Script must be located in the configs/ directory.
Script runs inside the container before dynamo installation.
Useful for installing custom SGLang versions, additional dependencies, or patches.

Example setup script (configs/install-sglang-main.sh):

#!/bin/bash
pip install --quiet git+https://github.com/sgl-project/sglang.git

enable_config_dump

Enable dumping worker configuration to JSON for debugging.

enable_config_dump: true

Field	Type	Default	Description
`enable_config_dump`	bool	true	Dump config JSON for debugging

When enabled, worker startup commands include --dump-config-to which writes the resolved configuration to a JSON file.

Complete Examples

Disaggregated Mode with Dynamo

name: "deepseek-r1-disagg"

model:
  path: "deepseek-r1"
  container: "0.5.6"
  precision: "fp8"

resources:
  gpu_type: "gb200"
  gpus_per_node: 4
  prefill_nodes: 2
  prefill_workers: 4
  decode_nodes: 4
  decode_workers: 8

slurm:
  time_limit: "04:00:00"

frontend:
  type: dynamo
  enable_multiple_frontends: true
  args:
    router-mode: "kv"

backend:
  type: sglang

  kv_events_config:
    prefill: true

  prefill_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
  decode_environment:
    TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"

  sglang_config:
    prefill:
      tensor-parallel-size: 4
      mem-fraction-static: 0.84
      kv-cache-dtype: "fp8_e4m3"
    decode:
      tensor-parallel-size: 8
      mem-fraction-static: 0.83
      data-parallel-size: 8

benchmark:
  type: "sa-bench"
  isl: 1024
  osl: 1024
  concurrencies: [128, 256, 512]

health_check:
  max_attempts: 180
  interval_seconds: 10

dynamo:
  version: "0.8.0"

Aggregated Mode with SGLang Router

name: "qwen-agg-router"

model:
  path: "qwen3-32b"
  container: "latest"
  precision: "bf16"

resources:
  gpu_type: "h100"
  gpus_per_node: 8
  agg_nodes: 4
  agg_workers: 8

slurm:
  time_limit: "02:00:00"

frontend:
  type: sglang
  enable_multiple_frontends: false
  args:
    policy: "cache_aware"

backend:
  type: sglang
  sglang_config:
    aggregated:
      tensor-parallel-size: 4
      mem-fraction-static: 0.9
      enable-dp-attention: true

benchmark:
  type: "router"
  isl: 14000
  osl: 200
  num_requests: 200
  prefix_ratios: [0.1, 0.3, 0.5, 0.7, 0.9]

Profiling Example

name: "profile-decode"

model:
  path: "llama-70b"
  container: "latest"
  precision: "fp8"

resources:
  gpu_type: "h100"
  gpus_per_node: 8
  prefill_nodes: 1
  prefill_workers: 1
  decode_nodes: 1
  decode_workers: 1

slurm:
  time_limit: "01:00:00"

profiling:
  type: "torch"
  prefill:
    start_step: 5
    stop_step: 15
  decode:
    start_step: 5
    stop_step: 15

backend:
  type: sglang
  sglang_config:
    prefill:
      tensor-parallel-size: 8
    decode:
      tensor-parallel-size: 8

benchmark:
  type: "sa-bench"
  isl: 2048
  osl: 256
  concurrencies: "32x64"
  req_rate: "inf"

Parameter Sweep Example

name: "sweep-throughput"

model:
  path: "deepseek-r1"
  container: "latest"
  precision: "fp8"

resources:
  gpu_type: "gb200"
  gpus_per_node: 4
  prefill_nodes: 1
  prefill_workers: 2
  decode_nodes: 2
  decode_workers: 4

benchmark:
  type: "sa-bench"
  isl: "{isl}"
  osl: "{osl}"
  concurrencies: [64, 128, 256]

sweep:
  mode: "grid"
  parameters:
    isl: [512, 1024, 2048, 4096]
    osl: [128, 256, 512, 1024]

Config Override Example

base:
  name: "disagg-fp8-benchmark"

  model:
    path: "deepseek-r1"
    container: "latest"
    precision: "fp8"

  resources:
    gpu_type: "h100"
    gpus_per_node: 8
    prefill_nodes: 2
    prefill_workers: 2
    decode_nodes: 8
    decode_workers: 8

  backend:
    sglang_config:
      prefill:
        tp-size: 8
      decode:
        tp-size: 8

  benchmark:
    type: "sa-bench"
    isl: 1024
    osl: 8192
    concurrencies: [8192, 10240]

# Use TP=64 for both prefill and decode
override_tp64:
  backend:
    sglang_config:
      prefill:
        tp-size: 64
      decode:
        tp-size: 64

# Smaller cluster with fewer decode nodes
override_small:
  resources:
    decode_nodes: 4
    decode_workers: 4
  benchmark:
    concurrencies: [4096]

Custom Mounts and Setup

name: "custom-setup"

model:
  path: "$MODELS_DIR/my-model"
  container: "$CONTAINERS_DIR/custom.sqsh"
  precision: "fp8"

resources:
  gpu_type: "h100"
  gpus_per_node: 8
  agg_nodes: 2
  agg_workers: 4

setup_script: "install-custom-sglang.sh"

environment:
  CUSTOM_VAR: "value"
  NCCL_DEBUG: "INFO"

container_mounts:
  "$HOME/datasets": "/datasets"
  "$SCRATCH/cache": "/cache"

extra_mount:
  - "/shared/data:/data:ro"

sbatch_directives:
  mail-user: "user@example.com"
  mail-type: "END,FAIL"
  reservation: "gpu-cluster"

srun_options:
  cpu-bind: "none"

output:
  log_dir: "$HOME/experiments/{job_id}/logs"

health_check:
  max_attempts: 120
  interval_seconds: 15

FilesExpand file tree

config-reference.md

Latest commit

History

config-reference.md

File metadata and controls

Configuration Reference

Table of Contents

Overview

Cluster Config Discovery

Cluster Config Fields

name

model

resources

Disaggregated Mode (prefill + decode)

Aggregated Mode (single worker type)

Computed Properties

slurm

frontend

backend

sglang_config

kv_events_config

TRTLLM Backend

benchmark

Available Benchmark Types

manual

sa-bench (Serving Accuracy)

sglang-bench

mmlu

longbenchv2

router

mooncake-router

dynamo

profiling

ProfilingPhaseConfig

Profiling Modes

Validation Rules

Example: Torch Profiling (Disaggregated)

Example: Nsys Profiling (Aggregated)

output

health_check

infra

sweep

Sweep Modes

Using Sweep Parameters

Config Overrides

YAML Structure

Naming

Deep Merge Semantics

Combining with Sweeps

Backward Compatibility

FormattablePath Template System

How It Works

Available Placeholders

Environment Variable Expansion

Extra Placeholders

Examples

container_mounts

Default Mounts

Cluster-Level Mounts

Mount Priority

environment

Per-Worker Template Variables

extra_mount

sbatch_directives

srun_options

setup_script

enable_config_dump

Complete Examples

Disaggregated Mode with Dynamo

Aggregated Mode with SGLang Router

Profiling Example

Parameter Sweep Example

Config Override Example

Custom Mounts and Setup