Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,14 @@ docker create --name aic aiconfigurator:latest && docker cp aic:/workspace/dist
```bash
aiconfigurator cli default --model QWEN3_32B --total_gpus 32 --system h200_sxm
aiconfigurator cli exp --yaml_path exp.yaml
gaiconfigurator cli generate --model_path QWEN3_32B --total_gpus 8 --system h200_sxm
aiconfigurator cli generate --model_path QWEN3_32B --total_gpus 32 --system h200_sxm
aiconfigurator cli support --model_path QWEN3_32B --system h200_sxm
```
- We have three modes: `default`, `exp`, and `generate`.
- We have four modes: `default`, `exp`, `generate`, and `check`.
- Use `default` to find the estimated best deployment by searching the configuration space.
- Use `exp` to run customized experiments defined in a YAML file.
- Use `generate` to quickly create a naive configuration without a parameter sweep.
- Use `check` to verify if AIC supports a model/hardware combination for agg and disagg modes.
- Use `--backend` to specify the inference backend: `trtllm` (default), `vllm`, or `sglang`.
- Use `exp`, pass in exp.yaml by `--yaml_path` to customize your experiments and even a heterogenous one.
- Use `--save_dir DIR` to generate framework configuration files for Dynamo.
Expand All @@ -88,7 +90,7 @@ Refer to [CLI User Guide](docs/cli_user_guide.md)
You can also use `aiconfigurator` programmatically in Python:

```python
from aiconfigurator.cli import cli_default, cli_exp, cli_generate
from aiconfigurator.cli import cli_default, cli_exp, cli_generate, cli_support

# 1. Run default agg vs disagg comparison
result = cli_default(model_path="Qwen/Qwen3-32B", total_gpus=32, system="h200_sxm")
Expand All @@ -100,6 +102,10 @@ result = cli_exp(yaml_path="my_experiments.yaml")
# 3. Generate a naive configuration
result = cli_generate(model_path="Qwen/Qwen3-32B", total_gpus=8, system="h200_sxm")
print(result["parallelism"]) # {'tp': 1, 'pp': 1, 'replicas': 8, 'gpus_used': 8}

# 4. Check support for a model/system combination
agg, disagg = cli_support(model_path="Qwen/Qwen3-32B", system="h200_sxm")
print(f"Agg supported: {agg}, Disagg supported: {disagg}")
```

An example here,
Expand Down
42 changes: 42 additions & 0 deletions docs/cli_user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,48 @@ print(result["parallelism"]) # {'tp': 1, 'pp': 1, 'replicas': 8, 'gpus_used': 8

> **Note:** This is a naive configuration without memory validation or performance optimization. For production deployments, use `aiconfigurator cli default` to run the full parameter sweep with SLA optimization.

### Support mode
This mode allows you to verify if AIConfigurator supports a specific model and hardware combination for both aggregated and disaggregated serving modes. Support is determined by a majority-vote of tests in the support matrix for models sharing the same architecture.

```bash
aiconfigurator cli support --model_path Qwen/Qwen3-32B --system h200_sxm
```

**Required arguments:**
- `--model_path`: HuggingFace model path (e.g., `Qwen/Qwen3-32B`) or local path containing `config.json`
- `--system`: System name (`h200_sxm`, `gb200_sxm`, `b200_sxm`, `h100_sxm`, `a100_sxm`, `l40s`)

**Optional arguments:**
- `--backend`: Filter by specific backend (`trtllm`, `vllm`, `sglang`). Defaults to `trtllm`.
- `--backend_version`: Filter by a specific backend version. Defaults to the latest version found in the support matrix for the given model/architecture/system/backend combination.

**Example output:**
```text
============================================================
AIC Support Check Results
============================================================
Model: Qwen/Qwen3-32B
System: h200_sxm
Backend: trtllm
Version: 0.18.0
------------------------------------------------------------
Aggregated Support: YES
Disaggregated Support: YES
============================================================
```

**Python API equivalent:**
```python
from aiconfigurator.cli import cli_support

agg_supported, disagg_supported = cli_support(
model_path="Qwen/Qwen3-32B",
system="h200_sxm",
backend="trtllm"
)
print(f"Agg: {agg_supported}, Disagg: {disagg_supported}")
```

### Default mode
This mode is triggered by
```bash
Expand Down
2 changes: 2 additions & 0 deletions src/aiconfigurator/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@
cli_default,
cli_exp,
cli_generate,
cli_support,
)

__all__ = [
"CLIResult",
"cli_default",
"cli_exp",
"cli_generate",
"cli_support",
]
42 changes: 40 additions & 2 deletions src/aiconfigurator/cli/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
"""
Python API for calling CLI workflows programmatically.

This module provides simple function interfaces to the CLI's "default", "exp", and
"generate" modes, making it easy to use from Python code without going through argparse.
This module provides simple function interfaces to the CLI's "default", "exp",
"generate", and "support" modes, making it easy to use from Python code without going through argparse.
"""

from __future__ import annotations
Expand All @@ -25,6 +25,43 @@
from aiconfigurator.cli.report_and_save import save_results
from aiconfigurator.sdk.task import TaskConfig


def cli_support(
model_path: str,
system: str,
*,
backend: str = "trtllm",
backend_version: str | None = None,
) -> tuple[bool, bool]:
"""
Check if AIC supports the model/hardware combo for (agg, disagg).
Support is determined by a majority vote of PASS status for the given
architecture, system, backend, and version in the support matrix.

This is the programmatic equivalent of:
aiconfigurator cli support --model_path ... --system ...

Args:
model_path: HuggingFace model path (e.g., 'Qwen/Qwen3-32B') or local path.
system: System name (GPU type), e.g., 'h200_sxm', 'b200_sxm'.
backend: Optional backend name to filter by ('trtllm', 'sglang', 'vllm').
backend_version: Optional backend database version.

Returns:
tuple[bool, bool]: (agg_supported, disagg_supported)
"""
from aiconfigurator.sdk.common import check_support
from aiconfigurator.sdk.utils import get_model_config_from_model_path

try:
model_info = get_model_config_from_model_path(model_path)
architecture = model_info[0]
except Exception:
architecture = None

return check_support(model_path, system, backend, backend_version, architecture=architecture)


logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -297,4 +334,5 @@ class _MockArgs:
"cli_default",
"cli_exp",
"cli_generate",
"cli_support",
]
99 changes: 99 additions & 0 deletions src/aiconfigurator/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,37 @@ def _add_generate_mode_arguments(parser):
)


def _add_support_mode_arguments(parser):
"""Add arguments for the support mode (support matrix check)."""
parser.add_argument(
"--model_path",
type=_validate_model_path,
required=True,
help="Model path: HuggingFace model path (e.g., 'Qwen/Qwen3-32B') or "
"local path to directory containing config.json.",
)
parser.add_argument(
"--system",
choices=common.SupportedSystems,
type=str,
required=True,
help="System name (GPU type).",
)
parser.add_argument(
"--backend",
choices=[backend.value for backend in common.BackendName],
type=str,
default="trtllm",
help="Backend name to filter by. Defaults to 'trtllm'.",
)
parser.add_argument(
"--backend_version",
type=str,
default=None,
help="Optional backend version to filter by.",
)


def configure_parser(parser):
common_cli_parser = _build_common_cli_parser()
subparsers = parser.add_subparsers(dest="mode", required=True)
Expand Down Expand Up @@ -204,6 +235,15 @@ def configure_parser(parser):
)
_add_generate_mode_arguments(generate_parser)

# Support mode - support matrix check
support_parser = subparsers.add_parser(
"support",
parents=[common_cli_parser],
help="Check if AIC supports the model/hardware combo for (agg, disagg).",
description="Verify support for a specific model and system combination using the support matrix.",
)
_add_support_mode_arguments(support_parser)


def _get_backend_data_path(system_name: str, backend_name: str, backend_version: str) -> str | None:
systems_dir = perf_database.get_system_config_path()
Expand Down Expand Up @@ -639,6 +679,60 @@ def _run_generate_mode(args):
print("=" * 60 + "\n")


def _run_support_mode(args):
"""Run the support mode to see if a model/hardware combo is supported."""
model = args.model_path
system = args.system
backend = args.backend
version = args.backend_version

# If no version specified, find the latest version in the support matrix
if not version:
matrix = common.get_support_matrix()
versions_for_combo = [row["Version"] for row in matrix if row["System"] == system and row["Backend"] == backend]
if versions_for_combo:
# Sort versions and take the latest (assumes semantic versioning or lexicographic order)
version = sorted(set(versions_for_combo), reverse=True)[0]

logger.info("Checking support for model=%s, system=%s, backend=%s, version=%s", model, system, backend, version)

# Resolve architecture for better check
try:
model_info = get_model_config_from_model_path(model)
architecture = model_info[0]
except Exception:
architecture = None

result = common.check_support(
model=model, system=system, backend=backend, version=version, architecture=architecture
)

print("\n" + "=" * 60)
print(" AIC Support Check Results")
print("=" * 60)
print(f" Model: {model}")
print(f" System: {system}")
print(f" Backend: {backend}")
print(f" Version: {version}")
print("-" * 60)
print(f" Aggregated Support: {'YES' if result.agg_supported else 'NO'}")
print(f" Disaggregated Support: {'YES' if result.disagg_supported else 'NO'}")

# Show explanation if support was inferred from architecture majority vote
if not result.exact_match and result.architecture:
print("-" * 60)
print(f" Note: Model '{model}' not found in support matrix.")
print(f" Support inferred from architecture '{result.architecture}' majority vote:")
if result.agg_total_count:
p, t = result.agg_pass_count, result.agg_total_count
print(f" Aggregated: {p}/{t} passed (>{t // 2} required)")
if result.disagg_total_count:
p, t = result.disagg_pass_count, result.disagg_total_count
print(f" Disaggregated: {p}/{t} passed (>{t // 2} required)")

print("=" * 60 + "\n")


def main(args):
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
Expand All @@ -652,6 +746,11 @@ def main(args):
_run_generate_mode(args)
return

# Handle support mode separately (no sweeping)
if args.mode == "support":
_run_support_mode(args)
return

if args.mode == "default":
task_configs = build_default_task_configs(
model_path=args.model_path,
Expand Down
Loading