Skip to content

Support workload group ONNXPY #329

@ssapreTT

Description

@ssapreTT

Support workload group ONNXPY

Title

Support workload group ONNXPY (Python-generated ONNX)

Description

Summary

Add a new workload group ONNXPY that behaves like ONNX except the ONNX file is produced at run time by running a Python script. The script is invoked with a temporary output path; on success, the workload is executed like ONNX using that file; on failure, the entire workload is skipped. The temporary file is removed after all instances of the workload have been processed.

Workload config YAML (differences from ONNX)

  • No path in instances
    Instance entries do not include a path attribute (unlike ONNX, where each instance has e.g. path: 'onnx/resnet50.onnx').
  • **module without @ qualifier**
    The module field is a single script path (e.g. onnx/generate_resnet.py), not a qualified name like ResNet@basicresnet.py. It identifies the Python script that will be run to generate the ONNX file.

Otherwise the block looks like ONNX: api: ONNXPY, name, basedir, instances (and optionally params if desired).

Example (illustrative):

- api: ONNXPY
  name: RESNET50
  basedir: workloads
  module: onnx/generate_resnet50.py
  instances:
    rn50_b1_hd : { img_height: 1024, img_width: 1024, bs: 1 }

Execution flow

  1. Pre-step (once per ONNXPY workload, before running its instances)
  • Generate a temporary file name (e.g. via tempfile.NamedTemporaryFile(delete=False) or equivalent).
  • Run the Python script given by module (resolved relative to basedir), passing the temp path via --output <path>.
  • If the script exits with a non-zero code: skip the entire workload (do not run any instance of this workload).
  • If the script succeeds: record the temp path for this (wlgroup, wlname) and use it as the ONNX path for all instances of this workload.
  1. Per-instance handling (same as ONNX)
  • For each instance of an ONNXPY workload that was not skipped, treat it like ONNX: use the generated temp path as the model path and call the same ONNX pipeline (e.g. onnx2graph(wli, wpath) as at polaris.py line 132).
  • Instance config (e.g. bs, img_height, img_width) is used as today for ONNX (e.g. in wlcfg and downstream stats).
  1. Cleanup
  • After successfully processing all instances of a given ONNXPY workload, remove the temporary file.
  • If the workload was skipped (script failed), there is no temp file to remove.

Code / config touchpoints

  • Workload spec / validation
    • Add an ONNXPY workload model (e.g. PYDWorkloadONNXPYModelValidator in ttsim/config/validators.py) with api: Literal['ONNXPY'], no path in instance config, and module as a plain string (no @).
    • Extend AnyWorkload (and simconfig’s workload class table) so ONNXPY is a recognized API and get_instances() for ONNXPY returns instance configs without a path key (path will be supplied at run time).
  • Polaris driver
    • polaris.py:
      • Pre-phase: Before the main experiment loop (e.g. before or at the start of execute_wl_on_dev), for each unique (wlgroup, wlname) where wlgroup == 'ONNXPY': run the script with --output <tempfile>, and build a map (wlgroup, wlname) -> temp_path on success, or mark that workload as skipped on non-zero exit. Optionally filter the workload list so skipped ONNXPY workloads are not iterated.
      • Path resolution: In the loop where wlpath = wlins_cfg['path'] is used (polaris.py around line 321), for ONNXPY use the precomputed temp path from the map instead of reading path from wlins_cfg.
      • Graph construction: In get_wlgraph (polaris.py around line 130), add a branch for wlg == 'ONNXPY' that mirrors the ONNX branch: same onnx2graph(wli, wpath) and same perf/count handling.
      • Cleanup: After the main loop over experiments, for each ONNXPY workload that was run (has an entry in the temp-path map), remove the corresponding temporary file (e.g. os.remove or Path.unlink), and handle errors if the file was already removed.
  • Script invocation
    • Resolve module relative to basedir (and optionally set cwd to repo root or basedir when running the script; document the chosen behavior).
    • Invoke as: python <module_path> --output <temp_file_path>.
    • No other CLI args are specified in this issue; optional follow-up could add passing workload/instance params if needed.

Edge cases / notes

  • Dry run: For --dryrun, do not run the script; either skip ONNXPY entries or show them as “would run script and then ONNX” without creating temp files.
  • Filtering: If --filterwlg / --filterwl / --filterwli exclude some instances, cleanup should still remove the temp file once all in-scope instances of that ONNXPY workload have been processed.
  • Failure handling: Only script exit code is used to decide “skip workload”; any exception during script execution (e.g. missing interpreter) can be treated as failure and workload skipped.

Acceptance criteria

  • Config schema allows api: ONNXPY with module (no @) and instances without path.
  • Polaris runs the module script once per ONNXPY workload with --output <tempfile>; on non-zero exit, that workload’s instances are skipped.
  • Each instance of a successful ONNXPY workload is executed like ONNX (same graph and stats path as polaris.py line 130).
  • Temporary file is removed after all instances of that ONNXPY workload have been processed.
  • Dry run does not create or leave temp files.

Optional clarifications (for implementer or follow-up)

  • Working directory: When running the script, should cwd be the repo root, basedir, or the directory containing the script?
  • Extra script arguments: Should the script receive only --output, or also workload/instance params (e.g. from params or instance config)?
  • Params: Should ONNXPY support an optional top-level params block like ONNX/TTSIM for merging into instance config?

@ramamalladiTT
@lyen1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions