Context
New models (e.g., lerobot/smolVLA) often utilize PyTorch operators or control flow patterns that are not yet supported by the torch-mlir or iree-turbine export paths. Manually debugging these export failures is inefficient. We need a tool that attempts to compile a model, captures the specific operator failures (e.g., aten::fft, aten::complex), and generates a structured "Gap Report" to guide the implementation of missing shims or MLIR lowerings.
Objective
Develop a Model Export Harness (src/tools/model_audit/) that automates the ingestion, tracing, and lowering analysis of arbitrary PyTorch models, using smolVLA as the primary integration test case.
Scope of Work
- Model Harness (
harness.py):
- Integration with
transformers / lerobot to load models and automatically generate valid dummy inputs (shapes/types) for tracing.
- Support for
torch.export (AOT) and torch_mlir.compile (JIT) paths.
- Failure Classifier (
analyzer.py):
- Parses
torch-mlir diagnostic logs to identify the root cause of export failure.
- Classifies errors into:
MISSING_OP, TYPE_MISMATCH, DYNAMIC_SHAPE_ERROR.
- Gap Reporter (
reporter.py):
- Outputs a
shim_requirements.yaml listing the specific aten::* ops that need to be decomposed or registered in the compiler backend.
Acceptance Criteria (Definition of Done)
We define success by the tool's ability to identify gaps in smolVLA and other reference models:
Test 1: Auto-Input Generation
- Input:
lerobot/smolVLA (or a mock VLA model class).
- Condition: Run
harness.py.
- Success: The tool successfully infers input shapes (image + text tokens) and executes the
model.forward() pass in eager mode without crashing.
Test 2: Missing Operator Detection
- Input: A mock model containing an unsupported op (e.g.,
aten::complex or a specific unsupported FFT).
- Condition: Run the export harness.
- Success: The tool catches the crash/exception and outputs a JSON report identifying the specific missing op name.
Test 3: Shim Spec Generation
- Input: The full
smolVLA model (assuming current compiler stack fails on it).
- Condition: Run the full audit suite.
- Success: Generates
artifacts/smolVLA_gaps.yaml containing:
missing_ops: List of unsupported ATen operators.
locations: Stack traces pointing to where these ops are used in the model code.
Test 4: Successful Lowering (Regression)
- Input: A simple ResNet18 (known supported).
- Condition: Run the harness.
- Success: Returns
status: SUPPORTED and saves the valid .mlir file to artifacts.
Context
New models (e.g.,
lerobot/smolVLA) often utilize PyTorch operators or control flow patterns that are not yet supported by thetorch-mliroriree-turbineexport paths. Manually debugging these export failures is inefficient. We need a tool that attempts to compile a model, captures the specific operator failures (e.g.,aten::fft,aten::complex), and generates a structured "Gap Report" to guide the implementation of missing shims or MLIR lowerings.Objective
Develop a Model Export Harness (
src/tools/model_audit/) that automates the ingestion, tracing, and lowering analysis of arbitrary PyTorch models, usingsmolVLAas the primary integration test case.Scope of Work
harness.py):transformers/lerobotto load models and automatically generate valid dummy inputs (shapes/types) for tracing.torch.export(AOT) andtorch_mlir.compile(JIT) paths.analyzer.py):torch-mlirdiagnostic logs to identify the root cause of export failure.MISSING_OP,TYPE_MISMATCH,DYNAMIC_SHAPE_ERROR.reporter.py):shim_requirements.yamllisting the specificaten::*ops that need to be decomposed or registered in the compiler backend.Acceptance Criteria (Definition of Done)
We define success by the tool's ability to identify gaps in
smolVLAand other reference models:Test 1: Auto-Input Generation
lerobot/smolVLA(or a mock VLA model class).harness.py.model.forward()pass in eager mode without crashing.Test 2: Missing Operator Detection
aten::complexor a specific unsupported FFT).Test 3: Shim Spec Generation
smolVLAmodel (assuming current compiler stack fails on it).artifacts/smolVLA_gaps.yamlcontaining:missing_ops: List of unsupported ATen operators.locations: Stack traces pointing to where these ops are used in the model code.Test 4: Successful Lowering (Regression)
status: SUPPORTEDand saves the valid.mlirfile to artifacts.