The Solar package includes comprehensive tests that validate the entire 5-stage analysis pipeline and support kernelbench benchmark models. All test outputs use human-readable YAML without anchors or aliases.
Stage 1: PyTorch Graph Extraction
Input: model.py
Output: pytorch_graph.yaml
Stage 2: Einsum Conversion + Rank Renaming
Input: pytorch_graph.yaml
Output: einsum_graph.yaml
einsum_graph_renamed.yaml
einsum_graph.pdf (optional)
Stage 3: Hardware-Independent Analysis
Input: einsum_graph_renamed.yaml
Output: analysis.yaml
Stage 4: Performance Prediction
Input: analysis.yaml + arch config
Output: perf_<arch>.yaml
Stage 5: Timeloop Export (optional)
Input: einsum_graph_renamed.yaml
Output: timeloop_graph.yaml
tests/
├── conftest.py # Pytest fixtures and configuration
├── test_graph_processing.py # Stage 1: PyTorch graph extraction
├── test_einsum_analyzer.py # Stage 2: Einsum conversion
├── test_model_analyzer.py # Stages 3-4: Analysis + performance
├── test_llm_agent.py # LLM agent and node registry
├── test_standalone_bert.py # Full pipeline on BERT example
└── test_integration.py # End-to-end benchmark tests
# Run all tests (~4-5 minutes)
bash run_tests.sh
# Quick smoke tests (~1 minute)
bash run_tests.sh quick
# Run unit tests only (no integration)
bash run_tests.sh unit
# Run integration tests only
bash run_tests.sh integration
# Run example scripts
bash run_tests.sh examples# Stage 1: Graph extraction (pytorch_graph.yaml)
bash run_tests.sh graph
# Stage 2: Einsum conversion (einsum_graph.yaml, einsum_graph_renamed.yaml)
bash run_tests.sh einsum
# Stages 3-4: Analysis + performance (analysis.yaml, perf_*.yaml)
bash run_tests.sh model# LLM agent and node registry
bash run_tests.sh llm
# Standalone BERT example (full 5-stage pipeline)
bash run_tests.sh bert# Test kernelbench models
bash run_tests.sh kernelbench
# Verbose output
bash run_tests.sh all -v# Run all tests
python3 -m pytest tests/
# Run specific test file
python3 -m pytest tests/test_einsum_analyzer.py -v
# Run tests matching pattern
python3 -m pytest tests/ -k "kernelbench"
python3 -m pytest tests/ -k "Integration"
# With coverage
python3 -m pytest tests/ --cov=solar --cov-report=htmlTests PyTorch graph extraction to pytorch_graph.yaml:
- TorchviewProcessor: Core graph extraction using torchview
- PyTorchProcessor: Single-model processing with explicit paths
- BenchmarkProcessor: Batch processing for kernelbench
- RNN model handling with device fallback (meta → cpu)
- Parameter extraction (weights, biases, module args)
Key Tests:
test_process_graph: End-to-end graph generationtest_generate_torchview_graph: Torchview integrationtest_is_rnn_model: RNN detection and special handling
Tests einsum equation generation and conversion to einsum_graph.yaml and einsum_graph_renamed.yaml:
- Dynamic einsum generation: matmul, linear, conv (1D/2D/3D)
- Reduction operations: sum, mean, max, min, prod
- Element-wise operations: relu, sigmoid, add, mul
- Attention operations: scaled_dot_product_attention
- Rank renaming: BFS-based dimension label propagation
- Compute cost: MAC calculation for all operation types
- Memory cost: Element counting for orojenesis/fusion analysis
Key Tests:
test_matmul: Dynamic matmul einsum (1D-4D)test_conv2d: Convolution einsum with stride/paddingtest_torch_prod: Product reduction supporttest_full_model_analysis: Complete model conversion
Tests hardware-independent analysis (analysis.yaml) and performance prediction (perf_<arch>.yaml):
- EinsumGraphAnalyzer: Compute MACs, FLOPs, orojenesis_bytes, fused_bytes
- EinsumGraphPerfModel: SOL roofline predictions
- Architecture configs: H100_PCIe, A6000, H100_fp32
- LLM agent integration: Dynamic handler generation for unknown ops
- Node registry: Extensible operation handler system
Key Tests:
test_analyze_graph: Hardware-independent metricstest_predict_performance: Roofline modelingtest_unknown_node_handling: LLM agent fallback
Tests dynamic operation handler generation:
- Agent configuration and initialization
- Code generation for unknown operations
- Handler validation and safety checks
- Caching mechanisms for generated handlers
- Node type registry operations
Key Tests:
test_agent_initialization: Setup and configtest_generate_handler: Dynamic code generationtest_handler_caching: Cache persistence
Tests the complete 5-stage pipeline on a real model:
- Full pipeline: model.py → pytorch_graph.yaml → einsum_graph.yaml → einsum_graph_renamed.yaml → analysis.yaml
- Multi-head attention handling
- Feed-forward network processing
- Embedding layer support
Key Tests:
test_bert_full_pipeline: End-to-end BERT processing
End-to-end tests with benchmark suites:
- Kernelbench pipeline: Full directory processing
- Batch processing: Multiple kernels at once
Key Tests:
test_full_kernelbench_pipeline: Kernelbench end-to-end
Tests that all example scripts run successfully:
- DenseAttention: Full attention matrix computation
- SlidingWindowAttention: Local window attention
- RandomAttention: Random sparse attention
- BlockSparseAttention: Block-sparse attention patterns
- Attention: Multi-head self-attention
- BERT: Complete BERT-like model
- File Format:
{kernel_id}_{name}.py(e.g.,1_ResNet50.py) - Directory Structure:
kernelbench/level{N}/ - Output Structure:
kernelbench_outputs/level{N}/{kernel_id}/ - Node Types: PascalCase (e.g.,
Conv2d,Linear,ReLU)
- Automatic name normalization (PascalCase ↔ lowercase)
- Flexible ID parsing (numeric and string)
- Mixed naming convention support
- Unified analysis pipeline for both benchmark types
Each kernel output directory contains:
level{N}/{kernel_id}/
├── pytorch_graph.yaml # Stage 1 output
├── einsum_graph.yaml # Stage 2 output
├── einsum_graph_renamed.yaml # Stage 2 output (with BFS rank renaming)
├── einsum_graph.pdf # Stage 2 output (optional visualization)
├── analysis.yaml # Stage 3 output
├── perf_<arch>.yaml # Stage 4 output
└── timeloop_graph.yaml # Stage 5 output (optional)
All YAML files use NoAliasDumper for human readability (no &id001 references).
Tests create sample models dynamically following benchmark conventions:
# Kernelbench-style model
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 64, 7, stride=2)
self.fc = nn.Linear(64 * 112 * 112, 1000)
def forward(self, x):
x = self.conv1(x)
x = x.view(x.size(0), -1)
return self.fc(x)
def get_inputs():
"""Required function for Solar processing."""
return [torch.randn(1, 3, 224, 224)]Common fixtures are defined in conftest.py:
sample_node_data: Sample node informationsample_torchview_nodes: Sample graph nodeskernelbench_sample_path: Path to test kernelbench modeltmp_path: Pytest's built-in temporary directory
pytorch_graph.yaml (Stage 1):
model_name: BERT
layers:
Model.linear:
type: linear
node_class: FunctionNode
input_shapes:
- - 2
- 16
- 64
output_shapes:
- - 2
- 16
- 64
weight_nodes:
- weight
- bias
weight_shapes:
- - 64
- 64
- - 64
module_args:
in_features: 64
out_features: 64
connections:
inputs: []
outputs: []einsum_graph_renamed.yaml (Stage 2 - with BFS rank renaming):
model_name: BERT
layers:
start:
type: start
einsum_equation: ->ABC
is_real_einsum: false
is_einsum_supportable: false
shapes:
Output:
- 2
- 16
- 64
connections:
inputs: []
outputs:
- Model.linear
Model.linear:
type: linear
einsum_equation: ABC,DC->ABD
is_real_einsum: true
is_einsum_supportable: true
shapes:
Input:
- 2
- 16
- 64
Weight:
- 64
- 64
Output:
- 2
- 16
- 64
connections:
inputs:
- start
outputs: []analysis.yaml (Stage 3):
model_name: BERT
total:
macs: 131072
flops: 262144
orojenesis_bytes: 24640
fused_bytes: 16448
layers:
Model.linear:
macs: 131072
flops: 262144
orojenesis_bytes: 24640
fused_bytes: 16448perf_H100_PCIe.yaml (Stage 4):
arch: H100_PCIe
precision: fp32
total:
unfused_runtime_ms: 0.0012
fused_runtime_ms: 0.0008
layers:
Model.linear:
unfused_runtime_ms: 0.0012
fused_runtime_ms: 0.0008When adding new operation support or features:
import pytest
from pathlib import Path
import yaml
class TestNewFeature:
"""Tests for new feature."""
def test_stage1_graph_extraction(self, tmp_path):
"""Test graph extraction produces valid pytorch_graph.yaml."""
from solar.graph import PyTorchProcessor
# Create test model
model_file = tmp_path / "model.py"
model_file.write_text("...")
# Process
processor = PyTorchProcessor()
success = processor.process_model_file(str(model_file), str(tmp_path))
# Verify pytorch_graph.yaml exists and is valid
graph_path = tmp_path / "pytorch_graph.yaml"
assert graph_path.exists()
with open(graph_path) as f:
graph = yaml.safe_load(f)
assert "layers" in graph
assert "model_name" in graph
def test_stage2_einsum_conversion(self, tmp_path):
"""Test einsum conversion produces valid einsum_graph.yaml."""
from solar.einsum import PyTorchToEinsum
# Create test pytorch_graph.yaml
# ... convert it ...
# Verify einsum_graph.yaml and einsum_graph_renamed.yaml format
einsum_path = tmp_path / "einsum_graph.yaml"
renamed_path = tmp_path / "einsum_graph_renamed.yaml"
assert einsum_path.exists()
assert renamed_path.exists()
with open(renamed_path) as f:
einsum_graph = yaml.safe_load(f)
# Verify no YAML anchors/aliases
content = renamed_path.read_text()
assert "&id" not in content
assert "*id" not in content
def test_kernelbench_support(self):
"""Test feature with kernelbench models (PascalCase)."""
# Test implementation- Test all pipeline stages when adding new operations
- Verify YAML format: No anchors/aliases (
&id001,*id001) - Test both model types (kernelbench PascalCase)
- Use fixtures for common test data and temporary directories
- Mock external dependencies (e.g., LLM API calls) for unit tests
- Include integration tests for end-to-end validation
- Check file outputs: Verify expected files are created with correct structure
- Test error handling: Include tests for invalid inputs and edge cases
When adding support for a new PyTorch operation:
def test_new_operation_einsum(self):
"""Test new operation einsum generation."""
from solar.einsum import EinsumAnalyzer
analyzer = EinsumAnalyzer()
shapes = {"Input": [32, 64], "Weight": [128, 64]}
# Test einsum generation
einsum_op = analyzer.get_linear_einsum_op(shapes)
assert einsum_op.equation == "BMK,NK->BMN"
# Test compute cost
cost = analyzer.get_compute_cost("Linear", shapes)
assert cost == 32 * 128 * 64 # Expected MACs
def test_new_operation_full_pipeline(self, tmp_path):
"""Test new operation through full pipeline."""
# Create model with new operation
# Run through all 5 stages
# Verify outputs at each stage
passFor CI pipelines, use a tiered approach:
# Tier 1: Quick validation (on every commit)
bash run_tests.sh quick
# Tier 2: Unit tests (on PR)
bash run_tests.sh unit
# Tier 3: Full suite (on merge to main)
bash run_tests.sh allname: Solar Tests
on: [push, pull_request]
jobs:
quick-tests:
name: Quick Smoke Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: '3.8'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run quick tests
run: |
bash run_tests.sh quick
full-tests:
name: Full Test Suite
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11']
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run all tests with coverage
run: |
bash run_tests.sh all
python3 -m pytest tests/ --cov=solar --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xmlSolar uses a custom NoAliasDumper to ensure all YAML outputs are human-readable:
# Standard PyYAML with anchors (hard to read)
input_shapes: &id001
- - 16
output_shapes: *id001
# Solar with NoAliasDumper (easy to read)
input_shapes:
- - 16
output_shapes:
- - 16The NoAliasDumper is automatically used for all YAML outputs:
pytorch_graph.yamleinsum_graph.yamleinsum_graph_renamed.yamlanalysis.yamlperf_<arch>.yamltimeloop_graph.yaml
This makes outputs easier to inspect, diff, and debug, at a small cost of slightly larger file sizes.
-
Import Errors
# Install package in development mode pip install -e .
-
Missing Dependencies
# Install all dependencies pip install -r requirements.txt # For development/testing pip install pytest pytest-cov # For graph visualization pip install graphviz matplotlib
-
Test Discovery Issues
# Run from solar/ root directory cd /path/to/solar python3 -m pytest tests/
-
Model Loading Failures
# Check PyTorch and torchview versions pip install torch>=2.0.0 torchview>=0.2.6 # For RNN models, ensure CPU fallback is working # (Solar automatically handles meta → cpu fallback)
-
YAML Anchor/Alias Issues
# If you see &id001 or *id001 in outputs, ensure NoAliasDumper is used # All Solar components should automatically use NoAliasDumper from solar.common.utils
-
Graph Visualization Errors
# Install graphviz system package # Ubuntu/Debian: sudo apt-get install graphviz # macOS: brew install graphviz # Then install Python package: pip install graphviz
Generate coverage reports:
# HTML report
python -m pytest tests/ --cov=solar --cov-report=html
open htmlcov/index.html
# Terminal report
python -m pytest tests/ --cov=solar --cov-report=term-missingSolar includes several example models in examples/:
# Run all examples
bash run_tests.sh examples
# Or run individual examples
cd examples/DenseAttention && bash run_solar.sh
cd examples/SlidingWindowAttention && bash run_solar.sh
cd examples/RandomAttention && bash run_solar.sh
cd examples/BlockSparseAttention && bash run_solar.sh
cd examples/Attention && bash run_solar.sh
cd examples/BERT && bash run_solar.shEach example demonstrates:
- Complete 5-stage pipeline
- PDF graph visualization
- Performance prediction on H100
For performance-sensitive components:
import pytest
import time
@pytest.mark.benchmark
def test_einsum_performance():
"""Benchmark einsum generation."""
from solar.einsum import EinsumAnalyzer
analyzer = EinsumAnalyzer()
start = time.time()
for _ in range(1000):
analyzer.generate_matmul_einsum([100, 200], [200, 300])
elapsed = time.time() - start
assert elapsed < 1.0 # Should complete in under 1 secondApproximate execution times on a typical development machine:
| Test Category | Tests | Time | Command |
|---|---|---|---|
| Quick smoke tests | 2 | ~1 min | bash run_tests.sh quick |
| Graph processing | 10 | ~50 sec | bash run_tests.sh graph |
| Einsum analyzer | 15 | ~37 sec | bash run_tests.sh einsum |
| Model analyzer | 15 | ~39 sec | bash run_tests.sh model |
| LLM agent | 20 | ~39 sec | bash run_tests.sh llm |
| BERT example | 1 | ~44 sec | bash run_tests.sh bert |
| Integration | 6 | ~48 sec | bash run_tests.sh integration |
| Examples | 6 | ~3 min | bash run_tests.sh examples |
| All tests | ~70 | ~5-6 min | bash run_tests.sh |
- Add property-based testing with hypothesis for einsum equation validation
- Implement performance regression tests
- Add mutation testing for critical paths
- Create test data generators for complex transformer models
- Add CI/CD pipeline configuration for automated testing