Compiler Pass Generation - Triton Kernel Optimizer

An automated optimization framework that uses LLMs to suggest compiler pass parameters for Triton GPU kernels. The system implements a closed-loop refinement process that learns from previous attempts to optimize kernel performance.

Features

1. Baseline Implementation

PyTorch baseline implementations for matmul and softmax
Automated benchmarking and performance measurement
Correctness validation against reference implementations

2. Triton Kernels with Tunable Parameters

Matmul kernel with configurable:
- Block sizes (M, N, K dimensions)
- Group size for program ID mapping
- Pipeline stages
- Number of warps
Softmax kernel with configurable:
- Block size

3. Testing Framework

Automatic input generation
Correctness testing with numerical validation
Performance benchmarking with statistical analysis
Stability testing across different input sizes

4. Knowledge Archive

Stores all kernel versions and optimization attempts
Tracks parameters, speedup, correctness, and metadata
Provides history and statistics for analysis

5. LLM Integration

Structured prompts including:
- Kernel code
- Hardware/device information
- Performance goals and constraints
- Optimization history
Heuristic fallback when LLM unavailable

6. Closed-Loop Optimization

Iterative refinement with feedback
Parameter suggestion → testing → scoring → refinement
Early stopping on good speedup
Maximum iteration budget

7. Reporting System

Comprehensive optimization reports
Parameter impact analysis
Stability analysis across input sizes
Top-performing configurations

Installation

# Install dependencies
pip install -r requirements.txt

Usage

Basic Usage

# Optimize matmul kernel (default)
python optimizer.py --kernel matmul

# Optimize softmax kernel
python optimizer.py --kernel softmax

# Specify device and iterations
python optimizer.py --kernel matmul --device cuda --max-iterations 20

# With OpenAI API key
python optimizer.py --kernel matmul --api-key YOUR_API_KEY
# Or set environment variable: export OPENAI_API_KEY=your_key

Programmatic Usage

from optimizer import KernelOptimizer

# Create optimizer
optimizer = KernelOptimizer(
    kernel_name="matmul",
    device="cuda",
    max_iterations=20,
    llm_api_key="your-api-key"  # Optional
)

# Run optimization
results = optimizer.optimize()

print(f"Best speedup: {results['best_speedup']:.3f}x")
print(f"Best parameters: {results['best_params']}")

Testing Individual Kernels

from test_framework import TestFramework
from triton_kernels import triton_matmul

# Create test framework
framework = TestFramework(device="cuda")

# Test specific parameters
params = {
    "BLOCK_SIZE_M": 128,
    "BLOCK_SIZE_N": 64,
    "BLOCK_SIZE_K": 32,
    "GROUP_SIZE_M": 8,
    "num_stages": 4,
    "num_warps": 8,
}

result = framework.full_test_matmul(params, m=1024, n=1024, k=1024)
print(f"Speedup: {result['speedup']:.3f}x")
print(f"Correct: {result['correct']}")

Accessing Archive

from knowledge_archive import KnowledgeArchive

archive = KnowledgeArchive()

# Get best kernel
best = archive.get_best_kernel("matmul")
print(f"Best speedup: {best['speedup']:.3f}x")

# Get optimization history
history = archive.get_kernel_history("matmul", limit=10)

# Get statistics
stats = archive.get_statistics("matmul")
print(f"Total attempts: {stats['total_attempts']}")

Project Structure

compiler-pass-generation/
├── baseline.py              # PyTorch baseline implementations
├── triton_kernels.py        # Triton kernels with tunable parameters
├── test_framework.py        # Testing and benchmarking framework
├── knowledge_archive.py     # Storage for optimization results
├── llm_optimizer.py         # LLM integration for parameter suggestions
├── optimizer.py             # Main optimization loop
├── reporter.py              # Report generation system
├── requirements.txt         # Python dependencies
└── README.md               # This file

How It Works

Initialization: Sets up baseline PyTorch functions and Triton kernels with default parameters
Baseline Benchmarking: Measures baseline performance for comparison
Optimization Loop (repeats up to max_iterations):
- LLM suggests new parameter values based on:
  - Current kernel code
  - Hardware characteristics
  - Previous optimization attempts
  - Performance goals
- Kernel is compiled and tested with new parameters
- Results are scored (correctness + speedup)
- Best results are stored in archive
- Feedback is provided to LLM for next iteration
Analysis:
- Parameter impact analysis
- Stability testing across input sizes
- Comprehensive reporting

Tunable Parameters

MatMul Kernel

BLOCK_SIZE_M: Block size for M dimension (16, 32, 64, 128)
BLOCK_SIZE_N: Block size for N dimension (16, 32, 64, 128)
BLOCK_SIZE_K: Block size for K dimension (16, 32, 64)
GROUP_SIZE_M: Group size for program ID mapping (1, 2, 4, 8)
num_stages: Number of pipeline stages (1-5)
num_warps: Number of warps per block (1, 2, 4, 8, 16)

Softmax Kernel

BLOCK_SIZE: Block size for processing (256, 512, 1024, 2048, 4096)

Output

The optimizer generates:

Console output with optimization progress
Archive files in archive/ directory:
- kernels.json: All kernel versions and metadata
- metadata.json: Optimization statistics
Reports in reports/ directory:
- {kernel}_optimization_report.txt: Comprehensive optimization report

Requirements

Python 3.8+
PyTorch 2.0+
Triton 2.0+
CUDA-capable GPU (for optimal performance)
OpenAI API key (optional, for LLM suggestions)

Notes

Without an OpenAI API key, the system falls back to heuristic-based parameter suggestions
The framework is designed to work with CUDA, but CPU fallback is available
Optimization results are stored persistently for later analysis
The system learns from previous attempts to improve suggestions over time

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
report		report
triton_kernels		triton_kernels
.gitignore		.gitignore
CHANGES.md		CHANGES.md
COLAB_CLEAN_VERSION.md		COLAB_CLEAN_VERSION.md
COLAB_FINAL_FIX.md		COLAB_FINAL_FIX.md
COLAB_NOTEBOOK.md		COLAB_NOTEBOOK.md
COLAB_READY.md		COLAB_READY.md
COLAB_UPDATE_SUMMARY.md		COLAB_UPDATE_SUMMARY.md
FIX_FINAL.md		FIX_FINAL.md
IMPORT_FIX.md		IMPORT_FIX.md
LAST_FIX.md		LAST_FIX.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SOLUTION_FINAL.md		SOLUTION_FINAL.md
VERIFICATION.md		VERIFICATION.md
baseline.py		baseline.py
colab_notebook.py		colab_notebook.py
colab_notebook_final.py		colab_notebook_final.py
colab_notebook_fixed.py		colab_notebook_fixed.py
example.py		example.py
kernel_code_reader.py		kernel_code_reader.py
knowledge_archive.py		knowledge_archive.py
llm_optimizer.py		llm_optimizer.py
optimizer.py		optimizer.py
pseudocode.txt		pseudocode.txt
reporter.py		reporter.py
requirements.txt		requirements.txt
run_history.md		run_history.md
test_framework.py		test_framework.py
test_optimization.py		test_optimization.py
triton_kernels.py		triton_kernels.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Compiler Pass Generation - Triton Kernel Optimizer

Features

1. Baseline Implementation

2. Triton Kernels with Tunable Parameters

3. Testing Framework

4. Knowledge Archive

5. LLM Integration

6. Closed-Loop Optimization

7. Reporting System

Installation

Usage

Basic Usage

Programmatic Usage

Testing Individual Kernels

Accessing Archive

Project Structure

How It Works

Tunable Parameters

MatMul Kernel

Softmax Kernel

Output

Requirements

Notes

About

Uh oh!

Releases

Packages

Languages

simar-rekhi/compiler-pass-generation

Folders and files

Latest commit

History

Repository files navigation

Compiler Pass Generation - Triton Kernel Optimizer

Features

1. Baseline Implementation

2. Triton Kernels with Tunable Parameters

3. Testing Framework

4. Knowledge Archive

5. LLM Integration

6. Closed-Loop Optimization

7. Reporting System

Installation

Usage

Basic Usage

Programmatic Usage

Testing Individual Kernels

Accessing Archive

Project Structure

How It Works

Tunable Parameters

MatMul Kernel

Softmax Kernel

Output

Requirements

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages