GinS: Ghost in the Shell

GinS is an automated system for generating and validating CUDA kernels from PyTorch operations. It uses AI-powered iterative refinement to convert high-level PyTorch operations into optimized CUDA kernels with comprehensive validation.

Features

Automated CUDA Kernel Generation: Converts PyTorch operations to equivalent CUDA kernels
Iterative Validation & Refinement: AI-powered error fixing with up to 5 attempts per operation
Multi-LLM Support: Works with Google Gemini and Ollama models
Complete PyTorch Integration: Generates kernels as PyTorch C++ extensions
Robust Error Handling: Comprehensive validation covering compilation and correctness
Benchmark Processing: Handles sequences of operations with shared execution context
Detailed Logging: Saves all attempts, feedback, and results for analysis

Architecture

The system follows a 3-stage pipeline for each operation:

Monitor (src/monitor.py): Profiles PyTorch operations to capture ATen calls and CUDA kernel information
Generate (src/generator.py): Uses LLMs to generate CUDA kernel code based on profiling data
Verify (src/verifier.py): Compiles and validates generated kernels against ground truth

Requirements

Core Software

Python ≥ 3.12
PyTorch (CUDA-enabled build)
NVIDIA GPU with CUDA support
LLM Provider of your choice:
- Google Gemini API key - required for Gemini-based models
- Ollama - required for running local models
- OpenAI API key - required for OpenAI-based models

Minimum Versions

Component	Minimum Version	Reason / Notes
NVCC	≥ 12.1	Required for full C++17 support with PyTorch 2.x
GCC	≥ 11.x	Compatible with NVCC 12.1 toolchain
PyTorch	≥ 2.0	Required for modern extension APIs and C++17
CUDA Driver	≥ 12.0	Must support CUDA 12.x toolkit

** If issue with compiling generated code, likely compatibility issue with one of these (update them).

Installation

Clone the repository:
```
git clone <repository-url>
cd GinS
```

Install dependencies:

python -m venv env
source env/bin/activate
pip install -r requirements.txt

Set up API keys (for Gemini):

export GOOGLE_API_KEY="your-api-key-here"

Install Ollama (optional, for local models):

# Follow Ollama installation instructions for your platform
ollama pull llama3.2:latest

Usage

Basic Usage

Run the system on a benchmark file:

python -m src.main benchmarks/initial_testing.json

Benchmark Format

Create a JSON file with the following structure:

[
    {
        "name": "program1",
        "operations": [
            {
                "assignment": "c",
                "operation": "torch.matmul",
                "inputs": ["a", "b"]
            },
            {
                "assignment": "d",
                "operation": "torch.sin",
                "inputs": ["c"]
            }
        ],
        "definitions": [
            {"variable": "a", "value": "torch.randn(2048, 2048, device=\"cuda\")"},
            {"variable": "b", "value": "torch.randn(2048, 2048, device=\"cuda\")"}
        ]
    }
]

Configuration

Key configuration options in src/main.py:

MAX_ATTEMPTS = 5: Maximum retry attempts per operation
OUTPUT_DIR = "generated_kernels": Directory for output files

Project Structure

GinS/
├── src/
│   ├── main.py              # Main pipeline orchestrator
│   ├── monitor.py           # PyTorch operation profiling
│   ├── generator.py         # LLM-based code generation
│   ├── verifier.py          # Kernel validation and testing
│   └── prompts/
│       └── prompts.py       # LLM system prompts
├── benchmarks/
│   └── initial_testing.json # Example benchmark file
├── generated_kernels/       # Output directory for results
├── requirements.txt         # Python dependencies
└── run.sh                   # Example run script

Workflow

Load Benchmark: Parse JSON file with operations and variable definitions
Initialize Context: Set up execution environment with defined variables
Process Operations: For each operation:
- Profile: Monitor PyTorch execution to capture ATen/kernel information
- Generate: Use LLM to create CUDA kernel code
- Validate: Compile and test against ground truth
- Refine: If validation fails, use AI to fix errors and retry (default: up to 5 attempts)
Save Results: Store final kernels and logs

Output Files

The system generates several types of output files:

*_inputs.pt: Input tensors for validation
*_gold.pt: Ground truth output tensors
*_iter{N}.log: Logs for each validation attempt
*_kernel_final.cu: Final generated kernel (successful)
*_kernel_final_FAILED.cu: Final kernel (if all attempts failed)

Example

The included benchmarks/initial_testing.json demonstrates:

Matrix multiplication (torch.matmul)
Element-wise operations (torch.sin)
Sequential operations with shared context

Troubleshooting

Common Issues

CUDA not available: Ensure PyTorch is installed with CUDA support
Compilation errors: Check that generated kernels follow PyTorch C++ extension requirements (NVCC, GCC, etc.)
API key issues: Verify API key is set correctly for models OR ollama running in separate terminal for local models
Memory issues: Reduce tensor sizes in benchmark definitions

Debug Mode

Enable detailed logging by modifying the logging level in src/verifier.py:

logging.basicConfig(level=logging.DEBUG)

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
benchmarks		benchmarks
generated_kernels		generated_kernels
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GinS: Ghost in the Shell

Features

Architecture

Requirements

Core Software

Minimum Versions

Installation

Usage

Basic Usage

Benchmark Format

Configuration

Project Structure

Workflow

Output Files

Example

Troubleshooting

Common Issues

Debug Mode

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TheJoshBrod/Ghost-in-the-Shell-GPU-Compiler

Folders and files

Latest commit

History

Repository files navigation

GinS: Ghost in the Shell

Features

Architecture

Requirements

Core Software

Minimum Versions

Installation

Usage

Basic Usage

Benchmark Format

Configuration

Project Structure

Workflow

Output Files

Example

Troubleshooting

Common Issues

Debug Mode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages