Skip to content

tile-ai/TileOPs

Repository files navigation

TileOPs

Spec-driven GPU operator library for LLMs — designed for AI agents to build, evaluate, and optimize

Built on TileLang

Installation | Quick Start | Docs

Status: TileOPs is under active development. APIs may change.

Overview

TileOPs is a GPU operator library for LLM training and inference, built on TileLang. Beyond providing a growing collection of production-quality operators, TileOPs explores a spec-driven development model where AI agents can read declarative operator specifications, generate kernel implementations, and evaluate them against hardware-theoretical performance bounds — with minimal human scaffolding.

Architecture

Every operator is split into two layers with a strict boundary:

  • Op (L2) — stateless Python entry point. Handles validation, dtype casting, and memory layout. Compatible with CUDA-Graph and torch.compile.
  • Kernel (L1) — TileLang GPU implementation with hardware-specific optimizations (Ampere, Hopper).

This separation keeps user-facing behavior independent of GPU strategy, allowing agents and developers to modify either layer without side effects on the other.

Key Properties

  • Spec-driven — each operator is declared in a machine-readable manifest (ops_manifest.yaml) that specifies signatures, workloads, and roofline formulas, serving as the entry point for both agent code generation and automated validation
  • Roofline-evaluated — kernel performance is measured against Speed-of-Light hardware bounds, not relative baselines
  • Auto-tuning — built-in search over tile sizes, pipelines, and scheduling parameters
  • Lightweight — depends only on TileLang, PyTorch, and einops

Installation

TileOPs can be installed from PyPI or built from source. A CUDA-capable GPU is required.

Prerequisites

  • Python >= 3.10
  • PyTorch >= 2.1
  • CUDA Toolkit
  • NVIDIA GPU: Hopper (SM_90)
  • TileLang == 0.1.8

From PyPI

pip install tileops

From source

git clone https://github.com/tile-ai/TileOPs
cd TileOPs
make install    # dev dependencies + pre-commit hooks

Note

If CUDA and TileLang are already installed system-wide and you encounter build issues: PIP_NO_BUILD_ISOLATION=1 pip install -e '.[dev]' -v && pre-commit install

Verify:

python -m pytest tests/ -q    # requires a CUDA GPU

Quick Start

import torch
from tileops.ops import GemmOp

M, N, K = 1024, 1024, 512
dtype = torch.float16

gemm = GemmOp(M, N, K, dtype=dtype)

A = torch.randn(M, K, device="cuda", dtype=dtype)
B = torch.randn(K, N, device="cuda", dtype=dtype)

C = gemm(A, B)

Documentation

Design docs and development guides are in docs/. The full API reference and performance tables are published at TileOPs.github.io.

Contributing

See workflow.md for branch naming, commit conventions, and the PR process.

License

TileOPs is released under the MIT License.

About

High-performance LLM operator library built on TileLang.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages