[Refactor][Benchmark] Migrate microbench hardware characterization into benchmarks/hardware/ by lcy-seso · Pull Request #639 · tile-ai/TileOPs

lcy-seso · 2026-03-23T12:59:50Z

Closes #630

Summary

Migrate the hardware characterization benchmark suite from tilelang-microbench into benchmarks/hardware/ with memory/, compute/, and system/ subdirectories, plus shared utils/ and results/ directories
Add scripts/generate_gpu_profile.py that reads CSV benchmark results and produces calibrated gpu_profile.yaml files for the roofline model
Include BF16 GEMM throughput benchmark alongside existing FP16 measurement
Generate tileops/perf/profiles/h200.yaml with measured HBM bandwidth (4255 GB/s) and Tensor Core TFLOPS (fp16: 370.3, bf16: 364.0) with calibration factors

Test plan

pytest tests/hardware/test_hardware_structure.py — validates directory structure and file presence (AC-1)
pytest tests/hardware/test_generate_gpu_profile.py — validates profile generation, schema compliance, and calibration factors (AC-2, AC-4)
BF16 GEMM rows present in gemm_throughput.csv and consumed by parse_gemm_results (AC-3)
Full smoke suite: 867 tests passed, 0 failed
Pre-commit lint passes cleanly

Additional context

The original tilelang-microbench repo remains as-is (archive) — this is a copy, not a move
CUDA benchmarks (.cu files) require nvcc with -arch=sm_90 for H200; they are included for reference but not compiled during CI
CSV result files are checked in to enable offline profile generation without GPU access

🤖 Generated with Claude Code

lcy-seso · 2026-03-23T13:00:18Z

/gemini review

gemini-code-assist · 2026-03-23T13:04:02Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting /gemini summary.

Copilot

Pull request overview

Migrates the hardware characterization microbenchmark suite into benchmarks/hardware/ and adds a CSV→YAML profile generator to produce GPU peak/calibration data (e.g., H200) for downstream performance modeling.

Changes:

Add benchmarks/hardware/ microbench suite (memory/compute/system) with shared utils and checked-in CSV results.
Add scripts/generate_gpu_profile.py plus pytest coverage to generate validated GPU profile YAMLs from benchmark CSVs.
Add an H200 profile YAML (tileops/perf/profiles/h200.yaml) with measured HBM bandwidth and FP16/BF16 Tensor Core throughput.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tileops/perf/init.py	Introduces perf module package marker/docstring.
tileops/perf/profiles/init.py	Introduces profiles subpackage marker/docstring.
tileops/perf/profiles/h200.yaml	Adds generated H200 measured/theoretical peaks + calibration factors.
scripts/generate_gpu_profile.py	New CLI/script to parse benchmark CSVs and emit GPU profile YAML.
tests/hardware/init.py	Adds hardware test package marker.
tests/hardware/test_hardware_structure.py	Smoke test asserting benchmark directory structure + key files exist.
tests/hardware/test_generate_gpu_profile.py	Tests CSV parsing/profile generation + validates committed H200 profile.
benchmarks/hardware/init.py	Adds top-level hardware benchmark package docs.
benchmarks/hardware/run_all.py	Convenience runner to execute core Python benchmarks in a fixed order.
benchmarks/hardware/utils/init.py	Re-exports shared utilities for benchmarks.
benchmarks/hardware/utils/bench.py	Adds do_bench wrapper + bandwidth/TFLOPS calculators.
benchmarks/hardware/utils/env.py	Adds env capture and theoretical peak tables for supported GPUs.
benchmarks/hardware/utils/output.py	Adds CSV schema/constants and a CSV writer targeting `benchmarks/hardware/results/`.
benchmarks/hardware/results/hbm_peak.csv	Checks in HBM peak measurement CSV used for offline profile generation.
benchmarks/hardware/results/gemm_throughput.csv	Checks in GEMM throughput CSV including BF16 rows.
benchmarks/hardware/memory/init.py	Memory benchmarks package marker/docstring.
benchmarks/hardware/memory/hbm_bandwidth.py	Python HBM bandwidth benchmark emitting `bandwidth.csv`.
benchmarks/hardware/memory/l2_bandwidth.py	L2 “working set” reuse benchmark emitting `l2_cache.csv`.
benchmarks/hardware/memory/shared_bandwidth.py	TileLang shared-memory bandwidth benchmark emitting `shared_memory.csv`.
benchmarks/hardware/memory/latency.py	Latency suite including pointer-chase integration and CSV output.
benchmarks/hardware/memory/pointer_chase.cu	CUDA pointer-chase kernel used by latency benchmark.
benchmarks/hardware/memory/hbm_saturation.cu	Standalone CUDA “cookbook” for HBM saturation exploration.
benchmarks/hardware/compute/init.py	Compute benchmarks package marker/docstring.
benchmarks/hardware/compute/gemm_throughput.py	GEMM TFLOPS benchmark for cuBLAS/TileLang (FP16 + BF16).
benchmarks/hardware/system/init.py	System benchmark package marker with nvcc build notes.
benchmarks/hardware/system/stream_sync.py	Python stream/event synchronization overhead experiments.
benchmarks/hardware/system/sync_overhead.cu	CUDA sync primitive overhead microbench (reference).
benchmarks/hardware/system/atomic_overhead.cu	CUDA atomic contention microbench (reference).
benchmarks/hardware/system/bank_conflict.cu	CUDA shared-memory bank conflict microbench (reference).
benchmarks/hardware/system/bank_conflict_v2.cu	Updated bank conflict microbench variant (reference).
benchmarks/hardware/system/async_copy.cu	CUDA cp.async pipeline microbench (reference).
benchmarks/hardware/system/warp_spec.cu	CUDA warp-specialization microbench (reference).
benchmarks/hardware/system/occupancy_latency.cu	CUDA occupancy vs throughput microbench (reference).
benchmarks/hardware/system/register_spill.cu	CUDA register pressure/spill microbench (reference).

benchmarks/hardware/memory/pointer_chase.cu

benchmarks/hardware/memory/latency.py

scripts/generate_gpu_profile.py

gemini-code-assist

Code Review

This pull request is a significant and valuable addition, migrating the hardware microbenchmarks into the main repository with a well-organized structure. The benchmarks are comprehensive, covering memory, compute, and system-level characteristics. The inclusion of a script to generate GPU profiles from the results is a great feature for performance modeling.

My review focuses on a couple of areas to improve robustness and maintainability:

The CSVWriter utility can be made more robust by implementing it as a context manager to prevent resource leaks.
There is a duplicated constant in scripts/generate_gpu_profile.py that should be imported to follow the DRY principle.

Overall, this is an excellent contribution that provides a solid foundation for hardware characterization.

benchmarks/hardware/utils/output.py

scripts/generate_gpu_profile.py

…to benchmarks/hardware/ Migrate the hardware characterization benchmark suite from tilelang-microbench into benchmarks/hardware/ with memory/, compute/, and system/ subdirectories. Add scripts/generate_gpu_profile.py to produce gpu_profile.yaml from benchmark results, include BF16 GEMM alongside FP16, and generate an H200 profile with calibrated HBM bandwidth and Tensor Core TFLOPS values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lcy-seso added refactor Code restructuring without behavior change all-ai-powered Produced entirely by automated contributors labels Mar 23, 2026

lcy-seso requested a review from Copilot March 23, 2026 13:00

Copilot started reviewing on behalf of lcy-seso March 23, 2026 13:01 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

benchmarks/hardware/memory/pointer_chase.cu Show resolved Hide resolved

benchmarks/hardware/memory/latency.py Show resolved Hide resolved

scripts/generate_gpu_profile.py Show resolved Hide resolved

scripts/generate_gpu_profile.py Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

benchmarks/hardware/utils/output.py Show resolved Hide resolved

scripts/generate_gpu_profile.py Show resolved Hide resolved

lcy-seso marked this pull request as ready for review March 23, 2026 13:15

lcy-seso requested a review from a team March 23, 2026 13:15

lcy-seso force-pushed the refactor/benchmark/issue-630 branch from 12e5a95 to 5b188b2 Compare March 24, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor][Benchmark] Migrate microbench hardware characterization into benchmarks/hardware/#639

[Refactor][Benchmark] Migrate microbench hardware characterization into benchmarks/hardware/#639
lcy-seso wants to merge 1 commit intotile-ai:mainfrom
lcy-seso:refactor/benchmark/issue-630

lcy-seso commented Mar 23, 2026

Uh oh!

lcy-seso commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lcy-seso commented Mar 23, 2026

Summary

Test plan

Additional context

Uh oh!

lcy-seso commented Mar 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants