Skip to content

Commit 71720b2

Browse files
Merge pull request #46 from amd/dholanda/scope
Update catalog scope
2 parents 5f12b06 + fbfb584 commit 71720b2

30 files changed

Lines changed: 2 additions & 26219 deletions

File tree

.claude-plugin/marketplace.json

Lines changed: 0 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -8,36 +8,12 @@
88
"version": "0.1.0"
99
},
1010
"plugins": [
11-
{
12-
"name": "aiter-reflection",
13-
"source": "./skills/aiter-reflection",
14-
"skills": "./",
15-
"description": "This skill should be used when optimizing AMD GPU kernels on MI300 using the aiter project, including running op tests, benchmarking, iterating on kernel changes, and recording results in the kernel experiment database."
16-
},
1711
{
1812
"name": "apu-memory-tuner",
1913
"source": "./skills/apu-memory-tuner",
2014
"skills": "./",
2115
"description": "Inspect and tune the shared-vs-dedicated memory split (GTT / UMA Frame Buffer) on AMD Ryzen APUs so larger LLMs and image models fit on the iGPU."
2216
},
23-
{
24-
"name": "gpu-architecture-fundamentals",
25-
"source": "./skills/gpu-architecture-fundamentals",
26-
"skills": "./",
27-
"description": "This skill should be used when reasoning about GPU architecture fundamentals to guide kernel optimization choices such as memory hierarchy usage, execution model mapping, block sizing, and latency-aware tuning across HIP, Triton, and PyTorch."
28-
},
29-
{
30-
"name": "hip-kernel-optimization",
31-
"source": "./skills/hip-kernel-optimization",
32-
"skills": "./",
33-
"description": "This skill should be used when writing or tuning HIP kernels on AMD/NVIDIA GPUs, covering memory coalescing, shared-memory tiling, bank conflict avoidance, warp primitives, occupancy, vectorization, async ops, loop unrolling, and profiling."
34-
},
35-
{
36-
"name": "kernel-exp-history",
37-
"source": "./skills/kernel-exp-history",
38-
"skills": "./",
39-
"description": "This skill should be used when optimizing kernels in this repo and needing to consult past optimization experiments, or when recording the current optimization iteration back into the kernel experiment database."
40-
},
4117
{
4218
"name": "local-ai-app-integration",
4319
"source": "./skills/local-ai-app-integration",
@@ -56,47 +32,11 @@
5632
"skills": "./",
5733
"description": "Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces."
5834
},
59-
{
60-
"name": "mi300-hip-programming-insights",
61-
"source": "./skills/mi300-hip-programming-insights",
62-
"skills": "./",
63-
"description": "CDNA3/MI300 HIP programming insights—chiplet/cache model, Infinity Cache, memory coherency, matrix cores, sparsity, and best practices."
64-
},
65-
{
66-
"name": "pytorch-kernel-optimization",
67-
"source": "./skills/pytorch-kernel-optimization",
68-
"skills": "./",
69-
"description": "This skill should be used when optimizing PyTorch models and kernels, including efficient tensor operations, torch.compile, custom autograd/CUDA/Triton extensions, mixed precision, memory and data pipeline tuning, model optimization techniques, CUDA graphs, and profiling."
70-
},
7135
{
7236
"name": "rocm-doctor",
7337
"source": "./skills/rocm-doctor",
7438
"skills": "./",
7539
"description": "Diagnose why ROCm, PyTorch, or llama.cpp isn't working on an AMD GPU. Matches the symptom against a fixed list of twelve known misconfigurations and proposes the next step."
76-
},
77-
{
78-
"name": "rocprof-compute",
79-
"source": "./skills/rocprof-compute",
80-
"skills": "./",
81-
"description": "This skill should be used when profiling AMD GPU kernels with rocprof-compute to collect metrics, roofline data, and analyze bottlenecks for HIP kernels."
82-
},
83-
{
84-
"name": "triton-hip-reference-kernel-search",
85-
"source": "./skills/triton-hip-reference-kernel-search",
86-
"skills": "./",
87-
"description": "Search and adapt Triton/HIP kernel patterns from a corpus to optimize AMD GPUs; use to find similar ops and reuse tiling/occupancy strategies."
88-
},
89-
{
90-
"name": "triton-kernel-optimization",
91-
"source": "./skills/triton-kernel-optimization",
92-
"skills": "./",
93-
"description": "This skill should be used when writing or tuning Triton GPU kernels, including autotuning block sizes, coalesced accesses, tiled matmul, fused ops, reductions, flash-attention style kernels, quantization, custom gradients, and profiling."
94-
},
95-
{
96-
"name": "triton-kernel-reflection-prompts",
97-
"source": "./skills/triton-kernel-reflection-prompts",
98-
"skills": "./",
99-
"description": "Reflection/self-critique prompts for reviewing and fixing AMD-targeted Triton kernels after generation or test failures."
10040
}
10141
]
10242
}

README.md

Lines changed: 2 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ Skills earn their keep on repeated, opinionated workflows, exactly where the AMD
5757
>
5858
> **Target: ready for testing by June 12.** Until then, treat anything below as a preview.
5959
60-
The initial catalog is organized into five focus areas.
60+
The initial catalog is organized into four focus areas.
6161

6262

6363
### Application integration
@@ -80,22 +80,6 @@ Diagnose, configure, and ready AMD systems for AI workloads: drivers, BIOS, memo
8080
| `gfx-target-chooser` | Pick the right `gfx942` / `gfx90a` / `gfx1100` target and matching compiler flags. | _planned_ |
8181
| `pytorch-rocm-setup` | Get a known-good PyTorch + ROCm stack running on a target node, end to end. | _planned_ |
8282

83-
### Kernel engineering
84-
85-
Author, tune, and reason about GPU kernels for AMD targets.
86-
87-
| Skill | What it does | Source |
88-
| --- | --- | --- |
89-
| [`aiter-reflection`](skills/aiter-reflection/SKILL.md) | Optimize AMD GPU kernels on MI300 using the aiter project: op tests, benchmarks, iteration, experiment database. | [Apex](https://github.com/AMD-AGI/Apex) |
90-
| [`gpu-architecture-fundamentals`](skills/gpu-architecture-fundamentals/SKILL.md) | Reason about memory hierarchy, execution model, block sizing, and latency across HIP, Triton, and PyTorch. | [Apex](https://github.com/AMD-AGI/Apex) |
91-
| [`hip-kernel-optimization`](skills/hip-kernel-optimization/SKILL.md) | Write and tune HIP kernels: coalescing, shared-memory tiling, bank conflicts, warp primitives, occupancy, vectorization. | [Apex](https://github.com/AMD-AGI/Apex) |
92-
| [`kernel-exp-history`](skills/kernel-exp-history/SKILL.md) | Consult past kernel optimization experiments and record the current iteration back into the experiment database. | [Apex](https://github.com/AMD-AGI/Apex) |
93-
| [`mi300-hip-programming-insights`](skills/mi300-hip-programming-insights/SKILL.md) | CDNA3 / MI300 HIP programming insights: chiplet and cache model, Infinity Cache, coherency, matrix cores, sparsity. | [Apex](https://github.com/AMD-AGI/Apex) |
94-
| [`pytorch-kernel-optimization`](skills/pytorch-kernel-optimization/SKILL.md) | Optimize PyTorch models and kernels: `torch.compile`, custom extensions, mixed precision, CUDA graphs, profiling. | [Apex](https://github.com/AMD-AGI/Apex) |
95-
| [`triton-hip-reference-kernel-search`](skills/triton-hip-reference-kernel-search/SKILL.md) | Search and adapt Triton / HIP kernel patterns from a corpus to reuse tiling and occupancy strategies. | [Apex](https://github.com/AMD-AGI/Apex) |
96-
| [`triton-kernel-optimization`](skills/triton-kernel-optimization/SKILL.md) | Write and tune Triton kernels: autotune block sizes, tiled matmul, fused ops, reductions, flash-attention, quantization. | [Apex](https://github.com/AMD-AGI/Apex) |
97-
| [`triton-kernel-reflection-prompts`](skills/triton-kernel-reflection-prompts/SKILL.md) | Reflection / self-critique prompts for reviewing and fixing AMD-targeted Triton kernels. | [Apex](https://github.com/AMD-AGI/Apex) |
98-
9983
### Cross-stack porting
10084

10185
Bring existing workloads onto AMD.
@@ -113,7 +97,7 @@ Close the loop from trace to fix to ship.
11397
| Skill | What it does | Source |
11498
| --- | --- | --- |
11599
| [`magpie`](skills/magpie/SKILL.md) | Evaluate GPU kernel correctness and performance, compare kernel implementations, and benchmark vLLM / SGLang inference with profiling, TraceLens, and torch-trace gap analysis. | [Magpie](https://github.com/AMD-AGI/Magpie) |
116-
| [`rocprof-compute`](skills/rocprof-compute/SKILL.md) | Profile AMD GPU kernels with `rocprof-compute` to collect metrics, roofline data, and bottleneck analysis. | [Apex](https://github.com/AMD-AGI/Apex) |
100+
| `hyperloom` | Autonomously optimizes LLM inference on AMD GPUs. | _planned_ |
117101
| `omniperf-tune` | Run `omniperf`, locate the bottleneck, and suggest the fix. | _planned_ |
118102
| `quark-quantize` | Quantize PyTorch / ONNX models with [AMD Quark](https://github.com/amd/Quark) and export for AMD deployment. | _planned_ |
119103

scripts/sources.yml

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -23,25 +23,6 @@
2323
# the resulting changes for human review.
2424

2525
sources:
26-
- name: amd-agi-apex
27-
repo: AMD-AGI/Apex
28-
ref: main
29-
path: tools/skills
30-
license: MIT
31-
# `skill-creator` is intentionally excluded; this catalog already has
32-
# its own `create-skill` story via CONTRIBUTING.md.
33-
skills:
34-
- aiter-reflection
35-
- gpu-architecture-fundamentals
36-
- hip-kernel-optimization
37-
- kernel-exp-history
38-
- mi300-hip-programming-insights
39-
- pytorch-kernel-optimization
40-
- rocprof-compute
41-
- triton-hip-reference-kernel-search
42-
- triton-kernel-optimization
43-
- triton-kernel-reflection-prompts
44-
4526
- name: amd-agi-magpie
4627
repo: AMD-AGI/Magpie
4728
ref: main

skills/aiter-reflection/.federated.json

Lines changed: 0 additions & 9 deletions
This file was deleted.

skills/aiter-reflection/SKILL.md

Lines changed: 0 additions & 72 deletions
This file was deleted.

skills/gpu-architecture-fundamentals/.federated.json

Lines changed: 0 additions & 9 deletions
This file was deleted.

skills/gpu-architecture-fundamentals/SKILL.md

Lines changed: 0 additions & 36 deletions
This file was deleted.

skills/hip-kernel-optimization/.federated.json

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)