AMD Skills provide agents with knowledge, scripts, and conventions for working with AMD hardware and software.
Skills in this repository follow the standardized Agent Skills format and are designed to interoperate with the major coding agents like Cursor, Claude Code, OpenAI Codex, and Gemini CLI.
AMD Skills is built directly into Claude and Cursor. No install. No setup
Just ask something like: "Use AMD Skills to integrate local AI into my app".
For other agents, see Manual installation.
A skill is a self-contained folder that bundles everything an agent needs to perform a focused task: instructions, helper scripts, prompts, templates, and references. At its core is a SKILL.md file with YAML frontmatter, a name, and a short description that tells the agent when the skill should activate, followed by the guidance the agent reads while the skill is in use.
skills/
rocm-doctor/
SKILL.md
scripts/
references/
When an agent decides a skill is relevant (or you invoke it explicitly), it loads that SKILL.md and follows the instructions inside. Descriptions stay in context cheaply; the full body of a skill only loads when the task actually matches.
Documentation describes an API surface: every flag, every option, neutral by design. A skill encodes the opinionated path: which flags, which container image, which gfx target, which environment variables, in what order. It captures the decisions a senior AMD engineer makes without thinking, in a form the agent can apply consistently across teams and repositories.
Skills earn their keep on repeated, opinionated workflows, exactly where the AMD stack lives.
Important
The catalog is under active development. Skills, categories, and descriptions are changing fast. Expect entries to appear, move, and get renamed without notice.
Target: ready for testing by June 12. Until then, treat anything below as a preview.
The initial catalog is organized into five focus areas.
Embed AMD-optimized AI into end-user applications.
| Skill | What it does | Source |
|---|---|---|
local-ai-app-integration |
Integrate local AI into cloud LLM apps for offline support, better privacy, and lower API costs. | in-repo |
local-ai-use |
Route image generation, text-to-speech, and speech-to-text through a local AI server to reduce token cost. | in-repo |
Diagnose, configure, and ready AMD systems for AI workloads: drivers, BIOS, memory pools, gfx targets, and framework setup.
| Skill | What it does | Source |
|---|---|---|
apu-memory-tuner |
Inspect and tune the shared-vs-dedicated memory split (GTT / UMA Frame Buffer) on AMD Ryzen APUs. | in-repo |
rocm-doctor |
Diagnose ROCm / PyTorch / llama.cpp failures on AMD GPUs against a fixed list of known misconfigurations. | in-repo |
gfx-target-chooser |
Pick the right gfx942 / gfx90a / gfx1100 target and matching compiler flags. |
planned |
pytorch-rocm-setup |
Get a known-good PyTorch + ROCm stack running on a target node, end to end. | planned |
Author, tune, and reason about GPU kernels for AMD targets.
| Skill | What it does | Source |
|---|---|---|
aiter-reflection |
Optimize AMD GPU kernels on MI300 using the aiter project: op tests, benchmarks, iteration, experiment database. | Apex |
gpu-architecture-fundamentals |
Reason about memory hierarchy, execution model, block sizing, and latency across HIP, Triton, and PyTorch. | Apex |
hip-kernel-optimization |
Write and tune HIP kernels: coalescing, shared-memory tiling, bank conflicts, warp primitives, occupancy, vectorization. | Apex |
kernel-exp-history |
Consult past kernel optimization experiments and record the current iteration back into the experiment database. | Apex |
mi300-hip-programming-insights |
CDNA3 / MI300 HIP programming insights: chiplet and cache model, Infinity Cache, coherency, matrix cores, sparsity. | Apex |
pytorch-kernel-optimization |
Optimize PyTorch models and kernels: torch.compile, custom extensions, mixed precision, CUDA graphs, profiling. |
Apex |
triton-hip-reference-kernel-search |
Search and adapt Triton / HIP kernel patterns from a corpus to reuse tiling and occupancy strategies. | Apex |
triton-kernel-optimization |
Write and tune Triton kernels: autotune block sizes, tiled matmul, fused ops, reductions, flash-attention, quantization. | Apex |
triton-kernel-reflection-prompts |
Reflection / self-critique prompts for reviewing and fixing AMD-targeted Triton kernels. | Apex |
Bring existing workloads onto AMD.
| Skill | What it does | Source |
|---|---|---|
cuda-to-hip |
Port CUDA kernels with hipify and flag anything that needs manual review. |
planned |
vllm-rocm |
Stand up vLLM on AMD with the right environment variables and model configurations. | planned |
serving-llms-on-instinct |
Deploy LLM inference on AMD Instinct GPUs end-to-end: detect hardware (or onboard via AMD Developer Cloud), validate model fit, apply the right vLLM recipe, and launch a benchmarked endpoint. SGLang and engine/backend selection in later phases. | planned |
Close the loop from trace to fix to ship.
| Skill | What it does | Source |
|---|---|---|
magpie |
Evaluate GPU kernel correctness and performance, compare kernel implementations, and benchmark vLLM / SGLang inference with profiling, TraceLens, and torch-trace gap analysis. | Magpie |
rocprof-compute |
Profile AMD GPU kernels with rocprof-compute to collect metrics, roofline data, and bottleneck analysis. |
Apex |
omniperf-tune |
Run omniperf, locate the bottleneck, and suggest the fix. |
planned |
quark-quantize |
Quantize PyTorch / ONNX models with AMD Quark and export for AMD deployment. | planned |
The AMD stack is large and moves fast. ROCm, HIP, Ryzen AI, and framework integrations each have their own team, release cadence, and validation matrix. So skills here are federated: each skill is owned and versioned by the team that owns the product it describes, and this repository is the catalog that brings them together.
┌─────────────────────────────────────────────────────┐
│ amd/skills (this repo) │
│ │
│ skills/ scripts/ .*-plugin/ │
│ in-repo skills sources.yml agent manifests │
└──────────────────────┬──────────────────────────────┘
│ one install
▼
your AI coding agent
▲
│ resolves pointers to
┌───────────────┬───────────────┼───────────────┬────────────────┐
│ │ │ │ │
ROCm/ROCm ROCm/HIP Ryzen AI repo lemonade-sdk ...more
rocm-doctor/ cuda-to-hip/ ryzen-ai-tools/ local-ai-app- product
gfx-target-... triton-amd-... ... integration/ repos
This repo also acts as an incubator: a skill can start under skills/ to iterate quickly, then graduate to its product repo and be re-pointed from scripts/sources.yml once it has a clear owner, with no change for installed users.
skills/ # All skills the agent can load (in-repo + vendored copies of federated)
.cursor-plugin/ # Cursor plugin manifest
.claude-plugin/ # Claude Code marketplace manifest
.github/workflows/ # CI for validating skills and the `import-external-skills` workflow
scripts/ # Tooling for publishing, regenerating manifests, and importing
scripts/sources.yml # Master list of external skill sources for federation
In-repo skills are authored directly under skills/. Federated skills are
declared in scripts/sources.yml and vendored into
skills/ by the import-external-skills workflow.
AMD Skills are compatible with Cursor, Claude Code, OpenAI Codex, and Gemini CLI. The general flow:
Install the AMD plugin from this repository through the Cursor plugin flow. The repo ships a .cursor-plugin/plugin.json so skills are discoverable as soon as the plugin is enabled.
Register this repository as a plugin marketplace, then install individual skills:
/plugin marketplace add amd/skills
/plugin install <skill-name>@amd/skillsCopy or symlink the desired folders from skills/ into one of Codex's standard skill locations (for example $REPO_ROOT/.agents/skills or $HOME/.agents/skills). Codex will discover the SKILL.md files automatically.
A gemini-extension.json will be provided so the repo can be installed as a Gemini CLI extension:
gemini extensions install https://github.com/amd/skills.git --consentOnce a skill is installed, reference it in plain language while talking to your agent. For example:
- "Use AMD Skills to integrate local AI capabilities into my app with Embeddable Lemonade."
- "Use AMD Skills to convert these CUDA kernels and flag anything that needs manual review."
In most cases the agent picks the right skill on its own from the description; explicit invocation is a fallback, not a requirement.
We welcome contributions from AMD engineers and selected partners. Two paths, matching how the catalog is organized:
- Path A — In-repo skills. Authored directly under
skills/. Best for cross-cutting workflows without a natural product home. - Path B — Product-repo skills. Authored in a product repository and registered here through
scripts/sources.ymlwith a pinned tag. Best for skills that should ship and version with a specific product.
See CONTRIBUTING.md for step-by-step instructions and the rules CI enforces.
Released under the MIT License. See LICENSE for details.
