Skip to content

Commit d6392dc

Browse files
committed
Import magpie
1 parent ab86eb0 commit d6392dc

4 files changed

Lines changed: 317 additions & 0 deletions

File tree

skills/magpie/.federated.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"source": "amd-agi-magpie",
3+
"repo": "AMD-AGI/Magpie",
4+
"ref": "main",
5+
"commit": "3820dccb9fc7eb06de92f6388886ad672e299ce5",
6+
"path": "skills/magpie",
7+
"license": "MIT",
8+
"imported_at": "2026-05-28T20:04:55Z"
9+
}

skills/magpie/SKILL.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
name: magpie
3+
description: Performs GPU kernel correctness and performance evaluation and LLM inference benchmarking with Magpie. Analyzes single or multiple kernels (HIP/CUDA/PyTorch), compares kernel implementations, runs vLLM/SGLang benchmarks with profiling and TraceLens, and runs gap analysis on torch traces. Creates kernel config YAMLs, discovers kernels in a project, and queries GPU specs. Use when the user mentions Magpie, kernel analyze or compare, HIP/CUDA kernel evaluation, vLLM/SGLang benchmark, gap analysis, TraceLens, creating kernel configs, or discovering GPU kernels.
4+
---
5+
6+
# Magpie
7+
8+
Magpie is a GPU kernel evaluation and LLM benchmarking framework. Use this skill when performing analyze, compare, benchmark, gap-analysis, or when creating kernel configs or discovering kernels without MCP.
9+
10+
**When describing Magpie's capabilities:** Describe only what is in this skill. Do not add project-specific, pipeline-specific, or other product/org names (e.g. do not mention any parent repo name).
11+
12+
## Entry point
13+
14+
- **CLI:** `magpie` or `python -m Magpie`. Run from the Magpie repo root (or with `PYTHONPATH` including the Magpie package).
15+
- **Setup:** From repo root, `pip install -e .` (or `make install`).
16+
17+
## Analyze (single or multi-kernel)
18+
19+
Analyze kernel(s) for correctness and performance.
20+
21+
**With kernel config (recommended):**
22+
23+
```bash
24+
magpie analyze --kernel-config path/to/kernel.yaml
25+
```
26+
27+
**Inline (single kernel):**
28+
29+
```bash
30+
magpie analyze path/to/kernel.hip --testcase "./run_test.sh"
31+
```
32+
33+
- `-k`, `--kernel-config`: YAML with `kernel` or `kernels` (see template below).
34+
- `-t`, `--testcase`: Command to run the test (required if not in config).
35+
- `--type`: `hip` | `cuda` | `pytorch` (default: hip).
36+
- `--compile-cmd`: Custom compile command.
37+
- `--no-perf`: Skip performance profiling.
38+
- `-o`, `--output-dir`: Output directory (default: `./results`).
39+
40+
**Config template (single kernel):** Use `kernel:` with `id`, `type`, `source_files`, `working_dir`, `testcase_command`, optional `compile_command`, `env`. See [Magpie/kernel_config.yaml.example](Magpie/kernel_config.yaml.example) and [examples/ck_gemm_add.yaml](examples/ck_gemm_add.yaml).
41+
42+
## Compare (multiple kernels)
43+
44+
Compare and rank multiple kernel implementations.
45+
46+
**With config:**
47+
48+
```bash
49+
magpie compare --kernel-config path/to/compare.yaml
50+
```
51+
52+
**Inline:**
53+
54+
```bash
55+
magpie compare kernel1.hip kernel2.hip --testcase "./run_test.sh"
56+
```
57+
58+
- `-k`, `--kernel-config`: YAML with `kernels:` list.
59+
- `--baseline`: Index of baseline kernel (default: 0).
60+
- `--no-perf`, `-o`: Same as analyze.
61+
62+
Example: [examples/ck_grouped_gemm_compare.yaml](examples/ck_grouped_gemm_compare.yaml).
63+
64+
## Benchmark (vLLM / SGLang)
65+
66+
Run framework-level LLM inference benchmarks with optional profiling and gap analysis.
67+
68+
**With config (recommended):**
69+
70+
```bash
71+
magpie benchmark --benchmark-config examples/benchmark_vllm.yaml
72+
```
73+
74+
**CLI overrides:** `magpie benchmark [vllm|sglang] -m <model> --benchmark-config <yaml>` with optional:
75+
76+
- `-m`, `--model`: Model name or path.
77+
- `-p`, `--precision`: fp8 | fp16 | bf16 | fp4 (default: fp8).
78+
- `--tp`: Tensor parallel size (default: 1).
79+
- `--concurrency`, `--input-len`, `--output-len`: Request and sequence settings.
80+
- `--torch-profiler`, `--system-profiler`: Enable profilers.
81+
- `--run-mode`: `docker` (default) or `local`.
82+
- `--docker-image`, `--timeout`, `-o`: Override image, timeout (seconds), output dir.
83+
84+
Example configs: [examples/benchmark_vllm.yaml](examples/benchmark_vllm.yaml), [docs/benchmark.md](docs/benchmark.md).
85+
86+
## Gap analysis (standalone)
87+
88+
Run gap analysis on existing torch trace directories.
89+
90+
```bash
91+
magpie benchmark gap-analysis --trace-dir path/to/torch_trace
92+
```
93+
94+
- `--trace-dir`: Path to `torch_trace` dir or benchmark workspace (required).
95+
- `--start-pct`, `--end-pct`: Analysis window 0–100 (default: 0, 100).
96+
- `--top-k`: Top bottleneck kernels (default: 20).
97+
- `--min-duration-us`: Minimum event duration (µs).
98+
- `--categories`, `--ignore-categories`: Include/exclude event categories.
99+
100+
## GPU info
101+
102+
```bash
103+
magpie --gpu-info
104+
```
105+
106+
Shows vendor, architecture, compiler, profiler. No mode required.
107+
108+
## Create kernel config (no CLI)
109+
110+
When the user needs a kernel config file:
111+
112+
1. Emit YAML matching the structure in [Magpie/kernel_config.yaml.example](Magpie/kernel_config.yaml.example): `kernel:` with `id`, `type` (hip|cuda|pytorch), `source_files`, `working_dir`, `testcase_command`, and optionally `compile_command`, `env`.
113+
2. Write the file to the user's requested path (e.g. `kernel_config.yaml`).
114+
3. Run: `magpie analyze --kernel-config <that_file>`.
115+
116+
For **compare**, use `kernels:` as a list of kernel entries (each with `id`, `type`, `source_files`, etc.).
117+
118+
## Discover kernels (no CLI)
119+
120+
1. Scan the project for `.hip`, `.cu`, or PyTorch kernel files.
121+
2. For each candidate, build a kernel config entry (id, type, source_files, working_dir, testcase_command if inferrable; otherwise prompt user).
122+
3. Optionally write a combined config and run `magpie analyze -k <file>` or `magpie compare -k <file>`.
123+
124+
## Suggest optimizations (no CLI)
125+
126+
1. Read analyze or compare JSON output (from `-o` results or last run).
127+
2. Use `performance_state`, `performance_result.summary`, and per-kernel stats (dispatch count, duration, utilization).
128+
3. Suggest improvements (e.g. memory bandwidth, occupancy, kernel fusion) based on the metrics.
129+
130+
## List / get benchmark results (no CLI)
131+
132+
- **List:** Results live under the benchmark `--output-dir` (default: `./results`); each run has a timestamped workspace (e.g. `results/benchmark_vllm_<timestamp>/`).
133+
- **Get result:** Open `benchmark_report.json` or `inferencex_result.json` in that workspace.
134+
- **Compare runs:** Diff two workspace reports or run two benchmarks and compare; for TraceLens comparison use TraceLens tooling if available.
135+
136+
## Additional resources
137+
138+
- Full CLI reference: [reference.md](reference.md)
139+
- Copy-paste command examples: [examples.md](examples.md)

skills/magpie/examples.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Magpie command examples
2+
3+
Copy-paste examples. Run from the Magpie repo root (or set `PYTHONPATH`).
4+
5+
## GPU info
6+
7+
```bash
8+
magpie --gpu-info
9+
# or
10+
python -m Magpie --gpu-info
11+
```
12+
13+
## Analyze (config file)
14+
15+
```bash
16+
magpie analyze --kernel-config examples/ck_gemm_add.yaml
17+
magpie analyze --kernel-config my_kernel.yaml -o ./out
18+
```
19+
20+
## Analyze (inline)
21+
22+
```bash
23+
magpie analyze ./kernels/matmul.hip --testcase "./run_test.sh"
24+
magpie analyze ./kernel.hip -t "./run_test.sh" --type hip --no-perf
25+
```
26+
27+
## Compare (config file)
28+
29+
```bash
30+
magpie compare --kernel-config examples/ck_grouped_gemm_compare.yaml
31+
```
32+
33+
## Compare (inline)
34+
35+
```bash
36+
magpie compare kernel_v1.hip kernel_v2.hip --testcase "./run_test.sh"
37+
magpie compare k1.hip k2.hip k3.hip -t "./test.sh" --baseline 0
38+
```
39+
40+
## Benchmark (config file)
41+
42+
```bash
43+
magpie benchmark --benchmark-config examples/benchmark_vllm.yaml
44+
magpie benchmark --benchmark-config examples/benchmark_sglang.yaml -o ./results
45+
```
46+
47+
## Benchmark (CLI overrides)
48+
49+
```bash
50+
magpie benchmark vllm -m deepseek-ai/DeepSeek-R1-0528 -p fp8 --tp 8
51+
magpie benchmark --benchmark-config examples/benchmark_vllm.yaml --run-mode local --timeout 1800
52+
```
53+
54+
## Gap analysis (standalone)
55+
56+
```bash
57+
magpie benchmark gap-analysis --trace-dir ./results/benchmark_vllm_20260101_120000/torch_trace
58+
magpie benchmark gap-analysis --trace-dir ./results/run_xyz/torch_trace --top-k 30 --start-pct 20 --end-pct 80
59+
```
60+
61+
## Verbose and config
62+
63+
```bash
64+
magpie -v analyze --kernel-config my.yaml
65+
magpie --config path/to/config.yaml benchmark --benchmark-config examples/benchmark_vllm.yaml
66+
```

skills/magpie/reference.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Magpie CLI reference
2+
3+
Full flag reference for `magpie` / `python -m Magpie`. Global options apply before the mode.
4+
5+
## Global options
6+
7+
| Option | Description |
8+
|--------|-------------|
9+
| `--config`, `-c` | Framework configuration file (default: package config.yaml) |
10+
| `--verbose`, `-v` | Verbose output |
11+
| `--gpu-info` | Show detected GPU info and exit (no mode required) |
12+
| `--environment`, `-e` | `local` \| `container` |
13+
| `--workers`, `-w` | Number of concurrent workers |
14+
| `--docker-image` | Docker image for container environment |
15+
16+
## analyze
17+
18+
| Option | Description |
19+
|--------|-------------|
20+
| `kernels` | Positional: kernel file(s) to analyze |
21+
| `--kernel-config`, `-k` | Kernel configuration YAML |
22+
| `--testcase`, `-t` | Testcase command |
23+
| `--type` | `hip` \| `cuda` \| `pytorch` (default: hip) |
24+
| `--compile-cmd` | Custom compile command |
25+
| `--no-perf` | Skip performance profiling |
26+
| `--output-dir`, `-o` | Output directory (default: ./results) |
27+
28+
## compare
29+
30+
| Option | Description |
31+
|--------|-------------|
32+
| `kernels` | Positional: kernel files to compare |
33+
| `--kernel-config`, `-k` | Kernel configuration YAML (kernels list) |
34+
| `--testcase`, `-t` | Testcase command (optional) |
35+
| `--type` | `hip` \| `cuda` \| `pytorch` (default: hip) |
36+
| `--baseline` | Baseline kernel index (default: 0) |
37+
| `--no-perf` | Skip performance profiling |
38+
| `--output-dir`, `-o` | Output directory (default: ./results) |
39+
40+
## benchmark
41+
42+
| Option | Description |
43+
|--------|-------------|
44+
| `framework` | Optional positional: `vllm` \| `sglang` |
45+
| `--benchmark-config`, `-b` | Benchmark configuration YAML |
46+
| `--model`, `-m` | Model name or path |
47+
| `--precision`, `-p` | fp8 \| fp16 \| bf16 \| fp4 (default: fp8) |
48+
| `--tp` | Tensor parallel size (default: 1) |
49+
| `--concurrency` | Request concurrency (default: 32) |
50+
| `--input-len` | Input sequence length (default: 1024) |
51+
| `--output-len` | Output sequence length (default: 512) |
52+
| `--torch-profiler` | Enable torch profiler |
53+
| `--system-profiler` | Enable system profiler (rocprof/ncu) |
54+
| `--run-mode` | `docker` \| `local` |
55+
| `--docker-image` | Override Docker image |
56+
| `--inferencex-path` | Path to InferenceX (auto-cloned if empty) |
57+
| `--benchmark-script` | Override benchmark script name |
58+
| `--timeout` | Timeout in seconds (default: 3600) |
59+
| `--output-dir`, `-o` | Output directory (default: ./results) |
60+
61+
## benchmark gap-analysis
62+
63+
| Option | Description |
64+
|--------|-------------|
65+
| `--trace-dir` | Path to torch_trace dir or benchmark workspace (required) |
66+
| `--start-pct` | Start of analysis window 0–100 (default: 0) |
67+
| `--end-pct` | End of analysis window 0–100 (default: 100) |
68+
| `--top-k` | Number of top bottleneck kernels (default: 20) |
69+
| `--min-duration-us` | Minimum event duration in µs (default: 0) |
70+
| `--categories` | Event categories to include (space-separated) |
71+
| `--ignore-categories` | Event categories to exclude |
72+
73+
## Kernel config YAML (analyze)
74+
75+
Single kernel:
76+
77+
```yaml
78+
kernel:
79+
id: "my_kernel"
80+
type: hip
81+
source_files: ["./kernel.hip"]
82+
working_dir: "."
83+
testcase_command: "./run_test.sh"
84+
# optional: compile_command, env
85+
```
86+
87+
Multiple kernels (compare):
88+
89+
```yaml
90+
kernels:
91+
- id: "v1"
92+
type: hip
93+
source_files: ["./v1.hip"]
94+
testcase_command: "./run_test.sh"
95+
- id: "v2"
96+
type: hip
97+
source_files: ["./v2.hip"]
98+
testcase_command: "./run_test.sh"
99+
```
100+
101+
## Benchmark config YAML
102+
103+
Top-level key `benchmark:` with `framework`, `model`, `precision`, `envs`, `profiler`, `gap_analysis`, `timeout_seconds`, etc. See `examples/benchmark_vllm.yaml` and `docs/benchmark.md`.

0 commit comments

Comments
 (0)