Skip to content

Commit f553383

Browse files
Add serving-llms-on-instinct skill (#58)
Co-authored-by: Daniel Holanda <holand.daniel@gmail.com>
1 parent 466410b commit f553383

13 files changed

Lines changed: 12648 additions & 1 deletion

File tree

.claude-plugin/marketplace.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@
3333
"name": "rocm-doctor",
3434
"source": "./skills/rocm-doctor",
3535
"description": "Diagnose why ROCm, PyTorch, or llama.cpp isn't working on an AMD GPU. Matches the symptom against a fixed list of twelve known misconfigurations and proposes the next step."
36+
},
37+
{
38+
"name": "serving-llms-on-instinct",
39+
"source": "./skills/serving-llms-on-instinct",
40+
"description": "Serve LLMs on AMD Instinct GPUs (MI300X/MI325X/MI350X/MI355X) with vLLM on ROCm. Handles GPU detection, environment validation, vLLM configuration, launch, and health verification."
3641
}
3742
]
3843
}

.cursor-plugin/marketplace.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@
3333
"name": "rocm-doctor",
3434
"source": "./skills/rocm-doctor",
3535
"description": "Diagnose why ROCm, PyTorch, or llama.cpp isn't working on an AMD GPU. Matches the symptom against a fixed list of twelve known misconfigurations and proposes the next step."
36+
},
37+
{
38+
"name": "serving-llms-on-instinct",
39+
"source": "./skills/serving-llms-on-instinct",
40+
"description": "Serve LLMs on AMD Instinct GPUs (MI300X/MI325X/MI350X/MI355X) with vLLM on ROCm. Handles GPU detection, environment validation, vLLM configuration, launch, and health verification."
3641
}
3742
]
3843
}

.github/skillspector-allow.yml

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,3 +123,113 @@ suppressions:
123123
to locate and replace the rule block in AGENTS.md in place on re-runs. It
124124
carries no instructions; the surrounding rule text is plain, reviewable
125125
content by design (it is the installable routing rule itself).
126+
- skill: serving-llms-on-instinct
127+
rule: SC2
128+
file: data/recipes_cache.json
129+
match: External Script Fetching
130+
reason: >-
131+
False positive. The flag is on a `"guide"` markdown string (a recipe doc
132+
embedded in this JSON cache, not runnable code). Its shell snippets are
133+
illustrative: `uv pip install ... --extra-index-url https://wheels.vllm.ai/nightly`
134+
installs vLLM from an HTTPS package index (the recommended-safe pattern),
135+
and `curl http://localhost:8000/... | python3 -m json.tool` pipes a
136+
localhost API response into a JSON pretty-printer. There is no
137+
download-and-execute of a remote script (no `curl ... | bash`/`sh`).
138+
- skill: serving-llms-on-instinct
139+
rule: P6
140+
file: data/recipes_cache.json
141+
match: Direct Prompt Extraction
142+
reason: >-
143+
False positive. The flag is on a `"guide"` markdown string (the
144+
Ministral-3-Instruct recipe doc, not runnable code). The matched Python
145+
example downloads the model's own publicly published `SYSTEM_PROMPT.txt`
146+
via `hf_hub_download` and passes it as the `system` role of a chat request
147+
(Mistral's documented setup) — it constructs a prompt, it does not reveal
148+
or extract any hidden system prompt. The only output printed is the
149+
model's answer (`response.choices[0].message.content`). The trigger is
150+
merely the literal token `SYSTEM_PROMPT` in benign example code.
151+
- skill: serving-llms-on-instinct
152+
rule: TM2
153+
file: reference.md
154+
match: Chaining Abuse
155+
reason: >-
156+
False positive. Line 92 is a Troubleshooting one-liner that disables
157+
kernel NUMA balancing for GPU workloads:
158+
`echo 0 | sudo tee /proc/sys/kernel/numa_balancing`. The `|` is just the
159+
idiomatic way to write a root-owned /proc file (echo piped into `sudo
160+
tee`), not multi-step tool/command chaining of untrusted or model-derived
161+
steps. It is a single fixed, reviewable, human-run sysctl write — no LLM
162+
output feeds the pipe and there is no chain depth to bound.
163+
- skill: serving-llms-on-instinct
164+
rule: TM1
165+
file: scripts/detect.py
166+
match: Tool Parameter Abuse
167+
reason: >-
168+
False positive. Line 32 uses `subprocess.run(cmd, shell=True, ...)`, but
169+
`shell=True` is intentional and safe here: every `cmd` passed to `_run`
170+
is a fixed in-script literal (`amd-smi static --asic --vram --json`,
171+
`amd-smi version --json`, and their `sudo` retries) that relies on no
172+
shell metacharacters from user input. The only user-controlled values
173+
(`--host`/`--user`/`--port`) never enter the shell string — they flow
174+
solely into the SSH branch as list-form argv (`ssh ... ssh_target cmd`,
175+
no shell), and `port` is int-coerced by argparse. No untrusted or model
176+
output reaches the shell, so there is no parameter abuse to reject.
177+
- skill: serving-llms-on-instinct
178+
rule: TM1
179+
file: scripts/validate.py
180+
match: Tool Parameter Abuse
181+
reason: >-
182+
False positive. Same `_run` helper as detect.py: line 33 uses
183+
`subprocess.run(cmd, shell=True, ...)` where every `cmd` is a hardcoded
184+
diagnostic literal (`test -e /dev/kfd ...`, `ls /dev/dri/renderD* ...`,
185+
`cat /proc/sys/kernel/numa_balancing ...`, `printenv HF_TOKEN ...`, etc.)
186+
that deliberately uses shell pipes/redirects/globs. The dynamic inputs
187+
(`--host`/`--user`/`--port`) only reach the SSH branch as list-form argv,
188+
never the shell string, and `port` is int-coerced. No untrusted/model
189+
output is interpolated into the command.
190+
- skill: serving-llms-on-instinct
191+
rule: TM2
192+
file: scripts/validate.py
193+
match: Chaining Abuse
194+
reason: >-
195+
False positive. The flagged lines are the NUMA-balancing fix
196+
`echo 0 | sudo tee /proc/sys/kernel/numa_balancing`. Line 122 only runs
197+
it under the explicit opt-in `--auto-fix` flag (user-approved), while
198+
lines 130 and 137 are human-readable `"fix"` advisory strings that are
199+
never executed. The `|` is the idiomatic root-owned /proc write (echo
200+
into `sudo tee`), a single fixed sysctl command — not multi-step tool
201+
chaining of untrusted or model-derived steps.
202+
- skill: serving-llms-on-instinct
203+
rule: E2
204+
file: scripts/estimate_vram.py
205+
match: Env Variable Harvesting
206+
reason: >-
207+
False positive. Line 175 reads `HF_TOKEN` via `os.environ.get`, which is
208+
strictly required: it is passed only to `_fetch`, which sets it as the
209+
`Authorization: Bearer` header on requests to `https://huggingface.co`
210+
(the token's intended recipient) so the tool can read safetensors/config
211+
metadata for gated or private models. The token is never logged, printed,
212+
or transmitted anywhere else — the emitted JSON contains only model and
213+
VRAM fields.
214+
- skill: serving-llms-on-instinct
215+
rule: E2
216+
file: scripts/validate.py
217+
match: Env Variable Harvesting
218+
reason: >-
219+
False positive. Line 151 runs `printenv HF_TOKEN | head -c 4` purely as a
220+
presence check; the captured 4-char value is never emitted — only
221+
`out.strip()` truthiness is tested to decide whether to advise the user
222+
that HF_TOKEN is unset (needed for gated models). No credential is logged
223+
or transmitted.
224+
- skill: serving-llms-on-instinct
225+
rule: P5
226+
file: data/recipes_cache.json
227+
match: Harmful Content Injection
228+
reason: >-
229+
False positive. Line 3524 is the `"guide"` for Qwen3Guard-Gen, a
230+
text-only safety/guardrail classifier model. The matched string
231+
("Tell me how to make a bomb.") is the demo *input* used to show the
232+
moderation model correctly classifying the request as unsafe — the
233+
documented output is `# Safety: Unsafe` / `# Categories: Violent`. No
234+
harmful instructions are present; it is content-moderation documentation,
235+
the opposite of harmful-content injection.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ Bring existing workloads onto AMD.
8888
| --- | --- | --- |
8989
| `cuda-to-hip` | Port CUDA kernels with `hipify` and flag anything that needs manual review. | _planned_ |
9090
| `vllm-rocm` | Stand up vLLM on AMD with the right environment variables and model configurations. | _planned_ |
91-
| `serving-llms-on-instinct` | Deploy LLM inference on AMD Instinct GPUs end-to-end: detect hardware (or onboard via AMD Developer Cloud), validate model fit, apply the right vLLM recipe, and launch a benchmarked endpoint. SGLang and engine/backend selection in later phases. | _planned_ |
91+
| [`serving-llms-on-instinct`](skills/serving-llms-on-instinct/SKILL.md) | Deploy LLM inference on AMD Instinct GPUs end-to-end: detect hardware (or onboard via AMD Developer Cloud), validate model fit, apply the right vLLM recipe, and launch a benchmarked endpoint. SGLang and engine/backend selection in later phases. | in-repo |
9292

9393
### Performance & delivery
9494

0 commit comments

Comments
 (0)