Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@
"name": "rocm-doctor",
"source": "./skills/rocm-doctor",
"description": "Diagnose why ROCm, PyTorch, or llama.cpp isn't working on an AMD GPU. Matches the symptom against a fixed list of twelve known misconfigurations and proposes the next step."
},
{
"name": "serving-llms-on-instinct",
"source": "./skills/serving-llms-on-instinct",
"description": "Serve LLMs on AMD Instinct GPUs (MI300X/MI325X/MI350X/MI355X) with vLLM on ROCm. Handles GPU detection, environment validation, vLLM configuration, launch, and health verification."
}
]
}
5 changes: 5 additions & 0 deletions .cursor-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@
"name": "rocm-doctor",
"source": "./skills/rocm-doctor",
"description": "Diagnose why ROCm, PyTorch, or llama.cpp isn't working on an AMD GPU. Matches the symptom against a fixed list of twelve known misconfigurations and proposes the next step."
},
{
"name": "serving-llms-on-instinct",
"source": "./skills/serving-llms-on-instinct",
"description": "Serve LLMs on AMD Instinct GPUs (MI300X/MI325X/MI350X/MI355X) with vLLM on ROCm. Handles GPU detection, environment validation, vLLM configuration, launch, and health verification."
}
]
}
110 changes: 110 additions & 0 deletions .github/skillspector-allow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,113 @@ suppressions:
to locate and replace the rule block in AGENTS.md in place on re-runs. It
carries no instructions; the surrounding rule text is plain, reviewable
content by design (it is the installable routing rule itself).
- skill: serving-llms-on-instinct
rule: SC2
file: data/recipes_cache.json
match: External Script Fetching
reason: >-
False positive. The flag is on a `"guide"` markdown string (a recipe doc
embedded in this JSON cache, not runnable code). Its shell snippets are
illustrative: `uv pip install ... --extra-index-url https://wheels.vllm.ai/nightly`
installs vLLM from an HTTPS package index (the recommended-safe pattern),
and `curl http://localhost:8000/... | python3 -m json.tool` pipes a
localhost API response into a JSON pretty-printer. There is no
download-and-execute of a remote script (no `curl ... | bash`/`sh`).
- skill: serving-llms-on-instinct
rule: P6
file: data/recipes_cache.json
match: Direct Prompt Extraction
reason: >-
False positive. The flag is on a `"guide"` markdown string (the
Ministral-3-Instruct recipe doc, not runnable code). The matched Python
example downloads the model's own publicly published `SYSTEM_PROMPT.txt`
via `hf_hub_download` and passes it as the `system` role of a chat request
(Mistral's documented setup) — it constructs a prompt, it does not reveal
or extract any hidden system prompt. The only output printed is the
model's answer (`response.choices[0].message.content`). The trigger is
merely the literal token `SYSTEM_PROMPT` in benign example code.
- skill: serving-llms-on-instinct
rule: TM2
file: reference.md
match: Chaining Abuse
reason: >-
False positive. Line 92 is a Troubleshooting one-liner that disables
kernel NUMA balancing for GPU workloads:
`echo 0 | sudo tee /proc/sys/kernel/numa_balancing`. The `|` is just the
idiomatic way to write a root-owned /proc file (echo piped into `sudo
tee`), not multi-step tool/command chaining of untrusted or model-derived
steps. It is a single fixed, reviewable, human-run sysctl write — no LLM
output feeds the pipe and there is no chain depth to bound.
- skill: serving-llms-on-instinct
rule: TM1
file: scripts/detect.py
match: Tool Parameter Abuse
reason: >-
False positive. Line 32 uses `subprocess.run(cmd, shell=True, ...)`, but
`shell=True` is intentional and safe here: every `cmd` passed to `_run`
is a fixed in-script literal (`amd-smi static --asic --vram --json`,
`amd-smi version --json`, and their `sudo` retries) that relies on no
shell metacharacters from user input. The only user-controlled values
(`--host`/`--user`/`--port`) never enter the shell string — they flow
solely into the SSH branch as list-form argv (`ssh ... ssh_target cmd`,
no shell), and `port` is int-coerced by argparse. No untrusted or model
output reaches the shell, so there is no parameter abuse to reject.
- skill: serving-llms-on-instinct
rule: TM1
file: scripts/validate.py
match: Tool Parameter Abuse
reason: >-
False positive. Same `_run` helper as detect.py: line 33 uses
`subprocess.run(cmd, shell=True, ...)` where every `cmd` is a hardcoded
diagnostic literal (`test -e /dev/kfd ...`, `ls /dev/dri/renderD* ...`,
`cat /proc/sys/kernel/numa_balancing ...`, `printenv HF_TOKEN ...`, etc.)
that deliberately uses shell pipes/redirects/globs. The dynamic inputs
(`--host`/`--user`/`--port`) only reach the SSH branch as list-form argv,
never the shell string, and `port` is int-coerced. No untrusted/model
output is interpolated into the command.
- skill: serving-llms-on-instinct
rule: TM2
file: scripts/validate.py
match: Chaining Abuse
reason: >-
False positive. The flagged lines are the NUMA-balancing fix
`echo 0 | sudo tee /proc/sys/kernel/numa_balancing`. Line 122 only runs
it under the explicit opt-in `--auto-fix` flag (user-approved), while
lines 130 and 137 are human-readable `"fix"` advisory strings that are
never executed. The `|` is the idiomatic root-owned /proc write (echo
into `sudo tee`), a single fixed sysctl command — not multi-step tool
chaining of untrusted or model-derived steps.
- skill: serving-llms-on-instinct
rule: E2
file: scripts/estimate_vram.py
match: Env Variable Harvesting
reason: >-
False positive. Line 175 reads `HF_TOKEN` via `os.environ.get`, which is
strictly required: it is passed only to `_fetch`, which sets it as the
`Authorization: Bearer` header on requests to `https://huggingface.co`
(the token's intended recipient) so the tool can read safetensors/config
metadata for gated or private models. The token is never logged, printed,
or transmitted anywhere else — the emitted JSON contains only model and
VRAM fields.
- skill: serving-llms-on-instinct
rule: E2
file: scripts/validate.py
match: Env Variable Harvesting
reason: >-
False positive. Line 151 runs `printenv HF_TOKEN | head -c 4` purely as a
presence check; the captured 4-char value is never emitted — only
`out.strip()` truthiness is tested to decide whether to advise the user
that HF_TOKEN is unset (needed for gated models). No credential is logged
or transmitted.
- skill: serving-llms-on-instinct
rule: P5
file: data/recipes_cache.json
match: Harmful Content Injection
reason: >-
False positive. Line 3524 is the `"guide"` for Qwen3Guard-Gen, a
text-only safety/guardrail classifier model. The matched string
("Tell me how to make a bomb.") is the demo *input* used to show the
moderation model correctly classifying the request as unsafe — the
documented output is `# Safety: Unsafe` / `# Categories: Violent`. No
harmful instructions are present; it is content-moderation documentation,
the opposite of harmful-content injection.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Bring existing workloads onto AMD.
| --- | --- | --- |
| `cuda-to-hip` | Port CUDA kernels with `hipify` and flag anything that needs manual review. | _planned_ |
| `vllm-rocm` | Stand up vLLM on AMD with the right environment variables and model configurations. | _planned_ |
| `serving-llms-on-instinct` | Deploy LLM inference on AMD Instinct GPUs end-to-end: detect hardware (or onboard via AMD Developer Cloud), validate model fit, apply the right vLLM recipe, and launch a benchmarked endpoint. SGLang and engine/backend selection in later phases. | _planned_ |
| [`serving-llms-on-instinct`](skills/serving-llms-on-instinct/SKILL.md) | Deploy LLM inference on AMD Instinct GPUs end-to-end: detect hardware (or onboard via AMD Developer Cloud), validate model fit, apply the right vLLM recipe, and launch a benchmarked endpoint. SGLang and engine/backend selection in later phases. | in-repo |

### Performance & delivery

Expand Down
Loading
Loading