Skip to content

Commit 361f941

Browse files
Merge branch 'main' into maxdokukin/skills/local-ai-use
2 parents 4c1eb3d + 45ecc14 commit 361f941

15 files changed

Lines changed: 902 additions & 7 deletions

File tree

.github/skillspector-allow.yml

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# SkillSpector false-positive allowlist.
2+
#
3+
# SkillSpector's static scan is high-recall / moderate-precision and has no
4+
# native per-finding suppression. This file is the auditable place to record
5+
# findings that are genuinely false positives so the CI gate
6+
# (scripts/skillspector_gate.py) does not fail on them. Everything not listed
7+
# here still fails the build at HIGH/CRITICAL.
8+
#
9+
# Each entry suppresses ONE rule for ONE file within ONE skill:
10+
# skill: skill directory name under skills/
11+
# rule: SkillSpector rule id (e.g. YR1)
12+
# file: path as it appears in the report, relative to the skill dir
13+
# match: (optional) substring that must appear in the finding message, so
14+
# the suppression stays scoped to the specific signature
15+
# reason: why this is a false positive (keep it accurate and specific)
16+
#
17+
# Add entries sparingly and only when the finding is demonstrably benign.
18+
19+
suppressions:
20+
- skill: rocm-doctor
21+
rule: YR1
22+
file: scripts/apply_fix.py
23+
match: backdoor_persistence
24+
reason: >-
25+
False positive. The 'backdoor_persistence' YARA rule's $bashrc_persist
26+
string matches any `echo ... >> ~/.bashrc`. Here it is the documented
27+
remediation that appends `export PATH="/opt/rocm/bin:$PATH"` so ROCm
28+
binaries land on PATH after install. Standard ROCm setup guidance, not a
29+
persistence backdoor or payload.
30+
- skill: rocm-doctor
31+
rule: YR1
32+
file: scripts/diagnose.py
33+
match: backdoor_persistence
34+
reason: >-
35+
False positive. Same $bashrc_persist match: diagnose.py prints the
36+
remediation command `echo 'export PATH=<bin>:$PATH' >> ~/.bashrc` (or
37+
~/.zshrc) for the user to add ROCm/HIP to PATH. No payload, no SSH key
38+
injection, no hidden user.
39+
- skill: rocm-doctor
40+
rule: OH1
41+
file: scripts/apply_fix.py
42+
match: Unvalidated Output Injection
43+
reason: >-
44+
False positive. The flag is on the generic `_run(cmd: list[str], ...)`
45+
helper, which calls `subprocess.run(cmd, ..., shell defaults to False)`
46+
with a list-form argv, so there is no shell interpolation. Every `cmd`
47+
is a hardcoded argv list assembled in-script (e.g.
48+
`["usermod","-a","-G","render,video",user]`, `["modprobe","amdgpu"]`);
49+
the only dynamic pieces are the local username from `$USER`/`$LOGNAME`
50+
and binary paths resolved via `shutil.which`. No LLM/model output ever
51+
reaches this sink, so there is nothing to validate or sanitize.
52+
- skill: rocm-doctor
53+
rule: OH1
54+
file: scripts/examine.py
55+
match: Unvalidated Output Injection
56+
reason: >-
57+
False positive. Same generic `_run(cmd: list[str], ...)` helper as in
58+
apply_fix.py: list-form `subprocess.run` with no shell=True. The read-only
59+
probes only ever pass fixed argv lists (`["rocminfo"]`,
60+
`["lspci","-nn","-D"]`, the PowerShell/CIM `Get-CimInstance` probes, the
61+
framework binary from `shutil.which`). No model output flows into the
62+
command, and there is no shell to inject into.
63+
- skill: rocm-doctor
64+
rule: PE3
65+
file: scripts/examine.py
66+
match: Credential Access
67+
reason: >-
68+
False positive. Line 493 is a code comment ("Resolve uid/gid to names via
69+
/etc/passwd & /etc/group") describing how `_stat_device` maps a device's
70+
owner uid/gid to names. The actual resolution uses the stdlib `pwd`/`grp`
71+
modules (`pwd.getpwuid` / `grp.getgrgid`), not any read of /etc/passwd,
72+
/etc/shadow, .env, or token files. No credential material is accessed.
73+
- skill: local-ai-use
74+
rule: SC2
75+
file: SKILL.md
76+
match: External Script Fetching
77+
reason: >-
78+
False positive. The flagged `curl ... | python -c ...` is not fetching or
79+
executing a remote script: `curl` POSTs an image-generation request to the
80+
local loopback Lemonade Server, and the piped `python -c` only
81+
base64-decodes the JSON response body and writes it to `out.png`. No
82+
remote code is downloaded or run.
83+
- skill: local-ai-use
84+
rule: SC2
85+
file: templates/local-ai-rule.md
86+
match: External Script Fetching
87+
reason: >-
88+
False positive. Same pattern as SKILL.md: the `curl ... | python -c ...`
89+
in the installable rule template POSTs to the local Lemonade Server and
90+
pipes the JSON response into `python -c` purely to base64-decode the image
91+
bytes into `out.png`. No remote script is fetched or executed.
92+
- skill: local-ai-use
93+
rule: OH1
94+
file: scripts/setup_local_ai.py
95+
match: Unvalidated Output Injection
96+
reason: >-
97+
False positive. Both flagged calls (lines 98 and 128) use list-form
98+
subprocess.run argv with no shell=True, so there is no shell
99+
interpolation. Line 98 is fully hardcoded (`lemonade list --downloaded
100+
--json`); line 128 is `lemonade pull <model>` where `model` comes from
101+
argparse defaults / explicit --image-model/--tts-model/--stt-model flags,
102+
not from LLM or model output. Nothing here consumes unvalidated model
103+
output, so there is no injection sink to sanitize.
104+
- skill: local-ai-use
105+
rule: P2
106+
file: templates/local-ai-rule.md
107+
match: Hidden Instructions
108+
reason: >-
109+
False positive. Line 1 is the `<!-- BEGIN amd-skills:local-ai-use -->`
110+
HTML comment, a benign machine-readable marker that setup_local_ai.py uses
111+
to locate and replace the rule block in AGENTS.md in place on re-runs. It
112+
carries no instructions; the surrounding rule text is plain, reviewable
113+
content by design (it is the installable routing rule itself).

.github/workflows/skillspector.yml

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
name: skillspector
2+
3+
# Statically scan every skill under skills/ with SkillSpector to catch malicious patterns and
4+
# security risks before they land on main. LLM semantic analysis is
5+
# intentionally disabled (--no-llm): the scan is fully static, needs no API
6+
# key, and runs in an isolated environment via uvx.
7+
#
8+
# Mirrors the discover-matrix-aggregate shape of validate.yml so each skill
9+
# is its own pass/fail and a single aggregate check (the `skillspector` job)
10+
# can be marked required in branch protection.
11+
12+
on:
13+
push:
14+
branches: [main]
15+
pull_request:
16+
paths:
17+
- "skills/**"
18+
- ".github/workflows/skillspector.yml"
19+
workflow_dispatch:
20+
21+
permissions:
22+
contents: read
23+
24+
jobs:
25+
# Enumerate the skills so the scan job can fan out over them with a matrix,
26+
# reusing the same discovery script that validate.yml relies on.
27+
discover-skills:
28+
name: Discover skills
29+
runs-on: ubuntu-latest
30+
outputs:
31+
skills: ${{ steps.discover.outputs.skills }}
32+
steps:
33+
- name: Check out repository
34+
uses: actions/checkout@v4
35+
36+
- name: Set up uv
37+
uses: astral-sh/setup-uv@v7
38+
39+
- name: List skills
40+
id: discover
41+
run: echo "skills=$(uv run scripts/validate_skills.py --list)" >> "$GITHUB_OUTPUT"
42+
43+
scan-skill:
44+
name: Scan skill
45+
needs: discover-skills
46+
runs-on: ubuntu-latest
47+
strategy:
48+
# Don't cancel the other skills when one fails; we want to see every
49+
# skill's scan result in a single run.
50+
fail-fast: false
51+
matrix:
52+
skill: ${{ fromJson(needs.discover-skills.outputs.skills) }}
53+
steps:
54+
- name: Check out repository
55+
uses: actions/checkout@v4
56+
57+
- name: Set up uv
58+
uses: astral-sh/setup-uv@v7
59+
60+
# Run SkillSpector pinned to a specific commit for reproducibility and
61+
# supply-chain safety. To bump it, update the SHA below to the desired
62+
# skillspector commit (e.g. `git ls-remote https://github.com/NVIDIA/skillspector.git main`).
63+
#
64+
# The CLI exits 1 when a skill's *aggregate* risk score is HIGH/CRITICAL
65+
# (score > 50) and 2 on error. We don't gate on the aggregate score,
66+
# because a pile of MEDIUM findings can push the aggregate to HIGH even
67+
# when no single finding is HIGH/CRITICAL. Instead we fail only when an
68+
# individual finding is HIGH or CRITICAL (and always fail on error).
69+
- name: Scan skill with SkillSpector
70+
run: |
71+
mkdir -p reports
72+
report="reports/${{ matrix.skill }}.md"
73+
set +e
74+
uvx --python 3.12 \
75+
--from "git+https://github.com/NVIDIA/skillspector.git@939da7d41eed4282e4d8217fe2254c69f690027e" \
76+
skillspector scan "skills/${{ matrix.skill }}" \
77+
--no-llm --format markdown --output "$report"
78+
code=$?
79+
set -e
80+
echo "----- SkillSpector report: ${{ matrix.skill }} -----"
81+
cat "$report" || true
82+
83+
# Exit code 2 means SkillSpector itself errored; surface that.
84+
if [ "$code" = "2" ]; then
85+
echo "SkillSpector errored (exit code 2)." >&2
86+
exit 2
87+
fi
88+
89+
# Fail when any individual finding is HIGH or CRITICAL, except for
90+
# documented false positives recorded in .github/skillspector-allow.yml.
91+
# SkillSpector has no native suppression, so the gate applies the
92+
# allowlist here (see scripts/skillspector_gate.py).
93+
uv run scripts/skillspector_gate.py \
94+
--report "$report" \
95+
--skill "${{ matrix.skill }}" \
96+
--allowlist .github/skillspector-allow.yml
97+
98+
- name: Upload report
99+
if: always()
100+
uses: actions/upload-artifact@v4
101+
with:
102+
name: skillspector-report-${{ matrix.skill }}
103+
path: reports/${{ matrix.skill }}.md
104+
if-no-files-found: warn
105+
106+
# Single gate that aggregates the per-skill matrix. Branch protection can
107+
# require just this one check: it only passes when every skill scan
108+
# succeeded. Because matrix jobs run independently under `fail-fast: false`,
109+
# we inspect the job result explicitly rather than relying on `needs`
110+
# short-circuiting.
111+
skillspector:
112+
name: SkillSpector security scan
113+
needs: scan-skill
114+
if: always()
115+
runs-on: ubuntu-latest
116+
steps:
117+
- name: Verify all skill scans passed
118+
run: |
119+
echo "scan-skill result: ${{ needs.scan-skill.result }}"
120+
if [ "${{ needs.scan-skill.result }}" != "success" ]; then
121+
echo "One or more skills failed the SkillSpector scan." >&2
122+
exit 1
123+
fi
124+
echo "All skills passed the SkillSpector scan."

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@ __pycache__/
1111

1212
# Eval run artifacts
1313
eval/runs/
14+
15+
# Behavioral matrix results
16+
eval/behavioral/results/

eval/behavioral/conftest.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
"""pytest wiring for the behavioral harness.
2+
3+
Adds this directory to ``sys.path`` so tests can ``from harness import ...``,
4+
and runs a one-time API preflight so the (expensive) behavioral runs fail
5+
fast with a clear message when the `claude` API isn't reachable -- e.g.
6+
when you're not connected to the network that can reach it.
7+
"""
8+
9+
from __future__ import annotations
10+
11+
import sys
12+
from pathlib import Path
13+
14+
import pytest
15+
16+
sys.path.insert(0, str(Path(__file__).resolve().parent))
17+
18+
from harness import DEFAULT_MODEL, check_api_reachable # noqa: E402
19+
20+
21+
@pytest.fixture(scope="session", autouse=True)
22+
def _require_api_reachable() -> None:
23+
"""Fail the suite up front if the `claude` API can't be reached."""
24+
ok, detail = check_api_reachable(DEFAULT_MODEL)
25+
if not ok:
26+
pytest.fail(
27+
f"claude API not reachable -- are you on the right network? ({detail})"
28+
)

0 commit comments

Comments
 (0)