Skip to content

Commit 4219bf5

Browse files
harp-intelclaude
andauthored
feature: add functional-test skill for Claude Code (#687)
Adds a Claude Code skill that runs targeted PerfSpect functional tests on remote targets. The skill analyzes code changes, maps them to affected test categories, runs only the relevant tests, and verifies output correctness. Includes test catalogs for all command categories: benchmark, config, flamegraph, lock, metrics, report, and telemetry. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d1c5990 commit 4219bf5

File tree

8 files changed

+411
-0
lines changed

8 files changed

+411
-0
lines changed
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
name: functional-test
3+
description: >
4+
Use this skill when running functional tests to validate PerfSpect code changes,
5+
when the user says "run functional tests", "test my changes", "check for regressions",
6+
or when verifying a code change did not break existing functionality.
7+
---
8+
9+
> **Skill Loaded:** "Using functional-test skill."
10+
11+
# Functional Test Runner
12+
13+
Run targeted PerfSpect functional tests on a remote target to validate code changes. Identify the specific tests affected by a change, run them, and verify output aligns with the change.
14+
15+
## Test script
16+
17+
`../tools/perfspect/functional_test.sh` (relative to the perfspect repo root). Verify the file exists before proceeding.
18+
19+
## Prerequisites
20+
21+
1. **Built binary.** Run `make` (x86_64) or `make perfspect-aarch64` (ARM64). Binary must be at `./perfspect` (or set `PERFSPECT_DIR`).
22+
2. **Remote target.** User must provide: hostname/IP (`TARGET`), SSH user (`USER_NAME`), private key path (`PRIVATE_KEY_PATH`). Password-less sudo must be configured on the target.
23+
3. **Target dependencies.** `stress-ng` on the target. For flame tests: `java` and `/tmp/primes.java` (copy from `../tools/perfspect/primes.java`).
24+
25+
## Workflow
26+
27+
### Step 1 — Analyze the code change
28+
29+
Run `git diff main...HEAD` (or the appropriate base). Read the diff. Identify:
30+
31+
- **What changed**: flag names, validation logic, error messages, output formats, collection behavior, report generation, table definitions, script content.
32+
- **Behavioral impact**: Does the change alter a CLI flag? A validation rule? An error message string? An output file format? A collection path? A report table?
33+
34+
### Step 2 — Identify affected test categories
35+
36+
Use the code-to-category mapping below to determine which `TEST_*` categories are affected.
37+
38+
| Changed path | Categories |
39+
|---|---|
40+
| `cmd/config/` | `TEST_CONFIG` |
41+
| `cmd/flamegraph/` | `TEST_FLAME` |
42+
| `cmd/lock/` | `TEST_LOCK` |
43+
| `cmd/metrics/` | `TEST_METRICS` |
44+
| `cmd/report/` | `TEST_REPORT` |
45+
| `cmd/benchmark/` | `TEST_BENCHMARK` |
46+
| `cmd/telemetry/` | `TEST_TELEMETRY` |
47+
| `cmd/root.go` | All — trace the specific change to narrow |
48+
| `internal/app/` | All — trace the specific change to narrow |
49+
| `internal/workflow/` | All reporting commands — trace to narrow |
50+
| `internal/extract/` | `TEST_REPORT`, `TEST_TELEMETRY`, `TEST_METRICS` |
51+
| `internal/target/` | All — affects SSH/local execution |
52+
| `internal/script/` | All — affects script execution |
53+
| `internal/report/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY`, `TEST_METRICS`, `TEST_FLAME` |
54+
| `internal/table/` | `TEST_REPORT`, `TEST_BENCHMARK`, `TEST_TELEMETRY` |
55+
| `internal/cpus/` | All — CPU detection used everywhere |
56+
| `internal/progress/` | All — progress UI used everywhere |
57+
| `internal/util/` | All — trace the specific change to narrow |
58+
| `main.go`, `go.mod`, `go.sum` | All |
59+
| `scripts/`, `tools/` | All — embedded resources |
60+
61+
### Step 3 — Identify specific affected tests
62+
63+
Read the test catalog for each affected category. Load **only** the doc files for affected categories:
64+
65+
| Category | Test catalog |
66+
|---|---|
67+
| `TEST_CONFIG` | [docs/config-tests.md](docs/config-tests.md) |
68+
| `TEST_FLAME` | [docs/flame-tests.md](docs/flame-tests.md) |
69+
| `TEST_LOCK` | [docs/lock-tests.md](docs/lock-tests.md) |
70+
| `TEST_METRICS` | [docs/metrics-tests.md](docs/metrics-tests.md) |
71+
| `TEST_REPORT` | [docs/report-tests.md](docs/report-tests.md) |
72+
| `TEST_BENCHMARK` | [docs/benchmark-tests.md](docs/benchmark-tests.md) |
73+
| `TEST_TELEMETRY` | [docs/telemetry-tests.md](docs/telemetry-tests.md) |
74+
75+
Within the loaded catalog, find every test whose behavior intersects with the change using these criteria:
76+
77+
1. **Flag changes** — Tests that pass the changed flag in `t_args`.
78+
2. **Error message changes** — Tests whose `t_expect_stderr` matches the changed error string.
79+
3. **Output format changes** — Tests that exercise the changed format via `--format` in `t_args`.
80+
4. **Collection behavior changes** — Tests that exercise the changed collection path (scope, granularity, duration, live mode, workload-driven, etc.).
81+
5. **Shared infrastructure changes** — If the change is in shared code (`internal/target/`, `internal/script/`, `internal/workflow/`, `internal/app/`, `cmd/root.go`, `main.go`), trace the change to the specific behavior and find tests that trigger it across categories. Do not blindly run all tests.
82+
6. **stdout/stderr pattern changes** — Tests whose `t_expect_stdout` or `t_expect_stderr` contains text the change modifies.
83+
7. **Custom validation function changes** — Tests with `t_expect_func` that validate output artifacts affected by the change.
84+
85+
Build a list of specific test names (`t_name` values) and their category.
86+
87+
### Step 4 — Predict expected test outcomes
88+
89+
For each identified test, determine whether the code change should:
90+
91+
- **Not alter the test result** (regression check) — The test must still PASS with the same output patterns.
92+
- **Change the test's expected behavior** — The test's expectations (`t_expect_exit`, `t_expect_stdout`, `t_expect_stderr`, `t_expect_func`) no longer match the new code. Flag this to the user: the test script itself must be updated. Explain what the new expected values must be.
93+
- **Make a previously-skipped test runnable** — If the change adds support for something that was previously guarded.
94+
95+
### Step 5 — Run the affected test categories
96+
97+
Disable all categories except those containing affected tests:
98+
99+
```bash
100+
TARGET=<host> USER_NAME=<user> PRIVATE_KEY_PATH=<key> \
101+
PERFSPECT_DIR=. \
102+
TEST_CONFIG=false TEST_FLAME=false TEST_LOCK=false TEST_METRICS=false \
103+
TEST_REPORT=false TEST_BENCHMARK=false TEST_TELEMETRY=false \
104+
<enable affected categories here>=true \
105+
../tools/perfspect/functional_test.sh -q -v
106+
```
107+
108+
Add `NO_ROOT=true` if the remote user does not have password-less sudo.
109+
110+
### Step 6 — Verify output aligns with the change
111+
112+
Do not stop at PASS/FAIL. For each affected test:
113+
114+
1. **Read the test output.** Examine `test/output/<N>-<test_name>/stdout.txt`, `stderr.txt`, and `perfspect.log`.
115+
2. **Verify the change is reflected.** Follow the output verification guidance in the category's doc file. Examples:
116+
- Error message changed → confirm `stderr.txt` contains the new text.
117+
- New output field added → confirm it appears in `stdout.txt` or generated report files.
118+
- Chart/report generation changed → confirm output HTML/JSON/CSV contains expected new content.
119+
- Bug fix that eliminated ERROR log entries → confirm `perfspect.log` no longer contains `level=ERROR` for the affected path.
120+
- Collection behavior changed → confirm `stderr.txt` shows expected collection messages and `stdout.txt` shows expected output files.
121+
3. **Check for unintended side effects.** Scan output of non-target tests in the same category for unexpected ERRORs or changed output patterns.
122+
123+
### Step 7 — Report to user
124+
125+
Provide:
126+
- The list of tests identified as affected and why.
127+
- PASS/FAIL status of each.
128+
- For each affected test: what was verified in the output and whether the change is reflected correctly.
129+
- Any tests whose expectations must be updated in the test script (with the specific `t_expect_*` values that must change).
130+
- Any tests that passed but whose output reveals a concern.
131+
132+
## Environment variable reference
133+
134+
| Variable | Default | Purpose |
135+
|---|---|---|
136+
| `PERFSPECT_DIR` | `.` | Path to directory containing the `perfspect` binary |
137+
| `ROOT_OUTPUT_DIR` | `test/output` | Output directory for test artifacts |
138+
| `TARGET` | _(empty)_ | Remote target hostname/IP (empty = local) |
139+
| `USER_NAME` | _(empty)_ | SSH username for remote target |
140+
| `PRIVATE_KEY_PATH` | _(empty)_ | SSH private key path for remote target |
141+
| `NO_ROOT` | `false` | Set to `true` to run without root |
142+
| `TEST_CONFIG` | `true` | Run config tests |
143+
| `TEST_FLAME` | `true` | Run flame tests |
144+
| `TEST_LOCK` | `true` | Run lock tests |
145+
| `TEST_METRICS` | `true` | Run metrics tests |
146+
| `TEST_REPORT` | `true` | Run report tests |
147+
| `TEST_BENCHMARK` | `true` | Run benchmark tests |
148+
| `TEST_TELEMETRY` | `true` | Run telemetry tests |
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Benchmark Tests (TEST_BENCHMARK)
2+
3+
## Test catalog
4+
5+
| Test name | Args exercised | Validates |
6+
|---|---|---|
7+
| `benchmark default` | `benchmark` | Default benchmark (all benchmarks, default format) |
8+
| `benchmark input` | `benchmark --input <prev>` | Reprocessing from `benchmark default` output |
9+
| `benchmark invalid benchmark` | `benchmark --foo` | Exit 1, unknown flag rejected by cobra |
10+
| `benchmark invalid format` | `benchmark --format invalid` | Exit 1 |
11+
12+
## Flags exercised
13+
14+
`--input`, `--format`, unknown flags (cobra validation)
15+
16+
Note: The test script does not exercise individual benchmark selection flags (`--speed`, `--power`, `--temperature`, `--frequency`, `--memory`, `--cache`, `--storage`) or `--storage-dir`, `--no-summary`. Changes to these flags are covered only by `benchmark default` (which runs with `--all` implicitly).
17+
18+
## Test dependencies
19+
20+
- `benchmark input` depends on the output of `benchmark default` (uses its output directory as `--input`).
21+
22+
## Output verification guidance
23+
24+
- **`benchmark default`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains benchmark report files. If the change affects benchmark collection, summary table generation, or reference data comparisons, inspect the output report content.
25+
- **`benchmark input`**: Verify reprocessing produces output without re-running benchmarks.
26+
- **`benchmark invalid benchmark`**: Verifies cobra rejects unknown flags. This test is stable unless the flag name `--foo` is added as a real flag (unlikely).
27+
- **`benchmark invalid format`**: Verify exit code is 1.
28+
- **If benchmark selection flags change**: Only `benchmark default` (all benchmarks) is tested. Individual benchmark flags are not exercised. If a benchmark is added/removed/renamed, verify `benchmark default` still passes and its output reflects the change.
29+
- **If `--format` options change**: Same pattern as other commands — `benchmark invalid format` still passes, but `benchmark default` output should be checked for the new format.
30+
- **If `--storage-dir` validation changes**: No test exercises this flag directly. Manual verification required.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Config Tests (TEST_CONFIG)
2+
3+
All config tests require root (`t_requires_root=true`).
4+
5+
## Test catalog
6+
7+
| Test name | Args exercised | Validates |
8+
|---|---|---|
9+
| `config help` | `config --help` | Help text prints `Usage:` |
10+
| `config default` | `config` | No-op prints `No changes requested` and `Configuration` |
11+
| `config gov epb epp` | `config --gov performance --epb 0 --epp 0` | Applies governor/epb/epp, stderr confirms each setting |
12+
| `config disable l2hw prefetcher` | `config --pref-l2hw disable` | Prefetcher disable, stderr confirms |
13+
| `config enable l2hw prefetcher no-summary` | `config --pref-l2hw enable --no-summary` | Prefetcher enable with `--no-summary` suppresses stdout table |
14+
| `config invalid core count` | `config --cores 0` | Exit 1, stderr: `invalid flag value, --cores 0, valid values are` |
15+
| `config invalid llc size` | `config --llc 0` | Exit 1, stderr: `invalid flag value, --llc 0, valid values are` |
16+
| `config invalid core frequency` | `config --core-max .05` | Exit 1, stderr: `invalid flag value, --core-max 0.05, valid values are` |
17+
| `config invalid tdp` | `config --tdp 0` | Exit 1, stderr: `invalid flag value, --tdp 0, valid values are` |
18+
| `config invalid epb` | `config --epb 16` | Exit 1, stderr: `invalid flag value, --epb 16, valid values are` |
19+
| `config invalid epp` | `config --epp 256` | Exit 1, stderr: `invalid flag value, --epp 256, valid values are` |
20+
| `config invalid governor` | `config --gov invalid` | Exit 1, stderr: `invalid flag value, --gov invalid, valid values are` |
21+
| `config invalid elc` | `config --elc invalid` | Exit 1, stderr: `invalid flag value, --elc invalid, valid values are` |
22+
| `config invalid uncore max frequency` | `config --uncore-max .05` | Exit 1, stderr: `invalid flag value, --uncore-max 0.05, valid values are` |
23+
| `config invalid uncore min frequency` | `config --uncore-min .05` | Exit 1, stderr: `invalid flag value, --uncore-min 0.05, valid values are` |
24+
| `config invalid uncore max compute frequency` | `config --uncore-max-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-max-compute 0.05, valid values are` |
25+
| `config invalid uncore min compute frequency` | `config --uncore-min-compute .05` | Exit 1, stderr: `invalid flag value, --uncore-min-compute 0.05, valid values are` |
26+
| `config invalid uncore max io frequency` | `config --uncore-max-io .05` | Exit 1, stderr: `invalid flag value, --uncore-max-io 0.05, valid values are` |
27+
| `config invalid uncore min io frequency` | `config --uncore-min-io .05` | Exit 1, stderr: `invalid flag value, --uncore-min-io 0.05, valid values are` |
28+
| `config invalid l2hw prefetcher` | `config --pref-l2hw invalid` | Exit 1, stderr: `invalid flag value, --pref-l2hw invalid, valid values are` |
29+
| `config invalid c6` | `config --c6 invalid` | Exit 1, stderr: `invalid flag value, --c6 invalid, valid values are` |
30+
| `config invalid c1 demotion` | `config --c1-demotion invalid` | Exit 1, stderr: `invalid flag value, --c1-demotion invalid, valid values are` |
31+
32+
## Flags exercised
33+
34+
`--gov`, `--epb`, `--epp`, `--pref-l2hw`, `--no-summary`, `--cores`, `--llc`, `--core-max`, `--tdp`, `--elc`, `--uncore-max`, `--uncore-min`, `--uncore-max-compute`, `--uncore-min-compute`, `--uncore-max-io`, `--uncore-min-io`, `--c6`, `--c1-demotion`, `--help`
35+
36+
## Output verification guidance
37+
38+
- **Positive tests** (`config gov epb epp`, `config disable l2hw prefetcher`, etc.): Verify `stderr.txt` contains the `set <flag> to <value>` confirmation messages. Verify `stdout.txt` contains the `Configuration` table when `--no-summary` is not set, and does not contain it when `--no-summary` is set.
39+
- **Negative tests** (all `config invalid *`): Verify `stderr.txt` contains the exact `Error: invalid flag value, --<flag> <value>, valid values are` message. Verify exit code is 1.
40+
- **If a validation range changes** (e.g., `--epb` now accepts 0-20 instead of 0-15): The `config invalid epb` test passes `--epb 16` and expects exit 1. If 16 is now valid, this test must be updated — flag to user with the new boundary value.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Flame Tests (TEST_FLAME)
2+
3+
All flame tests require root (`t_requires_root=true`).
4+
5+
## Test catalog
6+
7+
| Test name | Runner | Args exercised | Validates |
8+
|---|---|---|---|
9+
| `flame duration java` | `run_test` | `flame --duration 10 --format all` + java workload | JSON output contains `primes.java` in `Flamegraph[0]["Java Stacks"]` |
10+
| `flame duration native` | `run_test` | `flame --duration 10 --format all` + stress-ng | JSON output contains `stress-ng` in `Flamegraph[0]["Native Stacks"]` |
11+
| `flame dual native stacks` | `run_test` | `flame --duration 10 --format all --dual-native-stacks` + stress-ng | Dual stack mode, JSON validates `stress-ng` in Native Stacks |
12+
| `flame all options` | `run_test` | `flame --duration 10 --frequency 10 --format html,json --no-summary --max-depth 20 --perf-event instructions` + java + `--pids` | All flags combined, JSON validates `primes.java` in Java Stacks |
13+
| `flame with input` | `run_test` | `flame --input <prev_output>` | Reprocessing from raw data produced by `flame all options` |
14+
| `flame invalid format` | `run_test` | `flame --format html,invalid` | Exit 1, stderr: `format options are: all, html, txt, json` |
15+
| `flame invalid duration` | `run_test` | `flame --duration -1` | Exit 1, stderr: `duration must be 0 or greater` |
16+
| `flame invalid frequency` | `run_test` | `flame --frequency 0` | Exit 1, stderr: `frequency must be 1 or greater` |
17+
| `flame sigint native` | `run_sigint_test` | `flame --format all --no-summary` + stress-ng, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, `perf` and `processwatch` no longer running, JSON validates `stress-ng` |
18+
| `flame sigint java` | `run_sigint_test` | `flame --format all --no-summary` + java, SIGINT after 15s | Graceful shutdown: log ends with `Shutting down`, JSON validates `primes.java` |
19+
20+
## Flags exercised
21+
22+
`--duration`, `--format`, `--frequency`, `--no-summary`, `--max-depth`, `--perf-event`, `--dual-native-stacks`, `--pids`, `--input`
23+
24+
## Custom validation functions
25+
26+
Tests `flame duration java`, `flame all options`, `flame sigint java` use:
27+
```bash
28+
jq -r ".["Flamegraph"][0]["Java Stacks"]" "$1"/*_flame.json | grep -q "primes.java"
29+
```
30+
31+
Tests `flame duration native`, `flame dual native stacks`, `flame sigint native` use:
32+
```bash
33+
jq -r ".["Flamegraph"][0]["Native Stacks"]" "$1"/*_flame.json | grep -q "stress-ng"
34+
```
35+
36+
## Output verification guidance
37+
38+
- **Collection tests** (`flame duration java`, `flame duration native`, `flame dual native stacks`, `flame all options`): Verify `*_flame.json` exists in the output directory. Parse it with `jq` to confirm the expected stack type contains the workload name.
39+
- **Input reprocessing** (`flame with input`): Verify it regenerates output from previously-collected raw data without re-collecting.
40+
- **Negative tests**: Verify `stderr.txt` contains the exact error message string. Verify exit code is 1.
41+
- **SIGINT tests**: Verify `perfspect.log` last line contains `Shutting down`. Verify no `perf` or `processwatch` processes remain on target. Verify the `t_expect_func` JSON validation still passes (data was collected before shutdown).
42+
- **If `--format` options change**: The `flame invalid format` test expects the error `format options are: all, html, txt, json`. Update the expected string if format options are added or removed.
43+
- **If JSON output structure changes**: The custom validation functions parse `*_flame.json` with specific jq paths. If the JSON schema changes, these tests will fail — flag to user that both code and test `t_expect_func` must be updated.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Lock Tests (TEST_LOCK)
2+
3+
All lock tests require root (`t_requires_root=true`).
4+
5+
## Test catalog
6+
7+
| Test name | Args exercised | Validates |
8+
|---|---|---|
9+
| `lock all options` | `lock --duration 10 --frequency 22 --package --no-summary --format html` + stress-ng | All lock flags combined, successful collection |
10+
| `lock invalid duration` | `lock --duration 0` | Exit 1 (duration must be > 0) |
11+
| `lock invalid frequency` | `lock --frequency -1` | Exit 1 (frequency must be > 0) |
12+
| `lock invalid format` | `lock --format invalid` | Exit 1 (format must be from: all, html, txt) |
13+
14+
## Flags exercised
15+
16+
`--duration`, `--frequency`, `--package`, `--no-summary`, `--format`
17+
18+
## Output verification guidance
19+
20+
- **`lock all options`**: Verify no `level=ERROR` in `perfspect.log`. Verify output directory contains HTML report file. With `--package`, verify raw data package was downloaded.
21+
- **Negative tests**: Verify exit code is 1. These tests do not set `t_expect_stderr` patterns, so validation is exit-code-only. If a code change adds specific error messages for lock validation, the tests may need `t_expect_stderr` added.
22+
- **If `--format` options change**: The `lock invalid format` test passes `--format invalid` and expects exit 1. If new format options are added, this test still passes (since `invalid` remains invalid). But if format validation error messages change, verify they still align.
23+
- **If duration/frequency validation changes** (e.g., allowing 0 duration for indefinite collection): `lock invalid duration` passes `--duration 0` and expects exit 1. If 0 becomes valid, this test must be updated — flag to user.

0 commit comments

Comments
 (0)