The SOC Pack ships a structured live evaluation harness under
tests/eval/. It is a contract-style suite that
hits every tool group on a deployed (or locally running) instance and
asserts the response shape is what Security Copilot / MCP clients
expect.
It complements the existing assets:
| Asset | Scope | When to run |
|---|---|---|
pytest tests/ |
Unit tests against mocked upstreams. No network. | Every commit (CI). |
scripts/smoke.ps1 |
30-second curl-style sanity check, prints a pass/skip/fail table. | Right after Deploy to Azure or an image bump. |
scripts/eval.ps1 (this) |
Full contract suite over every endpoint, real upstream calls. | Nightly, before promoting an image to production, or before tagging a release. |
Each scenario is a Scenario record in
tests/eval/test_live_scenarios.py
with:
- HTTP method + path + params (or JSON body for POST routes).
- Expected status code (defaults to 200).
- A list of contract assertions on the response body
(e.g.
cveIDechoed,vulnskey present, response is a non-empty list).
The harness deliberately does not assert exact values, because upstream threat-intel data changes daily. It catches regressions in schema/shape and in the conditional-registration plumbing.
Scenarios for tools whose upstream API key is not configured on the target are auto-skipped, in two ways:
- Route absent (HTTP 404) — when the env var is unset on the Container App, the route is never registered. The harness skips instead of failing.
- Route present, key still missing (HTTP 503) — when the route is
registered but the upstream rejects the request, the harness reads
the
detailand skips.
This matches the conditional-registration policy in
src/app.py and lets the same eval suite run against
both minimum-viable deployments (KEV / EPSS / ATT&CK / crt.sh /
ransomware.live / OSV / CIRCL / D3FEND / HIBP only) and full
deployments (every upstream key configured).
Set the following environment variables before running:
| Variable | Required | Default | Notes |
|---|---|---|---|
EVAL_TARGET_URL |
✅ | — | Base URL with scheme. Example: https://copilot-mcp-soc-pack.<env>.<region>.azurecontainerapps.io. |
EVAL_API_KEY |
conditional | — | X-API-Key value. Required if the target enforces auth. |
EVAL_TIMEOUT_SECONDS |
optional | 20 |
Per-request timeout. |
If EVAL_TARGET_URL is unset, the entire harness module is skipped —
this keeps the default pytest tests/ invocation fast and offline.
# Against the deployed Container App
$env:EVAL_API_KEY = az containerapp secret show -g <rg> -n <app> `
--secret-name api-key --query value -o tsv
./scripts/eval.ps1 -Fqdn copilot-mcp-soc-pack.<env>.<region>.azurecontainerapps.io
# Against a local uvicorn
uvicorn src.app:app --port 8080 &
./scripts/eval.ps1 -Fqdn localhost:8080 -PytestArgs '-k', 'kev'Or invoke pytest directly:
$env:EVAL_TARGET_URL = "https://copilot-mcp-soc-pack.<env>.<region>.azurecontainerapps.io"
$env:EVAL_API_KEY = "<value>"
pytest tests/eval -m eval -vThe repository ships a .github/workflows/eval.yml workflow that you
can trigger manually (workflow_dispatch) and that also runs nightly.
It expects two repository secrets:
EVAL_TARGET_URL— base URL of the long-running test deployment.EVAL_API_KEY— theX-API-Keyvalue for that deployment.
Configure them under Settings → Secrets and variables → Actions in your fork. Without those secrets the workflow gracefully no-ops.
- Add a tool module under
src/tools/<name>.py(see tool-module instructions). - Append a
Scenario(...)entry toSCENARIOSintests/eval/test_live_scenarios.pythat exercises a stable, public, long-lived input. - Set
requires_upstream_key=Trueif the tool is gated. - Keep contract assertions to shape only (key presence, type, list-min-length). Never assert exact upstream values.