Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 3.41 KB

File metadata and controls

66 lines (47 loc) · 3.41 KB

Demo Corpus Methodology

How a demo is structured

Every repo follows this layout:

<repo>/
├── README.md              ← what the app pretends to do, plus the warning
├── DEMO.md                ← seeded-finding summary, prose
├── demo.yaml              ← seeded-finding inventory, machine-readable
├── src/                   ← application code with seeded SAST findings
├── infra/
│   ├── terraform/         ← AWS/Azure/GCP IaC with seeded misconfigs
│   └── k8s/               ← K8s manifests with seeded misconfigs
├── Dockerfile             ← container with seeded misconfigs
├── <manifest>             ← requirements.txt | package.json | go.mod | Cargo.toml | <project>.csproj
├── .github/workflows/     ← CI/CD with seeded misconfigs
└── tests/                 ← minimal smoke tests, no security assertions

Capability coverage matrix

Every repo must cover, at minimum:

Capability Floor Ceiling Realised via
SAST 8 findings 15 findings Application code in src/
IaC 5 findings 10 findings infra/terraform/, infra/k8s/, Dockerfile
SCA 3 vulnerable deps 6 vulnerable deps Manifest pinned to historical CVE'd versions
SBOM 1 generated SBOM 1 generated SBOM Derived from manifest, written to dist/sbom.json during scan
Pipeline misconfig 3 findings 6 findings .github/workflows/<name>.yml

Total per repo: 20–37 seeded findings. Across corpus v0.1.0: 246 seeded findings (106 SAST · 66 IaC · 40 SCA · 34 pipeline misconfig).

Token budget per scan (measured against corpus v0.1.0)

The whole corpus is 217 files / ~3,300 LOC / ~110 KB of scannable text, which means a full-corpus run is:

Layer Model Input tokens, full corpus Output tokens (est.)
Layer 3 triage Claude Haiku 4.5 (Bedrock EU) ~28,000 ~6,000
Layer 4 validation Claude Sonnet 4.5 (Bedrock EU) ~5,500 ~1,500

That's ~2,800 input tokens per repo for triage — low token counts because the apps are small by design. At Bedrock EU pricing the cost per full-corpus scan is a few pence; a weekly cadence plus rule/model-triggered re-scans runs at well under £100/year for indefinitely-rerunnable continuous validation.

The corpus deliberately stays in this size envelope. Adding more repos is fine; growing existing repos past the 1,500-LOC cap requires a charter amendment so the cadence cost doesn't drift.

Naming convention

All demo repos use invented brand names that do not map to any real company. Every repo README.md includes the warning banner. None of the apps reference real customer data, real internal hostnames, or any DevSecAI internal infrastructure.

What the demo is not

The demo corpus is not:

  • Real customer code (none of these are based on any real customer's systems)
  • A measurement of real-world false-positive rate (use customer scan data for that)
  • A penetration testing surface (these apps don't run; static analysis only)
  • A way to inflate scan volume — the cadence is published, the trigger reasons are tagged, and the workload is reproducible by anyone

It is:

  • A repeatable way to validate the engine end-to-end on every change
  • A reproducible artifact a prospective customer can clone and inspect
  • A regression baseline that lets us catch detection-rate drops within minutes of a release