Every repo follows this layout:
<repo>/
├── README.md ← what the app pretends to do, plus the warning
├── DEMO.md ← seeded-finding summary, prose
├── demo.yaml ← seeded-finding inventory, machine-readable
├── src/ ← application code with seeded SAST findings
├── infra/
│ ├── terraform/ ← AWS/Azure/GCP IaC with seeded misconfigs
│ └── k8s/ ← K8s manifests with seeded misconfigs
├── Dockerfile ← container with seeded misconfigs
├── <manifest> ← requirements.txt | package.json | go.mod | Cargo.toml | <project>.csproj
├── .github/workflows/ ← CI/CD with seeded misconfigs
└── tests/ ← minimal smoke tests, no security assertions
Every repo must cover, at minimum:
| Capability | Floor | Ceiling | Realised via |
|---|---|---|---|
| SAST | 8 findings | 15 findings | Application code in src/ |
| IaC | 5 findings | 10 findings | infra/terraform/, infra/k8s/, Dockerfile |
| SCA | 3 vulnerable deps | 6 vulnerable deps | Manifest pinned to historical CVE'd versions |
| SBOM | 1 generated SBOM | 1 generated SBOM | Derived from manifest, written to dist/sbom.json during scan |
| Pipeline misconfig | 3 findings | 6 findings | .github/workflows/<name>.yml |
Total per repo: 20–37 seeded findings. Across corpus v0.1.0: 246 seeded findings (106 SAST · 66 IaC · 40 SCA · 34 pipeline misconfig).
The whole corpus is 217 files / ~3,300 LOC / ~110 KB of scannable text, which means a full-corpus run is:
| Layer | Model | Input tokens, full corpus | Output tokens (est.) |
|---|---|---|---|
| Layer 3 triage | Claude Haiku 4.5 (Bedrock EU) | ~28,000 | ~6,000 |
| Layer 4 validation | Claude Sonnet 4.5 (Bedrock EU) | ~5,500 | ~1,500 |
That's ~2,800 input tokens per repo for triage — low token counts because the apps are small by design. At Bedrock EU pricing the cost per full-corpus scan is a few pence; a weekly cadence plus rule/model-triggered re-scans runs at well under £100/year for indefinitely-rerunnable continuous validation.
The corpus deliberately stays in this size envelope. Adding more repos is fine; growing existing repos past the 1,500-LOC cap requires a charter amendment so the cadence cost doesn't drift.
All demo repos use invented brand names that do not map to any real company. Every repo README.md includes the warning banner. None of the apps reference real customer data, real internal hostnames, or any DevSecAI internal infrastructure.
The demo corpus is not:
- Real customer code (none of these are based on any real customer's systems)
- A measurement of real-world false-positive rate (use customer scan data for that)
- A penetration testing surface (these apps don't run; static analysis only)
- A way to inflate scan volume — the cadence is published, the trigger reasons are tagged, and the workload is reproducible by anyone
It is:
- A repeatable way to validate the engine end-to-end on every change
- A reproducible artifact a prospective customer can clone and inspect
- A regression baseline that lets us catch detection-rate drops within minutes of a release