Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ executors:
docker-executor:
docker:
- image: cimg/node:22.22.2-browsers
docker-semgrep-executor:
docker:
- image: semgrep/semgrep:1.163.0
docker-postgres-executor:
docker:
- image: cimg/node:22.22.2-browsers
Expand Down Expand Up @@ -81,6 +84,9 @@ workflows:
- test_cucumber:
requires:
- build_done
- sast_scan:
requires:
- build_done
- dynamic_security_scan:
requires:
- build_done
Expand All @@ -92,6 +98,7 @@ workflows:
- test_e2e_api
- test_e2e_utils
- test_cucumber
- sast_scan
- dynamic_security_scan
- deploy_job:
name: autodeploy_staging
Expand Down Expand Up @@ -597,6 +604,24 @@ jobs:
- store_artifacts:
path: reports

sast_scan:
executor: docker-semgrep-executor
resource_class: medium
steps:
- checkout
- run:
name: Install Node.js for Semgrep comparator
command: |
apk add --no-cache nodejs
- run:
name: Run Semgrep scan
command: node ./tools/semgrep-sast.js scan
- run:
name: Check Semgrep baseline
command: node ./tools/semgrep-sast.js check
- store_artifacts:
path: reports/semgrep

deploy_job:
executor: docker-executor
resource_class: large
Expand Down
67 changes: 67 additions & 0 deletions docs/adr/0026-semgrep-sast-baseline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# 0026. Add Semgrep SAST Baseline Control

## Status

Proposed

## Context

During a configuration management audit performed on March 30, 2026, no evidence verifiable from The TTA Hub repository showed that a dedicated static application security testing (SAST) control had been run for the baseline.

The TTA Hub repository already contained adjacent security controls:

- Biome linting for static code quality checks
- Yarn audit for dependency vulnerability checks
- OWASP ZAP in CircleCI for dynamic application scanning

Those controls do not satisfy the missing requirement on their own:

- linting is not a dedicated SAST control
- dependency audit is software composition analysis (SCA)
- ZAP is dynamic application security testing (DAST)

The missing control must be reproducible from the repository, run in the same CI system that already enforces the rest of the build gates, and keep an auditable baseline with explicit finding dispositions.

The repository also already has a Semgrep GitHub app installation available, but this ticket is intentionally scoped to a CircleCI-only control and an evidence path owned in the repository. Relying on app configuration or UI retention alone would not fully address the audit gap.

## Decision

We will add a dedicated SAST control in CircleCI using Semgrep Community Edition.

The implementation stores its control definition and evidence in the repository:

- `security/sast/scan-config.json`
- `security/sast/baseline.json`
- `security/sast/dispositions.json`

CircleCI runs the dedicated SAST job in the pinned Semgrep container image `semgrep/semgrep:1.163.0`, stores SARIF, JSON, and text artifacts under `reports/semgrep`, and then compares the current results against the committed baseline and dispositions using repository-owned Node.js tooling.

The initial control uses the Semgrep Registry `p/default` ruleset with scan scope and exclusions defined in the repository. The gate blocks only net new findings with Semgrep severity `ERROR`. Baseline findings are allowed only when they have a committed disposition. Findings marked `fixed` must not still appear in the active scan. Baseline generation and CI validation both fail if Semgrep reports scan errors or skipped paths, so incomplete scans cannot be retained as authoritative evidence.

Finding identity is based on a stable match key that does not include line numbers. Duplicate matches are sorted by a stable locator before occurrence numbering is assigned, which avoids baseline churn when Semgrep returns otherwise identical matches in a different order. When refreshing the baseline, existing disposition records are merged by finding identity so previously reviewed rationale, ownership, and approval data are preserved.

Baseline refresh always runs a fresh scan before writing the committed baseline and records provenance including the git HEAD SHA, the hash of the committed scan configuration, and git worktree fingerprints that indicate whether the scan came from a dirty checkout. The committed baseline keeps only hashes and counts for local untracked content, while the richer path-level detail remains limited to the ephemeral CI artifact provenance. That prevents a new baseline from being generated silently from stale scan output, reduces unnecessary local filename exposure in repository history, and keeps the audit trail tied to the scanned repository state.

We chose Semgrep over alternatives such as SonarQube, Checkmarx, and CodeQL for The TTA Hub because:

- it fits the existing CircleCI enforcement model without requiring a separate platform to become the control plane
- it can run locally and in CI with the same committed configuration
- it can emit JSON and SARIF directly for artifact retention
- it is simpler to roll out as a repository owned control than SonarQube or Checkmarx, which would add more platform dependency and operational overhead
- it avoids making GitHub-specific integration, such as CodeQL or the existing Semgrep app, the only evidence path for this ticket

We are intentionally not using the Semgrep GitHub app as part of the authoritative control for this ticket. If we later want PR annotations or centralized triage in the app, that should be added in a later integration using the same repository owned rules and evidence model.

## Consequences

This adds another CI job and increases scan time modestly.

The team must maintain three committed artifacts:

- the scan configuration
- the baseline scan snapshot
- the disposition log for unresolved baseline findings

Because Semgrep Registry rulesets evolve over time, the committed baseline and disposition comparison logic become the stability mechanism for this change. The control remains reproducible because the CLI version is pinned and the scan invocation is defined in the repository, but the team should expect periodic baseline review as Semgrep community rules change.

This design improves auditability because the scanner configuration, baseline, and dispositions can all be verified directly from The TTA Hub repository history.
4 changes: 4 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
"deps:install": "yarn install && yarn --cwd frontend install",
"deps:frozen": "yarn install --frozen-lockfile && yarn --cwd frontend install --frozen-lockfile",
"deps:audit": "./tools/run-yarn-audit.js && cd frontend && ../tools/run-yarn-audit.js",
"sast:scan": "node ./tools/semgrep-sast.js scan",
"sast:baseline": "node ./tools/semgrep-sast.js generate-baseline",
"sast:dispositions": "node ./tools/semgrep-sast.js seed-dispositions",
"sast:check": "node ./tools/semgrep-sast.js check",
"db:migrate": "sequelize db:migrate && yarn db:ldm",
"db:migrate:ci": "bin/ci-env sequelize db:migrate && yarn db:ldm:ci",
"db:migrate:prod": "sequelize db:migrate --options-path .production.sequelizerc",
Expand Down
52 changes: 52 additions & 0 deletions security/sast/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Semgrep SAST

This directory is the repo-owned source of truth for the CircleCI SAST control introduced for ticket `TTAHUB-5242`.

## Files

- `scan-config.json`: pinned Semgrep CLI version, ruleset selection, scan scope, and gate threshold
- `baseline.json`: the retained baseline scan snapshot for the current control
- `dispositions.json`: dispositions for every baseline finding that remains open or intentionally unresolved

## Local workflow

1. Install the pinned Semgrep CLI version:

```bash
SEMGREP_VERSION=$(node -pe "require('./security/sast/scan-config.json').semgrepVersion")
python3 -m pip install --user "semgrep==${SEMGREP_VERSION}"
export PATH="$HOME/.local/bin:$PATH"
```

2. Run the scan if you want to inspect findings without updating the baseline:

```bash
yarn sast:scan
```

3. Refresh the baseline when explicitly approved:

```bash
yarn sast:baseline
yarn sast:dispositions
```

`yarn sast:baseline` runs a fresh scan before writing `baseline.json` and records scan provenance from the exact repository state used to start the scan.
`yarn sast:dispositions` preserves existing reviewed records by signature and only scaffolds entries for newly introduced baseline findings.

4. Check the current findings against the committed baseline and dispositions:

```bash
yarn sast:check
```

## Operating rules

- CircleCI is the authoritative SAST control for this implementation.
- The Semgrep GitHub app, if present, is informational only and not part of the control evidence path.
- The gate currently blocks only net-new Semgrep findings with severity `ERROR`.
- Every baseline finding must have a committed disposition entry.
- Findings marked `fixed` in `dispositions.json` must not still be present in the live scan.
- Baseline generation and CI validation fail if Semgrep reports scan errors or skipped paths, so the retained baseline cannot silently come from a partial scan.
- Committed baseline provenance records the git HEAD SHA, scan config hash, and git worktree fingerprints, including whether the worktree was dirty and hashes for tracked and untracked local changes.
- The ephemeral `reports/semgrep/provenance.json` artifact may include richer untracked-entry detail for debugging, but raw local filenames are not retained in committed baseline evidence.
Loading
Loading