[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1296

2026-03-13T22:22:01Z

github-actions[bot]
bot Mar 13, 2026

📊 Current CI/CD Pipeline Status

The repository has a well-structured and mature CI/CD pipeline with 40+ workflow files covering build verification, testing, security scanning, and agentic quality checks. Recent workflow runs show high success rates: most workflows are passing at 100%, with only the currently-running assessment workflow pending.

The pipeline is layered across three tiers:

Standard GitHub Actions workflows (.yml files) — deterministic, automated quality gates
Agentic workflows (.md lock files compiled with gh-aw) — AI-driven reviews and specialized checks
Scheduled maintenance — dependency audits, performance monitoring, secret scanning

✅ Existing Quality Gates

The following checks run on pull requests to main:

Check	Workflow File	Scope
Build verification (Node 20 & 22)	`build.yml`	All PRs
ESLint + Markdownlint	`lint.yml`	All PRs
TypeScript type check (`tsc --noEmit`)	`test-integration.yml`	All PRs
Unit test coverage with PR comments	`test-coverage.yml`	Non-markdown PRs
CodeQL (JS/TS + Actions)	`codeql.yml`	All PRs
npm dependency audit (high+critical)	`dependency-audit.yml`	Non-markdown PRs
PR title conventional commits check	`pr-title.yml`	All PRs
Container Trivy scan (agent + squid)	`container-scan.yml`	`containers/**` path changes
Integration tests (4 parallel jobs)	`test-integration-suite.yml`	All PRs
Chroot integration tests (4 jobs)	`test-chroot.yml`	All PRs
Examples smoke tests	`test-examples.yml`	Non-markdown PRs
Setup Action tests	`test-action.yml`	Non-markdown PRs
Documentation link check	`link-check.yml`	`*/.md` path changes
Docs preview build	`docs-preview.yml`	`docs-site/**` path changes
Security Guard (Claude AI review)	`security-guard.md`	All PRs
Build Test Suite (Copilot AI agent)	`build-test.md`	All PRs

Scheduled / non-PR checks: CodeQL (weekly), Trivy (weekly), dependency audit (weekly), performance benchmarks (weekly), secret digger (hourly), link check (weekly), doc maintainer (daily).

🔍 Identified Gaps

🔴 High Priority

1. Critically low unit test coverage on core components

cli.ts has 0% unit test coverage and docker-manager.ts has only 18% (4% function coverage).
These two files contain the majority of the business logic: CLI argument handling, container orchestration, exit code propagation, cleanup lifecycle, and log streaming.
The overall coverage threshold is set at 38% statements — extremely low for a security-critical firewall that controls network egress for AI agents.
A regression in the cleanup lifecycle, domain normalization, or iptables rule generation would not be caught by unit tests.

2. Container security scan is path-limited — misses source-driven changes

container-scan.yml only triggers on containers/** path changes on PRs.
Changes to src/docker-manager.ts (which controls container configuration, capabilities, seccomp profiles, and volume mounts) do not trigger a Trivy scan.
A security regression in how containers are launched (e.g., dropping the seccomp profile, re-enabling NET_ADMIN) would not trigger a container scan.

3. No shell script linting (shellcheck)

Critical shell scripts with no static analysis: containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, containers/squid/entrypoint.sh, and scripts/ci/cleanup.sh.
setup-iptables.sh configures the iptables rules that enforce the network isolation. A bug (e.g., wrong flag, wrong chain) could silently break the firewall.
shellcheck catches quoting bugs, undefined variables, and unsafe patterns before runtime.

4. No Dockerfile linting (hadolint)

Neither containers/agent/Dockerfile nor containers/squid/Dockerfile is linted.
hadolint catches security anti-patterns (running as root, pinning base images, layer ordering) that Trivy does not detect.
The agent container Dockerfile installs packages with apt-get — version pinning and layer security are not validated.

🟡 Medium Priority

5. Performance benchmarks not enforced on PRs

performance-monitor.yml runs weekly only and does not trigger on pull requests.
The benchmark covers container startup time, proxy setup latency, and command execution overhead — metrics users care about directly.
A change that increases startup time from 8s to 30s (e.g., slow healthcheck polling) would not be caught before merge.
Note: The workflow creates GitHub issues on regression detection when run on a schedule, but this is reactive, not proactive.

6. Documentation build failures don't block PRs

docs-preview.yml has continue-on-error: true on the build step.
A broken Astro/Starlight documentation build (e.g., invalid MDX, broken imports) does not fail the PR.
Since docs changes are common (the project has active documentation), silent doc build failures accumulate.

7. Smoke tests are reaction-gated, not automatic

The smoke tests for Claude (smoke-claude.md), Codex (smoke-codex.md), and Copilot (smoke-copilot.md) are triggered by emoji reactions, not automatically on every PR.
While this is intentional to conserve resources, there is no clear policy for which PRs require smoke tests before merge.
High-impact changes (e.g., changes to containers/, src/docker-manager.ts, src/squid-config.ts) could merge without an end-to-end smoke test.

8. No artifact size monitoring

The compiled dist/ output and Docker image sizes are not tracked.
Accidental inclusion of large files (e.g., fixture data, bundled node_modules) in dist/ would not be caught.
Docker image size regressions affect pull times for all users of the GHCR images.

9. Integration tests don't run the api-proxy-observability and api-proxy-rate-limit tests in CI

test-integration-suite.yml runs --testPathPatterns="api-proxy" which should match all api-proxy tests.
However, api-proxy-observability.test.ts and api-proxy-rate-limit.test.ts are in the integration folder but do not appear in the pattern list documented in docs/INTEGRATION-TESTS.md. These tests should be verified to actually run in the test-api-proxy job.

🟢 Low Priority

10. Link check doesn't run when code files change

link-check.yml only triggers on **/*.md path changes.
Source code changes that remove or rename CLI flags, configuration options, or environment variables referenced in documentation produce dangling doc links — but these won't trigger a link check.
Adding push trigger or broadening the path filter would catch cross-cutting stale references.

11. No OpenSSF Scorecard

As a security-focused, GitHub-owned project, OpenSSF Scorecard is a natural fit.
It would automatically assess branch protection, token permissions, dependency update practices, CI/CD security, and pinned actions.
Results can be published as a badge and monitored for regression.

12. No SBOM generation in the release pipeline

release.yml publishes Docker images to GHCR but does not generate a Software Bill of Materials (SBOM).
Trivy can generate SBOMs as part of the release process, and GitHub's container registry can attach them to image manifests.
This is increasingly expected for security tooling consumed in enterprise environments.

13. `examples/github-copilot.sh` skipped in CI

The most realistic end-to-end example is not tested in CI because it requires a Copilot token.
A mock/offline variant of this test or a dedicated integration environment with a test token would catch CLI argument regressions in real-world usage patterns.

📋 Actionable Recommendations

#	Gap	Recommended Solution	Complexity	Impact
1	`cli.ts` / `docker-manager.ts` near-zero unit coverage	Add unit tests for `generateSquidConfig`, `generateDockerCompose`, domain normalization, and exit code logic; raise coverage threshold incrementally (target: 60%)	High	🔴 Critical
2	Container scan path-limited	Add `src/docker-manager.ts` to container scan trigger paths: `paths: [containers/**, src/docker-manager.ts]`	Low	🔴 High
3	No shellcheck	Add a `shellcheck` job to `lint.yml` targeting `containers/*/.sh` and `scripts/*/.sh`	Low	🔴 High
4	No Dockerfile linting	Add `hadolint` job to `lint.yml` or `build.yml` targeting both Dockerfiles	Low	🟡 Medium
5	No PR performance gate	Add a `performance-regression` job in integration tests that runs a subset of benchmarks (e.g., startup time only) with a failure threshold	Medium	🟡 Medium
6	Docs build failures silent	Remove `continue-on-error: true` from `docs-preview.yml` build step	Low	🟡 Medium
7	Smoke tests not automatic	Add path-based automatic triggering for smoke tests on `containers/` and `src/` changes, without requiring reaction	Low	🟡 Medium
8	No artifact size check	Add a step to `build.yml` that checks `du -sh dist/` against a threshold and fails if it exceeds it	Low	🟢 Low
9	Verify api-proxy-observability runs	Audit `test-integration-suite.yml` testPathPatterns to confirm all integration test files are covered	Low	🟡 Medium
10	Link check scope	Add `src/` and `containers/` to `link-check.yml` paths (with lychee ignoring non-URL content)	Low	🟢 Low
11	No OpenSSF Scorecard	Add `ossf/scorecard-action` workflow with weekly schedule and PR comment	Low	🟢 Low
12	No SBOM	Add `anchore/sbom-action` to `release.yml` after each Docker build	Low	🟢 Low

📈 Metrics Summary

Metric	Value
Total workflow files	40 (21 `.yml` standard + 21 `.md` agentic lock files)
Workflows triggering on PRs	16
Workflows triggering on schedule	10+
Recent workflow success rate (last 30 runs)	~90% (3 failures — all Secret Digger runs, 1 in-progress)
Unit test coverage — statements	38.39%
Unit test coverage — `cli.ts`	0%
Unit test coverage — `docker-manager.ts`	18% (4% function coverage)
Integration test files	30 test files across domain/network/security/chroot/api-proxy
Coverage threshold (enforced)	38% statements, 30% branches, 35% functions

Top 3 Actions for Maximum Impact

Raise unit test coverage on docker-manager.ts and cli.ts — the most important files have almost no unit tests; this is the single highest-value investment for PR quality.
Add shellcheck to the lint workflow — one-line change to catch shell script bugs in the security-critical iptables setup scripts.
Expand container scan trigger paths to include src/docker-manager.ts — a trivial workflow change that closes a meaningful security gap.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 20, 2026, 10:22 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1296

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1296

Uh oh!

github-actions[bot] bot Mar 13, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Critically low unit test coverage on core components

2. Container security scan is path-limited — misses source-driven changes

3. No shell script linting (shellcheck)

4. No Dockerfile linting (hadolint)

🟡 Medium Priority

5. Performance benchmarks not enforced on PRs

6. Documentation build failures don't block PRs

7. Smoke tests are reaction-gated, not automatic

8. No artifact size monitoring

9. Integration tests don't run the api-proxy-observability and api-proxy-rate-limit tests in CI

🟢 Low Priority

10. Link check doesn't run when code files change

11. No OpenSSF Scorecard

12. No SBOM generation in the release pipeline

13. examples/github-copilot.sh skipped in CI

📋 Actionable Recommendations

📈 Metrics Summary

Top 3 Actions for Maximum Impact

Replies: 0 comments

github-actions[bot]
bot Mar 13, 2026

13. `examples/github-copilot.sh` skipped in CI