You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The repository has a well-structured and mature CI/CD pipeline with 40+ workflow files covering build verification, testing, security scanning, and agentic quality checks. Recent workflow runs show high success rates: most workflows are passing at 100%, with only the currently-running assessment workflow pending.
1. Critically low unit test coverage on core components
cli.ts has 0% unit test coverage and docker-manager.ts has only 18% (4% function coverage).
These two files contain the majority of the business logic: CLI argument handling, container orchestration, exit code propagation, cleanup lifecycle, and log streaming.
The overall coverage threshold is set at 38% statements — extremely low for a security-critical firewall that controls network egress for AI agents.
A regression in the cleanup lifecycle, domain normalization, or iptables rule generation would not be caught by unit tests.
2. Container security scan is path-limited — misses source-driven changes
container-scan.yml only triggers on containers/** path changes on PRs.
Changes to src/docker-manager.ts (which controls container configuration, capabilities, seccomp profiles, and volume mounts) do not trigger a Trivy scan.
A security regression in how containers are launched (e.g., dropping the seccomp profile, re-enabling NET_ADMIN) would not trigger a container scan.
3. No shell script linting (shellcheck)
Critical shell scripts with no static analysis: containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, containers/squid/entrypoint.sh, and scripts/ci/cleanup.sh.
setup-iptables.sh configures the iptables rules that enforce the network isolation. A bug (e.g., wrong flag, wrong chain) could silently break the firewall.
shellcheck catches quoting bugs, undefined variables, and unsafe patterns before runtime.
4. No Dockerfile linting (hadolint)
Neither containers/agent/Dockerfile nor containers/squid/Dockerfile is linted.
hadolint catches security anti-patterns (running as root, pinning base images, layer ordering) that Trivy does not detect.
The agent container Dockerfile installs packages with apt-get — version pinning and layer security are not validated.
🟡 Medium Priority
5. Performance benchmarks not enforced on PRs
performance-monitor.yml runs weekly only and does not trigger on pull requests.
The benchmark covers container startup time, proxy setup latency, and command execution overhead — metrics users care about directly.
A change that increases startup time from 8s to 30s (e.g., slow healthcheck polling) would not be caught before merge.
Note: The workflow creates GitHub issues on regression detection when run on a schedule, but this is reactive, not proactive.
6. Documentation build failures don't block PRs
docs-preview.yml has continue-on-error: true on the build step.
A broken Astro/Starlight documentation build (e.g., invalid MDX, broken imports) does not fail the PR.
Since docs changes are common (the project has active documentation), silent doc build failures accumulate.
7. Smoke tests are reaction-gated, not automatic
The smoke tests for Claude (smoke-claude.md), Codex (smoke-codex.md), and Copilot (smoke-copilot.md) are triggered by emoji reactions, not automatically on every PR.
While this is intentional to conserve resources, there is no clear policy for which PRs require smoke tests before merge.
High-impact changes (e.g., changes to containers/, src/docker-manager.ts, src/squid-config.ts) could merge without an end-to-end smoke test.
8. No artifact size monitoring
The compiled dist/ output and Docker image sizes are not tracked.
Accidental inclusion of large files (e.g., fixture data, bundled node_modules) in dist/ would not be caught.
Docker image size regressions affect pull times for all users of the GHCR images.
9. Integration tests don't run the api-proxy-observability and api-proxy-rate-limit tests in CI
test-integration-suite.yml runs --testPathPatterns="api-proxy" which should match all api-proxy tests.
However, api-proxy-observability.test.ts and api-proxy-rate-limit.test.ts are in the integration folder but do not appear in the pattern list documented in docs/INTEGRATION-TESTS.md. These tests should be verified to actually run in the test-api-proxy job.
🟢 Low Priority
10. Link check doesn't run when code files change
link-check.yml only triggers on **/*.md path changes.
Source code changes that remove or rename CLI flags, configuration options, or environment variables referenced in documentation produce dangling doc links — but these won't trigger a link check.
Adding push trigger or broadening the path filter would catch cross-cutting stale references.
11. No OpenSSF Scorecard
As a security-focused, GitHub-owned project, OpenSSF Scorecard is a natural fit.
It would automatically assess branch protection, token permissions, dependency update practices, CI/CD security, and pinned actions.
Results can be published as a badge and monitored for regression.
12. No SBOM generation in the release pipeline
release.yml publishes Docker images to GHCR but does not generate a Software Bill of Materials (SBOM).
Trivy can generate SBOMs as part of the release process, and GitHub's container registry can attach them to image manifests.
This is increasingly expected for security tooling consumed in enterprise environments.
13. examples/github-copilot.sh skipped in CI
The most realistic end-to-end example is not tested in CI because it requires a Copilot token.
A mock/offline variant of this test or a dedicated integration environment with a test token would catch CLI argument regressions in real-world usage patterns.
📋 Actionable Recommendations
#
Gap
Recommended Solution
Complexity
Impact
1
cli.ts / docker-manager.ts near-zero unit coverage
Add unit tests for generateSquidConfig, generateDockerCompose, domain normalization, and exit code logic; raise coverage threshold incrementally (target: 60%)
High
🔴 Critical
2
Container scan path-limited
Add src/docker-manager.ts to container scan trigger paths: paths: [containers/**, src/docker-manager.ts]
Low
🔴 High
3
No shellcheck
Add a shellcheck job to lint.yml targeting containers/**/*.sh and scripts/**/*.sh
Low
🔴 High
4
No Dockerfile linting
Add hadolint job to lint.yml or build.yml targeting both Dockerfiles
Low
🟡 Medium
5
No PR performance gate
Add a performance-regression job in integration tests that runs a subset of benchmarks (e.g., startup time only) with a failure threshold
Medium
🟡 Medium
6
Docs build failures silent
Remove continue-on-error: true from docs-preview.yml build step
Low
🟡 Medium
7
Smoke tests not automatic
Add path-based automatic triggering for smoke tests on containers/** and src/** changes, without requiring reaction
Low
🟡 Medium
8
No artifact size check
Add a step to build.yml that checks du -sh dist/ against a threshold and fails if it exceeds it
Low
🟢 Low
9
Verify api-proxy-observability runs
Audit test-integration-suite.yml testPathPatterns to confirm all integration test files are covered
Low
🟡 Medium
10
Link check scope
Add src/** and containers/** to link-check.yml paths (with lychee ignoring non-URL content)
Low
🟢 Low
11
No OpenSSF Scorecard
Add ossf/scorecard-action workflow with weekly schedule and PR comment
Low
🟢 Low
12
No SBOM
Add anchore/sbom-action to release.yml after each Docker build
Low
🟢 Low
📈 Metrics Summary
Metric
Value
Total workflow files
40 (21 .yml standard + 21 .md agentic lock files)
Workflows triggering on PRs
16
Workflows triggering on schedule
10+
Recent workflow success rate (last 30 runs)
~90% (3 failures — all Secret Digger runs, 1 in-progress)
Unit test coverage — statements
38.39%
Unit test coverage — cli.ts
0%
Unit test coverage — docker-manager.ts
18% (4% function coverage)
Integration test files
30 test files across domain/network/security/chroot/api-proxy
Coverage threshold (enforced)
38% statements, 30% branches, 35% functions
Top 3 Actions for Maximum Impact
Raise unit test coverage on docker-manager.ts and cli.ts — the most important files have almost no unit tests; this is the single highest-value investment for PR quality.
Add shellcheck to the lint workflow — one-line change to catch shell script bugs in the security-critical iptables setup scripts.
Expand container scan trigger paths to include src/docker-manager.ts — a trivial workflow change that closes a meaningful security gap.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a well-structured and mature CI/CD pipeline with 40+ workflow files covering build verification, testing, security scanning, and agentic quality checks. Recent workflow runs show high success rates: most workflows are passing at 100%, with only the currently-running assessment workflow pending.
The pipeline is layered across three tiers:
.ymlfiles) — deterministic, automated quality gates.mdlock files compiled withgh-aw) — AI-driven reviews and specialized checks✅ Existing Quality Gates
The following checks run on pull requests to
main:build.ymllint.ymltsc --noEmit)test-integration.ymltest-coverage.ymlcodeql.ymldependency-audit.ymlpr-title.ymlcontainer-scan.ymlcontainers/**path changestest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymllink-check.yml**/*.mdpath changesdocs-preview.ymldocs-site/**path changessecurity-guard.mdbuild-test.mdScheduled / non-PR checks: CodeQL (weekly), Trivy (weekly), dependency audit (weekly), performance benchmarks (weekly), secret digger (hourly), link check (weekly), doc maintainer (daily).
🔍 Identified Gaps
🔴 High Priority
1. Critically low unit test coverage on core components
cli.tshas 0% unit test coverage anddocker-manager.tshas only 18% (4% function coverage).2. Container security scan is path-limited — misses source-driven changes
container-scan.ymlonly triggers oncontainers/**path changes on PRs.src/docker-manager.ts(which controls container configuration, capabilities, seccomp profiles, and volume mounts) do not trigger a Trivy scan.NET_ADMIN) would not trigger a container scan.3. No shell script linting (shellcheck)
containers/agent/setup-iptables.sh,containers/agent/entrypoint.sh,containers/squid/entrypoint.sh, andscripts/ci/cleanup.sh.setup-iptables.shconfigures the iptables rules that enforce the network isolation. A bug (e.g., wrong flag, wrong chain) could silently break the firewall.shellcheckcatches quoting bugs, undefined variables, and unsafe patterns before runtime.4. No Dockerfile linting (hadolint)
containers/agent/Dockerfilenorcontainers/squid/Dockerfileis linted.hadolintcatches security anti-patterns (running as root, pinning base images, layer ordering) that Trivy does not detect.apt-get— version pinning and layer security are not validated.🟡 Medium Priority
5. Performance benchmarks not enforced on PRs
performance-monitor.ymlruns weekly only and does not trigger on pull requests.6. Documentation build failures don't block PRs
docs-preview.ymlhascontinue-on-error: trueon the build step.7. Smoke tests are reaction-gated, not automatic
smoke-claude.md), Codex (smoke-codex.md), and Copilot (smoke-copilot.md) are triggered by emoji reactions, not automatically on every PR.containers/,src/docker-manager.ts,src/squid-config.ts) could merge without an end-to-end smoke test.8. No artifact size monitoring
dist/output and Docker image sizes are not tracked.dist/would not be caught.9. Integration tests don't run the api-proxy-observability and api-proxy-rate-limit tests in CI
test-integration-suite.ymlruns--testPathPatterns="api-proxy"which should match all api-proxy tests.api-proxy-observability.test.tsandapi-proxy-rate-limit.test.tsare in the integration folder but do not appear in the pattern list documented indocs/INTEGRATION-TESTS.md. These tests should be verified to actually run in thetest-api-proxyjob.🟢 Low Priority
10. Link check doesn't run when code files change
link-check.ymlonly triggers on**/*.mdpath changes.pushtrigger or broadening the path filter would catch cross-cutting stale references.11. No OpenSSF Scorecard
12. No SBOM generation in the release pipeline
release.ymlpublishes Docker images to GHCR but does not generate a Software Bill of Materials (SBOM).13.
examples/github-copilot.shskipped in CI📋 Actionable Recommendations
cli.ts/docker-manager.tsnear-zero unit coveragegenerateSquidConfig,generateDockerCompose, domain normalization, and exit code logic; raise coverage threshold incrementally (target: 60%)src/docker-manager.tsto container scan trigger paths:paths: [containers/**, src/docker-manager.ts]shellcheckjob tolint.ymltargetingcontainers/**/*.shandscripts/**/*.shhadolintjob tolint.ymlorbuild.ymltargeting both Dockerfilesperformance-regressionjob in integration tests that runs a subset of benchmarks (e.g., startup time only) with a failure thresholdcontinue-on-error: truefromdocs-preview.ymlbuild stepcontainers/**andsrc/**changes, without requiring reactionbuild.ymlthat checksdu -sh dist/against a threshold and fails if it exceeds ittest-integration-suite.ymltestPathPatterns to confirm all integration test files are coveredsrc/**andcontainers/**tolink-check.ymlpaths (with lychee ignoring non-URL content)ossf/scorecard-actionworkflow with weekly schedule and PR commentanchore/sbom-actiontorelease.ymlafter each Docker build📈 Metrics Summary
.ymlstandard + 21.mdagentic lock files)cli.tsdocker-manager.tsTop 3 Actions for Maximum Impact
docker-manager.tsandcli.ts— the most important files have almost no unit tests; this is the single highest-value investment for PR quality.shellcheckto the lint workflow — one-line change to catch shell script bugs in the security-critical iptables setup scripts.src/docker-manager.ts— a trivial workflow change that closes a meaningful security gap.Beta Was this translation helpful? Give feedback.
All reactions