Python CI failures are noisy, tool-specific, and expensive to triage when Ruff, Pyright, and pytest all disagree about what matters first.
PyGate turns those gate results into one deterministic Python quality gate with bounded auto-repair and structured escalation evidence for humans or agents.
- "The PR failed, but I have to dig through Ruff, Pyright, and pytest separately to understand why."
- "We want fail-fast Python CI, not another lint dashboard."
- "Auto-fix should stop when it stops helping instead of thrashing the repo."
- "If repair cannot finish the job, I want a clean escalation artifact instead of a pile of logs."
pip install pygate-cipygate summarize --input demo/artifacts/failures.json{
"brief_json_path": ".pygate/agent-brief.json",
"brief_md_path": ".pygate/agent-brief.md",
"status": "fail"
}When To Use It
Use PyGate when you want one deterministic Python CI gate that can normalize Ruff, Pyright, and pytest output, attempt bounded deterministic repair, and escalate with machine-readable evidence when it cannot finish safely.
When Not To Use It
Do not use PyGate as a generic lint aggregator, a semantic code fixer, or a replacement for the underlying tools. It is a fail-fast gate-and-escalate wrapper around them.
PyPI package: pygate-ci
CLI command: pygate
- Quick Start
- What It Does
- Commands
- Artifacts
- Repair Loop
- Configuration
- GitHub Action
- Limitations
- Roadmap
- Contributing
- License
# Prerequisites: ruff, pyright, and pytest must be available in your environment
pip install pygate-ci
# Run quality gates on changed files
echo "src/app.py" > changed.txt
pygate run --mode canary --changed-files changed.txt
# Generate agent brief from failures
pygate summarize --input .pygate/failures.json
# Attempt bounded repair
pygate repair --input .pygate/failures.json --max-attempts 3Note: The PyPI package is
pygate-cibut the CLI command ispygate.
PyGate runs deterministic quality gates on your Python project and produces structured, machine-readable artifacts designed for both humans and AI agents.
| Gate | Tool | Canary | Full |
|---|---|---|---|
| lint | ruff | yes | yes |
| typecheck | pyright | yes | yes |
| test | pytest | configurable | yes |
Changed Files ──> Run Gates ──> Findings? ──No──> Pass
|
Yes
|
v
Repair Loop ──> Improved? ──Fixed──> Pass
|
No
|
v
Escalate with Evidence
- You tell PyGate which files changed (from your CI diff, PR, etc.)
- It runs lint, typecheck, and optionally tests
- Findings are normalized into a unified schema with severity, rule codes, and evidence
- The repair loop applies safe deterministic fixes (ruff --fix + format)
- If it can't fix everything, it escalates with structured evidence explaining why
pygate run --mode canary|full --changed-files <path>
pygate summarize --input .pygate/failures.json
pygate repair --input .pygate/failures.json [--max-attempts N]
Exit codes: 0 = pass, 1 = fail (run), 2 = escalated (repair)
All artifacts are written to .pygate/:
| File | Description |
|---|---|
failures.json |
Structured findings with severity, rule codes, and evidence |
run-metadata.json |
Gate execution traces (commands, stdout, stderr, durations) |
agent-brief.json |
Priority actions and retry policy for AI agents |
agent-brief.md |
Human-readable summary |
repair-report.json |
Repair attempt history (on success) |
escalation.json |
Escalation reason and evidence (on failure) |
JSON Schema files for all artifact types are available in schemas/ for downstream validation and code generation. See demo/artifacts/ for sample output.
The repair command runs a bounded deterministic repair loop:
- Backup workspace
- Fix via
ruff check --fix+ruff formaton scoped files - Re-run gates to measure improvement
- Decide: pass (done), worsened (rollback), no improvement (escalate)
| Parameter | Default |
|---|---|
| Max attempts | 3 |
| Max patch lines | 150 |
| No-improvement abort | 2 consecutive |
| Time cap | 20 minutes |
| Code | Meaning |
|---|---|
NO_IMPROVEMENT |
2+ consecutive attempts with no finding reduction |
PATCH_BUDGET_EXCEEDED |
Edit exceeded line budget |
UNKNOWN_BLOCKER |
Max attempts exhausted |
UNRESOLVED_DETERMINISTIC_FAILURES |
Deterministic failures remain after repair |
ARCHITECTURAL_CHANGE_REQUIRED |
Structural issues beyond repair scope (reserved) |
FLAKY_EVALUATOR |
Gate produces inconsistent results (reserved) |
ENVIRONMENT_DRIFT |
Python version or dependency mismatch (reserved) |
TEST_FIXTURE_OR_EXTERNAL_DEP |
Tests depend on network, DB, or time (reserved) |
Configure via pygate.toml (standalone) or [tool.pygate] in pyproject.toml:
pygate.toml:
[policy]
max_attempts = 3
max_patch_lines = 150
abort_on_no_improvement = 2
time_cap_seconds = 1200
[commands]
lint = "ruff check --output-format json ."
typecheck = "pyright --outputjson ."
test = "pytest --json-report --json-report-file=.pygate/pytest-report.json -q"
[gates]
test_in_canary = falseOr in pyproject.toml:
[tool.pygate.policy]
max_attempts = 3
[tool.pygate.commands]
lint = "ruff check --output-format json ."
[tool.pygate.gates]
test_in_canary = falsePyGate ships with a composite GitHub Action for CI integration:
- uses: actions/checkout@v4
- uses: hermes-labs-ai/quick-gate-python/.github/actions/pygate@main
with:
mode: canary # or "full"
repair: "true" # attempt auto-repair on failures
max-attempts: 3
python-version: "3.12"The action detects changed files from the PR, runs gates, optionally repairs, and uploads .pygate/ artifacts. The post-comment feature requires pull-requests: write permission in your workflow.
- Deterministic repair only (v1): The repair loop uses
ruff --fixandruff format. It cannot fix type errors, failing tests, or issues requiring semantic understanding. - No incremental analysis: All specified gates run on every invocation. There is no caching or incremental mode.
- Tool availability: PyGate requires ruff, pyright, and pytest to be installed in the target environment. It does not install them.
- Single-repo scope: Designed for single Python projects, not monorepos with multiple packages.
- Model-assisted repair (LLM-powered fixes for type errors and test failures)
- Coverage gate (fail on coverage drops)
- Security gate (bandit / safety integration)
- Incremental mode (only re-run gates on changed files)
- PyPI trusted publishing via GitHub Actions
- Plugin system for custom gates
See CONTRIBUTING.md for development setup and guidelines.
Hermes Labs builds AI audit infrastructure for enterprise AI systems — EU AI Act readiness, ISO 42001 evidence bundles, continuous compliance monitoring, agent-level risk testing. We work with teams shipping AI into regulated environments.
Our OSS philosophy — read this if you're deciding whether to depend on us:
- Everything we release is free, forever. MIT or Apache-2.0. No "open core," no SaaS tier upsell, no paid version with the features you actually need. You can run this repo commercially, without talking to us.
- We open-source our own infrastructure. The tools we release are what Hermes Labs uses internally — we don't publish demo code, we publish production code.
- We sell audit work, not licenses. If you want an ANNEX-IV pack, an ISO 42001 evidence bundle, gap analysis against the EU AI Act, or agent-level red-teaming delivered as a report, that's at hermes-labs.ai. If you just want the code to run it yourself, it's right here.
The Hermes Labs OSS audit stack (public, production-grade, no SaaS):
Static audit (before deployment)
- lintlang — Static linter for AI agent configs, tool descriptions, system prompts.
pip install lintlang - rule-audit — Static prompt audit — contradictions, coverage gaps, priority ambiguities
- scaffold-lint — Scaffold budget + technique stacking.
pip install scaffold-lint - intent-verify — Repo intent verification + spec-drift checks
Runtime observability (while the agent runs)
- little-canary — Prompt injection detection via sacrificial canary-model probes
- suy-sideguy — Runtime policy guard — user-space enforcement + forensic reports
- colony-probe — Prompt confidentiality audit — detects system-prompt reconstruction
Regression & scoring (to prove what changed)
- hermes-jailbench — Jailbreak regression benchmark.
pip install hermes-jailbench - agent-convergence-scorer — Score how similar N agent outputs are.
pip install agent-convergence-scorer
Supporting infra
