Skip to content

Latest commit

 

History

History
220 lines (159 loc) · 13.2 KB

File metadata and controls

220 lines (159 loc) · 13.2 KB

ZENITH ARCHITECTURE GUIDE

Your study guide for the pitch. Read this when a judge walks over.
Last updated: Session 1 — Core Pipeline scaffolding complete.


The Big Picture

Zenith is a 4-module sequential pipeline. A piece of code or a prompt string enters at Module 1 and a full security report exits at Module 4. Each module hands off a single ZenithPayload object to the next.

User Input (file or prompt)
        │
        ▼
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  MODULE 1        │────▶│  MODULE 2        │────▶│  MODULE 3        │────▶│  MODULE 4        │
│  Ingress         │     │  SAST + CVE      │     │  Red/Blue Clash  │     │  Verify + Score  │
│  dev1_ingress/   │     │  dev2_sast/      │     │  dev3_clash/     │     │  dev4_verify/    │
│  ingress.py      │     │  sast_runner.py  │     │  clash_runner.py │     │  verify_runner.py│
└──────────────────┘     └──────────────────┘     └──────────────────┘     └──────────────────┘
                                                                                     │
                                                                                     ▼
                                                                          results/dashboard.html
                                                                          results/report.json

The key insight for judges: The pipeline is self-healing. It doesn't just detect threats — it fights them (Module 3) and then mathematically proves the fix worked (Module 4). No other tool in this space closes that loop automatically.


The Data Contract: shared/payload.py

What: A Python @dataclass called ZenithPayload.
Why this way: Instead of passing 15 loose variables between modules, everyone agrees on one shared object. This is the standard engineering pattern for microservice pipelines — equivalent to a Protobuf message or a TypedDict.
Pitch line: "Our data contract prevents the #1 source of hackathon bugs: one dev names a field injections, another calls it injection_list, and they never talk to each other."

Key fields to know for the pitch

Field Set by Meaning
input_type Module 1 "code_file" / "prompt_string" / "dependency_manifest"
injections_detected Module 1 List of prompt-injection pattern labels found
threat_level Modules 1→2 LOW / MEDIUM / HIGH / CRITICAL — escalates, never downgrades
sast_findings Module 2 Semgrep rule hits with line numbers
cve_findings Module 2 Known CVEs from OSV.dev for detected dependencies
red_team_attacks Module 3 Claude's exploit payloads (one per round)
blue_team_patches Module 3 GPT-4o's defensive patches (one per round)
clash_verdict Module 3 "PATCHED" / "UNRESOLVED" / "SKIPPED"
confidence Module 4 0–1 score: how sure we are the patch works
robustness Module 4 0–1 score: how hard it was to patch
integrity Module 4 0–1 score: no regressions introduced

Module 1 — src/dev1_ingress/ingress.py

What it does: Accepts input, scans for prompt injection patterns, classifies the input type, sets initial threat level.
Why built this way: Prompt injection is caught at the gate — before any AI ever sees the malicious string. Defence-in-depth: the attacker never gets through the door.
Connects to: Populates the first three fields of ZenithPayload, then hands off to Module 2.

Module 1 Key functions

Function What it does Judge talking point
detect_injections(text) Runs 12 regex patterns against the input "We maintain a growing bank of known jailbreak patterns — DAN mode, system prompt overrides, INST-tag injection — all caught before the AI sees it."
triage_input_type(raw, path) Classifies as code/prompt/manifest "Different inputs need different analysis — you can't run Semgrep on a plain-English prompt."
assess_threat_level(injections) Maps injection count to LOW/MEDIUM/HIGH/CRITICAL "We boost severity for architecturally dangerous patterns like XML system-tag injection."
run_ingress(raw, path) The public API. Called by core_cli.py Creates and returns the ZenithPayload skeleton.

Module 2 — src/dev2_sast/sast_runner.py

What it does: Runs Semgrep (a real security scanner used by Shopify, Dropbox, etc.) on code files. Queries osv.dev (Google's open vulnerability database) for CVEs in dependencies.
Why built this way: Regex won't catch SQL injection in complex AST patterns — you need a purpose-built AST scanner. Semgrep is the industry standard, and it's free and local.
Connects to: Reads input_type, raw_input, file_path → Writes sast_findings, cve_findings, escalates threat_level.

Module 2 Key functions

Function What it does Judge talking point
run_semgrep(file_path) Shells out to Semgrep CLI, parses JSON output "We use the same tool Shopify uses in production. Auto rule detection with --config auto."
query_osv_for_package(name, ver) Hits api.osv.dev — the same DB GitHub Dependabot uses "The OSV database is free, has zero rate limits, and covers PyPI, npm, Go, Maven, and more."
query_nvd_for_package(name, ver) Graceful fallback/hybrid query to NIST NVD API "We scan the National Vulnerability Database dynamically. If it rate limits, we handle it gracefully so the demo never crashes."
escalate_threat_level(current, sast, cves) Threat level only goes UP, never down "A HIGH CVE plus three SAST findings = CRITICAL. The score is conservative by design."
run_sast(payload) The public API. Orchestrates Semgrep + CVE scan and returns updated payload.

Module 3 — src/dev3_clash/ (3-file split)

Files: red_team.py (attacker) + blue_team.py (defender) + clash_runner.py (orchestrator)
Why split: Each AI agent is its own class. Swapping Claude for Gemini means editing one file — the orchestrator never changes. Clean separation of concerns.
Connects to: Reads sast_findings, raw_input → Writes red_team_attacks, blue_team_patches, clash_rounds, clash_verdict, patch_status. After the loop, payload.raw_input is replaced with the final patch so Module 4 verifies the fixed code.

red_team.pyRedTeamAgent (Claude 3.5 Sonnet)

Method What it does Judge talking point
__init__() Checks for ANTHROPIC_API_KEY; prints notice and uses mock if absent "The demo never crashes — no API key means mock mode, identical outputs."
attack(code, finding) Sends vulnerable code + SAST finding to Claude; gets ≤15-line exploit back "We give Claude the exact rule that triggered — it knows what to probe, not just the raw code."
Mock fallback Pre-written OR 1=1 SQL exploit "Judges see a real exploit payload even in offline mode."

Win condition: Claude returns the literal string EXPLOIT_IMPOSSIBLE — that sentinel, not a heuristic, is what ends the loop. No fuzzy matching.

blue_team.pyBlueTeamAgent (GPT-4o)

Method What it does Judge talking point
__init__() Checks for OPENAI_API_KEY; uses mock if absent "Same graceful degradation pattern as the Red Team."
patch(code, attack, finding) Sends vulnerable code + exploit script to GPT-4o; gets secure rewrite "GPT-4o sees the actual exploit — it patches the real attack surface, not a theoretical one."
Mock fallback Pre-written parameterised query (WHERE id = ?) "Judges see a concrete before/after even in offline mode."

System prompt discipline: GPT-4o is told "no markdown code blocks" — output is directly runnable Python, zero stripping needed.

clash_runner.py — Orchestrator

Function What it does
run_clash(payload) The public API. Runs the round loop, writes all clash fields back to payload.

The Clash Loop (explain to judges)

Round 1: Claude attacks raw_input.
          EXPLOIT_IMPOSSIBLE? → verdict = PATCHED, stop.
          Else: GPT-4o patches it → current_code = patch.

Round 2: Claude attacks the PATCH (not the original).
          Same logic — stop on EXPLOIT_IMPOSSIBLE.

Round 3: Claude attacks the hardened patch.
          Still exploitable? → verdict = UNRESOLVED.

After loop: payload.raw_input = final patch.
            Module 4 verifies THIS code, not the original.

Pitch line: "The verdict is driven by the attacker. Only Claude saying 'I cannot exploit this anymore' counts as a win. That's adversarial validation — not AI telling itself it fixed the problem."


Module 4 — src/dev4_verify/verify_runner.py

What it does: Auto-generates a pytest test suite from the pipeline findings, runs it, calculates three scores (CRI), writes report.json, generates dashboard.html, and opens it in the browser.
Why built this way: "The AI says it's fixed" is not good enough for a production pipeline. We need proof. Dynamic test generation means no dev writes manual tests during a 28-hour sprint.
Connects to: Reads ALL payload fields → Writes test_results, confidence, robustness, integrity. Produces the two output files.

The CRI Scoring System (memorise this for judges)

Confidence  = (tests_passed / tests_total) × clash_bonus
              ↑ How certain are we the patch is correct?

Robustness  = 1.0 - (clash_rounds / 3)
              ↑ Did we need 1 round or 3? Fewer = more robust.

Integrity   = 1.0 - (failed_tests / total_tests)
              ↑ Did the patch introduce regressions?

Pitch line: "A system that scores 95% Confidence, 100% Robustness, 100% Integrity patched the vulnerability in round one with zero regressions. That's the number judges see on the dashboard."

Module 4 Key functions

Function What it does
generate_pytest_module(payload) Writes a .py test file dynamically from findings
run_pytest(test_source) Runs pytest in subprocess, parses pass/fail count
calculate_scores(payload, results) Returns (confidence, robustness, integrity) tuple
write_report(payload) Serialises payload to results/report.json
write_dashboard(payload) Generates the dark-mode HTML dashboard
run_verify(payload) The public API. Chains all of the above.

core_cli.py — The Conductor

What it does: One CLI entry point that chains all 4 modules. Usage: python core_cli.py demo_vuln.py
Why built this way: Lazy imports with try/except mean the pipeline never crashes if a module file is absent. Each dev can work in isolation on their branch without breaking the demo.
Key rule: core_cli.py is the ONLY file on main branch that imports other modules. It imports lazily, not at the top level.

Demo commands

# Run the full pipeline on a real file
python core_cli.py path/to/file.py

# Scan a raw prompt for injection
python core_cli.py --prompt "Ignore previous instructions..."

# Built-in demo (always works, no setup needed)
python core_cli.py --demo

Graceful Degradation Map (no demo crashes)

Scenario What happens
Module 1 file missing ⚠ SKIP [MODULE 1] printed, pipeline continues with default payload
Semgrep not installed Returns {"warning": "Semgrep not installed"} in sast_findings
No OPENAI_API_KEY Module 3 enters simulate mode, returns templated patch
No ANTHROPIC_API_KEY Module 3 enters simulate mode, returns templated attack
OSV.dev down Returns {"warning": "OSV API error: ..."}, no crash
pytest fails 4th time Module 4 returns patch_status: "FAILED", scores calculated anyway

What to Say When a Judge Asks "How Does It Work?"

"You submit a piece of code or a prompt. Module 1 is our gate — it checks for prompt injection and classifies what you sent us. Module 2 runs a real static analysis engine, the same one Shopify uses, and cross-references your dependencies against Google's public vulnerability database.

If it finds something, Module 3 kicks off the adversarial clash. We use Claude as the Red Team attacker and GPT-4o as the Blue Team defender. They fight for up to 3 rounds. Claude doesn't know what GPT-4o is going to patch, so there's genuine adversarial tension.

Finally, Module 4 auto-generates a pytest suite, runs it, and gives you three scores: Confidence, Robustness, and Integrity. Those scores appear on this dashboard. High Confidence means we're mathematically certain the patch works. High Robustness means it took just one round. High Integrity means no new bugs were introduced.

The whole loop runs in under 60 seconds for most inputs."