Skip to content

Latest commit

 

History

History
280 lines (212 loc) · 21.2 KB

File metadata and controls

280 lines (212 loc) · 21.2 KB

PRSpec: Continuous Specification Compliance for Ethereum Clients

Applicant: Safi El-Hassanine
Project: PRSpec — Automated EIP Specification Compliance Checker
Requested Funding: $15,000 (Phase 3 initial milestone) — with Phase 4 ($20,000) contingent on successful Phase 3 delivery
Note: I have intentionally structured this as a milestone-based request. Phase 3 can be funded independently, and Phase 4 funding is only requested after Phase 3 deliverables are verified. This reduces risk for ESP and demonstrates my confidence in execution — Phases 1–2 were completed entirely without funding.
Duration: 4 to 6 months
RFP Reference: Ethereum Foundation ESP — Integrating LLMs into Ethereum Protocol Security Research

Table of Contents

  1. Executive Summary
  2. What PRSpec Does Today
  3. Background & Motivation
  4. Technical Approach
  5. Deliverables
  6. Project Plan & Timeline
  7. Budget Breakdown
  8. Team Qualifications
  9. Success Metrics
  10. Maintenance Plan

1. Executive Summary

Ethereum upgrades are accelerating — Dencun, Pectra, Fusaka — yet manual specification review remains the critical bottleneck. Security researchers spend hundreds of hours per upgrade reconciling consensus specs and execution specs against client implementations, often catching specification drift only after testnet deployment.

PRSpec automates the mechanical part of this work. It fetches EIP specifications (including execution-specs and consensus-specs from their canonical repositories), pulls the corresponding implementation files from multiple Ethereum clients, and uses large-context LLM analysis to identify deviations, missing checks, and edge cases — before code reaches testnet.

Unlike generic code analysis tools, PRSpec understands protocol semantics. It doesn't just compare text — it maps specification constraints to code regions and flags violations even when variable names change or code is restructured.

This is not a proposal for future work. PRSpec is a working tool today. It currently analyzes 6 EIPs across 3 Ethereum clients (go-ethereum, Nethermind, Besu) in Go, C#, and Java, with 62 passing tests and real analysis outputs. This grant request funds the next phase: production GitHub Action integration, cross-client differential analysis, and pilot deployment with client teams.

2. What PRSpec Does Today

PRSpec is functional and actively producing results. Here is what has been built:

CapabilityStatusDetails
EIP Specification FetchingWorkingFetches EIP markdown, execution-specs (Python reference), and consensus-specs (beacon chain) from canonical GitHub repos
Multi-EIP SupportWorkingEIP-1559, 4844, 4788, 2930, 7002, 7251 registered; 1559 and 4844 fully analyzed
Multi-Client AnalysisWorkinggo-ethereum (Go), Nethermind (C#), Besu (Java) — 5 files per EIP per client
Multi-Language ParsingWorkingRegex + optional tree-sitter parsers for Go, Python, C#, Java with EIP keyword matching
LLM AnalysisWorkingGemini 2.5 Pro and GPT-4 backends; structured JSON output; parallel file analysis
Report GenerationWorkingJSON, Markdown, and HTML reports with executive summaries
CLI ToolWorkingFull Click-based CLI with progress bars, configuration panels
Test SuiteWorking62 tests passing (unit, integration, multi-client)
CI PipelineWorkingGitHub Actions running tests on Python 3.9–3.12

Example real output: PRSpec analyzed Nethermind's EIP-1559 implementation (5 C# files) and found 9 issues at 98% confidence, including a non-standard configurable minimum base fee that deviates from the specification, and a FeeCollector property that contradicts the mandatory fee burn mechanism. These are the kinds of findings that automated tools typically miss — they require understanding the intent behind both the spec and the code.

Validated by Nethermind core team: This finding was reported to Nethermind, and a core developer (@LukaszRozmej) confirmed that the FeeCollector is an intentional chain-specific extension (for Gnosis Chain) that “could be refactored better not to pollute the default config and spec.” PRSpec correctly identified a real spec deviation before any grant funding.

Ethereum Foundation engagement: The execution-specs team (@danceratopz, #2 contributor to ethereum/execution-specs) provided architectural guidance on using fork-to-fork diffs (the WET principle) for precise EIP boundary detection — directly informing PRSpec's spec extraction pipeline.

3. Background & Motivation

The Problem

Every Ethereum upgrade requires that multiple client teams independently implement the same specifications. Cross-client consistency is what makes Ethereum secure, but verifying it is almost entirely manual:

  • 400–600 hours per major upgrade for spec reconciliation across client implementations
  • 3–4 week feedback loops between spec finalization and compliance verification
  • 15–20% of security research time spent on mechanical comparison rather than deep analysis

Why This Matters Now

With Pectra shipping and Fusaka in active development, the Ethereum ecosystem is processing more concurrent specification changes than ever. The coordination cost scales superlinearly — each new EIP multiplied by each client team. Human reviewers are Ethereum's scarcest resource; they should focus on nuanced judgment calls, not mechanical diff work.

Gap Analysis

Existing tools address related but distinct problems:

  • Formal verification proves mathematical correctness but requires manual specification translation
  • Static analysis checks code quality but lacks semantic understanding of protocol specs
  • Fuzzing discovers edge cases but cannot verify spec alignment
  • Transaction analysis monitors runtime behavior but offers only post-hoc detection

PRSpec fills the gap: automated specification-to-code alignment with semantic understanding, operating at PR time rather than audit time.

4. Technical Approach

Architecture


 Specification Layer          Analysis Layer           Output Layer
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  EIP Markdown    │     │                  │     │  JSON Reports    │
│  Execution Specs │────▶│  LLM Analyzer    │────▶│  HTML Dashboard  │
│  Consensus Specs │     │  (Gemini / GPT)  │     │  CLI Output      │
└──────────────────┘     └──────────────────┘     │  GitHub Action   │
                                │                  └──────────────────┘
 Code Layer                     │
┌──────────────────┐            │
│  go-ethereum (Go)│            │
│  Nethermind (C#) │────────────┘
│  Besu (Java)     │
│  + future clients│
└──────────────────┘

Specification Layer: Fetches EIP documents, execution-specs (Python reference implementations from ethereum/execution-specs), and consensus-specs (beacon chain markdown from ethereum/consensus-specs). Extracts sections, identifies constraints.

Code Layer: Per-client, per-EIP file registries mapping each EIP to the specific source files that implement it. Multi-language parsers (Go, C#, Java, Python) extract functions, classes, and methods with EIP-keyword filtering.

Analysis Layer: Sends specification + code to a large-context LLM with a structured prompt that enforces JSON output. The prompt is EIP-agnostic — it reads the EIP number, title, and focus areas from context, so the same pipeline works for any EIP. Files are analyzed in parallel via thread pool.

Output Layer: Generates JSON (machine-readable), Markdown (documentation), and HTML (human review) reports with executive summaries, per-file status, confidence scores, and actionable issue descriptions.

Semantic Analysis (Not Just Text Diff)

Traditional diff tools compare text; PRSpec compares intent:

  1. Extracts semantic constraints from specifications (e.g., "the base fee must increase by 1/8 when blocks are full")
  2. Maps constraints to code regions using AST analysis + LLM reasoning
  3. Detects drift when code violates extracted constraints, even if variable names change or structure is refactored

Example: Nethermind's IEip1559Spec interface includes a FeeCollector address property. A text diff would show a normal interface definition. PRSpec flags it as a HIGH severity deviation because EIP-1559 requires the base fee to be burned, not collected — a semantic violation invisible to syntactic tools.

Security-First Design

  • No data retention: Code is sent to the LLM for analysis and not stored beyond the local output directory
  • Deterministic prompting: Structured JSON output schema ensures reproducible results
  • Minimal permissions: GitHub Action will require only read access to PR diffs
  • Local LLM support (planned): Ollama integration for sensitive codebases where no code should leave the infrastructure

5. Deliverables

Already Delivered (Pre-Grant)

DeliverableStatus
Working prototype with EIP-1559 analysis against go-ethereumDone (v1.0)
Multi-EIP architecture supporting 6 EIPsDone (v1.1)
Parallel analysis engine with executive summariesDone (v1.3)
Multi-client support: Nethermind (C#) + Besu (Java)Done (v1.4)
62 passing tests, CI pipeline, full documentationDone (v1.4)

Grant Deliverables (Funded Work)

DeliverablePhaseDescription
Cross-client differential reportsPhase 3Compare how multiple clients implement the same EIP; identify where implementations diverge
Production GitHub ActionPhase 4Zero-config CI integration: uses: prspec/action@v1 in any client repo
PR-level analysisPhase 4Analyze pull requests against canonical specs; post findings as PR comments
Pectra/Fusaka EIP coveragePhase 3Add file mappings and analysis for current upgrade EIPs (7702, 2935, etc.)
Security dashboardPhase 4Web interface for security teams to monitor spec compliance across clients
Local LLM supportPhase 4Ollama backend for privacy-sensitive analysis
Pilot with 2+ client teamsPhase 4Deployed in real client team workflows
Comprehensive documentationOngoingSetup guides, API docs, contribution guidelines

6. Project Plan & Timeline

Duration: 6 months (Feb 2026 – Aug 2026)

Phase 1–2: Foundation & Multi-Client (Months 1→2) — COMPLETED (Started 10)th Jan 2026

Completed ahead of schedule, pre-grant, demonstrating execution capability.

  • ☑ Multi-EIP architecture supporting 6 EIPs (1559, 4844, 4788, 2930, 7002, 7251)
  • ☑ Multi-client analysis: go-ethereum (Go), Nethermind (C#), Besu (Java)
  • ☑ Multi-language parsers with EIP keyword matching
  • ☑ Parallel analysis engine (~3x speedup)
  • ☑ JSON/Markdown/HTML reports with executive summaries
  • ☑ 62 passing tests across unit, integration, and multi-client suites
  • ☑ CI pipeline (GitHub Actions, Python 3.9–3.12)

Phase 3: Cross-Client Intelligence — Timeline: Month 3 → Month 4 Scheduled for 1st Mar 2026

Goal: Enable differential analysis to find where client implementations diverge from each other.

  • Build cross-client comparison engine that analyzes the same EIP across multiple clients in a single run
  • Generate differential reports highlighting where implementations agree and disagree
  • Extend EIP coverage to Pectra/Fusaka upgrade EIPs (7702, 2935, etc.)
  • Add Prysm (Go) and Lighthouse (Rust) consensus client support
  • Implement spec embedding cache for faster repeated analysis

Phase 4: Production & CI Integration — Timeline: Month 5 → Month 6

Goal: Ship production-ready GitHub Action, pilot with real client teams.

  • Build and publish prspec/action@v1 GitHub Action
  • Implement PR-level analysis (analyze diffs, not just full files)
  • Build security team dashboard for monitoring compliance across clients
  • Add local LLM support via Ollama for privacy-sensitive workflows
  • Conduct internal security audit of the tool itself
  • Pilot deployment with 2+ client teams
  • Transition documentation and community onboarding

Future Direction: Spec Quality Analysis (Community-Suggested)

Following discussion with Nethermind core developers (issue #10522), a promising direction emerged: using PRSpec to analyze the specifications themselves — flagging EIPs and devp2p specs that lack type sizes, constraints, or precise may/should/must logic. This would shift PRSpec from detecting code-vs-spec drift to also detecting spec ambiguity that causes implementation divergence in the first place. This direction will be explored after Phase 4 delivery.

7. Budget Breakdown

Milestone 1 — Phase 3: Cross-Client Intelligence ($15,000)

This is the initial funding request. Phase 3 is a self-contained deliverable.

CategoryAmountDetails
Development$11,000Cross-client differential engine, Pectra/Fusaka EIP coverage, Prysm + Lighthouse support
Infrastructure$2,500LLM API costs (Gemini/GPT) for testing and analysis across all clients
Documentation$1,500Technical writing, cross-client report examples, onboarding materials

Milestone 2 — Phase 4: Production & CI Integration ($20,000)

Requested only after Phase 3 deliverables are verified and accepted.

CategoryAmountDetails
Development$14,000GitHub Action, PR-level analysis, security dashboard, Ollama local LLM
Infrastructure$2,500CI runners, GitHub Actions compute, hosting for dashboard
Security Audit$2,000Third-party review of GitHub Action permissions and data handling
Community & Pilots$1,500Onboarding materials, pilot coordination with 2+ client teams

Total across both milestones: $25,000 - $35,000
Initial request: $15,000 — ESP bears no risk on Phase 4 until Phase 3 is proven.

Open to discussion: These numbers reflect estimated costs, not fixed requirements. I am fully open to lower amounts or alternative structures — partial funding, deferred payments, or a smaller scope per phase. I built Phases 1–2 entirely self-funded because I believe in this project; the budget is about sustaining momentum, not a precondition for building. What matters most is ESP's support and alignment, not the specific dollar amount.

8. Team Qualifications

Safi El-Hassanine — Principal Engineer

Software engineer with 7+ years building production systems across fintech and distributed systems. My interest in Ethereum protocol security grew from watching the coordination challenges of the Merge and subsequent upgrades — recognizing that security bottlenecks are increasingly organizational, not technical.

Relevant experience:

  • Built and scaled CI/CD pipelines processing 10M+ daily transactions
  • Developed static analysis tools for Solidity smart contracts (open source)
  • Deep familiarity with Ethereum protocol specs through independent research and client codebase study
  • Experience with LLM application development: RAG systems, structured prompting, local model deployment

Execution proof: PRSpec Phases 1–2 were completed before grant funding, on personal time and resources. The tool works today. This grant funds the production and deployment phases.

9. Success Metrics

MetricTargetMeasurement
Spec drift instances caught10+ real findings across clientsTracked in public issue reports
Client teams piloting2+ teams using PRSpec in their workflowIntegration confirmations from maintainers
False positive rate<15%Labeled dataset of historical spec-code mismatches
EIP coverage10+ EIPs with full file mappingsRegistry count in codebase
Review time savings30%+ reduction in manual spec reviewTime-tracking survey with pilot teams
PRs analyzed (with GitHub Action)100+ per month by month 6GitHub Action telemetry (opt-in)

10. Maintenance Plan

Open Source: All code is released under the MIT license. Repository: github.com/Fosurero/PRSpec

Active Development (Months 1–4): Weekly progress updates to ESP. All milestones tracked in GitHub Issues and documented in the changelog.

Transition (Months 5–6): Prepare for community stewardship — contribution guidelines (already in place), onboarding documentation, and optionally transfer to EF GitHub organization if desired.

Post-Grant: I commit to 12 months of advisory support (1–2 hours weekly) to ensure smooth knowledge transfer. Community-driven maintenance with client teams contributing EIP adapters.


Ethereum's security depends on faithful reconciliation of specifications and implementations across multiple independent client teams. This is mechanical, high-stakes work that scales poorly with human effort alone.

PRSpec automates the mechanical comparison, letting human intelligence focus on the judgment calls that only experienced protocol engineers can make. The tool works today. This grant funds its path to production.


Contact:
Safi El-Hassanine
Sofyelhelaly(at)gmail.com
https://x.com/Safy__H
GitHub: github.com/Fosurero

Repository: github.com/Fosurero/PRSpec