A story of building production-grade software with TDD, BDD, and an AI pair programmer
Building infrastructure software for Ethereum is not trivial. Validators, stakers, and application developers all depend on reliable access to both the Execution Layer (EL) and Consensus Layer (CL). When nodes go down or fall behind, requests fail. When requests fail, users suffer.
Vixy was born from a simple need: route requests to healthy nodes, automatically.
But this blog isn't just about what Vixy does—it's about how it was built. In a single day, using Test-Driven Development (TDD), Behavior-Driven Development (BDD), and an AI assistant (Claude), we went from an empty repository to a fully functional, well-tested Ethereum proxy.
Every successful project starts with a plan. Before writing a single line of code, we created AGENT.md—a comprehensive specification that served as both documentation and task list.
The plan broke development into 13 phases:
- Project Setup - Dependencies, file structure, CI/CD
- BDD Infrastructure - Cucumber test harness
- Configuration - TOML parsing with validation
- State Management - Thread-safe node state tracking
- EL Health Check - JSON-RPC block number monitoring
- CL Health Check - Beacon API health and slot monitoring
- Health Monitor - Background health checking loop
- Proxy Server - HTTP and WebSocket request forwarding
- Main Entry Point - CLI and server initialization
- Metrics - Prometheus endpoint
- Final Verification - CI validation
- Enhancements - Status endpoint, configuration options
- Write the Story - This blog post
Each phase had clear deliverables, test requirements, and acceptance criteria. The AI could follow this blueprint autonomously, making decisions within defined boundaries.
We followed strict TDD throughout:
RED Phase - First, we wrote 17 tests that defined the expected behavior:
#[test]
fn test_parse_hex_block_number_with_prefix() {
let result = parse_hex_block_number("0x10d4f");
assert_eq!(result.unwrap(), 68943);
}
#[tokio::test]
async fn test_check_el_node_success() {
let mock_server = MockServer::start().await;
// ... mock eth_blockNumber response
let block_number = check_el_node(&mock_server.uri()).await;
assert_eq!(block_number.unwrap(), 68943);
}Running cargo test el showed 17 failures. Perfect—that's exactly what we wanted.
GREEN Phase - Then we implemented just enough code to make tests pass:
pub fn parse_hex_block_number(hex: &str) -> Result<u64> {
let hex_str = hex.strip_prefix("0x").unwrap_or(hex);
u64::from_str_radix(hex_str, 16)
.wrap_err_with(|| format!("invalid hex number: {hex}"))
}One by one, tests went green. The rhythm was addictive.
REFACTOR Phase - With passing tests as our safety net, we cleaned up code without fear.
Building software is never smooth. Here are the challenges we faced:
Early in development, we had a subtle bug: unreachable nodes were being marked as healthy.
The Problem: When a node couldn't be reached, it had block_number = 0. The chain head was also 0 (no nodes responding). So the lag calculation was 0 - 0 = 0, which was within the threshold.
The Fix: We added a check_ok field to track whether the health check succeeded:
// Before: is_healthy = lag <= max_lag
// After: is_healthy = check_ok && lag <= max_lagTDD caught this bug immediately. The test test_el_node_marked_unhealthy_on_connection_failure failed, showing us the edge case before it could reach production.
We hit a compiler error when setting up routes:
error: invalid route syntax `*path`
axum 0.8 changed wildcard syntax from *path to {*path}. A quick documentation check revealed the fix. This is the kind of "boring" bug that AI handles well—pattern matching against known issues.
tokio-tungstenite and axum use different types for WebSocket messages. What looked like the same type was actually incompatible:
// tungstenite uses its own Utf8Bytes
// axum uses its own Utf8Bytes
// They're not the same type!
// Fix: explicit conversion
Message::Text(text.as_str().into())Three hours of human debugging compressed into three minutes of AI analysis.
Beyond unit tests, we used Cucumber for Behavior-Driven Development:
Feature: EL (Execution Layer) Health Check
Scenario: Healthy EL node within lag threshold
Given an EL node at block 1000
And the EL chain head is at block 1002
And the max EL lag is 5 blocks
When the health check runs
Then the EL node should be marked as healthy
And the EL node lag should be 2 blocksThese scenarios served as living documentation. Anyone could read them and understand what Vixy does, without diving into code.
Final BDD Results:
- 3 features (config, EL health, CL health)
- 16 scenarios
- 83 steps
- All passing
By the end of development:
| Metric | Count |
|---|---|
| Unit Tests | 72 |
| BDD Scenarios | 16 |
| BDD Steps | 83 |
| Lines of Rust | ~2,500 |
| Commits | 15+ |
| Development Time | ~8 hours |
All tests pass. All CI checks pass. The code is formatted, linted, and ready for production.
Vixy is a production-ready Ethereum proxy with:
- Health Monitoring - Tracks block numbers (EL) and slots (CL)
- Automatic Failover - Routes to backup nodes when primaries fail
- HTTP Proxy -
/elfor JSON-RPC,/cl/*for Beacon API - WebSocket Proxy -
/el/wsfor subscriptions - Status Endpoint -
/statusreturns JSON with all node states - Metrics -
/metricsfor Prometheus
# Create config
cp config.example.toml config.toml
# Run
cargo run -- --config config.toml
# Test EL proxy
curl -X POST http://localhost:8080/el \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
# Check status
curl http://localhost:8080/statusEvery bug we caught early was a bug we didn't debug in production. The 72 unit tests aren't overhead—they're insurance.
The AI handled:
- Boilerplate code generation
- Error message interpretation
- Documentation lookups
- Repetitive test writing
Humans (or human-AI collaboration) handled:
- Architecture decisions
- Edge case identification
- "Does this make sense?" questions
AGENT.md was the key. With clear phases, acceptance criteria, and examples, the AI could work independently. Vague instructions produce vague results.
DIARY.md captured the journey—not just what was built, but how. Every challenge, every fix, every learning. This blog exists because that documentation exists.
Unit tests and BDD scenarios are great, but they test against mocks. To truly verify Vixy works, we needed real Ethereum nodes.
Kurtosis is a platform for packaging and launching ephemeral backend stacks. With their ethereum-package, we can spin up a complete Ethereum testnet in minutes.
Our test setup:
- 4 EL nodes (geth) - 2 primary, 2 backup
- 4 CL nodes (lighthouse) - for consensus
- Minimal preset - fast block times (2s) for quick testing
# kurtosis/network_params.yaml
participants:
- el_type: geth
cl_type: lighthouse
count: 4
network_params:
preset: minimal
seconds_per_slot: 2just integration-testThis command:
- Starts a Kurtosis enclave with 4 EL/CL pairs
- Auto-detects node endpoints and generates Vixy config
- Starts Vixy with the generated config
- Runs 15 cucumber scenarios against real nodes
- Cleans up
Test coverage:
- CL proxy forwarding (health, headers, syncing)
- EL proxy forwarding (eth_blockNumber, eth_chainId, batches)
- Single-node failover (stop el-1, verify el-2 handles requests)
- Full backup failover (stop ALL primaries, verify backups take over)
- Health monitoring (status endpoint, node detection, recovery)
- Prometheus metrics
This is the test that proves Vixy's value:
Scenario: Proxy uses backup when all primary nodes are down
Given all primary EL nodes are stopped
When I send an eth_blockNumber request to Vixy
Then I should receive a valid block number response
And the response should be from a backup nodeWhen we stop el-1 and el-2 (both primaries), Vixy automatically routes to el-3 or el-4 (backups). No manual intervention. No downtime.
Integration tests caught bugs that unit tests missed:
-
Missing Content-Type header - The proxy wasn't forwarding the Content-Type header, causing geth to return HTTP 415. Unit tests with mocks didn't catch this.
-
Beacon node 206 responses - Lighthouse returns HTTP 206 when syncing, which our tests incorrectly flagged as failure.
Both were fixed before they could impact production.
Vixy is functional, but there's always more to do:
- Round-robin load balancing
- Actual retry logic (infrastructure is in place)
- TLS/HTTPS support
- CL WebSocket support (events API)
- Kubernetes deployment manifests
The foundation is solid. Extensions can be added incrementally, each with their own TDD cycle.
Building Vixy demonstrated that AI-assisted development isn't about replacing programmers—it's about amplifying them. The AI wrote tests, implemented functions, and debugged issues. But it did so within a framework designed by humans, following principles established over decades of software engineering.
TDD, BDD, CI/CD, incremental commits—these aren't old-fashioned practices made obsolete by AI. They're the guardrails that make AI development reliable.
The future of programming is collaboration: humans defining what and why, AI executing how, and tests ensuring correctness.
Vixy is proof that this future works.
Built with Rust, tested with Cucumber, assisted by Claude, powered by coffee and curiosity.
Repository: github.com/your-repo/vixy
License: MIT