Skip to content

Commit 3e9cc24

Browse files
RonTuretzkyclaudeCopilotbagelface
authored
refactor: docker-compose e2e (#95)
* refactor: update docker-compose to use GitHub Container Registry images - Add commonware-avs-node services (1-3) using ghcr.io/breadchaincoop/commonware-avs-node:v0.1.0 - Add commonware-avs-router service using ghcr.io/breadchaincoop/commonware-avs-router:dev - Configure services with proper environment variables and networking - Add orchestrator_with_g2.json config file for BLS key management - Map ports to 4000-4003 to avoid conflicts with local development - Set up volume mounts for keys and configuration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add node key setup script and Docker documentation - Add setup-node-keys.sh script to generate properly formatted BLS keys - Add README_DOCKER.md with complete setup instructions - Script extracts keys from eigenlayer-generated files and formats them for nodes - Documentation covers setup, troubleshooting, and usage * chore: deleting incomplete config * chore: delete * chore: tracking to prevent hallucinations * chore: track working * feat: implement hostname resolution for Docker service discovery - Add hostname resolution using ToSocketAddrs in router's main.rs - Update docker-compose.yml to use debug-v2 router image with hostname support - Configure public_orchestrator.json to use Docker service name 'router' - Add pull_policy: always to node services for consistent PR-55 image updates - Remove accidentally cloned commonware-avs-node directory This allows the router to resolve Docker service names (like node-1:3001) to IP addresses, fixing the AddrParseError(Socket) issues when using Docker Compose networking. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: address PR review comments - Update eigenlayer and node images from PR versions to dev - Enable restart policy for all node services and router - Restore platform specification for router service - Remove debug println statements from src/main.rs - Fix duplicate RUST_BACKTRACE environment variable Changes made per code review comments to standardize on dev images and enable proper service restart policies. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: enable Docker image builds for pull requests - Add pull_request trigger to Docker CI/CD workflow for main and dev branches - Implement PR-specific tagging using pr-<number> format - Update signer service to depend on eigenlayer completion instead of ethereum start - Maintain platform specification for router service This enables automatic Docker image builds for pull requests, making it easier to test PR changes using the ghcr.io registry with pr-<number> tags. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: fix clippy warnings and remove commonware-avs-node CI checks - Apply cargo fmt to fix formatting issues (trailing whitespace) - Fix clippy uninlined_format_args warnings by using inline format strings - Remove commonware-avs-node from rust-ci.yml matrix build - Simplify CI workflow to only check this repository All clippy warnings have been resolved and the code now passes cargo clippy -- -D warnings. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: allow clippy warnings from auto-generated and existing code Add allows to CI clippy command for warnings that come from: - Auto-generated binding files (unused_attributes, unused_variables) - Existing test code format strings (uninlined_format_args) This ensures CI passes without modifying auto-generated files or requiring changes to existing test code. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * revert: restore original format string in executor.rs Reverted the format string change from commit 208c77e to keep the original code unchanged. The clippy warning for uninlined_format_args is now allowed in the CI configuration instead. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add Docker Compose based CI workflow Create a new CI workflow that uses docker-compose to test the full stack instead of the manual setup in local-test.yml. Features: - Builds router image locally for testing - Pulls required images from GitHub Container Registry - Generates test BLS keys for operators - Starts all services using docker-compose - Waits for EigenLayer setup to complete - Verifies all services are running - Checks basic connectivity and health - Collects comprehensive logs on failure - Proper cleanup after test completion This provides a more realistic test environment that matches how the services will actually be deployed and interact. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add counter increment verification to docker-compose test Enhanced the docker-compose CI test to verify the core functionality: - Check initial counter state from router logs - Wait for aggregation cycle (30+ seconds) - Verify counter has incremented - Check for successful signature aggregation - Verify each node is participating and signing messages This ensures the AVS is not just running but actually performing its intended counter increment functionality with proper BLS signature aggregation across all nodes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address PR review comments for docker-compose CI workflow - Use example.env as base and properly set FORK_URL to fix ethereum error - Wait for EigenLayer to generate BLS keys instead of creating them manually - Remove flaky P2P connectivity check that relied on log patterns - Increase timeout to 300 seconds matching local-test workflow - Model setup steps after local-test workflow for consistency The workflow now properly: 1. Copies example.env and configures it for LOCAL mode 2. Sets FORK_URL for Holesky forking (fixes ethereum startup error) 3. Waits for EigenLayer to generate operator BLS keys 4. Uses a more robust service status check 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: resolve CI failures for docker-compose test workflow - Add LST_CONTRACT_ADDRESS and all required Holesky contract addresses to .env generation - Fix Docker build error by creating dummy scripts directory during pre-build phase - These changes address the eigenlayer service startup failure and cargo build errors 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: add support for RPC_URL secret and counter address validation - Support using repository secret RPC_URL for fork URL if available - Add validation for counter contract address from deployment file - Ensure operator_keys directory exists before eigenlayer setup 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address all PR review comments 1. Remove extra allow flags from rust-ci clippy command - Now using only: cargo clippy --all-targets --all-features -- -D warnings 2. Use repository secret for RPC_URL if available - Falls back to public Holesky RPC if secret not set - Uses ${{ secrets.RPC_URL }} for FORK_URL configuration 3. Fix .nodes directory handling - CI creates directory structure as needed - .nodes is gitignored so doesn't need .gitkeep 4. Replace log parsing with smart contract reads for counter test - Directly reads counter value from blockchain using eth_call - Uses function signature 0x8381f58a for number() method - More reliable than parsing logs The CI now properly reads the counter state directly from the smart contract and uses repository secrets for RPC URLs when available. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: refactor nodes dir * fix: address PR review comments for docker-compose CI workflow - Simplified sed commands to just remove comment character instead of full replacement - Removed flaky log checks for signature aggregation and node participation - Extended wait time to 5 aggregation cycles (150 seconds) for more reliable testing - Added explanatory comment for Dockerfile workspace member dummy files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: resolve CI failures for docker-compose test workflow - Add creation of config/config.json from example in CI workflow - This file is required by the eigenlayer service for proper startup - Fixes "Is a directory (os error 21)" error in eigenlayer container 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address PR review comments for docker-compose CI workflow - Fix jq parsing error when checking container status - Docker compose ps --format json returns an object, not an array - Use ".State // empty" instead of ".[0].State" for proper parsing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address PR review comments for docker-compose CI workflow - Simplified sed commands to use just variable names for uncommenting - Removed unnecessary directory creation (eigenlayer creates .nodes) - Removed router health check as it has no health endpoint - Reverted Dockerfile changes to test if simpler approach works 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: remove old local-test.yml workflow - Removed deprecated local-test.yml that was cloning commonware-avs-node - This workflow is replaced by docker-compose-test.yml which uses Docker images - Simplifies CI by having a single integration test approach 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: refactor nodes directory location to config/.nodes - Moved .nodes directory into config/ for better organization - Updated all docker-compose.yml volume mounts to use new location - Updated .gitignore for new config/.nodes path - Updated CI workflow to check for files in new location - Updated example.env to reference new path - Maintains same .gitkeep structure for directory preservation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: enhance docker-compose CI test with multiple aggregation scenarios - Added test for fast aggregation frequency (0.5 seconds) - Added test for ingress-enabled mode with explicit ingress requests - Track counter state across all test phases for verification - Provide comprehensive test summary showing increments at each stage - Tests verify: default aggregation, fast aggregation, and ingress modes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: correct ingress endpoint and port configuration - Added port 8080:8080 to router service in docker-compose for ingress HTTP server - Fixed CI test to use correct /trigger endpoint instead of /ingress - Updated payload format to match expected TaskRequest structure: {"body": {"metadata": {...}}} - The ingress server expects requests at http://localhost:8080/trigger 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address PR review comments for docker-compose CI workflow - Re-add dummy scripts directory creation in Dockerfile to fix build errors - The workspace includes scripts member which requires a Cargo.toml during pre-build - This was causing CI failures after commit f9658fe which reverted these changes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: ensure ingress mode is properly activated in CI tests - Use 'docker compose up -d --force-recreate' instead of 'restart' to reload environment variables - Docker restart doesn't reload .env changes, only container recreation does - Added verification to check if ingress mode is actually enabled - Increased wait time for ingress server startup The issue was that INGRESS=true wasn't being picked up after restart because environment variables are set at container creation, not on restart. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: add extensive debug logging to CI ingress test - Added HTTP status code checking for ingress requests - Added raw RPC response logging to diagnose counter read failures - Added router status and log checks during ingress processing - Added proper error handling for invalid/empty RPC responses - Increased wait time after ingress requests to 15 seconds - Added grep for ingress-related log messages This will help diagnose why the counter value read is failing after ingress requests are sent successfully. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address PR review comments on CI workflow - Remove redundant "Verify services are running" step - Refactor fast aggregation test to wait for 1 minute total - Remove flaky ingress verification that relies on logs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: add debug logging to CI ingress test Add log check to verify ingress mode is properly activated after recreating the router container with INGRESS=true 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add extensive debug logging for ingress queue processing Add detailed logging to diagnose ingress timeout issues: - Log orchestrator aggregation cycle start/end - Log task queue push/pop operations with queue sizes - Log HTTP ingress request receipt and queuing - Log wait_for_task timeout progress - Better error handling with explicit error messages This will help identify where tasks are getting lost between the HTTP ingress endpoint and the orchestrator's task creator. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: add missing error macro import in orchestrator Add tracing::error import to fix compilation error in generic.rs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * style: fix formatting and clippy warnings - Apply cargo fmt to fix code formatting - Fix clippy::uninlined_format_args warning - Ensure all linting checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: create new hasher for each orchestrator iteration The Sha256 hasher's finalize() method consumes the hasher, making it unusable for subsequent iterations. This caused the orchestrator to fail after the first aggregation cycle. Move hasher creation inside the loop so a fresh hasher is used for each iteration, fixing the counter increment functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * style: fix formatting in orchestrator Remove extra blank line to satisfy cargo fmt 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: connect HTTP ingress queue to task creator queue The ingress system had an architectural issue where the HTTP server and task creator were using completely separate, unconnected queues: - HTTP server was writing to its own Arc<Mutex<Vec<TaskRequest>>> - ListeningCounterCreator was polling from a different SimpleTaskQueue This caused ingress requests to be queued but never processed, leading to timeout errors in the orchestrator. Changes: - Modified HTTP server to accept and use Arc<SimpleTaskQueue> - Updated factories to share the same SimpleTaskQueue instance between HTTP server and ListeningCounterCreator - Removed unused get_queue() method from SimpleTaskQueue - Both components now use the TaskQueue trait methods (push/pop) This ensures tasks added via HTTP ingress are properly processed by the orchestrator's task creator. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: preserve signature collections across aggregation cycles The orchestrator was resetting signature collections on each aggregation cycle, even when the round hadn't changed. This prevented reaching the signature threshold as nodes wouldn't re-sign for the same round. Now signature collections are preserved across cycles until the round changes, allowing the orchestrator to accumulate signatures properly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: increase timeout for ingress mode to prevent orchestrator panic The orchestrator was crashing when ingress mode was enabled because it timed out waiting for tasks before HTTP requests could arrive. Increased the timeout from 5 seconds to 60 seconds for ingress mode to give external requests time to arrive. This fixes the issue where the counter RPC returns empty (0x) after enabling ingress due to the router having crashed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: correct counter address variable usage in ingress test The ingress test was failing because it wasn't properly using the counter address from the environment. Fixed by: 1. Properly extracting the counter address from GitHub env context 2. Adding debug output to verify the address is being used correctly 3. Showing the exact RPC request JSON for debugging The test was previously using an empty address, causing eth_call to return 0x (error) instead of the actual counter value. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: prevent Ethereum/EigenLayer recreation during router restart The ingress test was failing because `docker compose up -d router --force-recreate` was also recreating the router's dependencies (Ethereum and EigenLayer containers). This caused the entire blockchain state to be reset, losing all deployed contracts. Changed to stop, remove, and recreate only the router container without affecting its dependencies. This preserves the blockchain state and deployed contracts while still allowing the router to pick up new environment variables. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: use --no-deps flag to prevent EigenLayer restart The previous fix still caused EigenLayer to restart because Docker Compose was starting stopped dependencies when creating the router. Using --no-deps flag prevents Docker Compose from starting any dependencies. This ensures EigenLayer (which exits after initial setup) doesn't get restarted and attempt to redeploy contracts, which would fail or cause inconsistent state. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * cleanup: remove debug output and make ingress timeout configurable Cleaned up the PR by removing debug artifacts added during troubleshooting: CI Workflow: - Removed debug echo statements for checking router status - Removed RPC request JSON debug output - Removed counter address verification logs - Simplified counter value reading after ingress Code Changes: - Reduced verbose queue operation logs (push/pop) - Changed HTTP request logging to debug level - Removed "reusing signatures" info log - Made ingress timeout configurable via INGRESS_TIMEOUT_MS env var (default 30s) - Removed unused imports This makes the PR production-ready while keeping essential monitoring logs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: simplify setup by using Docker Compose exclusively Documentation Changes: - Removed manual setup instructions for running outside Docker - Focused README on Docker Compose workflow only - Removed Rust from prerequisites (not needed for running) - Updated ingress mode instructions for Docker environment - Added INGRESS_TIMEOUT_MS to documented environment variables Configuration Changes: - Docker Compose now mounts config.example.json directly (no copy needed) - Removed config.json from .gitignore (no longer generated) - Removed "Prepare for EigenLayer setup" CI step (config copy not needed) This simplifies the setup process and reduces potential configuration errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: address code review feedback - Remove redundant service health checks in CI workflow - Add conditional execution for advanced tests (only on dev branch push) - Remove debug logging from HTTP ingress handler - Revert orchestrator error handling to use simple unwrap() - Remove unused error import from orchestrator 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * ci: add staging branch to docker-image workflow pull request targets 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: improve INGRESS_TIMEOUT_MS parsing using idiomatic Rust Use .ok() and .and_then() chain for cleaner error handling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: rename docker-compose-test.yml to integration-test.yml - Rename workflow file for better clarity - Update workflow display name to "Integration Test" 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: restore eigenlayer dependency on ethereum container Ensure eigenlayer waits for ethereum to start before initialization 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: ensure docker image build completes before integration tests on dev - Add wait-for-docker-build job that waits for Docker CI/CD workflow - Pull latest built image from registry on dev pushes instead of building locally - Build locally only for PRs and other branches - Ensures integration tests always run with the latest pushed image on dev 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address code review comments - Update README with proper environment configuration instructions - Remove debug/info tracing statements from main.rs - Remove debug import and logging from counter creator 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: remove tracing Co-authored-by: Copilot <[email protected]> * chore: remove bad comment Co-authored-by: Copilot <[email protected]> * ci: add staging branch to integration test workflow triggers 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: simplify integration test workflow to use workflow_run trigger - Remove complex wait-for-docker-build job - Use workflow_run event to trigger after Docker CI/CD completes - Always pull image from registry (no more local builds) - Update conditionals to use workflow_run context 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Revert "refactor: simplify integration test workflow to use workflow_run trigger" This reverts commit 9beb548. * fix: simplify integration test workflow - Remove complex wait-for-docker-build job - Always build router image locally for integration tests - This ensures tests run with the exact code that was pushed - Simpler and more reliable than trying to coordinate with Docker CI/CD workflow 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: separate CI testing from user deployment - Users now pull pre-built images from ghcr.io registry - CI uses docker-compose.ci.yml override to build and test local code - This ensures PRs test the actual code changes while users get stable images - Updated README with instructions for both usage modes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * ci: only build Docker images on merge, not on PRs - Remove pull_request trigger from docker-image.yml workflow - Remove PR-specific tagging logic - Images now only built when code is merged to main, dev, or staging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * cleanup: remove debugging artifacts from codebase - Remove debug println! statements from main.rs - Remove commented-out debug logging configuration - Remove TODO comments (fixed hardcoded quorum_number, made endpoint URL configurable) - Remove RUST_BACKTRACE=1 from all docker-compose services - Remove unnecessary comments and warnings from counter creator - Clean up trigger_endpoint.sh to use environment variable 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Revert "cleanup: remove debugging artifacts from codebase" This reverts commit 3cc9fc3. * cleanup: remove debugging artifacts introduced in this PR - Remove warn! statement and comment from counter creator - Remove RUST_BACKTRACE=1 from all docker-compose services 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: restore warn import that is still needed The warn! macro on line 183 was not added in this PR, so we need to keep the import 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * revert: remove unnecessary changes to docker-image.yml Revert docker-image.yml to match dev branch as these changes were not needed for the Docker Compose refactoring 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * simplify: remove health checks from integration test workflow Health checks add complexity without much value since the tests themselves verify that services are working correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * cleanup: remove comments added to creator.rs in this PR Remove "Waiting for task from queue" and "Task retrieved from queue" comments that were added during debugging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: create docker-compose.ci.yml dynamically in CI - Remove docker-compose.ci.yml from repository - Create the override file dynamically in the CI workflow - Update README with simpler local development instructions - Keeps the repository cleaner with one less config file 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: some minor changes to get e2e local working and minor tweaks (#101) Co-authored-by: RonTuretzky <[email protected]> * chore: moving queue closer to where it actually sed --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: bagelface.eth <[email protected]>
1 parent 3e4919e commit 3e9cc24

File tree

20 files changed

+619
-499
lines changed

20 files changed

+619
-499
lines changed

.github/workflows/integration-test.yml

Lines changed: 377 additions & 0 deletions
Large diffs are not rendered by default.

.github/workflows/local-test.yml

Lines changed: 0 additions & 386 deletions
This file was deleted.

0 commit comments

Comments
 (0)