Skip to content

[PERFORMANCE]: Automatic performance testing and tracking for every build (hey) including SQLite and Postgres / Redis configurations #251

@crivetimihai

Description

@crivetimihai

🧭 Chore Summary

Implement comprehensive performance testing and tracking system: make perf-test and make perf-report to automatically benchmark every build, detect regressions, and track performance metrics over time with historical trend analysis and alerting.


🧱 Areas Affected

  • Performance testing infrastructure / Make targets (make perf-test, make perf-baseline, make perf-report, make perf-compare)
  • CI/CD pipeline integration (GitHub Actions performance jobs)
  • Results storage and historical tracking (database, JSON files, artifacts)
  • Performance visualization and reporting (charts, dashboards, trends)
  • Fast-time-server integration for reference benchmarking
  • Regression detection and alerting system
  • Load testing scenarios and test data management

⚙️ Context / Rationale

Automated performance testing prevents performance regressions and provides visibility into system performance trends over time. Every build should execute standardized performance tests, record detailed metrics, and compare results against historical baselines to catch performance degradation early. This creates a performance safety net and enables data-driven optimization decisions.

What is Automated Performance Testing?
A systematic approach to measuring application performance on every build, storing results over time, and detecting performance regressions through automated comparison with historical baselines.

Key Components:

  • Load Testing: HTTP request throughput, latency, concurrency handling
  • Benchmark Testing: Function-level performance measurement
  • Resource Monitoring: CPU, memory, database performance during tests
  • Trend Analysis: Historical comparison and regression detection
  • Alerting: Automatic notifications when performance degrades

Simple Performance Test Example:

# Basic load test with hey
hey -n 1000 -c 50 -o csv http://localhost:4444/rpc > results.csv

# Benchmark test with hyperfine
hyperfine --export-json bench.json \
  'curl -X POST http://localhost:4444/rpc -d @payload.json'

Advanced Performance Testing Setup:

# Performance test configuration
PERFORMANCE_TESTS = {
    "gateway_tools_list": {
        "method": "GET",
        "url": "/tools/",
        "requests": 1000,
        "concurrency": 50,
        "max_latency_p95": 100,  # ms
        "min_throughput": 500    # req/s
    },
    "fast_time_server": {
        "method": "POST", 
        "url": "/rpc",
        "payload": {"jsonrpc": "2.0", "method": "Time.Now", "id": 1},
        "requests": 5000,
        "concurrency": 100,
        "max_latency_p95": 50,
        "min_throughput": 1000
    },
    "gateway_health": {
        "method": "GET",
        "url": "/health",
        "requests": 2000,
        "concurrency": 25,
        "max_latency_p95": 25,
        "min_throughput": 800
    }
}

MCPGateway Specific Test Scenarios:

# Performance test matrix for mcpgateway
performance_scenarios:
  # Core API endpoints
  - name: "tools_crud_operations"
    description: "CRUD operations on tools API"
    tests:
      - endpoint: "POST /tools/"
        payload_file: "test_data/tool_create.json"
        requests: 500
        concurrency: 25
        
      - endpoint: "GET /tools/"
        requests: 1000
        concurrency: 50
        
      - endpoint: "GET /tools/{id}"
        requests: 800
        concurrency: 40

  # Fast-time-server integration tests  
  - name: "fast_time_server_integration"
    description: "Performance via fast-time-server proxy"
    setup_cmd: "cd mcp-servers/go/fast-time-server && make run-http &"
    tests:
      - endpoint: "POST /rpc"
        payload_file: "test_data/time_now_request.json"
        requests: 5000
        concurrency: 100
        target_p95_latency: 50
        target_throughput: 1000

  # WebSocket connection stress testing
  - name: "websocket_stress_test"  
    description: "WebSocket connection handling under load"
    tests:
      - concurrent_connections: 100
        messages_per_connection: 50
        connection_duration: 30
        target_max_latency: 100

  # Database performance
  - name: "database_operations"
    description: "Database read/write performance"
    tests:
      - operation: "bulk_tool_insert"
        records: 1000
        target_duration: 5000  # ms
        
      - operation: "complex_query_performance"
        query_file: "test_data/complex_queries.sql"
        target_duration: 200

📦 Related Make Targets

Target Purpose
make perf-test Run full performance test suite and record results
make perf-baseline Establish performance baseline for current build
make perf-compare Compare current performance against previous baseline
make perf-report Generate comprehensive performance report with trends
make perf-fast-time Run fast-time-server specific performance tests
make perf-regression Check for performance regressions (fail if detected)
make perf-clean Clean performance test artifacts and temporary data
make perf-serve Start local performance dashboard for result visualization
make perf-export Export performance data for external analysis
make perf-alert Send performance alerts based on thresholds

Bold targets are mandatory; CI must fail if performance regressions exceed configured thresholds.


📋 Acceptance Criteria

  • make perf-test executes comprehensive performance test suite measuring latency, throughput, and resource usage.
  • make perf-baseline establishes and stores performance baselines for regression detection.
  • make perf-compare automatically detects performance regressions >10% from baseline.
  • make perf-report generates visual reports with historical trends and performance metrics.
  • Performance tests run automatically on every CI build and store results.
  • Fast-time-server is integrated as performance reference benchmark.
  • Performance dashboard displays real-time trends and historical data.
  • Alerts trigger when performance degrades beyond acceptable thresholds.
  • Test results are stored with git commit metadata for traceability.
  • Performance data exports to common formats (CSV, JSON, InfluxDB).
  • Changelog entry under "Performance" or "Testing".

🛠️ Task List (suggested flow)

  1. Performance testing infrastructure

    mkdir -p performance/{tests,data,results,baselines,reports}
    
    # Create performance test configuration
    cat > performance/config.yaml << 'EOF'
    test_suites:
      - name: "mcpgateway_core"
        base_url: "http://localhost:4444"
        scenarios:
          - name: "tools_api"
            endpoint: "/tools/"
            method: "GET"
            requests: 1000
            concurrency: 50
            thresholds:
              p95_latency_ms: 100
              throughput_rps: 500
              
      - name: "fast_time_server"
        base_url: "http://localhost:8080"
        scenarios:
          - name: "time_now"
            endpoint: "/http"
            method: "POST"
            payload: '{"jsonrpc":"2.0","method":"Time.Now","id":1}'
            requests: 5000
            concurrency: 100
            thresholds:
              p95_latency_ms: 50
              throughput_rps: 1000
    EOF
  2. Makefile integration

    # Performance testing targets
    .PHONY: perf-test perf-baseline perf-compare perf-report perf-clean
    
    PERF_DIR := performance
    PERF_RESULTS := $(PERF_DIR)/results
    PERF_BASELINE := $(PERF_DIR)/baselines
    PERF_REPORTS := $(PERF_DIR)/reports
    BUILD_ID := $(shell git rev-parse --short HEAD)
    TIMESTAMP := $(shell date -u +%Y%m%d_%H%M%S)
    
    perf-test: perf-setup
    	@echo "🏃 Running performance test suite..."
    	@mkdir -p $(PERF_RESULTS)/$(BUILD_ID)
    	@python performance/run_tests.py \
    		--config performance/config.yaml \
    		--output $(PERF_RESULTS)/$(BUILD_ID) \
    		--build-id $(BUILD_ID) \
    		--timestamp $(TIMESTAMP)
    
    perf-baseline:
    	@echo "📊 Establishing performance baseline..."
    	@cp $(PERF_RESULTS)/$(BUILD_ID)/* $(PERF_BASELINE)/
    	@echo "$(BUILD_ID)" > $(PERF_BASELINE)/baseline_commit.txt
    
    perf-compare:
    	@echo "🔍 Comparing against baseline..."
    	@python performance/compare.py \
    		--current $(PERF_RESULTS)/$(BUILD_ID) \
    		--baseline $(PERF_BASELINE) \
    		--threshold 10 \
    		--fail-on-regression
    
    perf-report:
    	@echo "📈 Generating performance report..."
    	@python performance/generate_report.py \
    		--results-dir $(PERF_RESULTS) \
    		--output $(PERF_REPORTS)/report_$(TIMESTAMP).html \
    		--include-trends
    
    perf-fast-time:
    	@echo "⚡ Testing fast-time-server performance..."
    	@cd mcp-servers/go/fast-time-server && make run-http &
    	@sleep 2
    	@hey -n 5000 -c 100 -o csv \
    		-D performance/data/time_request.json \
    		http://localhost:8080/http > $(PERF_RESULTS)/fast_time_$(TIMESTAMP).csv
    	@pkill fast-time-server || true
    
    perf-setup:
    	@echo "🔧 Setting up performance test environment..."
    	@docker-compose -f docker-compose.test.yml up -d postgres redis
    	@sleep 5
    	@python -m mcpgateway.main &
    	@sleep 3
    	@cd mcp-servers/go/fast-time-server && make build && ./fast-time-server -transport http &
    	@sleep 2
    
    perf-clean:
    	@echo "🧹 Cleaning performance test artifacts..."
    	@docker-compose -f docker-compose.test.yml down
    	@pkill -f "mcpgateway.main" || true
    	@pkill fast-time-server || true
  3. Performance test runner

    # performance/run_tests.py
    #!/usr/bin/env python3
    """
    Comprehensive performance test runner for mcpgateway.
    
    Features:
    - Multiple test scenarios with configurable thresholds
    - Resource monitoring during tests  
    - Detailed metrics collection and storage
    - Integration with fast-time-server
    """
    
    import json
    import time
    import subprocess
    import psutil
    import argparse
    from pathlib import Path
    from dataclasses import dataclass
    from typing import Dict, List, Optional
    
    @dataclass
    class PerformanceResult:
        """Performance test result."""
        scenario: str
        requests: int
        concurrency: int
        total_time: float
        requests_per_second: float
        latency_p50: float
        latency_p95: float
        latency_p99: float
        error_rate: float
        cpu_usage: float
        memory_usage: float
        
    class PerformanceTestRunner:
        """Runs performance tests and collects detailed metrics."""
        
        def __init__(self, config_path: str, output_dir: str):
            self.config = self._load_config(config_path)
            self.output_dir = Path(output_dir)
            self.output_dir.mkdir(parents=True, exist_ok=True)
            
        def run_all_tests(self) -> List[PerformanceResult]:
            """Run all configured performance tests."""
            results = []
            
            for suite in self.config['test_suites']:
                for scenario in suite['scenarios']:
                    result = self._run_scenario(suite, scenario)
                    results.append(result)
                    
            # Save results
            self._save_results(results)
            return results
            
        def _run_scenario(self, suite: dict, scenario: dict) -> PerformanceResult:
            """Run a single performance test scenario."""
            print(f"🏃 Running {scenario['name']}...")
            
            # Start resource monitoring
            monitor = ResourceMonitor()
            monitor.start()
            
            # Build hey command
            cmd = self._build_hey_command(suite, scenario)
            
            # Run test
            start_time = time.time()
            result = subprocess.run(cmd, capture_output=True, text=True)
            end_time = time.time()
            
            # Stop monitoring
            cpu_usage, memory_usage = monitor.stop()
            
            # Parse hey output
            metrics = self._parse_hey_output(result.stdout)
            
            return PerformanceResult(
                scenario=scenario['name'],
                requests=scenario['requests'],
                concurrency=scenario['concurrency'],
                total_time=end_time - start_time,
                requests_per_second=metrics['rps'],
                latency_p50=metrics['p50'],
                latency_p95=metrics['p95'],
                latency_p99=metrics['p99'],
                error_rate=metrics['error_rate'],
                cpu_usage=cpu_usage,
                memory_usage=memory_usage
            )
  4. Fast-time-server integration

    # Add fast-time-server as performance reference
    cat > performance/data/time_request.json << 'EOF'
    {"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
    EOF
    
    # Create fast-time-server performance test
    cat > performance/tests/test_fast_time_server.py << 'EOF'
    import pytest
    import requests
    import time
    import statistics
    from concurrent.futures import ThreadPoolExecutor
    
    class TestFastTimeServerPerformance:
        """Performance tests for fast-time-server integration."""
        
        def test_time_now_latency(self):
            """Test Time.Now endpoint latency."""
            url = "http://localhost:8080/http"
            payload = {"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
            
            latencies = []
            for _ in range(100):
                start = time.perf_counter()
                response = requests.post(url, json=payload)
                end = time.perf_counter()
                
                assert response.status_code == 200
                latencies.append((end - start) * 1000)  # ms
                
            p95_latency = statistics.quantiles(latencies, n=20)[18]  # 95th percentile
            assert p95_latency < 50, f"P95 latency {p95_latency:.1f}ms exceeds 50ms threshold"
            
        def test_concurrent_requests(self):
            """Test concurrent request handling."""
            url = "http://localhost:8080/http" 
            payload = {"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
            
            def make_request():
                start = time.perf_counter()
                response = requests.post(url, json=payload)
                end = time.perf_counter()
                return response.status_code == 200, (end - start) * 1000
                
            with ThreadPoolExecutor(max_workers=50) as executor:
                start_time = time.perf_counter()
                futures = [executor.submit(make_request) for _ in range(1000)]
                results = [f.result() for f in futures]
                end_time = time.perf_counter()
                
            success_count = sum(1 for success, _ in results if success)
            latencies = [latency for success, latency in results if success]
            
            throughput = success_count / (end_time - start_time)
            p95_latency = statistics.quantiles(latencies, n=20)[18]
            
            assert success_count >= 995, f"Only {success_count}/1000 requests succeeded"
            assert throughput >= 1000, f"Throughput {throughput:.1f} req/s below 1000 req/s"
            assert p95_latency < 100, f"P95 latency {p95_latency:.1f}ms exceeds 100ms"
    EOF
  5. CI integration

    # Add to existing GitHub Actions workflow
    performance:
      name: 🏃 Performance Testing
      runs-on: ubuntu-latest
      needs: [test]  # Run after tests pass
      if: github.ref == 'refs/heads/main' || github.event_name == 'pull_request'
      
      services:
        postgres:
          image: postgres:15
          env:
            POSTGRES_PASSWORD: testpass
            POSTGRES_DB: testdb
          options: >-
            --health-cmd pg_isready
            --health-interval 10s
            --health-timeout 5s
            --health-retries 5
        
        redis:
          image: redis:7
          options: >-
            --health-cmd "redis-cli ping"
            --health-interval 10s
            --health-timeout 5s
            --health-retries 5
    
      steps:
        - name: ⬇️  Checkout source
          uses: actions/checkout@v4
          with:
            fetch-depth: 0  # Full history for trend analysis
            
        - name: 🐍  Set up Python
          uses: actions/setup-python@v5
          with:
            python-version: "3.12"
            cache: pip
            
        - name: 🔧  Install dependencies
          run: |
            python -m pip install --upgrade pip
            pip install -e .[dev]
            pip install hey
            
        - name: 🏗️  Build fast-time-server
          run: |
            cd mcp-servers/go/fast-time-server
            make build
            
        - name: 🚀  Start services
          run: |
            make perf-setup
            
        - name: 🏃  Run performance tests
          run: |
            make perf-test
            
        - name: 📊  Compare against baseline
          run: |
            make perf-compare || echo "::warning::Performance regression detected"
            
        - name: 📈  Generate performance report
          run: |
            make perf-report
            
        - name: 📎  Upload performance artifacts
          uses: actions/upload-artifact@v4
          with:
            name: performance-results-${{ github.sha }}
            path: |
              performance/results/
              performance/reports/
            retention-days: 30
            
        - name: 💬  Comment PR with results
          if: github.event_name == 'pull_request'
          uses: actions/github-script@v7
          with:
            script: |
              const fs = require('fs');
              const reportPath = 'performance/reports/summary.md';
              if (fs.existsSync(reportPath)) {
                const report = fs.readFileSync(reportPath, 'utf8');
                github.rest.issues.createComment({
                  issue_number: context.issue.number,
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  body: `## 🏃 Performance Test Results\n\n${report}`
                });
              }
  6. Performance regression detection

    # performance/compare.py
    #!/usr/bin/env python3
    """
    Performance regression detection and alerting.
    """
    
    import json
    import sys
    import argparse
    from pathlib import Path
    from typing import Dict, Any, List
    
    class PerformanceComparator:
        """Compare performance results and detect regressions."""
        
        def __init__(self, threshold_percent: float = 10.0):
            self.threshold = threshold_percent
            
        def compare_results(self, current: Dict[str, Any], baseline: Dict[str, Any]) -> Dict[str, Any]:
            """Compare current results against baseline."""
            comparison = {
                'regressions': [],
                'improvements': [],
                'summary': {}
            }
            
            for scenario in current.get('scenarios', []):
                scenario_name = scenario['name']
                baseline_scenario = self._find_baseline_scenario(baseline, scenario_name)
                
                if not baseline_scenario:
                    continue
                    
                # Compare key metrics
                metrics = ['requests_per_second', 'latency_p95', 'error_rate']
                for metric in metrics:
                    regression = self._check_regression(
                        scenario.get(metric, 0),
                        baseline_scenario.get(metric, 0),
                        metric
                    )
                    
                    if regression:
                        comparison['regressions'].append({
                            'scenario': scenario_name,
                            'metric': metric,
                            'current': scenario.get(metric, 0),
                            'baseline': baseline_scenario.get(metric, 0),
                            'change_percent': regression
                        })
                        
            return comparison
            
        def _check_regression(self, current: float, baseline: float, metric: str) -> float:
            """Check if metric shows regression beyond threshold."""
            if baseline == 0:
                return 0
                
            change_percent = ((current - baseline) / baseline) * 100
            
            # For latency and error_rate, higher is worse
            if metric in ['latency_p95', 'error_rate']:
                return change_percent if change_percent > self.threshold else 0
            # For throughput, lower is worse  
            elif metric == 'requests_per_second':
                return -change_percent if -change_percent > self.threshold else 0
                
            return 0
  7. Performance dashboard and reporting

    # performance/generate_report.py
    #!/usr/bin/env python3
    """
    Generate comprehensive performance reports with trend analysis.
    """
    
    import json
    import plotly.graph_objects as go
    import plotly.express as px
    from plotly.subplots import make_subplots
    import pandas as pd
    from pathlib import Path
    
    class PerformanceReporter:
        """Generate visual performance reports and trends."""
        
        def generate_html_report(self, results_dir: Path, output_file: Path):
            """Generate comprehensive HTML performance report."""
            
            # Load all historical results
            all_results = self._load_historical_results(results_dir)
            
            # Create visualizations
            fig = make_subplots(
                rows=2, cols=2,
                subplot_titles=[
                    'Throughput Trends',
                    'Latency Trends (P95)',
                    'Error Rate Trends', 
                    'Resource Usage'
                ]
            )
            
            # Throughput chart
            fig.add_trace(
                go.Scatter(
                    x=all_results['timestamps'],
                    y=all_results['throughput'],
                    mode='lines+markers',
                    name='Requests/sec',
                    line=dict(color='blue')
                ),
                row=1, col=1
            )
            
            # Latency chart
            fig.add_trace(
                go.Scatter(
                    x=all_results['timestamps'],
                    y=all_results['latency_p95'],
                    mode='lines+markers', 
                    name='P95 Latency (ms)',
                    line=dict(color='red')
                ),
                row=1, col=2
            )
            
            # Generate HTML report
            html_template = self._get_html_template()
            html_content = html_template.format(
                charts=fig.to_html(include_plotlyjs='cdn'),
                summary_table=self._generate_summary_table(all_results),
                timestamp=pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
            )
            
            output_file.write_text(html_content)
  8. Alert system

    # performance/alert.py - Performance alert system
    #!/usr/bin/env python3
    """
    Send alerts when performance regressions are detected.
    """
    
    import json
    import smtplib
    import argparse
    from email.mime.text import MIMEText
    from typing import Dict, List
    
    class PerformanceAlerts:
        """Send performance regression alerts."""
        
        def __init__(self, config: Dict[str, Any]):
            self.config = config
            
        def check_and_alert(self, comparison_results: Dict[str, Any]):
            """Check for regressions and send alerts if needed."""
            
            regressions = comparison_results.get('regressions', [])
            if not regressions:
                return
                
            # Send email alert
            self._send_email_alert(regressions)
            
            # Send Slack alert if configured
            if self.config.get('slack_webhook'):
                self._send_slack_alert(regressions)
                
        def _send_email_alert(self, regressions: List[Dict[str, Any]]):
            """Send email alert for performance regressions."""
            subject = f"🚨 Performance Regression Detected - {len(regressions)} issues"
            
            body_lines = [
                "Performance regression detected in the following scenarios:",
                "",
            ]
            
            for regression in regressions:
                body_lines.append(
                    f"• {regression['scenario']}: {regression['metric']} "
                    f"changed by {regression['change_percent']:.1f}%"
                )
                
            body = "\n".join(body_lines)
            
            # Send email (implementation depends on your email setup)
            print(f"ALERT: {subject}")
            print(body)
  9. Documentation integration

    Add performance testing section to your docs:

    # Performance Testing
    
    ## Running Performance Tests
    
    ```bash
    # Run full performance test suite
    make perf-test
    
    # Establish new baseline
    make perf-baseline
    
    # Compare against baseline
    make perf-compare
    
    # Generate performance report
    make perf-report

    Fast-Time-Server Integration

    The fast-time-server provides a reference benchmark for testing gateway performance:

    # Test fast-time-server performance
    make perf-fast-time

    Performance Thresholds

    Metric Threshold
    P95 Latency < 100ms
    Throughput > 500 req/s
    Error Rate < 1%
    
    
  10. Final validation

    # Test the complete performance testing pipeline
    make perf-setup
    make perf-test
    make perf-baseline
    make perf-compare
    make perf-report
    make perf-clean
    
    # Verify CI integration
    git add performance/
    git commit -m "Add performance testing infrastructure"
    git push  # Triggers CI with performance tests

📖 References


🧩 Additional Notes

  • Fast-time-server integration: Use your existing fast-time-server as a performance reference point and baseline for gateway overhead measurement.
  • Historical tracking: Store performance results with git commit metadata for precise regression attribution.
  • Resource monitoring: Track CPU, memory, and database performance during tests to identify bottlenecks.
  • Threshold tuning: Start with conservative thresholds and adjust based on actual performance characteristics.
  • Test environment consistency: Use identical test environments (container specs, database config) for reliable comparisons.
  • Regression alerts: Configure alerts for immediate notification when performance degrades beyond acceptable limits.
  • Performance budgets: Set performance budgets for different API endpoints based on usage patterns.

Performance Testing Best Practices:

  • Run tests in isolated environments to avoid interference
  • Use realistic test data and scenarios that match production usage
  • Measure both cold-start and warm-up performance
  • Include database and external service performance in tests
  • Track trends over time, not just point-in-time measurements
  • Automate performance testing as part of CI/CD pipeline
  • Set up alerts for both regressions and improvements

Metadata

Metadata

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance chorescicdIssue with CI/CD process (GitHub Actions, scaffolding)databasedevopsDevOps activities (containers, automation, deployment, makefiles, etc)help wantedExtra attention is neededperformancePerformance related itemstestingTesting (unit, e2e, manual, automated, etc)

Type

No fields configured for Task.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions