🧭 Chore Summary
Implement comprehensive performance testing and tracking system: make perf-test and make perf-report to automatically benchmark every build, detect regressions, and track performance metrics over time with historical trend analysis and alerting.
🧱 Areas Affected
⚙️ Context / Rationale
Automated performance testing prevents performance regressions and provides visibility into system performance trends over time. Every build should execute standardized performance tests, record detailed metrics, and compare results against historical baselines to catch performance degradation early. This creates a performance safety net and enables data-driven optimization decisions.
What is Automated Performance Testing?
A systematic approach to measuring application performance on every build, storing results over time, and detecting performance regressions through automated comparison with historical baselines.
Key Components:
- Load Testing: HTTP request throughput, latency, concurrency handling
- Benchmark Testing: Function-level performance measurement
- Resource Monitoring: CPU, memory, database performance during tests
- Trend Analysis: Historical comparison and regression detection
- Alerting: Automatic notifications when performance degrades
Simple Performance Test Example:
# Basic load test with hey
hey -n 1000 -c 50 -o csv http://localhost:4444/rpc > results.csv
# Benchmark test with hyperfine
hyperfine --export-json bench.json \
'curl -X POST http://localhost:4444/rpc -d @payload.json'
Advanced Performance Testing Setup:
# Performance test configuration
PERFORMANCE_TESTS = {
"gateway_tools_list": {
"method": "GET",
"url": "/tools/",
"requests": 1000,
"concurrency": 50,
"max_latency_p95": 100, # ms
"min_throughput": 500 # req/s
},
"fast_time_server": {
"method": "POST",
"url": "/rpc",
"payload": {"jsonrpc": "2.0", "method": "Time.Now", "id": 1},
"requests": 5000,
"concurrency": 100,
"max_latency_p95": 50,
"min_throughput": 1000
},
"gateway_health": {
"method": "GET",
"url": "/health",
"requests": 2000,
"concurrency": 25,
"max_latency_p95": 25,
"min_throughput": 800
}
}
MCPGateway Specific Test Scenarios:
# Performance test matrix for mcpgateway
performance_scenarios:
# Core API endpoints
- name: "tools_crud_operations"
description: "CRUD operations on tools API"
tests:
- endpoint: "POST /tools/"
payload_file: "test_data/tool_create.json"
requests: 500
concurrency: 25
- endpoint: "GET /tools/"
requests: 1000
concurrency: 50
- endpoint: "GET /tools/{id}"
requests: 800
concurrency: 40
# Fast-time-server integration tests
- name: "fast_time_server_integration"
description: "Performance via fast-time-server proxy"
setup_cmd: "cd mcp-servers/go/fast-time-server && make run-http &"
tests:
- endpoint: "POST /rpc"
payload_file: "test_data/time_now_request.json"
requests: 5000
concurrency: 100
target_p95_latency: 50
target_throughput: 1000
# WebSocket connection stress testing
- name: "websocket_stress_test"
description: "WebSocket connection handling under load"
tests:
- concurrent_connections: 100
messages_per_connection: 50
connection_duration: 30
target_max_latency: 100
# Database performance
- name: "database_operations"
description: "Database read/write performance"
tests:
- operation: "bulk_tool_insert"
records: 1000
target_duration: 5000 # ms
- operation: "complex_query_performance"
query_file: "test_data/complex_queries.sql"
target_duration: 200
📦 Related Make Targets
| Target |
Purpose |
make perf-test |
Run full performance test suite and record results |
make perf-baseline |
Establish performance baseline for current build |
make perf-compare |
Compare current performance against previous baseline |
make perf-report |
Generate comprehensive performance report with trends |
make perf-fast-time |
Run fast-time-server specific performance tests |
make perf-regression |
Check for performance regressions (fail if detected) |
make perf-clean |
Clean performance test artifacts and temporary data |
make perf-serve |
Start local performance dashboard for result visualization |
make perf-export |
Export performance data for external analysis |
make perf-alert |
Send performance alerts based on thresholds |
Bold targets are mandatory; CI must fail if performance regressions exceed configured thresholds.
📋 Acceptance Criteria
🛠️ Task List (suggested flow)
-
Performance testing infrastructure
mkdir -p performance/{tests,data,results,baselines,reports}
# Create performance test configuration
cat > performance/config.yaml << 'EOF'
test_suites:
- name: "mcpgateway_core"
base_url: "http://localhost:4444"
scenarios:
- name: "tools_api"
endpoint: "/tools/"
method: "GET"
requests: 1000
concurrency: 50
thresholds:
p95_latency_ms: 100
throughput_rps: 500
- name: "fast_time_server"
base_url: "http://localhost:8080"
scenarios:
- name: "time_now"
endpoint: "/http"
method: "POST"
payload: '{"jsonrpc":"2.0","method":"Time.Now","id":1}'
requests: 5000
concurrency: 100
thresholds:
p95_latency_ms: 50
throughput_rps: 1000
EOF
-
Makefile integration
# Performance testing targets
.PHONY: perf-test perf-baseline perf-compare perf-report perf-clean
PERF_DIR := performance
PERF_RESULTS := $(PERF_DIR)/results
PERF_BASELINE := $(PERF_DIR)/baselines
PERF_REPORTS := $(PERF_DIR)/reports
BUILD_ID := $(shell git rev-parse --short HEAD)
TIMESTAMP := $(shell date -u +%Y%m%d_%H%M%S)
perf-test: perf-setup
@echo "🏃 Running performance test suite..."
@mkdir -p $(PERF_RESULTS)/$(BUILD_ID)
@python performance/run_tests.py \
--config performance/config.yaml \
--output $(PERF_RESULTS)/$(BUILD_ID) \
--build-id $(BUILD_ID) \
--timestamp $(TIMESTAMP)
perf-baseline:
@echo "📊 Establishing performance baseline..."
@cp $(PERF_RESULTS)/$(BUILD_ID)/* $(PERF_BASELINE)/
@echo "$(BUILD_ID)" > $(PERF_BASELINE)/baseline_commit.txt
perf-compare:
@echo "🔍 Comparing against baseline..."
@python performance/compare.py \
--current $(PERF_RESULTS)/$(BUILD_ID) \
--baseline $(PERF_BASELINE) \
--threshold 10 \
--fail-on-regression
perf-report:
@echo "📈 Generating performance report..."
@python performance/generate_report.py \
--results-dir $(PERF_RESULTS) \
--output $(PERF_REPORTS)/report_$(TIMESTAMP).html \
--include-trends
perf-fast-time:
@echo "⚡ Testing fast-time-server performance..."
@cd mcp-servers/go/fast-time-server && make run-http &
@sleep 2
@hey -n 5000 -c 100 -o csv \
-D performance/data/time_request.json \
http://localhost:8080/http > $(PERF_RESULTS)/fast_time_$(TIMESTAMP).csv
@pkill fast-time-server || true
perf-setup:
@echo "🔧 Setting up performance test environment..."
@docker-compose -f docker-compose.test.yml up -d postgres redis
@sleep 5
@python -m mcpgateway.main &
@sleep 3
@cd mcp-servers/go/fast-time-server && make build && ./fast-time-server -transport http &
@sleep 2
perf-clean:
@echo "🧹 Cleaning performance test artifacts..."
@docker-compose -f docker-compose.test.yml down
@pkill -f "mcpgateway.main" || true
@pkill fast-time-server || true
-
Performance test runner
# performance/run_tests.py
#!/usr/bin/env python3
"""
Comprehensive performance test runner for mcpgateway.
Features:
- Multiple test scenarios with configurable thresholds
- Resource monitoring during tests
- Detailed metrics collection and storage
- Integration with fast-time-server
"""
import json
import time
import subprocess
import psutil
import argparse
from pathlib import Path
from dataclasses import dataclass
from typing import Dict, List, Optional
@dataclass
class PerformanceResult:
"""Performance test result."""
scenario: str
requests: int
concurrency: int
total_time: float
requests_per_second: float
latency_p50: float
latency_p95: float
latency_p99: float
error_rate: float
cpu_usage: float
memory_usage: float
class PerformanceTestRunner:
"""Runs performance tests and collects detailed metrics."""
def __init__(self, config_path: str, output_dir: str):
self.config = self._load_config(config_path)
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
def run_all_tests(self) -> List[PerformanceResult]:
"""Run all configured performance tests."""
results = []
for suite in self.config['test_suites']:
for scenario in suite['scenarios']:
result = self._run_scenario(suite, scenario)
results.append(result)
# Save results
self._save_results(results)
return results
def _run_scenario(self, suite: dict, scenario: dict) -> PerformanceResult:
"""Run a single performance test scenario."""
print(f"🏃 Running {scenario['name']}...")
# Start resource monitoring
monitor = ResourceMonitor()
monitor.start()
# Build hey command
cmd = self._build_hey_command(suite, scenario)
# Run test
start_time = time.time()
result = subprocess.run(cmd, capture_output=True, text=True)
end_time = time.time()
# Stop monitoring
cpu_usage, memory_usage = monitor.stop()
# Parse hey output
metrics = self._parse_hey_output(result.stdout)
return PerformanceResult(
scenario=scenario['name'],
requests=scenario['requests'],
concurrency=scenario['concurrency'],
total_time=end_time - start_time,
requests_per_second=metrics['rps'],
latency_p50=metrics['p50'],
latency_p95=metrics['p95'],
latency_p99=metrics['p99'],
error_rate=metrics['error_rate'],
cpu_usage=cpu_usage,
memory_usage=memory_usage
)
-
Fast-time-server integration
# Add fast-time-server as performance reference
cat > performance/data/time_request.json << 'EOF'
{"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
EOF
# Create fast-time-server performance test
cat > performance/tests/test_fast_time_server.py << 'EOF'
import pytest
import requests
import time
import statistics
from concurrent.futures import ThreadPoolExecutor
class TestFastTimeServerPerformance:
"""Performance tests for fast-time-server integration."""
def test_time_now_latency(self):
"""Test Time.Now endpoint latency."""
url = "http://localhost:8080/http"
payload = {"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
latencies = []
for _ in range(100):
start = time.perf_counter()
response = requests.post(url, json=payload)
end = time.perf_counter()
assert response.status_code == 200
latencies.append((end - start) * 1000) # ms
p95_latency = statistics.quantiles(latencies, n=20)[18] # 95th percentile
assert p95_latency < 50, f"P95 latency {p95_latency:.1f}ms exceeds 50ms threshold"
def test_concurrent_requests(self):
"""Test concurrent request handling."""
url = "http://localhost:8080/http"
payload = {"jsonrpc": "2.0", "method": "Time.Now", "id": 1}
def make_request():
start = time.perf_counter()
response = requests.post(url, json=payload)
end = time.perf_counter()
return response.status_code == 200, (end - start) * 1000
with ThreadPoolExecutor(max_workers=50) as executor:
start_time = time.perf_counter()
futures = [executor.submit(make_request) for _ in range(1000)]
results = [f.result() for f in futures]
end_time = time.perf_counter()
success_count = sum(1 for success, _ in results if success)
latencies = [latency for success, latency in results if success]
throughput = success_count / (end_time - start_time)
p95_latency = statistics.quantiles(latencies, n=20)[18]
assert success_count >= 995, f"Only {success_count}/1000 requests succeeded"
assert throughput >= 1000, f"Throughput {throughput:.1f} req/s below 1000 req/s"
assert p95_latency < 100, f"P95 latency {p95_latency:.1f}ms exceeds 100ms"
EOF
-
CI integration
# Add to existing GitHub Actions workflow
performance:
name: 🏃 Performance Testing
runs-on: ubuntu-latest
needs: [test] # Run after tests pass
if: github.ref == 'refs/heads/main' || github.event_name == 'pull_request'
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- name: ⬇️ Checkout source
uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for trend analysis
- name: 🐍 Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
- name: 🔧 Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[dev]
pip install hey
- name: 🏗️ Build fast-time-server
run: |
cd mcp-servers/go/fast-time-server
make build
- name: 🚀 Start services
run: |
make perf-setup
- name: 🏃 Run performance tests
run: |
make perf-test
- name: 📊 Compare against baseline
run: |
make perf-compare || echo "::warning::Performance regression detected"
- name: 📈 Generate performance report
run: |
make perf-report
- name: 📎 Upload performance artifacts
uses: actions/upload-artifact@v4
with:
name: performance-results-${{ github.sha }}
path: |
performance/results/
performance/reports/
retention-days: 30
- name: 💬 Comment PR with results
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const reportPath = 'performance/reports/summary.md';
if (fs.existsSync(reportPath)) {
const report = fs.readFileSync(reportPath, 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## 🏃 Performance Test Results\n\n${report}`
});
}
-
Performance regression detection
# performance/compare.py
#!/usr/bin/env python3
"""
Performance regression detection and alerting.
"""
import json
import sys
import argparse
from pathlib import Path
from typing import Dict, Any, List
class PerformanceComparator:
"""Compare performance results and detect regressions."""
def __init__(self, threshold_percent: float = 10.0):
self.threshold = threshold_percent
def compare_results(self, current: Dict[str, Any], baseline: Dict[str, Any]) -> Dict[str, Any]:
"""Compare current results against baseline."""
comparison = {
'regressions': [],
'improvements': [],
'summary': {}
}
for scenario in current.get('scenarios', []):
scenario_name = scenario['name']
baseline_scenario = self._find_baseline_scenario(baseline, scenario_name)
if not baseline_scenario:
continue
# Compare key metrics
metrics = ['requests_per_second', 'latency_p95', 'error_rate']
for metric in metrics:
regression = self._check_regression(
scenario.get(metric, 0),
baseline_scenario.get(metric, 0),
metric
)
if regression:
comparison['regressions'].append({
'scenario': scenario_name,
'metric': metric,
'current': scenario.get(metric, 0),
'baseline': baseline_scenario.get(metric, 0),
'change_percent': regression
})
return comparison
def _check_regression(self, current: float, baseline: float, metric: str) -> float:
"""Check if metric shows regression beyond threshold."""
if baseline == 0:
return 0
change_percent = ((current - baseline) / baseline) * 100
# For latency and error_rate, higher is worse
if metric in ['latency_p95', 'error_rate']:
return change_percent if change_percent > self.threshold else 0
# For throughput, lower is worse
elif metric == 'requests_per_second':
return -change_percent if -change_percent > self.threshold else 0
return 0
-
Performance dashboard and reporting
# performance/generate_report.py
#!/usr/bin/env python3
"""
Generate comprehensive performance reports with trend analysis.
"""
import json
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
from pathlib import Path
class PerformanceReporter:
"""Generate visual performance reports and trends."""
def generate_html_report(self, results_dir: Path, output_file: Path):
"""Generate comprehensive HTML performance report."""
# Load all historical results
all_results = self._load_historical_results(results_dir)
# Create visualizations
fig = make_subplots(
rows=2, cols=2,
subplot_titles=[
'Throughput Trends',
'Latency Trends (P95)',
'Error Rate Trends',
'Resource Usage'
]
)
# Throughput chart
fig.add_trace(
go.Scatter(
x=all_results['timestamps'],
y=all_results['throughput'],
mode='lines+markers',
name='Requests/sec',
line=dict(color='blue')
),
row=1, col=1
)
# Latency chart
fig.add_trace(
go.Scatter(
x=all_results['timestamps'],
y=all_results['latency_p95'],
mode='lines+markers',
name='P95 Latency (ms)',
line=dict(color='red')
),
row=1, col=2
)
# Generate HTML report
html_template = self._get_html_template()
html_content = html_template.format(
charts=fig.to_html(include_plotlyjs='cdn'),
summary_table=self._generate_summary_table(all_results),
timestamp=pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
)
output_file.write_text(html_content)
-
Alert system
# performance/alert.py - Performance alert system
#!/usr/bin/env python3
"""
Send alerts when performance regressions are detected.
"""
import json
import smtplib
import argparse
from email.mime.text import MIMEText
from typing import Dict, List
class PerformanceAlerts:
"""Send performance regression alerts."""
def __init__(self, config: Dict[str, Any]):
self.config = config
def check_and_alert(self, comparison_results: Dict[str, Any]):
"""Check for regressions and send alerts if needed."""
regressions = comparison_results.get('regressions', [])
if not regressions:
return
# Send email alert
self._send_email_alert(regressions)
# Send Slack alert if configured
if self.config.get('slack_webhook'):
self._send_slack_alert(regressions)
def _send_email_alert(self, regressions: List[Dict[str, Any]]):
"""Send email alert for performance regressions."""
subject = f"🚨 Performance Regression Detected - {len(regressions)} issues"
body_lines = [
"Performance regression detected in the following scenarios:",
"",
]
for regression in regressions:
body_lines.append(
f"• {regression['scenario']}: {regression['metric']} "
f"changed by {regression['change_percent']:.1f}%"
)
body = "\n".join(body_lines)
# Send email (implementation depends on your email setup)
print(f"ALERT: {subject}")
print(body)
-
Documentation integration
Add performance testing section to your docs:
# Performance Testing
## Running Performance Tests
```bash
# Run full performance test suite
make perf-test
# Establish new baseline
make perf-baseline
# Compare against baseline
make perf-compare
# Generate performance report
make perf-report
Fast-Time-Server Integration
The fast-time-server provides a reference benchmark for testing gateway performance:
# Test fast-time-server performance
make perf-fast-time
Performance Thresholds
| Metric |
Threshold |
| P95 Latency |
< 100ms |
| Throughput |
> 500 req/s |
| Error Rate |
< 1% |
-
Final validation
# Test the complete performance testing pipeline
make perf-setup
make perf-test
make perf-baseline
make perf-compare
make perf-report
make perf-clean
# Verify CI integration
git add performance/
git commit -m "Add performance testing infrastructure"
git push # Triggers CI with performance tests
📖 References
🧩 Additional Notes
- Fast-time-server integration: Use your existing fast-time-server as a performance reference point and baseline for gateway overhead measurement.
- Historical tracking: Store performance results with git commit metadata for precise regression attribution.
- Resource monitoring: Track CPU, memory, and database performance during tests to identify bottlenecks.
- Threshold tuning: Start with conservative thresholds and adjust based on actual performance characteristics.
- Test environment consistency: Use identical test environments (container specs, database config) for reliable comparisons.
- Regression alerts: Configure alerts for immediate notification when performance degrades beyond acceptable limits.
- Performance budgets: Set performance budgets for different API endpoints based on usage patterns.
Performance Testing Best Practices:
- Run tests in isolated environments to avoid interference
- Use realistic test data and scenarios that match production usage
- Measure both cold-start and warm-up performance
- Include database and external service performance in tests
- Track trends over time, not just point-in-time measurements
- Automate performance testing as part of CI/CD pipeline
- Set up alerts for both regressions and improvements
🧭 Chore Summary
Implement comprehensive performance testing and tracking system:
make perf-testandmake perf-reportto automatically benchmark every build, detect regressions, and track performance metrics over time with historical trend analysis and alerting.🧱 Areas Affected
make perf-test,make perf-baseline,make perf-report,make perf-compare)⚙️ Context / Rationale
Automated performance testing prevents performance regressions and provides visibility into system performance trends over time. Every build should execute standardized performance tests, record detailed metrics, and compare results against historical baselines to catch performance degradation early. This creates a performance safety net and enables data-driven optimization decisions.
What is Automated Performance Testing?
A systematic approach to measuring application performance on every build, storing results over time, and detecting performance regressions through automated comparison with historical baselines.
Key Components:
Simple Performance Test Example:
Advanced Performance Testing Setup:
MCPGateway Specific Test Scenarios:
📦 Related Make Targets
make perf-testmake perf-baselinemake perf-comparemake perf-reportmake perf-fast-timemake perf-regressionmake perf-cleanmake perf-servemake perf-exportmake perf-alertBold targets are mandatory; CI must fail if performance regressions exceed configured thresholds.
📋 Acceptance Criteria
make perf-testexecutes comprehensive performance test suite measuring latency, throughput, and resource usage.make perf-baselineestablishes and stores performance baselines for regression detection.make perf-compareautomatically detects performance regressions >10% from baseline.make perf-reportgenerates visual reports with historical trends and performance metrics.🛠️ Task List (suggested flow)
Performance testing infrastructure
mkdir -p performance/{tests,data,results,baselines,reports} # Create performance test configuration cat > performance/config.yaml << 'EOF' test_suites: - name: "mcpgateway_core" base_url: "http://localhost:4444" scenarios: - name: "tools_api" endpoint: "/tools/" method: "GET" requests: 1000 concurrency: 50 thresholds: p95_latency_ms: 100 throughput_rps: 500 - name: "fast_time_server" base_url: "http://localhost:8080" scenarios: - name: "time_now" endpoint: "/http" method: "POST" payload: '{"jsonrpc":"2.0","method":"Time.Now","id":1}' requests: 5000 concurrency: 100 thresholds: p95_latency_ms: 50 throughput_rps: 1000 EOFMakefile integration
Performance test runner
Fast-time-server integration
CI integration
Performance regression detection
Performance dashboard and reporting
Alert system
Documentation integration
Add performance testing section to your docs:
Fast-Time-Server Integration
The fast-time-server provides a reference benchmark for testing gateway performance:
# Test fast-time-server performance make perf-fast-timePerformance Thresholds
Final validation
📖 References
🧩 Additional Notes
Performance Testing Best Practices: