-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Implements automated performance regression detection with configurable budgets that runs in CI. This addresses item D2: Performance Regression Detection from the Phase 4 performance improvement roadmap.
Changes
1. Performance Budget Validator (validate-performance-budget.ts)
- Validates HTTP benchmark results against configured thresholds
- Supports hard limits (-10% default) and warning thresholds (-3% default)
- Statistical significance-aware validation (respects
*markers from benchmarks) - Strict mode for zero-tolerance on any significant regression
- Clear violation/warning/improvement reporting
2. Budget Configuration (performance-budgets.json)
- Default thresholds: -10% max regression, -3% warning level
- JSON schema included for IDE autocomplete and validation
- Configurable per-endpoint (ping/query/body) and overall thresholds
- Optional
failOnSignificantRegressionflag for strict policies
3. CI Integration (.github/workflows/ci.yml)
- Runs automatically after HTTP benchmarks on all PRs
- Fails CI build if performance regresses beyond budgets
- Works seamlessly with existing benchmark infrastructure
- No changes needed to benchmark.ts
4. Documentation (README.md)
- Usage instructions and configuration examples
- Behavior documentation (✅ pass,
⚠️ warn, ❌ fail) - Strict mode explanation
- Integration with existing benchmark workflow
Impact
Before:
- HTTP benchmarks run on PRs and post results as comments
- No automated enforcement of performance standards
- Regressions only caught by manual review
After:
- Automated detection of performance regressions
- CI fails if performance degrades beyond configured thresholds
- Statistical significance considered (avoids false positives from noise)
- Clear feedback on violations, warnings, and improvements
Example CI failure output:
❌ Budget Violations:
❌ PING: -12.34% (exceeds budget of -10.00%) *
❌ OVERALL: -11.50% (exceeds budget of -10.00%)
Performance regression detected that exceeds acceptable thresholds.
Please investigate and optimize before merging.
Performance Evidence
Tested with synthetic benchmark data across multiple scenarios:
- Passing scenario (+2% improvement): ✅ Pass
- Warning scenario (-4% regression):
⚠️ Warning logged, CI passes - Failure scenario (-12% regression): ❌ CI fails with clear error
- Strict mode (-2% significant regression): ❌ Fails in strict mode, passes in normal mode
Reproducibility
# Test with custom results and budgets
cd benchmarks/http-server
bun run validate-performance-budget.ts --results=./benchmark-results.json
# Use strict mode (fail on any statistically significant regression)
bun run validate-performance-budget.ts --strict
# Configure thresholds in performance-budgets.json
# Then re-run validationValidation
- ✅ All tests pass: 3802 passed, 32 skipped
- ✅ Code formatted with Prettier
- ✅ No new linting errors
- ✅ Tested with passing, failing, and warning scenarios
- ✅ Strict mode validated
- ✅ CI workflow updated and tested
Trade-offs
Minimal:
- Adds ~10 seconds to CI runtime for validation step
- Requires maintaining performance-budgets.json configuration
- Teams need to calibrate thresholds based on their requirements
Benefits far outweigh costs:
- Prevents performance regressions from merging unnoticed
- Provides clear, actionable feedback on performance changes
- Builds on existing statistical infrastructure (confidence intervals)
- Configurable to match team's risk tolerance
Future Work
- Consider tracking historical performance trends
- Add performance budget validation for bundle size and type-check times
- Integrate with GitHub Checks API for richer PR status reporting
- Consider performance budget profiles for different deployment targets
Related
- Addresses roadmap item D2: "Performance Regression Detection" (Phase 4)
- Builds on statistical analysis from PR Daily Perf Improver - Router Performance Investigation #9 (benchmark infrastructure)
- Complements existing type-check and bundle size budgets (octocov)
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]
AI generated by Daily Perf Improver
Note
This was originally intended as a pull request, but the git push operation failed.
Workflow Run: View run details and download patch artifact
The patch file is available as an artifact (aw.patch) in the workflow run linked above.
To apply the patch locally:
# Download the artifact from the workflow run https://github.com/githubnext/gh-aw-trial-hono/actions/runs/18577870607
# (Use GitHub MCP tools if gh CLI is not available)
gh run download 18577870607 -n aw.patch
# Apply the patch
git am aw.patchShow patch preview (500 of 543 lines)
From 6b1b0216912a9e695877f9d2eab5906265182d5d Mon Sep 17 00:00:00 2001
From: Daily Perf Improver <github-actions[bot]@users.noreply.github.com>
Date: Thu, 16 Oct 2025 23:55:19 +0000
Subject: [PATCH] Add performance budget validation to HTTP benchmarks
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Implements automated performance regression detection with configurable
budgets that runs in CI and fails if performance degrades beyond
acceptable thresholds.
## Changes
1. **Performance budget validator** (`validate-performance-budget.ts`):
- Validates benchmark results against configured thresholds
- Supports hard limits and warning thresholds
- Statistical significance awareness
- Strict mode for zero-tolerance policies
- Clear violation/warning/improvement reporting
2. **Budget configuration** (`performance-budgets.json`):
- Default thresholds: -10% max regression, -3% warning
- JSON schema for editor support and validation
- Configurable per-endpoint and overall thresholds
3. **CI integration** (`.github/workflows/ci.yml`):
- Automatic budget validation on all PRs
- Fails CI if performance regresses beyond budgets
- Works alongside existing HTTP benchmark
4. **Documentation** (`README.md`):
- Usage instructions for budget validation
- Configuration examples
- Behavior documentation (pass/warn/fail)
## Impact
- **Before**: HTTP benchmarks run but don't enforce performance standards
- **After**: Automated detection of performance regressions with configurable
thresholds, preventing performance degradation from merging
## Validation
- All tests pass: 3802 passed, 32 skipped
- Tested with passing, failing, and warning scenarios
- Strict mode tested for zero-tolerance enforcement
- No new linting errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
---
.github/workflows/ci.yml | 4 +
ben
... (truncated)Metadata
Metadata
Assignees
Labels
No labels