AIPerf v0.2.0
AIPerf Release Notes
Summary
AIPerf v0.2.0 introduces time-based benchmarking with configurable grace periods and request cancellation capabilities. The release adds advanced metrics including goodput measurement for SLO compliance, GPU telemetry, and inter-chunk latency tracking.
New Features
Time-Based Benchmarking
- Time-based benchmarking support - Run benchmarks for a specified duration with configurable grace periods for more realistic testing scenarios
- Benchmark grace period - Added grace period functionality to allow for proper warmup and cooldown phases during benchmarking
Request Management & Control
- Request cancellation - Added ability to cancel requests during benchmarking to test timeout behavior and service resilience
- Fixed-schedule for Mooncake traces - Enhanced trace replay with fixed-schedule detection and support for non-fixed-schedule trace formats
- Request rate with concurrency limits - Added ability to limit HTTP connections and control request concurrency for more realistic load testing
Advanced Metrics & Monitoring
- Goodput metric - Added goodput metric to measure throughput of requests meeting user-defined SLOs, with comprehensive tutorial support
- GPU Telemetry - Integrated GPU monitoring and telemetry collection for comprehensive performance analysis
- Inter-chunk-latency metric - Added inter-chunk latency tracking using raw value lists for detailed streaming performance analysis
- Total ISL/OSL metrics - Added total input/output sequence length metrics with improved CSV/JSON export support
- Per-record metrics export - Enhanced profile export with per-record metrics in
profile_export.jsonl - Mixed ISL/OSL distributions - Support for mixed input/output sequence length distributions in benchmarking
Video & Multimedia Support
- Synthetic video support - Added support for video benchmarking and synthetic video generation
Enhanced Data Management
- Inputs.json file for dataset traceability - Added dataset traceability through inputs.json file generation
- Request traceability headers - Added X-Request-Id and X-Correlation-Id headers for improved request tracking
Bug Fixes
Core Functionality
- ZMQ graceful termination - Fixed graceful termination of ZMQ context to prevent hanging processes
- Worker count limits - Capped default maximum workers to 32 to prevent resource exhaustion
- Race conditions in credit issuing - Fixed race conditions in credit issuing strategy for more stable performance
- Startup error handling - Improved error handling during startup with clear error messages and proper process exit
Request Processing
- Empty choices array handling - Fixed IndexError when OpenAI choices array is empty
- Request metadata validation - Fixed bug with request metadata validation for failed requests
Export & Data Handling
- CSV export logic - Fixed CSV export parsing to ensure correct data formatting
- JSONL file writing - Resolved issues with writing to JSONL files
- GenAI-Perf JSON format compatibility - Fixed JSON summary export to match GenAI-Perf format for better compatibility
Platform-Specific Fixes
- macOS Textual UI Dashboard - Fixed compatibility issues with Textual UI Dashboard on macOS systems
- Image test random seed - Set proper random seeds for image tests to fix sporadic test failures
Telemetry & Performance
- Telemetry Manager shutdown timing - Fixed issue where Telemetry Manager shuts down before profile configuration finishes
- Goodput release issues - Cherry-picked fix for goodput-related release problems
- CPU usage warnings - Added warnings when worker CPU usage exceeds 85% to help identify performance bottlenecks
Documentation & Tutorials
New Documentation
- Goodput tutorial - Complete tutorial on using the goodput metric for SLO validation
- Advanced features tutorials - Tutorials covering advanced benchmarking features
- Trace replay tutorial with real data - Updated trace replay tutorial with real Mooncake data examples
- Feature comparison with GenAI-Perf - Added detailed feature comparison matrix between AIPerf and GenAI-Perf
Infrastructure & Development
Build & Dependencies
- Flexible dependencies - Made package dependencies more flexible for better compatibility
- PyProject.toml cleanup - Cleaned up and organized pyproject.toml configuration
- License field compliance - Updated pyproject.toml license field for wheeltamer compliance
- Dependency updates - Removed pandas dependency (now using numpy only) and updated numpy to 1.26.4
Refactoring & Performance
Core Components
- Credit processor refactoring - Refactored credit processing system for better performance and maintainability
- Console output enhancements - Added median values to console output for better statistical insight
Performance Optimizations
- Performance test marking - Properly marked SSE tests as performance tests for better test organization
Known Issues
- InvalidStateError - Logs show an InvalidStateError during benchmarking. This is handled gracefully and will not impact benchmark results.