This document describes the implementation of the CrashLens Retry Loop Detector for the AI Email Categorizer Backend.
The CrashLens Retry Loop Detector is designed to identify and prevent expensive retry patterns in AI model interactions, specifically targeting:
- Excessive retry attempts that indicate poor error handling
- High-cost retry cascades that burn through budget
- Expensive model usage in retry scenarios
- Rapid successive retries without proper backoff
- Fallback model usage patterns
-
crashlens_retry_policy.yaml- Main policy configuration- Defines 5 detection rules with severity levels
- Sets cost thresholds and budget limits
- Configures global settings
-
crashlens_config.yaml- Updated main configuration- Added retry detection section
- Model fallback mapping
- Alert thresholds
-
crashlens_retry_integration.py- Integration classRetryLoopDetectorclass for policy enforcement- Integration examples for existing code
- Real-time retry decision making
-
test_retry_detection.py- Test suite- Creates sample log data with various retry scenarios
- Tests all policy rules
- Validates violation detection
.github/workflows/crashlens-scan.yml- Updated workflow- Added retry detection analysis step
- Generates comprehensive reports
- Uploads analysis artifacts
- Trigger:
retry_count > 3 - Action: Fail build
- Suggestion: Implement exponential backoff and circuit breaker patterns
- Trigger: Expensive models (
gpt-4,gpt-4-turbo,claude-3-opus) +retry_count > 1 - Action: Warn
- Suggestion: Use cheaper fallback models for retries
- Trigger:
cost > $0.05+retry_count > 0 - Action: Warn
- Suggestion: Review error handling and consider model downgrade
- Trigger:
retry_count > 2+total_tokens < 200 - Action: Warn
- Suggestion: Implement proper backoff delays
- Trigger:
fallback_count > 0 - Action: Warn
- Suggestion: Monitor primary model issues
- Warning: $0.01 per operation
- Critical: $0.05 per operation
- Daily retry budget: $5.00
- Monthly retry budget: $100.00
- Max cost per trace: $0.20
# Run the test suite
python test_retry_detection.py
# Test integration
python crashlens_retry_integration.pyThe workflow automatically runs:
- On pushes to
main/rajbranches - On pull requests to
main - Daily at 6 AM UTC
- Manual triggers with scan options
from crashlens_retry_integration import RetryLoopDetector
# Initialize detector
detector = RetryLoopDetector()
# Check if retry should be allowed
retry_decision = detector.should_allow_retry(
current_retry_count=retry_count,
cost_so_far=total_cost,
model=current_model
)
if not retry_decision['allow']:
print(f"Retry blocked: {retry_decision['reason']}")
break
# Check for violations in log entries
violations = detector.check_retry_violation(log_entry)retry-analysis-results.json- Detailed violation report- Security scan reports
- Performance analysis results
- Log analysis summaries
Automatic comments on pull requests with:
- Violation counts by severity
- Cost analysis
- Recommendations for fixes
The system analyzes existing JSONL logs for:
- Retry patterns
- Cost accumulation
- Model usage efficiency
- Error handling effectiveness
Edit crashlens_retry_policy.yaml to:
- Adjust retry count limits
- Modify cost thresholds
- Add new detection rules
- Change severity levels
Configure in crashlens_config.yaml:
retry_detection:
fallback_models:
"gpt-4": "gpt-3.5-turbo"
"claude-3-opus": "claude-3-haiku"
"gemini-2.0-flash": "gemini-1.5-flash"📊 Analysis Results:
Total requests: 15
Retry requests: 4
High-cost retries: 3
Policy violations: 10
⚠️ Policy Violations Detected:
[CRITICAL] excessive_retry_pattern: Excessive retries detected: 5
[HIGH] expensive_model_retries: Expensive model gpt-4 used in retry scenario
[HIGH] high_cost_retry_cascade: High-cost retry detected: $0.1500
- ✅ Automatic analysis on every push/PR
- 📊 Comprehensive reporting
- 🚨 Build failures on critical violations
- 📈 Historical trend tracking
- Deploy: Commit the files and push to trigger the workflow
- Monitor: Review analysis reports and adjust thresholds
- Integrate: Add retry detection to your existing email classification logic
- Optimize: Use fallback models and implement circuit breakers based on recommendations
The system is now ready to detect and prevent expensive retry loops in your AI email categorizer backend!