βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INTELLIGENT TEST GENERATOR β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT PROCESSING OUTPUT
βββββ ββββββββββ ββββββ
ββββββββββββββββ ββββββββββββββββββββββ βββββββββββββββ
β Teacher β β Domain Analyzer β β Generated β
β Samples ββββββββββββββ>β β β Tests β
β (optional) β β β’ Extract patternsβ β (10-50 tests)β
ββββββββββββββββ β β’ Understand β βββββββββββββββ
β capabilities β
ββββββββββββββββ β β’ Identify personasβ βββββββββββββββ
β POWER.md ββββββββββββββ>β β’ Analyze β β Domain β
β (optional) β β assertions β β Analysis β
ββββββββββββββββ ββββββββββ¬ββββββββββββ βββββββββββββββ
β
β Domain Understanding
β
βΌ
ββββββββββββββββββββββ
β Intelligent Gen. β
β β
β β’ Build prompts β
β β’ Generate batchesβ
β β’ Ensure diversityβ
β β’ Validate output β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β AWS Bedrock β
β (Claude Models) β
ββββββββββββββββββββββ
test_data_generator/
β
βββ domain_analyzer.py ββββββββββββββββββ
β ββ DomainAnalyzer β
β β ββ analyze_test_samples() β Phase 1: Understanding
β β ββ _extract_structural_patterns()β
β β ββ _extract_domain_understanding()
β β ββ _call_bedrock() β
β β
βββ intelligent_generator.py ββββββββββββ€
β ββ IntelligentTestGenerator β
β β ββ generate_test_cases() β Phase 2: Generation
β β ββ _generate_batch() β
β β ββ _build_generation_prompt() β
β β ββ _validate_and_fix_tests() β
β β
βββ cli.py ββββββββββββββββββββββββββββββ€
ββ main() β Phase 3: Interface
ββ load_teacher_samples() β
ββ load_power_instructions() β
Teacher Samples βββ> Structural Analysis βββ> Patterns
β
βΌ
LLM Analysis (Bedrock)
β
βΌ
Domain Understanding
βββββββββββββββββββββββ
β β’ Capabilities β
β β’ Personas β
β β’ Success criteria β
β β’ Complexity factorsβ
β β’ Edge cases β
βββββββββββββββββββββββ
Domain Understanding βββ> Prompt Builder βββ> Bedrock API
+ β β
Teacher Samples β β
+ βΌ βΌ
Diversity Factor Batch Generation LLM Response
β β β
β β<βββββββββββββββββββ
β β
βββββββββββββββββββ> Validator
β
βΌ
Valid Test Cases
Generated Tests βββ> Individual Files βββ> test_data_expanded/
β β
βββββββββββββ> All Tests File β
β β
ββββββββββββββββββββββββ€
β
Domain Analysis βββ> Analysis File β
β β
ββββββββββββββββββββββββ
β
βΌ
Evolution Config Update
β
βΌ
Run Evolution Workflow
Why?
- Separate understanding from generation
- Reusable domain analysis
- Better quality control
Tradeoffs:
- Two LLM calls instead of one
- Slightly slower, but much better quality
Why?
- Ensures diversity across batches
- Better progress tracking
- Fault tolerance (partial failures OK)
Tradeoffs:
- More API calls
- More complex code
- Better results and reliability
Why?
- LLM output can be imperfect
- Ensures structural consistency
- Reduces manual cleanup
Tradeoffs:
- May mask generation issues
- Extra processing
- Much better usability
Why?
- Different use cases need different diversity
- Allows tuning based on needs
- Explicit control over exploration/exploitation
Tradeoffs:
- Extra parameter to tune
- Complexity in prompt engineering
- Flexibility worth the complexity
Purpose: Deep understanding of domain
Structure:
1. Context (teacher samples + POWER.md)
2. Analysis requirements (capabilities, personas, etc.)
3. Output format (structured JSON)
Temperature: 0.3 (low - want consistent analysis)
Max tokens: 8000
Purpose: Create diverse, valid test cases
Structure:
1. Domain understanding (from analysis)
2. Teacher examples (2-3 samples)
3. Generation requirements (count, complexity, diversity)
4. Output format (test case array)
Temperature: 0.8 (higher - want creativity)
Max tokens: 16000
Purpose: Ensure diversity across batches
Strategy:
- Batch 0: Core capabilities
- Batch 1: User personas
- Batch 2: Edge cases
- Repeat with different aspects
Result: Natural diversity without explicit deduplication
Generated Test
β
βΌ
βββββββββββββββ
β Has ID? β βNoβ> Generate ID
ββββββββ¬βββββββ
β Yes
βΌ
βββββββββββββββ
β Has name? β βNoβ> Generate name
ββββββββ¬βββββββ
β Yes
βΌ
βββββββββββββββ
β Valid β βNoβ> Set default
β complexity? β
ββββββββ¬βββββββ
β Yes
βΌ
βββββββββββββββ
β Has β βNoβ> Skip test
β assertions? β
ββββββββ¬βββββββ
β Yes
βΌ
βββββββββββββββ
β Validate β βInvalidβ> Remove invalid
β assertions β
ββββββββ¬βββββββ
β Valid
βΌ
βββββββββββββββ
β Add to β
β output set β
βββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EVOLUTION WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ
β Test Data β
β Generator β (NEW)
ββββββββ¬ββββββββ
β Expands test set
βΌ
ββββββββββββββββ
β Test Data β
β Loader β (EXISTING)
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Evolution β
β Orchestrator β (EXISTING)
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Adapter β
β (Agent) β (EXISTING)
ββββββββββββββββ
Level 1: Graceful Degradation
- Missing POWER.md? Continue without it
- Some tests invalid? Use valid ones
- Batch failed? Continue with others
Level 2: Validation & Auto-Fix
- Missing fields? Add defaults
- Invalid assertions? Remove them
- Wrong structure? Fix if possible
Level 3: Clear Errors
- No teacher samples? Error & exit
- Bedrock unavailable? Error & exit
- All tests invalid? Error & exit
- Teacher samples: 1-10
- Generated tests: 10-50
- Time: 3-5 minutes
- Cost: $1-2
- Teacher samples: 10-100
- Generated tests: 100-500
- Time: 10-30 minutes
- Cost: $10-20
- Cache domain analysis
- Parallel batch generation
- Incremental generation
- Template extraction
- Local model support
test_basic.py
ββ Import validation
ββ Structural analysis (no API)
ββ Complexity analysis (no API)
ββ Assertion analysis (no API)example.py
ββ Basic generation
ββ With POWER.md
ββ Domain analysis
ββ High diversity
ββ Specific complexity1. Generate tests
2. Update config
3. Run evolution
4. Validate metricsCLI Arguments βββ> Generator Config
β
βΌ
ββββββββββββ
β Region β
β Model ID β
β Temp β
ββββββββββββ
β
βΌ
Bedrock Client
External:
ββ boto3 (AWS SDK)
ββ json (stdlib)
ββ logging (stdlib)
ββ pathlib (stdlib)
ββ argparse (stdlib)
Internal:
ββ domain_analyzer
ββ intelligent_generator
Evolution Framework:
ββ TestCase (core.test_case)
ββ TestCaseLoader (core.test_case)
Architecture Principles:
- Separation of Concerns: Analysis, generation, validation
- Fail-Safe: Graceful degradation and auto-fixing
- Extensibility: Easy to add new strategies
- Integration: Minimal changes to existing workflow
Key Strengths:
- Modular design
- Clear data flow
- Robust error handling
- Well-documented
- Easy to extend
Design Tradeoffs:
- Complexity vs. Quality β Chose quality
- Speed vs. Diversity β Chose diversity
- Automation vs. Control β Balanced both