Skip to content

Latest commit

Β 

History

History
407 lines (347 loc) Β· 13.3 KB

File metadata and controls

407 lines (347 loc) Β· 13.3 KB

Architecture Overview

System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INTELLIGENT TEST GENERATOR                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

INPUT                          PROCESSING                       OUTPUT
─────                          ──────────                       ──────

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Teacher    β”‚              β”‚  Domain Analyzer   β”‚         β”‚  Generated  β”‚
β”‚   Samples    │─────────────>β”‚                    β”‚         β”‚   Tests     β”‚
β”‚  (optional)  β”‚              β”‚  β€’ Extract patternsβ”‚         β”‚ (10-50 tests)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚  β€’ Understand      β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚    capabilities    β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚  β€’ Identify personasβ”‚        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  POWER.md    │─────────────>β”‚  β€’ Analyze         β”‚        β”‚   Domain    β”‚
β”‚  (optional)  β”‚              β”‚    assertions      β”‚        β”‚  Analysis   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β”‚ Domain Understanding
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚ Intelligent Gen.   β”‚
                              β”‚                    β”‚
                              β”‚  β€’ Build prompts   β”‚
                              β”‚  β€’ Generate batchesβ”‚
                              β”‚  β€’ Ensure diversityβ”‚
                              β”‚  β€’ Validate output β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                                       β–Ό
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚   AWS Bedrock      β”‚
                              β”‚  (Claude Models)   β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Architecture

test_data_generator/
β”‚
β”œβ”€β”€ domain_analyzer.py ─────────────────┐
β”‚   β”œβ”€ DomainAnalyzer                   β”‚
β”‚   β”‚  β”œβ”€ analyze_test_samples()        β”‚ Phase 1: Understanding
β”‚   β”‚  β”œβ”€ _extract_structural_patterns()β”‚
β”‚   β”‚  β”œβ”€ _extract_domain_understanding()
β”‚   β”‚  └─ _call_bedrock()               β”‚
β”‚                                        β”‚
β”œβ”€β”€ intelligent_generator.py ────────────
β”‚   β”œβ”€ IntelligentTestGenerator         β”‚
β”‚   β”‚  β”œβ”€ generate_test_cases()         β”‚ Phase 2: Generation
β”‚   β”‚  β”œβ”€ _generate_batch()             β”‚
β”‚   β”‚  β”œβ”€ _build_generation_prompt()    β”‚
β”‚   β”‚  └─ _validate_and_fix_tests()     β”‚
β”‚                                        β”‚
└── cli.py ──────────────────────────────
    β”œβ”€ main()                            β”‚ Phase 3: Interface
    β”œβ”€ load_teacher_samples()            β”‚
    └─ load_power_instructions()         β”‚

Data Flow

Phase 1: Domain Analysis

Teacher Samples ───> Structural Analysis ───> Patterns
                              β”‚
                              β–Ό
                     LLM Analysis (Bedrock)
                              β”‚
                              β–Ό
                    Domain Understanding
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ β€’ Capabilities      β”‚
                    β”‚ β€’ Personas          β”‚
                    β”‚ β€’ Success criteria  β”‚
                    β”‚ β€’ Complexity factorsβ”‚
                    β”‚ β€’ Edge cases        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 2: Intelligent Generation

Domain Understanding ───> Prompt Builder ───> Bedrock API
       +                       β”‚                   β”‚
Teacher Samples                β”‚                   β”‚
       +                       β–Ό                   β–Ό
Diversity Factor        Batch Generation      LLM Response
       β”‚                       β”‚                   β”‚
       β”‚                       β”‚<β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                       β”‚
       └──────────────────> Validator
                               β”‚
                               β–Ό
                       Valid Test Cases

Phase 3: Output & Integration

Generated Tests ───> Individual Files ───> test_data_expanded/
      β”‚                                           β”‚
      └────────────> All Tests File              β”‚
                           β”‚                      β”‚
                           └───────────────────────
                                                  β”‚
Domain Analysis ───> Analysis File               β”‚
                           β”‚                      β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                  β”‚
                                                  β–Ό
                                    Evolution Config Update
                                                  β”‚
                                                  β–Ό
                                    Run Evolution Workflow

Key Design Decisions

1. Two-Phase Approach (Analysis β†’ Generation)

Why?

  • Separate understanding from generation
  • Reusable domain analysis
  • Better quality control

Tradeoffs:

  • Two LLM calls instead of one
  • Slightly slower, but much better quality

2. Batch Generation

Why?

  • Ensures diversity across batches
  • Better progress tracking
  • Fault tolerance (partial failures OK)

Tradeoffs:

  • More API calls
  • More complex code
  • Better results and reliability

3. Validation & Auto-Fix

Why?

  • LLM output can be imperfect
  • Ensures structural consistency
  • Reduces manual cleanup

Tradeoffs:

  • May mask generation issues
  • Extra processing
  • Much better usability

4. Diversity Control

Why?

  • Different use cases need different diversity
  • Allows tuning based on needs
  • Explicit control over exploration/exploitation

Tradeoffs:

  • Extra parameter to tune
  • Complexity in prompt engineering
  • Flexibility worth the complexity

Prompt Engineering Strategy

Domain Analysis Prompt

Purpose: Deep understanding of domain
Structure:
  1. Context (teacher samples + POWER.md)
  2. Analysis requirements (capabilities, personas, etc.)
  3. Output format (structured JSON)
  
Temperature: 0.3 (low - want consistent analysis)
Max tokens: 8000

Generation Prompt

Purpose: Create diverse, valid test cases
Structure:
  1. Domain understanding (from analysis)
  2. Teacher examples (2-3 samples)
  3. Generation requirements (count, complexity, diversity)
  4. Output format (test case array)
  
Temperature: 0.8 (higher - want creativity)
Max tokens: 16000

Batch-Specific Focus

Purpose: Ensure diversity across batches
Strategy:
  - Batch 0: Core capabilities
  - Batch 1: User personas
  - Batch 2: Edge cases
  - Repeat with different aspects
  
Result: Natural diversity without explicit deduplication

Validation Pipeline

Generated Test
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Has ID?     β”‚ ─No─> Generate ID
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Yes
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Has name?   β”‚ ─No─> Generate name
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Yes
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Valid       β”‚ ─No─> Set default
β”‚ complexity? β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Yes
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Has         β”‚ ─No─> Skip test
β”‚ assertions? β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Yes
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Validate    β”‚ ─Invalid─> Remove invalid
β”‚ assertions  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚ Valid
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Add to      β”‚
β”‚ output set  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Integration Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   EVOLUTION WORKFLOW                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Test Data   β”‚
    β”‚  Generator   β”‚ (NEW)
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ Expands test set
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Test Data   β”‚
    β”‚   Loader     β”‚ (EXISTING)
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Evolution   β”‚
    β”‚ Orchestrator β”‚ (EXISTING)
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Adapter    β”‚
    β”‚    (Agent)   β”‚ (EXISTING)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Error Handling Strategy

Level 1: Graceful Degradation
  - Missing POWER.md? Continue without it
  - Some tests invalid? Use valid ones
  - Batch failed? Continue with others

Level 2: Validation & Auto-Fix
  - Missing fields? Add defaults
  - Invalid assertions? Remove them
  - Wrong structure? Fix if possible

Level 3: Clear Errors
  - No teacher samples? Error & exit
  - Bedrock unavailable? Error & exit
  - All tests invalid? Error & exit

Scalability Considerations

Current Scale

  • Teacher samples: 1-10
  • Generated tests: 10-50
  • Time: 3-5 minutes
  • Cost: $1-2

Future Scale

  • Teacher samples: 10-100
  • Generated tests: 100-500
  • Time: 10-30 minutes
  • Cost: $10-20

Optimization Opportunities

  1. Cache domain analysis
  2. Parallel batch generation
  3. Incremental generation
  4. Template extraction
  5. Local model support

Testing Strategy

Unit Tests

test_basic.py
  β”œβ”€ Import validation
  β”œβ”€ Structural analysis (no API)
  β”œβ”€ Complexity analysis (no API)
  └─ Assertion analysis (no API)

Integration Tests

example.py
  β”œβ”€ Basic generation
  β”œβ”€ With POWER.md
  β”œβ”€ Domain analysis
  β”œβ”€ High diversity
  └─ Specific complexity

End-to-End Tests

1. Generate tests
2. Update config
3. Run evolution
4. Validate metrics

Configuration Management

CLI Arguments ───> Generator Config
                         β”‚
                         β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚ Region   β”‚
                   β”‚ Model ID β”‚
                   β”‚ Temp     β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
                   Bedrock Client

Dependencies

External:
  β”œβ”€ boto3 (AWS SDK)
  β”œβ”€ json (stdlib)
  β”œβ”€ logging (stdlib)
  β”œβ”€ pathlib (stdlib)
  └─ argparse (stdlib)

Internal:
  β”œβ”€ domain_analyzer
  └─ intelligent_generator

Evolution Framework:
  β”œβ”€ TestCase (core.test_case)
  └─ TestCaseLoader (core.test_case)

Summary

Architecture Principles:

  1. Separation of Concerns: Analysis, generation, validation
  2. Fail-Safe: Graceful degradation and auto-fixing
  3. Extensibility: Easy to add new strategies
  4. Integration: Minimal changes to existing workflow

Key Strengths:

  • Modular design
  • Clear data flow
  • Robust error handling
  • Well-documented
  • Easy to extend

Design Tradeoffs:

  • Complexity vs. Quality β†’ Chose quality
  • Speed vs. Diversity β†’ Chose diversity
  • Automation vs. Control β†’ Balanced both