Generate diverse, high-quality test cases by understanding your task domain and learning from teacher samples.
This intelligent test generator solves the problem of limited test data by:
- Understanding your domain - Analyzes source code/documentation and optional teacher test samples using LLM to extract domain patterns, capabilities, user personas, and success criteria
- Generating diverse tests - Creates new test cases that explore different scenarios, edge cases, and complexity levels
- Maintaining quality - Ensures generated tests follow proper structure and quality standards
Key Feature: Source context (code/docs) is always required for domain understanding. Teacher samples are optional - you can bootstrap from source code alone!
Problem: You need more test samples for agent evolution, but manually creating test cases is time-consuming and you want to ensure good coverage.
Solution: This generator learns from a small set of teacher samples and generates diverse, realistic test cases automatically.
- π§ Domain Understanding: Uses Claude to deeply understand what your tests validate
- π― Targeted Generation: Generates tests covering different capabilities, personas, and scenarios
- π Diversity Control: Tune how similar/different generated tests should be from teachers
- β Quality Validation: Ensures generated tests have proper structure and valid assertions
- π Analysis Reports: Provides insights into domain patterns and test coverage
Generate tests directly from your source code without any teacher samples:
python -m test_data_generator.cli \
--source-context /path/to/your/source/code/ \
--count 20 \
--output generated_tests/The generator analyzes your source code, documentation, and configuration files to understand:
- What capabilities your system has
- What should be tested
- Appropriate test structures and assertions
For best results, combine teacher samples with source context:
python -m test_data_generator.cli \
--teacher-samples test_samples/ \
--source-context /path/to/your/source/code/ \
--count 20 \
--output generated_tests/Generate specific complexity with high diversity:
python -m test_data_generator.cli \
--teacher-samples test_samples/ \
--source-context /path/to/source/folder/ \
--count 15 \
--complexity medium \
--diversity 0.9 \
--output generated_tests/Analyze your domain without generating tests:
python -m test_data_generator.cli \
--source-context /path/to/source/folder/ \
--teacher-samples test_samples/ \
--analyze-only \
--output analysis/Required:
--source-context PATH Path to source code/docs directory or file
REQUIRED for domain understanding
--output PATH Output directory for generated tests
Optional:
--teacher-samples PATH Path to teacher test samples (file or directory)
Optional - can bootstrap from source context alone
Optional:
--count N Number of tests to generate (default: 10)
--complexity LEVEL Generate specific complexity: simple, medium, or complex
--diversity FACTOR Diversity 0-1: 0=similar, 1=very diverse (default: 0.8)
--region REGION AWS region (default: us-west-2)
--model-id MODEL Bedrock model ID (default: claude-opus-4-5)
--temperature TEMP Generation temperature 0-1 (default: 0.8)
--analyze-only Only analyze domain, don't generate
--no-deduplicate Disable test name deduplication
--no-ensure-complex Disable ensuring 20% complex tests
--use-two-pass-analysis Use two-pass analysis for large source context
--loading-strategy STR Strategy for loading files (default: agent_evaluation)
--verbose Enable verbose logging
The generator analyzes your teacher samples to extract:
- Core capabilities being tested
- Domain-specific patterns and scenarios
- User personas and interaction styles
- Success criteria and quality expectations
- Edge cases and complexity factors
- Assertion patterns and what they validate
Using the domain understanding, it generates tests that:
- Follow the same structure as teacher samples
- Explore different scenarios and edge cases
- Cover different user personas and skill levels
- Maintain appropriate assertion quality
- Match desired complexity distribution
Each generated test is validated to ensure:
- Required fields are present
- Assertions are valid and complete
- Structure matches teacher samples
- Reasonable defaults for timing/turns
After generation, you'll find:
output_directory/
βββ domain_analysis.json # Domain understanding and patterns
βββ all_generated_tests.json # All tests in one file
βββ generated_test_001.json # Individual test files
βββ generated_test_002.json
βββ ...
Use generated tests with your evolution workflow:
# 1. Generate tests (source context always required)
python -m test_data_generator.cli \
--source-context src/ \
--teacher-samples test_samples/ \
--count 30 \
--output test_data_expanded/
# 2. Update config to use expanded test data
# Edit examples/config.yaml:
# test_data:
# path: "test_data_expanded/"
# 3. Run evolution with more tests
python run_evolution.py \
--config examples/config.yaml \
--evolve \
--auto-patch \
--validation-method agent_judge \
--mode standard # Now uses expanded test setThe --diversity parameter controls how different generated tests are from teacher samples:
- 0.0 - 0.3: Low diversity - Stay close to teacher patterns, mostly variations
- 0.4 - 0.6: Medium diversity - Explore different aspects of core capabilities
- 0.7 - 1.0: High diversity - Explore edge cases, error handling, unusual scenarios
- Provide comprehensive source context: Include documentation, code, configuration files
- Organize your source well: Clear structure helps the analyzer understand your domain
- Include key files: POWER.md, README, main entry points, core logic
- Start with quality teachers: Better teacher samples = better generated tests
- Start at 0.8 diversity: Adjust based on results
- Review generated tests: Spot-check first few generations
- Iterate: Use domain analysis to understand coverage gaps
- Start with lower count: Generate 5-10 tests first to verify quality
- Review structure: First-generation tests may need manual refinement
- Use refined tests as teachers: Use generated tests as teacher samples for next round
- Iterate to improve: Each generation learns from previous results
You can also use the generator in your Python code. Note: source_context is always required.
from test_data_generator import IntelligentTestGenerator
from test_data_generator.context_loader import ContextLoader
# Load source context (required)
loader = ContextLoader(strategy='agent_evaluation')
source_context = loader.load('/path/to/your/source/')
generator = IntelligentTestGenerator(
region_name=your_region,
model_id=your_model,
temperature=0.8
)
# Generate from source context only (no teacher samples)
generated = generator.generate_test_cases(
teacher_samples=[], # Empty list
count=10,
source_context=source_context, # Required
diversity_factor=0.8,
output_dir='output/'
)
print(f"Generated {len(generated)} tests from source context")from test_data_generator import IntelligentTestGenerator
from test_data_generator.context_loader import ContextLoader
# Load source context (required)
loader = ContextLoader(strategy='agent_evaluation')
source_context = loader.load('/path/to/your/source/')
generator = IntelligentTestGenerator(
region_name='us-west-2',
model_id='us.anthropic.claude-opus-4-5-20251101-v1:0',
temperature=0.8
)
# Load teacher samples
teacher_samples = [...] # Your test samples
# Generate with both teacher samples and source context
generated = generator.generate_test_cases(
teacher_samples=teacher_samples,
count=20,
source_context=source_context, # Required
complexity='medium',
diversity_factor=0.8,
output_dir='output/'
)
print(f"Generated {len(generated)} tests")- Python 3.11+
- boto3 (AWS Bedrock access)
- AWS credentials configured
- Access to Claude models in Bedrock
No tests generated: Check that teacher samples have valid structure with assertions
Low quality tests: Try lowering diversity factor or providing POWER.md context
Bedrock errors: Verify AWS credentials and model access in your region
Memory issues: Reduce count or generate in smaller batches
"Power instructions very large" warning:
- Tool auto-skips .git, node_modules, pycache, etc.
- Auto-skips files >100KB
- Content truncated to 10K chars for analysis
- Use specific file instead of full directory if needed
See the examples directory for sample configurations and generated tests.
test_data_generator/
βββ __init__.py # Package exports
βββ cli.py # Command-line interface
βββ domain_analyzer.py # Domain understanding via LLM
βββ intelligent_generator.py # Test generation logic
βββ README.md # This file
The generator is designed to be extensible. To customize:
- Modify
domain_analyzer.pyto extract additional patterns - Adjust
intelligent_generator.pyprompts for your domain - Add custom validation logic for your test structure
Same as parent project.