add test data generator for evaluation#43
Conversation
| generated = generator.generate_test_cases( | ||
| teacher_samples=teacher_samples, | ||
| count=3, | ||
| power_instructions=power_instructions, | ||
| diversity_factor=0.8, | ||
| output_dir="./example_output/with_power" | ||
| ) |
lzongren
left a comment
There was a problem hiding this comment.
Thanks for the PR, pls address the code quality comments @LikeHui92. Not all of them are blocking but good for keeping a healthy repo from the start.
… a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…mplicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…once' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
@lzongren Thanks for the review! Addressed the comments. |
There was a problem hiding this comment.
Major Issues
1. No tests for the generator code
The test_data_generator/ package (~2400 lines of Python across 5 modules) has zero unit tests. ARCHITECTURE.md mentions a "Testing Strategy" with test_basic.py but the file doesn't exist. At minimum, structural analysis and validation logic can be tested without API calls.
2. example.py has broken imports
from intelligent_generator import IntelligentTestGenerator
from domain_analyzer import DomainAnalyzerThese bare imports won't work when running as a package (python -m test_data_generator.example). Should be relative imports: from .intelligent_generator import ...
3. Duplicated dead code in cli.py
cli.py contains a full load_source_context() function (~100 lines) that duplicates nearly all functionality of ContextLoader in context_loader.py. The function is never called since main() uses ContextLoader directly. This dead code should be removed.
…in cli.py and import issues in example.py, Remove generated_test_data from git tracking
|
Addressed review comments:
|
768bd0c to
ceb58e6
Compare
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Resolved conflict in .gitignore by accepting main branch changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Hi @lzongren, please help review again, I've done with the requested changes are done, and resolved conflicts. Thanks! |
There was a problem hiding this comment.
Follow-up Review (after updates)
Great improvements — ATX references down from 223 to 1, generated test files removed, tests added, broken imports fixed. Almost there.
Remaining issues
1. No dependency manifest for evaluation/
There's no pyproject.toml or requirements.txt in the evaluation/ directory. The tool requires boto3 at runtime and the unit tests mock it, but users won't know what to install. At minimum a requirements.txt with boto3>=1.28 would help.
2. Directory name mismatch: test_sample/ vs test_samples/
The README directory tree shows test_sample/ (singular) but the actual file is at evaluation/test_samples/onboarding_intermediate.json (plural). One of them needs to be renamed to match.
3. Duplicated fields in onboarding_intermediate.json
simulated_human_guidance, timeout_seconds, and max_turns appear both at the top level AND inside the metadata object. This looks accidental — the metadata copy should be removed.
4. Emojis in log output
cli.py and test_basic.py use emojis in log messages (✅, ❌, ⚠️). These render poorly in many terminal and CI environments. Consider plain text alternatives like PASS:, FAIL:, WARN:.
…fest, 2. fixed directory naming consistency, 3. removed duplicated JSON fields and so on
|
Hi @lzongren, thanks for the review! I've addressed the comments. Please help review again. Thank you! |
| ) | ||
|
|
||
| # Analyze domain | ||
| analysis = analyzer.analyze_test_samples(teacher_samples, None) |
There was a problem hiding this comment.
source_context is str but here None is passed.
| elif field in {"metadata"}: | ||
| test[field] = {} | ||
| elif field in {"assertions", "simulated_human_guidance"}: | ||
| test[field] = [] |
There was a problem hiding this comment.
Wrong type, simulated_human_guidance is always used as a str but here given as a []
| @@ -0,0 +1,155 @@ | |||
| # Test Data Generator - Testing Guide | |||
There was a problem hiding this comment.
Having three readmes incl. (TEST_README.md, READEME.md and ARCHITECTURE.md) seems to be a bit too much IMO, would you mind to compact them into 1-2 docs?
|
Re: Root-level However, this repo has a stronger convention than most: every component uses Option A — Minimal [build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.build_meta"
[project]
name = "agent-builder-evaluation"
version = "0.0.0"
requires-python = ">=3.11"
dependencies = ["boto3>=1.28", "pytest>=7.0", "pytest-cov>=4.0"]
[tool.setuptools.packages.find]
where = ["."]
Option B — Keep Either way, would be good to add a CI job for linting/testing this code so it doesn't silently rot. The unit tests in |
…ta_generator and addressed str bugs
|
@lzongren Addressed the comments, thanks for the review! |
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.