add test data generator for evaluation by LikeHui92 · Pull Request #43 · awslabs/agent-builder-toolkit-aws-transform

LikeHui92 · 2026-05-15T01:14:13Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

+    generated = generator.generate_test_cases(
+        teacher_samples=teacher_samples,
+        count=3,
+        power_instructions=power_instructions,
+        diversity_factor=0.8,
+        output_dir="./example_output/with_power"
+    )


lzongren

Thanks for the PR, pls address the code quality comments @LikeHui92. Not all of them are blocking but good for keeping a healthy repo from the start.

… a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…mplicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…once' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

LikeHui92 · 2026-05-15T07:03:13Z

@lzongren Thanks for the review! Addressed the comments.

lzongren

Major Issues

1. No tests for the generator code

The test_data_generator/ package (~2400 lines of Python across 5 modules) has zero unit tests. ARCHITECTURE.md mentions a "Testing Strategy" with test_basic.py but the file doesn't exist. At minimum, structural analysis and validation logic can be tested without API calls.

2. `example.py` has broken imports

from intelligent_generator import IntelligentTestGenerator
from domain_analyzer import DomainAnalyzer

These bare imports won't work when running as a package (python -m test_data_generator.example). Should be relative imports: from .intelligent_generator import ...

3. Duplicated dead code in `cli.py`

cli.py contains a full load_source_context() function (~100 lines) that duplicates nearly all functionality of ContextLoader in context_loader.py. The function is never called since main() uses ContextLoader directly. This dead code should be removed.

…in cli.py and import issues in example.py, Remove generated_test_data from git tracking

LikeHui92 · 2026-05-16T15:47:07Z

Addressed review comments:

Fixed broken imports in example.py (relative imports)
Removed dead load_source_context() function in cli.py
Added one test sample which has no internal reference
Added comprehensive unit tests (22 tests)
Created evaluation README

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Resolved conflict in .gitignore by accepting main branch changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

LikeHui92 · 2026-05-17T21:32:39Z

Hi @lzongren, please help review again, I've done with the requested changes are done, and resolved conflicts. Thanks!

lzongren

Follow-up Review (after updates)

Great improvements — ATX references down from 223 to 1, generated test files removed, tests added, broken imports fixed. Almost there.

Remaining issues

1. No dependency manifest for evaluation/

There's no pyproject.toml or requirements.txt in the evaluation/ directory. The tool requires boto3 at runtime and the unit tests mock it, but users won't know what to install. At minimum a requirements.txt with boto3>=1.28 would help.

2. Directory name mismatch: test_sample/ vs test_samples/

The README directory tree shows test_sample/ (singular) but the actual file is at evaluation/test_samples/onboarding_intermediate.json (plural). One of them needs to be renamed to match.

3. Duplicated fields in onboarding_intermediate.json

simulated_human_guidance, timeout_seconds, and max_turns appear both at the top level AND inside the metadata object. This looks accidental — the metadata copy should be removed.

4. Emojis in log output

cli.py and test_basic.py use emojis in log messages (✅, ❌, ⚠️). These render poorly in many terminal and CI environments. Consider plain text alternatives like PASS:, FAIL:, WARN:.

…fest, 2. fixed directory naming consistency, 3. removed duplicated JSON fields and so on

LikeHui92 · 2026-05-18T19:17:29Z

Hi @lzongren, thanks for the review! I've addressed the comments. Please help review again. Thank you!

lzongren · 2026-05-19T04:34:11Z

+    )
+
+    # Analyze domain
+    analysis = analyzer.analyze_test_samples(teacher_samples, None)


source_context is str but here None is passed.

lzongren · 2026-05-19T04:37:15Z

+                        elif field in {"metadata"}:
+                            test[field] = {}
+                        elif field in {"assertions", "simulated_human_guidance"}:
+                            test[field] = []


Wrong type, simulated_human_guidance is always used as a str but here given as a []

lzongren · 2026-05-19T05:11:29Z

@@ -0,0 +1,155 @@
+# Test Data Generator - Testing Guide


Having three readmes incl. (TEST_README.md, READEME.md and ARCHITECTURE.md) seems to be a bit too much IMO, would you mind to compact them into 1-2 docs?

lzongren · 2026-05-19T05:14:57Z

Re: evaluation/ directory placement and packaging

Root-level evaluation/ is fine — that's the common pattern in the Python ecosystem (HF transformers has benchmark/, scikit-learn has benchmarks/, vLLM has benchmarks/). No objection there.

However, this repo has a stronger convention than most: every component uses pyproject.toml with a src/ layout. A bare requirements.txt is the only packaging outlier across the entire repo. Two options worth considering:

Option A — Minimal pyproject.toml at evaluation/ (keeps it installable, consistent):

[build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.build_meta"

[project]
name = "agent-builder-evaluation"
version = "0.0.0"
requires-python = ">=3.11"
dependencies = ["boto3>=1.28", "pytest>=7.0", "pytest-cov>=4.0"]

[tool.setuptools.packages.find]
where = ["."]

version = "0.0.0" is an established convention for workspace-internal packages that won't be published (see AutoGen's autogen-test-utils for precedent).

Option B — Keep requirements.txt, drop it in a follow-up. Lower friction now, but the inconsistency remains.

Either way, would be good to add a CI job for linting/testing this code so it doesn't silently rot. The unit tests in test_units.py don't need AWS credentials, so they could run cheaply.

…ta_generator and addressed str bugs

LikeHui92 · 2026-05-19T06:02:09Z

@lzongren Addressed the comments, thanks for the review!

add test data generator and generated test samples for evaluation

ec67893

LikeHui92 requested a review from lzongren May 15, 2026 01:14

github-code-quality Bot found potential problems May 15, 2026

View reviewed changes

lzongren requested changes May 15, 2026

View reviewed changes

LikeHui92 and others added 9 commits May 14, 2026 23:05

Potential fix for pull request finding 'Wrong name for an argument in…

bca00fd

… a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Wrong number of arguments in …

6d23e4f

…a call' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Variable defined multiple times'

c5621ed

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Use of exit() or quit()'

6c3610a

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused local variable'

55fe254

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Explicit returns mixed with i…

95215b5

…mplicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Module is imported more than …

205e7ce

…once' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused import'

25ee6d6

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused import'

0212d43

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

github-code-quality Bot found potential problems May 15, 2026

View reviewed changes

Comment thread evaluation/test_data_generator/intelligent_generator.py Fixed

Comment thread evaluation/test_data_generator/cli.py Fixed

LikeHui92 and others added 2 commits May 14, 2026 23:58

Potential fix for pull request finding 'Unused import'

cf99e94

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Empty except'

965fc93

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

lzongren requested changes May 15, 2026

View reviewed changes

added unit tests for test data generator, fixed duplicated dead code …

ceb58e6

…in cli.py and import issues in example.py, Remove generated_test_data from git tracking

github-code-quality Bot found potential problems May 16, 2026

View reviewed changes

Comment thread evaluation/test_data_generator/test_basic.py Fixed

Comment thread evaluation/test_data_generator/test_basic.py Fixed

Comment thread evaluation/test_data_generator/test_units.py Fixed

LikeHui92 force-pushed the feature/add_evaluation branch from 768bd0c to ceb58e6 Compare May 16, 2026 17:14

github-code-quality Bot found potential problems May 16, 2026

View reviewed changes

Comment thread evaluation/test_data_generator/test_basic.py Fixed

Comment thread evaluation/test_data_generator/test_basic.py Fixed

Comment thread evaluation/test_data_generator/test_units.py Fixed

LikeHui92 and others added 2 commits May 16, 2026 10:20

Potential fix for pull request finding 'Unused import'

359a482

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused import'

ac34ea3

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

LikeHui92 changed the title ~~add test data generator and generated test samples for evaluation~~ add test data generator for evaluation May 17, 2026

Merge main into feature/add_evaluation and resolve conflicts

ea73e9a

Resolved conflict in .gitignore by accepting main branch changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

lzongren reviewed May 18, 2026

View reviewed changes

resolved issues in the evaluation directory: 1. added dependency mani…

4c48186

…fest, 2. fixed directory naming consistency, 3. removed duplicated JSON fields and so on

lzongren reviewed May 19, 2026

View reviewed changes

added pyproject.toml and CI job example, compacted docs under test_da…

5bb3ada

…ta_generator and addressed str bugs

Conversation

LikeHui92 commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lzongren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LikeHui92 commented May 15, 2026

Uh oh!

lzongren left a comment • edited by LikeHui92 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Major Issues

1. No tests for the generator code

2. example.py has broken imports

3. Duplicated dead code in cli.py

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LikeHui92 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LikeHui92 commented May 17, 2026

Uh oh!

lzongren left a comment • edited by LikeHui92 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Follow-up Review (after updates)

Remaining issues

Uh oh!

LikeHui92 commented May 18, 2026

Uh oh!

lzongren May 19, 2026

Choose a reason for hiding this comment

Uh oh!

lzongren May 19, 2026

Choose a reason for hiding this comment

Uh oh!

lzongren May 19, 2026

Choose a reason for hiding this comment

Uh oh!

lzongren commented May 19, 2026

Uh oh!

LikeHui92 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lzongren left a comment •

edited by LikeHui92

Loading

2. `example.py` has broken imports

3. Duplicated dead code in `cli.py`

LikeHui92 commented May 16, 2026 •

edited

Loading

lzongren left a comment •

edited by LikeHui92

Loading

LikeHui92 commented May 19, 2026 •

edited

Loading