feat: add eval summary report #112

pedroslopez · 2025-10-03T07:26:01Z

This pull request introduces a new CLI for managing connector builder evaluations, refactors the evaluation workflow to use this CLI, and improves the reporting of experiment results. The changes streamline how evaluations are run and how experiment summaries are generated and accessed.

Example: https://github.com/airbytehq/connector-builder-mcp/actions/runs/18216372660/attempts/1#summary-51866377857

CLI Introduction and Workflow Refactor:

Added a new CLI module connector_builder_agents/src/evals/cli.py to manage connector builder evaluations, supporting both running all evaluations and generating experiment reports.
Updated references throughout the codebase and workflow to use the new CLI commands (poe evals run and poe evals report <experiment_id>) instead of the previous run-evals command. This includes changes in .github/workflows/run-evals.yml, connector_builder_agents/src/evals/phoenix_run.py, and poe_tasks.toml. [1] [2] [3]

Experiment Reporting Improvements:

Integrated markdown summary generation for experiment results, making it easier to access and share evaluation outcomes. The summary is now generated automatically after running an experiment and can also be produced on demand via the CLI. [1] [2]

Internal Refactoring:

Updated the experiment run logic to capture the returned experiment object, enabling downstream reporting and summary generation.

github-actions · 2025-10-03T07:26:13Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poe <command> - Runs any poe command in the uv virtual environment
/poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

github-actions · 2025-10-03T07:30:48Z

PyTest Results (Fast)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit d7ac731. ± Comparison against base commit c1c91b6.

♻️ This comment has been updated with latest results.

aaronsteers · 2025-10-07T21:14:50Z

.gitignore

 # Generated files
 ai-generated-files/
 docs/generated/
+connector_builder_agents/src/evals/results/


Optionally, we can use generated/ in the path or another slug that could be globally ignored, like test-reports or eval-reports as a path part.

aaronsteers

Looking good! 🚀

pedroslopez added 4 commits October 3, 2025 03:09

feat: add eval summary report

4a02cf1

evals as cli

c99bec2

update eun-evals

ea6fdfb

github summary supposedly

1370135

pedroslopez added 2 commits October 3, 2025 03:27

Merge branch 'main' into pedro/eval-summary-report

85ffe1a

fmt

57a0cac

pedroslopez added 2 commits October 3, 2025 03:50

better visibility of errors

218b315

format

d7ac731

pedroslopez changed the title ~~Pedro/eval summary report~~ feat: add eval summary report Oct 3, 2025

pedroslopez requested a review from aaronsteers October 3, 2025 18:43

pedroslopez marked this pull request as ready for review October 3, 2025 18:43

aaronsteers reviewed Oct 7, 2025

View reviewed changes

aaronsteers approved these changes Oct 7, 2025

View reviewed changes

pedroslopez merged commit daef9f9 into main Oct 8, 2025
23 checks passed

pedroslopez deleted the pedro/eval-summary-report branch October 8, 2025 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add eval summary report #112

feat: add eval summary report #112

Uh oh!

pedroslopez commented Oct 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

aaronsteers Oct 7, 2025 •

edited

Loading

Uh oh!

aaronsteers left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add eval summary report #112

feat: add eval summary report #112

Uh oh!

Conversation

pedroslopez commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

👋 Greetings, Airbyte Team Member!

Testing This Branch via MCP

Testing This Branch via CLI

PR Slash Commands

Uh oh!

github-actions bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

aaronsteers Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aaronsteers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pedroslopez commented Oct 3, 2025 •

edited

Loading

github-actions bot commented Oct 3, 2025 •

edited

Loading

aaronsteers Oct 7, 2025 •

edited

Loading