Skip to content

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented Oct 3, 2025

This pull request introduces a new CLI for managing connector builder evaluations, refactors the evaluation workflow to use this CLI, and improves the reporting of experiment results. The changes streamline how evaluations are run and how experiment summaries are generated and accessed.

Example: https://github.com/airbytehq/connector-builder-mcp/actions/runs/18216372660/attempts/1#summary-51866377857
Screenshot 2025-10-03 at 4 02 46 AM

CLI Introduction and Workflow Refactor:

  • Added a new CLI module connector_builder_agents/src/evals/cli.py to manage connector builder evaluations, supporting both running all evaluations and generating experiment reports.
  • Updated references throughout the codebase and workflow to use the new CLI commands (poe evals run and poe evals report <experiment_id>) instead of the previous run-evals command. This includes changes in .github/workflows/run-evals.yml, connector_builder_agents/src/evals/phoenix_run.py, and poe_tasks.toml. [1] [2] [3]

Experiment Reporting Improvements:

  • Integrated markdown summary generation for experiment results, making it easier to access and share evaluation outcomes. The summary is now generated automatically after running an experiment and can also be produced on demand via the CLI. [1] [2]

Internal Refactoring:

  • Updated the experiment run logic to capture the returned experiment object, enabling downstream reporting and summary generation.

Copy link

github-actions bot commented Oct 3, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poe <command> - Runs any poe command in the uv virtual environment
  • /poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

Copy link

github-actions bot commented Oct 3, 2025

PyTest Results (Fast)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit d7ac731. ± Comparison against base commit c1c91b6.

♻️ This comment has been updated with latest results.

@pedroslopez pedroslopez changed the title Pedro/eval summary report feat: add eval summary report Oct 3, 2025
@pedroslopez pedroslopez marked this pull request as ready for review October 3, 2025 18:43
# Generated files
ai-generated-files/
docs/generated/
connector_builder_agents/src/evals/results/
Copy link
Contributor

@aaronsteers aaronsteers Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, we can use generated/ in the path or another slug that could be globally ignored, like test-reports or eval-reports as a path part.

Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! 🚀

@pedroslopez pedroslopez merged commit daef9f9 into main Oct 8, 2025
23 checks passed
@pedroslopez pedroslopez deleted the pedro/eval-summary-report branch October 8, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants