-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add eval summary report #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report", "connector-builder-mcp"]
}
}
} Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/eval-summary-report#egg=airbyte-connector-builder-mcp' --help PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
# Generated files | ||
ai-generated-files/ | ||
docs/generated/ | ||
connector_builder_agents/src/evals/results/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally, we can use generated/
in the path or another slug that could be globally ignored, like test-reports
or eval-reports
as a path part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! 🚀
This pull request introduces a new CLI for managing connector builder evaluations, refactors the evaluation workflow to use this CLI, and improves the reporting of experiment results. The changes streamline how evaluations are run and how experiment summaries are generated and accessed.
Example: https://github.com/airbytehq/connector-builder-mcp/actions/runs/18216372660/attempts/1#summary-51866377857

CLI Introduction and Workflow Refactor:
connector_builder_agents/src/evals/cli.py
to manage connector builder evaluations, supporting both running all evaluations and generating experiment reports.poe evals run
andpoe evals report <experiment_id>
) instead of the previousrun-evals
command. This includes changes in.github/workflows/run-evals.yml
,connector_builder_agents/src/evals/phoenix_run.py
, andpoe_tasks.toml
. [1] [2] [3]Experiment Reporting Improvements:
Internal Refactoring: