-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add evals using arize phoenix #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/evals", "connector-builder-mcp"]
}
}
} Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/evals#egg=airbyte-connector-builder-mcp' --help PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
PyTest Results (Full)0 tests 0 ✅ 0s ⏱️ Results for commit 8866540. ♻️ This comment has been updated with latest results. |
31ab486
to
83fa599
Compare
# Conflicts: # connector_builder_agents/src/run.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything here looks good to me - especially for a first iteration, I think this is strong. 💪
Approving for merge when ready - or re-request my review if you make substantive changes that need another pair of eyes. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker for the PR, but we can use this space for eval
module-level docs. Specifically, the docstring we add here will get rendered in the autogenerated pdoc docs.
# Conflicts: # connector_builder_agents/src/run.py
Evaluation framework and experiment automation:
phoenix_run.py
module to automate connector builder evaluations using the Phoenix framework, including experiment orchestration, dataset management, and evaluator integration.dataset.py
to load and manage evaluation datasets from a YAML config, including Phoenix dataset creation and hashing for versioning.evaluators.py
with LLM-based readiness and stream coverage evaluatorstask.py
to define the connector build task for experiments, including artifact collection and result formatting.helpers.py
for reading artifacts from the workspace directory.data/connectors.yaml
.Agent build pipeline refactor:
run_connector_build
andrun_manager_developer_build
inrun.py
to return lists ofRunResult
objects instead ofNone
, enabling collection and evaluation of build results. Now handles errors gracefully and returns partial results if interrupted. [1] [2] [3] [4] [5] [6] [7]Dependency and CLI updates:
pyproject.toml
.run-evals
inpoe_tasks.toml
to run the evaluation workflow.