Skip to content

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented Sep 23, 2025

Evaluation framework and experiment automation:

  • Added a new phoenix_run.py module to automate connector builder evaluations using the Phoenix framework, including experiment orchestration, dataset management, and evaluator integration.
  • Added dataset.py to load and manage evaluation datasets from a YAML config, including Phoenix dataset creation and hashing for versioning.
  • Introduced evaluators.py with LLM-based readiness and stream coverage evaluators
  • Added task.py to define the connector build task for experiments, including artifact collection and result formatting.
  • Added helpers.py for reading artifacts from the workspace directory.
  • Created a YAML dataset listing connectors and expected streams for evaluation in data/connectors.yaml.

Agent build pipeline refactor:

  • Refactored run_connector_build and run_manager_developer_build in run.py to return lists of RunResult objects instead of None, enabling collection and evaluation of build results. Now handles errors gracefully and returns partial results if interrupted. [1] [2] [3] [4] [5] [6] [7]

Dependency and CLI updates:

  • Added required dependencies for Phoenix evaluation, pandas, YAML, and OpenInference instrumentation in pyproject.toml.
  • Added a new CLI task run-evals in poe_tasks.toml to run the evaluation workflow.

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/evals", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/evals#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poe <command> - Runs any poe command in the uv virtual environment
  • /poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

Copy link

github-actions bot commented Sep 23, 2025

PyTest Results (Full)

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 8866540.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Sep 23, 2025

PyTest Results (Fast)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 8866540. ± Comparison against base commit 6624d45.

♻️ This comment has been updated with latest results.

@pedroslopez pedroslopez changed the base branch from main to pedro/no-globals September 23, 2025 20:10
@pedroslopez pedroslopez changed the title wip feat: add evals using arize phoenix Sep 26, 2025
@github-actions github-actions bot added the enhancement New feature or request label Sep 26, 2025
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here looks good to me - especially for a first iteration, I think this is strong. 💪

Approving for merge when ready - or re-request my review if you make substantive changes that need another pair of eyes. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for the PR, but we can use this space for eval module-level docs. Specifically, the docstring we add here will get rendered in the autogenerated pdoc docs.

@pedroslopez pedroslopez marked this pull request as ready for review September 26, 2025 22:24
Base automatically changed from pedro/no-globals to main September 30, 2025 03:31
# Conflicts:
#	connector_builder_agents/src/run.py
@pedroslopez pedroslopez merged commit 0dacc73 into main Sep 30, 2025
15 checks passed
@pedroslopez pedroslopez deleted the pedro/evals branch September 30, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🧪 Tests: Implement an evals framework to measure success and performance

2 participants